XF-Blog
ProjectMachine LearningdevelopmentAbout
MACHINE LEARNING PAPER NOTE
[Paper Note] MaskGIT: Masked Generative Image Transformer
[Paper Note] MaskGIT: Masked Generative Image Transformer

https://arxiv.org/abs/2202.04200

Training

Decoding

Mask Design

Ablation Studies

Questions

  • Is it possible to replace the mask token with a random vector? Or at least use a form of vq token + epsilon?
  • During training, the model depends on ground truth tokens, but during generation, it depends on its own generated tokens. How to bridge this gap?
  • Can the loss function be changed to mean squared error (MSE)?
  • Does this mean that some tokens will always be retained throughout the iterations? What if the initial choice for these tokens is bad?