Friday, April 8, 2022

baby Propaganda poster

 








All of these are using a mini DALL-E implementation on Hugging Face that uses a VQGAN rather than a VQVAE like the original DALL-E paper.  And the dataset of training images is 28X smaller than what OpenAI used in DALL-E.

But the last 'baby' example clues you into something fundamental associated with the mini implementation VQGAN.  You could probably improve the representation by adding additional layers to the model. But that kind of artifact is also associated with simple ReLU nets configured to represent images.  So a different activation function (think implicit neural representation like Siren) would be a better strategy i think for that part of the system.

No comments: