All of these experiments are using the exact same set of text prompt keyframes. But you could never create any of the frame images above if you were just working with text prompting only. You can see how even though they follow the story line, there is a huge amount of variability in the imagery from run to run. This is a function of how i'm modulating the latent representation of the U-Net independently of the text input.
Below is generated by just interpolating the text prompt latent representation, so it is recreate-able from just the set of text prompts (and a fixed random seed), although even here almost all of the frames could not be created by typing in text, but are generated by exploring the latent space between specific text prompts. The potential text latent space is much bigger than what is available with absolute text prompts, and the potential generative space of the full model is much bigger than what can be achieved working with text only (absolute text or latent text) to drive the image synthesis system.
No comments:
Post a Comment