These were all constructed using a random text cutup method similar to how David Bowie constructed the lyrics to his songs. Random pieces of text arranged into some kind of ordering. Split across 5 different categories for each text prompt, so 5 random picks from each of the categories for each random text prompt.
The first one below shows off a problem i have discussed with current color coherence approaches that work off of the first frame image for the rest of the animation. It also shows off a second problem i have seen before associated with using latents for feedback approaches that are built off of high contrast black and white imagery. You can think of it as feeding the latent representation something that doesn't have any noise in it, which leads to the diffusion synthesis algorithm not having anything to grab onto as the diffusion process iterates. I stopped this first one early because at the point where i stopped it you never really get out of the local minima hole it has fallen into.
If you take the exact same set of text prompts and use a fixed random seed and interpolate the text embeddings
(animation below), you can see that there is nothing inherent in the actual text prompting that forces a black and white high contrast imagery style. It's an artifact of feeding the U-Net latent diffusion process an initial starting latent it wasn't really trained to deal with
(basically it isn't noisy enough).
Switching back to recursive feedback into the U-Net latent, we can once again see the same 'getting stuck' phenomena we see in the first animation. I'm working with a depth estimation based warp map for this one below, which is a little bit better at breaking up large areas of high contrast black and white tone in the frame images. But the high contrast wood cut print look you oftentimes see is very characteristic of SD when it is getting stuck on this particular kind of local minima issue. Very prevalent if you feedback using the same random fixed seed.
If i now take the exact same text prompts and settings, but add one additional thing to the text prompt at frame 0 to force better color distribution for SD rendered frame 0, watch how that changes the animation run below. Both of the problems i discussed above go away
(although you are still tied to the color distribution of the first frame).
No comments:
Post a Comment