In part 2 of this 'woman dancing' series, we break away from the notion of viewing the system as a video processor, and think about it as a pure generative image synthesis system. Again, all of the posts in this series are using the exact same static text prompting. Above starts with a random initialization of the u-net input, then recursively feeds back the last output as the new input.Above is using a video generated with CLIP guided RGB optimization (run with a different prompt) as the input to the u-net. The source video is pretty noisy and abstract, so not representational of a woman dancing in any way.
Above starts with a random initialization of the u-net input, then recursively feeds back a locally adaptive affine transformation of the last output as the new input.
No comments:
Post a Comment