More stable diffusion video processing experiments. All using the same stock source video showed in the previous post of the singer in front of a band playing music. All are using fixed random seeds.Some of these like above incorporate minor text prompting changes over time. You can see the generative rendered output content changes when that happens.The 2 above are with low strength settings. If you boost the strength more then the spatial guide anchor influence of the input video becomes more apparent. The result can be pretty horrifying at times.The strength adjustment is weird, because you really lose a lot of what i would expect the style part of the text prompt to generate on its own if you were not polluting the latent U-Net input with the VAE encoded video frame input. Like with below, where if you look at other generative examples using a ukiyo-e woodcut styling on this blog, they are very flat like you would expect, while in the example below it just turns the content to japanese geishas with none of the flat woodcut appearance you would expect.
Once again the white overhead lights are tracked the best in these particular experiments, which probably tells us something important about the system and how it works.
This second batch of stable diffusion video processing experiments again supports my hypothesis that you really need to view this as spatial modulation of a true generative synthesis process rather than actual video processing.
No comments:
Post a Comment