Saturday, November 26, 2022

the masque of youth 2

 






2 comments:

Les Wagstaff said...

Absolutely Stunning John, I also like the young woman with her hair flowing off to the side, Ive been trying to produce that effect but failing...

Synthetik said...

Regarding the hair flowing off the side.

If you are just working with adjusting the text prompting as the input to any of the various generative ai image synthesis algorithms you will never be able to reach a large percentage of what the system is capable of visually creating.

This is because there are a lot of different ways to modulate the range of potential effects the system is capable of. But you are stuck with just text, which is very constrained and limited as compared to what could actually be specified in the text latent embedding of the system. That text embedding (which you can think of as highly quantized compared to what it could potentially be as a continuous vector representation) is then fed into the attention mechanism in the diffusion part of the synthesis algorithm.

And you are also stuck with the specific random noise the algorithm uses for a given seed as the input to the latent input to the diffusion U-Net. You can change the random seed, but what you end up getting out of that random noise generator essentially looks visually the same in many respects, it exercises just a small part of the space of potential inputs possible for the latent embedding to the U-net diffusion part of the generative system.

For the second and third examples I'm feeding different processed videos i generated in Studio Artist into the latent input of the U-net diffusion part of the system that drive the generative animation. So the actual movement and to some extent the base structure of what the generative algorithm synthesizes is being pushed and controlled to some extent by what is coming in from that video i made in Studio Artist.

How all of that modulation ends up playing out in the generated animation output is super nonlinear. So you have to be willing to try different things and get a feel for what they can do. The text prompting is still influencing what comes out, but you have the ability to get way more variability and nuance than what you would get only working with the t5ext prompting.

What bugs me about a lot of the text to image interfaces people have put together is that they get totally fixated on the text part, fixating on super elaborate text prompting as the only mechanism to adjust the system, and are missing most of what the actual complete system has to offer in the way of adjustments and variability, if you start exploring all of the different ways to modulate it.