• Atlas@lemmygrad.ml
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    27 days ago

    I think part of the beauty of art is that it translates an idea through different mediums, and at each step of the translation it is transformed by the experiences of the author/artist. The flip-side of this is the more times it changes hands, the more the original input is distorted.

    The process for the creation of TCG art is a good analogy. When using these tools to generate an image, you are the one creating the world and providing the concepts to be represented in the piece. The AI as the artist then interprets what you’ve provided through the training data (or in the case of a human artist, their life experience) and produces an image.

    I’m not the arbiter of what is / isn’t art, but I don’t think the distinction matters for the rest of the discussion.

    As a consumer there are multiple lenses in which you can view the art:

    • How do you interpret the image
    • What choices did the artist make in the production of the image
    • What can you tell about the artists experience / worldview from the image
    • What historical / external context might have impacted the image creation process

    I think you can do those for AI art, but most people aren’t interested in those questions regarding AI (it’s entire existence is dictated by training data scraped from the entire internet, and it has absorbed all of it completely uncritically), and since the prompter is one step removed from the final product it is more difficult to find interesting information about the prompter through the image.

    So ultimately I think that AI obscures the prompter in a way that makes it difficult to “see” intentionality in the final work. ie.

    • Did Yogthos intend for this to imitate a particular art style, or did the AI choose the art style because of the darker themes of barbed wire / war imagery
    • The color choices in the final panel evoke a kind of eerie twilight rather than a sunrise, is that a representation of anything?
    • Despite the red star rising, the sky is still gray, does that say anything? Does that have different meanings to us vs. the artist (cultural norms / history). Is that because the training data contains a lot of western propaganda depicting the “specter of communism” haunting the world?

    And that’s not an argument against using AI to generate images. Sometimes you don’t need it to be that deep, but I think truly effective propaganda has an intentionality to it that is missing from the AI generated pieces I’ve seen.

    • CriticalResist8@lemmygrad.ml
      link
      fedilink
      arrow-up
      9
      ·
      edit-2
      27 days ago

      I don’t disagree with your comment, I’m often asking these questions as thought exercises because like I said I don’t think they need to be answered for the end product to exist (and this doesn’t go for AI images only).

      There is intention, but intention is also subordinate to one’s own skill. I have in my head a very specific picture I would love to see in the real world with my eyes, but I can’t make it myself. This isn’t just me, it’s just that you need to have the skillset for your intention to be represented. All I can offer from my own two hands at this time will be an MS Paint stick figure and not the full, detailed digital art piece.

      Prompting is similar, and in fact teaches us to communicate our visual thoughts better. It’s like passing the information off to the artist - and freelancers have been complaining for a long time that clients don’t know what they want and can’t express it lol. If you don’t specify something, the artist/AI will just take a best guess (edit: or if they’re a nice artist, they’ll ask you about it which is definitely something AI could improve on, actually asking you about what you want before getting to work). We can argue about how the AI does it and how that differs from how a human would do it, but I also think that soon enough the difference will be imperceptible and this argument won’t matter.

      I don’t know if I’m coming across clearly lol. Perhaps to analogize your analogy:

      Did Yogthos intend for this to imitate a particular art style, or did the AI choose the art style because of the darker themes of barbed wire / war imagery

      Did yogthos intend for this style to transpire, or did the artist pick this because of the theme? We could ask the same question on an art commission.

      This is also exactly how we find traditional artists using AI. They already know what they want to convey in their piece (it doesn’t come naturally to people when they first pick up the brush, it’s a learned skill!) so they know how to prompt AI to get stuff the way they envision it and convey something specific. As a designer I know how to look at the details, it’s also a learned skill that newcomers don’t have - I’ve seen my share of designs that people think looks acceptable, but to the designer has a lot of problems; textboxes not being aligned is a big one, but most people don’t necessarily see it and therefore don’t worry about it.

      • Atlas@lemmygrad.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        27 days ago

        I think you are coming across clearly!

        There are only two further points I’d make:

        There is intention, but intention is also subordinate to one’s own skill. I have in my head a very specific picture I would love to see in the real world with my eyes, but I can’t make it myself. This isn’t just me, it’s just that you need to have the skillset for your intention to be represented. All I can offer from my own two hands at this time will be an MS Paint stick figure and not the full, detailed digital art piece.

        1. I think you’re undervaluing your own artistic ability, people like shitty drawings because they can empathize with the artist. Most propaganda now is done through the lens of memes & pop culture references and don’t necessarily resonate better because of the additional detail added by a fully rendered piece. So ultimately I’d only encourage you to produce some crap drawings where you think appropriate, presuming you have time. Every pixel in that piece was placed there on purpose by you, and that has value. Also consider whether visual mediums are where you’re most effective, this piece is inspired by a poem, there are countless mediums through which you can express yourself and I think it’s a joy to engage with and get better at.

        Did yogthos intend for this style to transpire, or did the artist pick this because of the theme? We could ask the same question on an art commission.

        1. The artist’s worldview and perspective shaped their previous work, this work inspired the commissioner to select them, and their decisions show in the final piece. I think you’d get very different outcomes if you commissioned someone familiar with Lenin’s work vs. someone who wasn’t. Facial expression, lighting, pose, medium / constraints, etc. all tell a story and I think that story is interesting when it’s a conscious decision made by a person.

        I would be interested to see more examples of what the prompt is vs. the image outcome, just to analyze it more.

        • CriticalResist8@lemmygrad.ml
          link
          fedilink
          arrow-up
          6
          arrow-down
          1
          ·
          edit-2
          27 days ago

          I would be interested to see more examples of what the prompt is vs. the image outcome, just to analyze it more.

          Well, by experience I would say a lot of it comes down to prompting differently if you can’t get the exact result you want. Or adding keywords and keywords to just slightly change the outcome. There’s some tricks and keywords you pick up on to get a certain result. I’ll try to find a video that showcases all of this because there is a lot that goes into it beyond the commercial LLMs, what they do is take your text prompt and reformat it for the image generator. But you also lose some control and that’s how we end up with yellow-filter GPT images (though you can absolutely fix that with some additional prompting).

          I would say though the biggest factor is the seed, which determines the original gaussian noise that gets generated. The checkpoint (the image model) then denoises that incrementally over X many steps (all of this is decided by the user). But most models have a sampler and scheduler that is clearly superior and you would not use any other once you find it. The seed however completely changes how the picture looks, because what the checkpoint does is hallucinate patterns in the noise. This post is a good example: https://old.reddit.com/r/StableDiffusion/comments/1p80j9x/the_perfect_combination_for_outstanding_images/. If you click through the gallery quickly, you’ll see it immediately.

          You can reprompt though, even with the same seed. You could say, instead of a closeup of a wolf (or whatever keyword they used to get that picture of the wolf), “taken from afar”. Some models even start to understand “taken from 10 meters away”, “macro photography”, etc. It really depends on what it’s trained on, but you have to think like a descriptor - you’re not describing what you want the picture to look like, you’re literally describing what’s in it. “Person, happy, smiling, in the pouring rain” - you have to add that happy otherwise the model will just “best guess” the expression, or might give them a blank expression.

          People in the stabdif community (that subreddit I linked) generally share their prompts, you can explore a little and see how they got the results they did. But it’s very dependent on the model itself, and then you can also add LORAs, which introduce purposeful bias. Loras can do a whole bunch of stuff, for example I have one that can produce pixel art. The model generates the picture, and then the LORA intervenes on some level to modify the weights of the neural network and make the output look pixel art. You have loras for everything, and anyone can train them. This is one of the example outputs from the pixel art lora:

          edit: I forgot to add, this makes the process very different from other forms of illustrative work. But this is true of painting vs digital painting vs logo creation too, or sculpting vs 3d modeling. Imo image prompting is more akin to a lottery, since it depends on the seed so much you generate a bunch of images (people even generate a whole grid of 9 or more pictures at once and then select the best one), then once you find something good enough you lock the seed in or use img2img, then reprompt over and over again. I’m sure that’s even still just entry-level stuff and the ‘pros’ do a whole bunch of more technical stuff to find exactly what they want.