Frustrations with the generation of chatgpt images

by Finn Patraic

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

Like many L&D people, I experienced the generation of chatgpt images. Sometimes Chatgpt is doing very well! But other times, I find it really frustrating. I recently worked on a scenario for a customer project, and I need a series of images of two characters speaking together. Midjourney is ideal for generating images of coherent characters, but only with a character at the same time. So, I thought I would try in place of Chatgpt for this project. What I discovered is that the coherence of chatgpt in the characters degrades over several generations. I also had trouble with the position of the characters; Chatgpt continues to place characters so that they do not look at each other.

As I have noted elsewhere, I am not an IA expert. My point of view is as a practitioner by experimenting with AI to understand how to improve my own work. The generation of images is one of the areas where I spent a lot of time, in part because I can often get better results with images generated by AI for scenarios than with stock libraries. But a game of “showing my work” with AI means that I show you when it does not work either.

I will show you my results by experimenting with the generation of chatgpt images, including what did not work and how I finally got better results.

Generate the characters and stage the scene

For this scenario, I have two characters: a graduate student and a teacher. They meet in a conference room or a classroom with a whiteboard in the background.

I started by generating a single character, the graduate student. So far, so good. This is a usable image.

A 25 -year -old black woman and a graduate student taking notes

Generate an image of this same character who worked as well. The consistency of the character and the scene is quite good.

The same graduate student as the previous, but speaking image.

Add a second character

The addition of a second character, the teacher, is the place where things have become more difficult. I encouraged Chatgpt to change the angle in lateral view so that we can see the other person in the conversation.

I invited to that the other character is sitting on the “opposite side of the table”, but that installed them in a corner. He should look at her listening, but the body positions are false. He looks at her in a way in front of her, not towards her.

The graduate student who speaks to a teacher, but he does not really look at her.

I tried to adjust the characters' positions with this prompt. “Change where they are seated so that they are on the opposite sides of the table, turning directly, with more space between them. They should not be in the corner of the table like this. Make the appearance of the 16 Ă— 9 image.”

As you can see below, it is not quite correct. They are a little more distant, but still not directly opposite. However, they are turned to the other a little better. In addition, the character's consistency is a little lower. The student's facial features have changed slightly and the teacher's hair is darker and less gray. However, it could be good enough for a scenario.

The student and the teacher speak, sitting a little further. She looks at him, but he always looks at him in front of her.

Change the character who speaks

Then I tried to change the speakers. Chatgpt overturned the position of the speakers rather than keeping them in the same place but to change their poses. Obviously, this will not work.

The professor is on the left and the graduate student is on the right.

I have the two characters sitting on the right sides of the table, but none of these characters look at each other. The two characters lose a certain consistency with each next generation.

The graduate student and a professor seated in a table. The professor speaks and looks outside camera. The student is faced with an angle but at least looking at the professor.

When I asked that the characters turn their bodies so that they are looking directly, the professor's hair and the students' skin has become darker. In addition, he always seems to speak to a third person in the room, rather than the graduate student who is now so close that their arms are almost touching.

The student looks at the professor and listens, but the teacher speaks to the wall.

I tried once again in this set before abandoning. “Turn the teacher's head more to look at the woman's face.” As you can see, it didn't turn your head.

The teacher speaks and looks in the corner of the room, avoiding visual contact with the student.

Inconsistency of character

Two versions of the graduate student character from the start and end of the Chatgpt images generation process. The two images have similar blouses and hair, but different skin colors, earrings and facial lines.

Here is a comparison side by side of the graduate student character, so the inconsistency is more obvious. I always expect a minor inconsistency in the generation of IA images; It is part of the reality to work with these tools. But passing a character as much after only a handful of iterations was really disappointing. I think Chatgpt does better with the consistency of the characters when you only work with one character. In a scene with even two characters, I don't think it is usable if the characters change so much.

I think Chatgpt did a little better with coherence in the white male character than with the black woman. Coherence was still not great, but I wonder if part of the problem is that I deliberately show the diversity of my characters. We know that IA image generators show a bias and tend to stereotypes. I think there are general problems to obtain the consistency of the characters with several characters, but I suspect that the underlying bias of the training data can also be part of the problem.

Try a different approach

Obviously, this approach with the generation of images did not work. Inviting images to Chatgpt is different from other tools; You can be more conversational and less precise. This conversational invitation lends itself well to iteration and to refine the images on several attempts (but only if the characters can remain consistent).

I decided to try something different. I generated the initial characters of Midjourney rather than in Chatgpt, so I started with more detailed images. I think Midjourney is better for more interesting and less generic characters than Chatgpt. I prefer this version of the graduate student character rather than the one I generated in Chatgpt. The character of the teacher always looks a bit like AI, so I'm probably going to go back and regenerate a basic image for this character. It was good enough for this experience.

I downloaded these two reference images on Chatgpt and asked him to assemble them in a scene. It is an area where the chatgpt can work well; It can combine and remix existing images in new scenes. There is a loss of details in the characters, but I can live with that. However, he has always set up the two characters at the corner of a table.

Sketch for prompt

After my previous frustrating experience, I decided to do something different to encourage the layout I wanted in my image. I made a very fast sketch of the scene and how I wanted the characters to position themselves.

Sketch showing two people sitting with each other at a table with a whiteboard in the background.

This approach finally gave me the results I wanted. The characters really look at each other!

It also worked to change the character who speaks. There is more inconsistency in the characters here, in particular the graduate student (note his hair and the black strap for an identity document around his neck). But it is certainly more on the right track.

Continue to experience

I will continue to work and experience the generation of chatgpt images. I'm still not sure that Chatgpt can keep the characters enough consistent on several iterations as I need for this scenario. However, since I started with characters generated in Midjourney, I can also create images of each character separately in Midjourney.

If Midjourney had an easy way to maintain the consistency of the characters with several characters, I will probably stick to this tool.

Sometimes Chatgpt is clearly the best image generation tool. Tim Slade showed how he used Chatgpt to generate cutting characters To put slides. Midjourney cannot yet do transparent background png. I was successful with a White context for character images in midjourneyAnd it's easy to modify to remove the background. However, this requires an additional step. Chatgpt can do it alone without additional publishing.

I may need a different workflow to build these scenes with several characters. There are other tools like Runway that seem to be able to remix character images in new scenes. Tons of tools are currently working on the consistency of the characters, because it is important for narration in many areas (marketing, entertainment, etc.). If I cannot make it work in the Chatppt with current technology, I think it should be possible in a few tool.

If you have managed to generate scenes with several characters, let me know which tools you use. I will add them to my list of tools to test.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.