Creating original, new images is among the top uses for generative AI. Tools like Adobe Firefly, Microsoft Designer, Canva, and Midjourney are among the most popular. Users of X (formerly Twitter) can now take advantage of Grok.
All of these tools can create semi-realistic images, but all have their drawbacks as well. Grok in particular gets some details humorously wrong, such as a cigarette sticking out of Mickey Mouse’s face; a door handle and door hinge both on the same edge of a door; lights sticking out of people’s heads; and Homer Simpson with two eyes but three pupils.
Grok also struggles to follow instructions, as users have noted. I discovered that first-hand when trying to create a simple image for a blog post. The output is impressive (and very fast), but…Grok can seem challenged to follow instructions when generating images, as the following example shows.
Okay, that’s not a bad start from a pretty vague prompt. But notice the odd shapes of the tables in these and the following images. In some cases, people just seem to sprout from the middle of a table.
Next, I got a little more specific with the background.
Not fantastic, but better. Making progress. Next, I asked Grok to show a wider angle and make the people look more enthusiastic.
Apparently, Grok didn’t understand “not so close-cropped”—and it kind of confused energy with awkwardness.
I thought maybe “zoom out” would work better, and asked for a smaller group.
Grok ignored both instructions. And again, note the oddly shaped tables.
“Smaller group” may have been too ambiguous, so I got more specific.
Okay, now Grok seems to just be going rogue. And the table shapes are even stranger.
I’m becoming exasperated at this point. Maybe if I shout the instructions IN ALL CAPS then Grok will understand?
Nope.
I get still more specific.
Apparently, Grok can’t count, but does at least understand what “diverse” means.
I try to add a bit more detail in my next prompt. And yes, it should say “they are looking.” I was flustered.
Definitely more women, but now the group size explodes. Certainly not “4 or 5 people.”
Maybe I’m not being fair to Grok. Maybe I’m asking it to presume too much. So, my next prompt gets very specific.
Ah, that helped. Still not quite there, but getting close.
After a few more tweaks to my prompts, the groups look better, and the tables are normal shapes. But somehow these teams have magical floating laptops? Amazing! Where can I get one of these?
A few more prompt tweaks, then…Eureka! I finally get a usable image for the blog post.
It’s not going to win any awards, but it will do. Grok (and I) have struggled—but prevailed.
Leave a Reply