A Deep Dive into Text-to-Image AI: How Google’s New ImageFX Stacks Up
On February 1, Google announced a new text-to-image creation tool called ImageFX. It is one of many AI models trained on a vast collection…
On February 1, Google announced a new text-to-image creation tool called ImageFX. It is one of many AI models trained on a vast collection of images and their descriptive texts. The magic unfolds by tapping into this learned relationship, allowing you to transform any words into images. So, when you tell a text-to-image model to make a “blueberry muffin dog,” it conjures an image that blends these concepts.
ImageFX is Google’s second generation of Imagen technology. They will be integrating it directly into Google’s LLM, Bard, in many countries (sorry, EU and Canada). This can be viewed as Google’s first real competition in the text-to-image space. Is it any good?
It has been a while since I compared the major text-to-image platforms head-to-head. (I last did it in September 2022 in “The Beginners Guide to No Code AI Artwork.” As an aside, I didn’t know what to call this emerging technology at that time and dubbed it “No Code.” To me, that was the most exciting feature. The term “Generative AI” didn’t start to catch on until October of that year.)
What follows is a collection of prompts and images for four of my most used text-to-image services, namely:
· ImageFX by Google- currently Free
· DALLE-3 by OpenAI- for Free via Microsoft Designer or $20/month with ChatGPT Plus
· Ideogram- 100 images/day for Free
· Midjourney v6- starting $10/month
(There are others out good services out there, including Adobe Firefly, Magic Media, and Stable Diffusion XL, to name a few. I don’t go into them here.)
This is where the text-to-image technology was 18 months ago when I wrote my last comparison.
Prompt: Flying Wizard with magical powers, Cinematic, Color Grading, Photoshoot, Shot on 70mm, Ultra-Wide Angle, Depth of Field, DOF, Tilt Blur, Ultra HD, Cinematic Lighting
And here’s what that same prompt generates today.
There has been a significant improvement in the quality of each service since then. But one thing that is generally true is that you no longer need to give these crazy-detailed prompts to generate pictures. Models are now more attuned to respond to simple language prompts, as you will see in the prompts I used below.
Let’s continue to compare the top models that exist today side-by-side.
Prompt: a 1977 movie marquee for the opening of “STAR WARS”
It is impressive that ImageFX, Ideogram, and Midjourney all have the “Star Wars font,” even if you wouldn’t have seen that on a 1977 movie marquee. DALL-E refused to generate this image. OpenAI is clearly afraid of generating anything Disney-related (i.e., they don’t want a lawsuit from “The Mouse”). Even when I explained to DALL-E that I only wanted the words “STAR WARS” on a marquee (which would NOT break any copyrights), it still refused to produce that output.
Prompt: joe biden opening presents
So what’s going on here? Google/ImageFX and Midjourney have effectively banned any generations using the term “Joe Biden” (along with “Donald Trump” and other world leaders/office holders). There is a legitimate concern that these tools can be used to spread misinformation in a much more deviant way than sharing pictures of the Pope in a puffer jacket.
Ideogram seems to have no such filters in place. Although I would assume they will soon. OpenAI is taking a different path. They generated an image, but as I mentioned in this chat, that sure isn’t Joe Biden.
Now, if I change the prompt from “Joe Biden” to “the president,” here’s what I get.
Prompt: the president opening presents
In Midjourney, you can get past its filters if you use the word “president.” It generated pictures of Trump, Biden, and Obama for me in the tool (including the Trump one here which was the highest quality). When you use the generic, “president,” Ideogram seems to make a weird amalgamation of Donald Trump and maybe Tony Blair (?).
Prompt: the official presidential portrait of an anthropomorphized President Pig
And Google appears to have locked down any use of the word “president.”
Prompt: a photograph of a dog in a spacesuit, on a space station, overlooking the earth
Even though I specified a “photograph,” DALL-E generated a picture of a three-legged dog in their own DALL-E art style. ImageFX did an admirable job. Ideogram has a nice picture of the earth, with another Earth where the moon is. The Midjourney photo is quite beautiful.
Prompt: a black and white photograph of a person holding a sign that says “Call Me at 555–1212.”
Close, but not quite right. Text generation has improved in all of the tools, but none are quite there.
prompt: a detailed painting of a man’s hand in the style of a Renaissance painting
Remember the trope that “AI cannot draw hands.” It is getting better. DALL-E is still a six-fingered abomination, but Ideogram and ImageFX are about 90% there. Midjourney is near-flawless.
Prompt: Two women with brown hair, one very old man, another woman with purple hair, and a young child having a fun discussion at an outdoor café
It is difficult for text-to-image tools to represent these multiple-step prompts, as I laid out. None of them got it right, but ImageFX (missing the kid) and Midjourney (missing one woman with brown hair) are the closest. Also, text-to-image tools have difficulty forming accurate faces of multiple people in photorealistic images. Look closely at ImageFX and Ideogram faces, and you’ll see distortions. Midjourney struggles the least.
The Verdict
DALL-E 3- The best thing that DALL-E has going for is its “Conversational AI” embedded in ChatGPT. If you don’t get what you want on the first try, you can tell it how to tweak and change it. Unfortunately, even after these tweaks, the results are usually never as good as the other image generators.
Ideogram- A good tool that has a generous free plan. It is worth trying if you don’t get the results you want from the others.
ImageFX- Google’s new service is quite good. After running these examples (and many more) in comparisons, I think it is better than DALL-E or Ideogram. And if you use it through the Google Test Kitchen link, they have a nice feature called expressive chips, which take your original prompt and give you drop-downs for variations it can create. And it’s all free, at least for now.
The Winner
But the king of text-to-image still has to be Midjourney. While the gap between Midjourney and the other tools may not be as far as it once was, it is still my go-to for image generation. While it isn’t always perfect, it is always where I start generating photos. If you have any reason to create images for $10/month, the investment is well worth it.