AI Model Faceoff! Testing Eight Models Head-to-Head for Blog Post Writing

Tom Pick

8 months ago

Ever stare at a blank screen with a nasty case of writer’s block? Need a list of 10 insights about (topic) but you can only think of four? Have a 2,000-word post to write and you’d like to knock it out faster?

AI model faceoff 8 tools tested for help writing blog posts

Image credit: Lil Mayer on Unsplash

These are just a few scenarios copy writers and bloggers have to turning to artificial intelligence (AI) to help with at least since ChatGPT exploded onto the scene in November 2022.

But with several more options released since then, and the field growing continually more crowded, which AI model is best for helping with blogging? I wondered exactly that, so I put eight popular tools to a test. The results?

In a moment. First, a necessary caveat.

AI Tools Can’t Replace Human Writers

While any of these AI models can be helpful for tasks like research, ideation, or even generated partial rough drafts, none of them should be used to produce final copy. Ever.

The reason lies in the core of their design. As explained in this article, “Generative-AI (genAI) tools are fundamentally sequence-prediction machines. What does this mean? By and large, these systems complete a sequence with whatever is most likely to appear.”

Predictability is highly valuable in many contexts: medicine, law, science, mathematics, and coding, just to name a few. But it’s a sure recipe for mediocrity in creative endeavors like writing.

In terms of entertainment, when a critic writes that the plot of a movie was “entirely predictable,” that’s not a compliment. In terms of business, readers are attracted to what’s new or even contrary, such as posts that expose common myths or take positions contrary to conventional wisdom, with titles like Forget Employee Engagement and Corporate Culture, Say Experts – Focus on These Three Priorities Instead.

Articles and blog posts that are predictable, that merely tell us what we already know, are boring and provide no value.

Affiliate sales support independent publishing.

So, yes, AI can be quite helpful in terms of speeding up research and even generating snippets of near-final copy. It can help copywriters, bloggers, and other content marketing professionals work more efficiently. But AI can’t replace human marketing pros.

Putting Eight AI Models to the Test: The Challenge

The first step was to train each model on my writing style. How do you do that? A brilliant but long (more complete) answer comes from Harshal Patil. A useful, shorter answer comes from Forbes senior contributor Jodie Cook:

The first thing to do is get ChatGPT (or other AI model) to understand your writing style. Explain what you’re going to do with the following prompt, so it’s ready to receive the examples:

‘I’d like your help in creating articles for [purpose]. Your first task will be to understand my writing style based on examples that I give you. After that, we’ll create some content. To start, please say GO AHEAD and I will paste examples of my writing. Keep saying GO AHEAD and I will paste new examples. When I am done I will say FINISHED. At this stage, please do not do anything except confirm that you have saved the writing style.'”

Next, since this post dealt with how to use a specific academic business framework, I asked each tool if it was familiar with that or if I needed to provide additional training material. All of the AI tools recognized this concept.

Finally, I provided each AI model with the same detailed blog post outline, which included among other elements:

Target word count
Target SEO keywords
Intended audience
Objective of the post
A detailed content brief
Suggested links and anchor text

Then I sat back and watched as each tool did its work.

Testing Eight AI Models for Blog Post Writing: The Results

Here’s how each model scored, in ascending order. The specified target word count for the post was between 1,200 and 1,500 words. Only one tool hit that range. In terms of the other metrics, each model was graded on a standard A-F scale (none failed :-)).

#8: Le Chat

I had high hopes for this tool from French developer Mistral AI. It turned out to be very friendly (no warnings about my prompts having too many characters, as some others did) and easy to use. However, it missed the mark badly on word count (799), failed to insert any of the suggested links, and the copy quality was middle-of-the-pack.

#7: GPT 4.0 Mini via Poe

Easy to use and the copy quality was nearly as good as the leaders. But it only inserted a couple of links and the content generated was the shortest of any of these tools, just 650 words.

#6: Poe Assistant

This was also easy to use and the copy was reasonably solid. But it didn’t use anchor text for the links, which was weird (Claude, GPT, Gemini, and Grok all did), and the copy output was very short at only 669 words.

#5: ChatGPT

The most popular AI model did a nice job of following instructions and was easy to use. But the copy was again well short of the specified target (787 words) and the quality of the writing was the worst of any of these tools. It definitely read like AI-produced copy.

#4: Perplexity

Perplexity at least managed to break 1,000 words, and the copy quality was as good as it gets from AI. But it didn’t include any links, and was unnecessarily difficult to use as I had to keep feeding it shorter and shorter blocks of writing samples in order to train it on my writing style.

#3: Gemini 2.0 via Poe

Gemini also topped 1,000 words of output, included several links, and was easy to use. But the formatting was awful and the opening text was dreadful (though the copy quality was pretty decent overall).

#2: Claude

This was the only tool that actually hit the target word count range; it did an excellent job with linking; and the copy quality was comparable to Perplexity. It would have scored the top spot in this faceoff except it was the only tool that could not complete this test in one sitting. Just as it was about to generate the post, it told me I had run out of free queries for the day, and would have to wait until 12:00 PM to continue.

It was just after 10:00 in the morning when I was running the test, so I thought no problem, I’ll swing back and finish up in two hours. No luck. Perhaps it meant 12:00 Pacific time? I waited another couple of hours and tried again. Still no dice. Finally, late that evening I refreshed the page and was able to finish the test.

#1: Grok

Grok produced the second-longest content, the linking was excellent, and the copy quality was fairly strong, though a bit adolescent; it read as though produced by a budding writer in their first job out of college (definitely not my style). The only real downfall is that it doesn’t retain information across sessions; I had to re-train it on my writing style when I returned.

The full list with rankings is shown above. As a final step, to verify the validity of my testing, I totaled the number of reviews for each of these AI model and average user ratings across Capterra, G2, and TrustRadius.

The results actually line up pretty well, with the exceptions of ChatGPT (which has by far the highest number of user ratings, being the oldest of these tools) and Grok (which has virtually no presence on these software review sites because it’s relatively new).

Final Thoughts on the AI Model Faceoff

All the usual caveats apply: your mileage may vary; all of these tools are continually being upgraded; and these results are for one specific use case. Nevertheless, the strong correlation between my results and the ratings of hundreds of individuals across the three top software review sites seems to indicate there is at least a clear top five in terms of these AI models.

The best advice? Do your own research. Hopefully this post has provided some helpful guidance on that front as well as warnings about some of the ease-of-use limitations each tool has at this stage.