How Reliable Are AI Detectors? Claims vs. Reality

8 min read
SEO/Content Marketing
By: Jasmine Leechuy

In less than a year since its release, the AI-powered chatbot ChatGPT has drastically changed how people approach content creation. Many marketers find in AI content generation an attractive feature for boosting creativity and efficiency.

But two big questions remain:

  • Will AI just make content creation easier, or will it replace some jobs completely?
  • If you don’t want AI content, can you rely on AI detectors to avoid it?

While the first question is mostly a reflection of our concerns for the future, the second seeks answers for something that’s impacting the industry as we speak.

After all, AI content detection tools rose just as quickly as AI use did, boasting their ability to tell whether a human or a trained AI algorithm wrote something.

But how reliable are AI detectors, really? And will AI-generated content phase content writers out? The answers lie in understanding the differences between human writing and AI-generated content.

Let’s dive in and take a closer look at how content marketers use AI, the promise of AI content detectors, and the reality of current AI content detection tools.

Spoiler alert: AI content and AI content detectors have a lot of problems.

What Content Marketers Should Know About AI: The Basics

AI is short for artificial intelligence, which refers to any computers that have the ability to simulate human intelligence. In other words, computers that can process their environment, understand context, and solve problems on their own within the guidelines given.

Popular AI models like OpenAI’s ChatGPT and Google Bard rely on two technologies: natural language processing (NLP) and machine learning.

Natural language processing allows computers to understand language the way humans can. Machine learning uses training datasets and algorithms that allow computers to “learn” and solve problems or take action without needing to be pre-programmed for the specific task.

To get an idea of how these work, let’s use an example question for ChatGPT, “What are some good names for a puppy?”

ChatGPT answering, “What are some good names for a puppy?”

Natural language processing helps the chatbot understand your question without the use of coding language.

And machine learning enables the chatbot to give you a set of names without having an explicitly coded response for your question ahead of time.

The Use of AI in Content Marketing

When it comes to marketing and content generation, there are two popular AI software applications: generating content and detecting AI-generated content.

AI Content Generators

AI tools can be used to generate different types of content, such as website copy, social media captions, and blog posts. Humans love shortcuts, but while AI tools usually take less time than a human writer, AI-generated content may not live up to the quality, relevance, and nuance of human writing.

Marketers can also leverage AI tools to help them get new ideas or break through writer’s block rather than relying on them completely. Some popular content generation tools include ChatGPT, Jasper, and Copy.ai.

AI Content Detectors

On the other side of the equation, you have AI content detectors, which are tools that claim to help you determine if an AI or a human produced a piece of writing.

These can also be used to check for quality or plagiarism issues and to spot-check if you’re likely getting human writing (not AI drafts) from employees, freelancers, and agency partners. Some of the most popular AI detectors include Originality AI, Content at Scale, and Copyleaks.

As AI has become a popular content generation tool, organizations, including schools, marketing agencies, and brands, have begun to turn to AI content detectors as a way to tell if the work they receive was written by a human or generated by software.

The questions we wanted to explore were, how do these detectors work, and can you rely on them to accurately identify AI-generated content?

Here’s what we found.

How Do AI Detectors Work?

To figure out how reliable AI detectors are, we need to understand how they work. AI detectors are given a set of training data, which typically contains both human and AI-generated text. It analyzes those articles to figure out which characteristics best identify the AI-generated pieces.

Two of the major characteristics AI detectors analyze are:

  • Perplexity: The unpredictability of the content. AI-generated text tends to have low perplexity, while human writing has higher perplexity.
  • Burstiness: Variation in the length and structure of the sentences. AI content tends to be more steady and have lower burstiness than human writing.

That said, AI detectors can’t guarantee anywhere close to 100% accuracy because they are based in large part on probabilities. Not to mention, each of the detectors use different datasets of content to train them. So, they can often provide different results from one another.

Quote “AI detectors can’t guarantee anywhere close to 100% accuracy because they are in based in large part on probabilities.”

In our experience, here are some of the other content qualities that detectors often flagged as AI-generated.

Infographic showing examples of AI content signals.

Based on these signals, the AI tool gives you its best guess on how the content was produced. Some tools provide the percentage chance that a human wrote the entire copy, while others will highlight specific parts of the text that it thinks are most likely AI-generated.

Do you want human-sounding, data-driven copy that converts?

Can AI Detectors Be Wrong?

Yes, there are many instances where AI detectors have failed to identify AI-generated text and others where they flag human-written text as AI copy (known as a false positive). Some experts believe that reliable AI detection isn’t possible with the current tools.

According to AI expert Soheil Feizi of the University of Maryland, “Current detectors of AI aren’t reliable in practical scenarios.” He points specifically to an example of a false positive where detectors flagged the U.S. Constitution as primarily AI-generated as an example.

Example of an AI detector flagging the U.S. Constitution as AI generated.

In particular, the use of AI-based paraphrasing tools like Quillbot or Wordtune can make AI content nearly undetectable in many cases.

On the other hand, human writing can also be incorrectly flagged as AI text. In these cases, it can damage relationships if you wrongly accuse a student, employee, or independent contractor of using AI instead of writing the copy themselves.

As such, it’s crucial that we don’t put too much stock into AI content detectors. Like ChatGPT, these tools are in their infancy and shouldn’t replace edits by an actual person.

The Importance of Human Content Review

Having an actual person review your copy is a much more reliable way to ensure high-quality writing.

Humans are the best people to ask questions like:

  • Is it engaging?
  • Does it make sense?
  • Is the content helpful and useful?
  • Are there credible sources cited?
  • Are the facts correct?
  • Does it address our readers’ main concerns?
  • Is this an appropriate reading level for our audience?
  • Does this sound like another person wrote it?

If you have a small team or are strapped for time, AI can be an option to do preliminary research or get ideas as long as human editors are still reviewing the content.

At The Blogsmith, we believe that unique copy written by humans is the best way to connect with your audience. We rely on our writers’ in-depth research and our editors’ extensive industry experience to ensure all the content we create is unique.

So AI content generation is not part of our process. But, we stay on top of new content developments and have extensively tested AI detectors.

Here’s how those tests went.

How Reliable Are AI Detectors? Our Experience

After experimenting with some of the most popular AI writing detectors, here’s what we found.

Claims Made by Leading AI Detectors

Several AI detector tools like Originality AI, Copyleaks, and GPTZero claim to accurately tell users if a piece of content was AI-generated, human-created, or a mixture of both.

Originality AI boasts up to 99% accuracy and less than 2% false positives, or incorrectly flagging human text as AI. Originality AI claims to have the most accurate detector because its “AI team is wildly smart” and the fact that it charges customers to use the platform so it can afford to use more computing power.

Copyleaks, on the other hand, claims to have the lowest false positive rates in the industry at 0.2%. The company explains its success by stating that its detector is trained to look for human text instead of trying to find and flag AI text.

The Reality of AI Detectors

When testing a variety of detectors, we found most claims didn’t seem to add up. Issues such as varying results between tools, high false positive rates, and inconsistent results from the same tool plagued tools when we tested them.

In one test where we asked several tools to analyze an AI-generated article, we saw results ranging from 100% AI-generated to 30% of the text flagged as having a 90% possibility of AI. Since the detectors often disagreed, we found that we needed to run tests on each piece with multiple detectors to help make our decisions.

Furthermore, we were able to test writing that we knew were created by our team of human writers and followed SEO best practices. In several instances, these pieces were falsely flagged by detectors as AI-generated.

And, when we asked ChatGPT to explain its reasoning for labeling an article as AI-generated, it noted things that are just good writing. It cited “clear headings and subheadings,” “objective and informative” writing, and “detail and organization” as factors in the rating, suggesting that ChatGPT may interpret elements that make for good human writing as AI-generated.

Finally, we experienced the issue of seeing results change when running the same content through AI detection tools twice.

In rewriting some text and rerunning the report, Originality completely changed its mind on what was AI and what wasn’t. It took the newly rephrased content and tagged it as mostly human but flip-flopped on what it previously said was human and flagged it as more likely AI.

From our view, AI detectors may have some use for flagging repetitive unoriginal content, but there’s no foolproof tool that can reliably tell you how a piece of content was generated. Interestingly enough, Originality AI’s study also says, “We don’t believe that [AI detection scores] should be relied on 100%.”

Quote, “We don’t believe that [AI detection scores] should be relied on 100%.”

The Ramifications of Relying Solely on AI Detectors

Relying on inconsistent detectors can cause trouble, especially when there’s a high chance of false positives. Professors using AI detectors in higher education has already led to false cheating accusations when tools like Turnitin incorrectly flagged students’ work as AI-generated.

In the marketing world, accusing writers and agencies of using AI tools when they’ve written the content themselves can damage valuable working relationships.

Originality offers one solution through its Chrome Extension that claims to “watch writers write” to ensure they’re not using AI. But this level of big brother-esque oversight doesn’t bode well for building strong relationships based on mutual trust. Instead, you begin signaling a lack of trust in your team, which can lower morale and negatively impact performance.

The Future of AI Content Detection

As the popularity of AI-generated content has grown, many detection tools have popped up, claiming to be able to definitively tell whether an AI or a human wrote something.

However, we couldn’t find a tool that reliably identified both sources of writing. AI content generation and AI detection are in the beginnings of an arms race, and right now, AI detection tools are behind as content generators like GPT continually raise the bar with new versions like GPT-3 and GPT-4 that have adjusted algorithms.

That said, AI detection tools can still be helpful as a part of the editing stage of the content production process.

At this time, we use two detectors for each content piece to help identify any red flags or readability issues we can address in a draft before it gets published. Still, they’re no replacement for human judgment, especially when it comes to search engine optimization and fact-checking.

If you need a content partner that prioritizes the reader experience, accuracy, and proven SEO tactics backed by an experienced team, explore The Blogsmith’s SEO content services today.

Join the Discussion

Your email address will not be published. Required fields are marked *