Home Artificial Intelligence What happens when genAI vendors kill off their best sources?

by Steven Vaughan-Nichols

What happens when genAI vendors kill off their best sources?

opinion

May 21, 20245 mins

Generative AIGoogleTechnology Industry

You don’t really think you can depend on answers pulled from the likes of self-appointed Reddit experts, do you?

RIP - grave - tombstone - cemetery - death [Image by Rob van der Meijden - CC0 via Pixabay]

Credit: Rob van der Meijden

If you think the latest generative AI (genAI) tools such as Google AI Overviews and OpenAI GPT-4o will change the world, you’re right. They will. But will they change it for the better? That’s another question.

I’ve been playing with both tools (and other genAI programs, as well). I’ve found they’re still prone to hallucinations, but sound more convincing than ever. That’s not a good thing.

One of the reasons I’m still making a living as a tech journalist is because I’m very good at discerning fact from fantasy. Part of that skill set comes from being an excellent researcher. The large language models (LLM) that underpin genAI chatbots…, not so much. Today, and for the foreseeable future, at their best, genAI is really just very good at copying and pasting from the work of others.

That means the results they spit out are only as good as their sources. Look at it this way: if I want to know about the latest news, I go to The New York Times, the Washington Post, and the Wall Street Journal. Not only do I trust their reporters, but I know what their biases are.

For example, I know I can believe what the Journal has to say about financial news, but I take their columnists with a huge grain of salt. (That’s just me; you might love them.)

As for the Times, remember it claims that OpenAI has stolen its stories to train ChatGPT — and if it wins its case, genAI is in trouble. Because other publishers will follow in quick succession. When that happens, all the genAI engines will have to steal — uhm, learn — their content from the likes of Reddit; your “private” Slack messages; and Stack Overflow, where users are sabotaging their answers to screw up OpenAI.

That’s not going to go well. There’s a reason genAI engines often spew garbage; it’s what they were trained on. For instance, 80% of OpenAI GPT-3 tokens come from Common Crawl. Like the name says, these petabytes of data are scraped from everywhere and anywhere on the web. As a Mozilla Foundation study found, the result is not trustworthy AI.

Worse still, this will eventually lead to a time when those genAI tools start consuming their own garbage. This is a known problem that will cause model collapse. Or, as neuroscientist Erik Hoel pithily describes the end result: “synthetic garbage.” He’s not alone; many AI engineers think a little bit of AI-generated data can poison their LLMs.

At the same time, genAI companies aren’t doing us — or themselves, in the long run — any favors. For example, Google’s AI-powered “Overviews” provides concise AI summaries at the top of search results. This move promises quicker access to information, and Google’s Liz Reid claims it will drive more clicks to websites by piquing users’ interest.

Reid, who oversees search operations, maintains that AI Overviews really will encourage more searches and clicks to websites as users seek to “dig deeper” after getting the initial synthesized summary.

Publishers know better. Who will bother to go to the real story, which might require a subscription or — horrors —seeing an ad?

Danielle Coffee, CEO of the News Media Alliance (it represents more than 2,200 publishers) warns that the change could be “catastrophic” for an industry already struggling with declining ad revenue. “It’s offensive and potentially unlawful for a dominant monopoly like Google to dictate the rules in a way that sacrifices the interests of publishers and creators,” she said.

Google has never been a friend to publishers. Just ask leaders in countries like Spain or Canada, where the government tried to get Google to pay publishers for access to their news sites.

If Google, Microsoft, and other genAI companies keep all those search visitors (and ad revenues) to themselves, as I expect will be the case, publications will die at an even faster rate. And there goes any authoritative information Google and the other AI services need for their LLMs.

OpenAI’s co-founder, Sam Altman, recently said, “GPT-4 is the dumbest model any of you will ever have to use again by a lot” and that “GPT-5 is going to be a lot smarter.”

I’m sure it will be. GPT-4o is clearly superior to its predecessor and GPT-5 will continue the trend. But GPT-6 and beyond? Simple greed may ensure that, as reliable human-created stories disappear, AI will only get dumber and dumber.

In short, we’re looking at a future filled with AI GIGO: Garbage In, Garbage Out. No one wants that. The time to stop it is now.

by Steven Vaughan-Nichols

Steven J. Vaughan-Nichols has been writing about technology and the business of technology since CP/M-80 was the cutting-edge PC operating system, 300bps was a fast Internet connection, WordStar was the state-of-the-art word processor, and we liked it!

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

What happens when genAI vendors kill off their best sources?

You don’t really think you can depend on answers pulled from the likes of self-appointed Reddit experts, do you?

More from this author

The workers have spoken: They’re staying home.

Boeing and the perils of outsourcing mission-critical work

X marks the (porn) spot

Apple’s worst ad ever?

Can AI tools help reduce Zoom fatigue?

The end of non-compete agreements is a tech job earthquake

You’re not really still using Windows XP, are you?

Is AI driving tech layoffs?

Most popular authors

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For August, Patch Tuesday means patch now

Germany’s BSI guns for better tech security

Podcast: Is the gold rush for AI talent slowing down?

Podcast: Google loses antitrust, and the world yawns

Podcast: Does a chief risk officer make sense?

Is there still a gold rush for AI talent?

Tech news roundup: Google antitrust, Delta-Microsoft tiff, and stuck astronauts

Do companies need a Chief Risk Officer?

What happens when genAI vendors kill off their best sources?

You don’t really think you can depend on answers pulled from the likes of self-appointed Reddit experts, do you?

Related content

AI and AR can supercharge ‘ambient computing’

Agentic RAG AI — more marketing hype than tech advance

Researchers tackle AI fact-checking failures with new LLM training technique

Hollywood unions OK AI-cloned voices in commercials

From our editors straight to your inbox

More from this author

The workers have spoken: They’re staying home.

Boeing and the perils of outsourcing mission-critical work

X marks the (porn) spot

Apple’s worst ad ever?

Can AI tools help reduce Zoom fatigue?

The end of non-compete agreements is a tech job earthquake

You’re not really still using Windows XP, are you?

Is AI driving tech layoffs?

Most popular authors

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For August, Patch Tuesday means patch now

Germany’s BSI guns for better tech security

Podcast: Is the gold rush for AI talent slowing down?

Podcast: Google loses antitrust, and the world yawns

Podcast: Does a chief risk officer make sense?

Is there still a gold rush for AI talent?

Tech news roundup: Google antitrust, Delta-Microsoft tiff, and stuck astronauts

Do companies need a Chief Risk Officer?