Think tanks, AI and the “interested public at large”

13 July 2023

Why writing explainers will soon be a waste of time and resources

A few years ago, I co-authored an article for On Think Tanks arguing that “engaging with ‘the interested public at large’ is an intrinsic part” of good think tank communications.

I’m not so sure that’s true anymore.

I’ll admit that I’ve been banging the write for a broad audience drum for a long time. In a Wonkcomms piece from 2014, I warned that the then-new trend of explainer journalism was encroaching into think tank territory and that we risked being disrupted with a good enough product. 

That’s basically what happened. It’s only fairly recently that the think tank sector has made a real push to reclaim the policy explainer territory.

But Google’s new AI threatens to put humans out of the explainer business entirely.

The state of editorial content

A little background is helpful.

About a decade ago, journalists got really into explainers. The jargony name for this kind of writing is service journalism, and it consists of things like explainers, guides and product reviews.

It turns out that service journalism generates a ton of website traffic. And website traffic, in turn, drives pretty much every aspect of revenue for media outlets. Ad sales, subscription conversions, affiliate links—they all rely on getting lots of eyeballs on the site.

In short, explainers bring traffic and traffic brings revenue. Investigative and feature journalism wins Pulitzers, but service journalism pays the bills.

A lot of the traffic to service journalism outputs comes by way of Google. Search accounts for about 35% of traffic to news sites, making it the largest driver of traffic. And search still (mostly) means Google. (All social networks put together combine for about 30% of media traffic. Media supported by viral Facebook posts proved unsustainable, as Buzzfeed NewsVice and Mic can attest.)

Think tanks are even more heavily reliant on search traffic. I’ve yet to work on a think tank website that received less than 60% of its inbound traffic from search. Most are closer to 70% and a few flirt with 90%.

And for most think tanks, the real heavy hitters—the pieces that consistently bring in a ton of inbound traffic from search—well, those are mostly explainers and blog posts. You know, the kinds of things we write for “the interested public at large.”

A lot of think tank website strategy is built around this basic pattern — traffic comes via a search result that links to a piece of evergreen content on topic that is suddenly hot, one that’s written for the general public. We then develop content strategies around moving people deeper into the site. We give them related content. We do a lot of deep linking to the more rigorous work underlying the explainer. We ask them to sign up for newsletters or attend events on their topic of interest.

I’ve helped build a lot of sites like this.

But what happens if Google stops sending traffic to explainers?

Turning off the search spigot

Here’s The Vergereporting on how Google is using AI to fundamentally change the way it presents search results. I’m going to quote fairly extensively for context, but it’s worth taking the time to read the entire piece.

To demonstrate, Liz Reid, Google’s VP of Search, flips open her laptop and starts typing into the Google search box. “Why is sourdough bread still so popular?” she writes and hits enter. Google’s normal search results load almost immediately. Above them, a rectangular orange section pulses and glows and shows the phrase “Generative AI is experimental.” A few seconds later, the glowing is replaced by an AI-generated summary: a few paragraphs detailing how good sourdough tastes, the upsides of its prebiotic abilities, and more. To the right, there are three links to sites with information that Reid says “corroborates” what’s in the summary.

Google calls this the “AI snapshot.” All of it is by Google’s large language models, all of it sourced from the open web. Reid then mouses up to the top right of the box and clicks an icon Google’s designers call “the bear claw,” which looks like a hamburger menu with a vertical line to the left. The bear claw opens a new view: the AI snapshot is now split sentence by sentence, with links underneath to the sources of the information for that specific sentence.

The tl;dr [too long; didn’t read] is that Google is scraping the web for good sources, then using an AI—specifically, a large language model—to automatically generate a summary—an explainer, if you will—that directly answers the search query.

That summary appears above any links to other sites.

Google’s feature isn’t limited to questions about baked goods. New York Magazine tested it by asking about the debt ceiling and found the output to be pretty good.

That summary even included a think tank in its references.

And, yes, there are references, and users who are interested can click through to get more information. But the basic work of providing a summary suitable for “the interested public at large” has already happened. There’s no need to click through to another explainer.

For the media, this is an existential threat, as it would potentially cripple the most reliable way of generating traffic from the largest driver of said traffic.

Think tanks and other research organizations have it a little better. We don’t rely on ad revenue for funding. While our vanity metrics will take a hit if Google launches this new feature widely, our funding sources will last as long as the foundations and corporations and individual donors hold out.

But our explainers are better!

Yes, I know what you’re going to say. An explainer written by a human expert will absolutely be better than anything autogenerated by Google’s robots. That’s true!

It’s also irrelevant.

The standard for judging explainers isn’t good. It’s good enough. People read explainers because they’re vaguely curious about a topic. As long as an article is good enough to satisfy their curiosity, then they are all set.

These days “disruption” gets tossed around so much that it’s mostly meaningless. But the original theory is still solid. Disruption happens when something comes along that is both an entirely new way of doing a thing and also good enough to replace the original thing, at least at the low end.

The digital camera is the classic example. Early digital cameras weren’t good. But they were good enough to replace the sorts of low-end cameras—the 110s and the disposables—that accounted for the lion’s share of actual photographs taken. That led to widespread adoption, which in turn led to a cycle of improvements.

Flash (ha!) forward a few years, and Kodak is filing for bankruptcy and Fujifilm is building copiers, medical diagnostic equipment and computer components.

Where do we go now?

For research organisations, all is not lost. Google can translate research into public-facing summaries, but it’s not out there conducting the research itself. If we put more care into creating our research products—the reports and briefs we’ve been producing for years—then we can accomplish two goals at once: Creating material that Google needs for informed summaries and also providing policymakers with the kinds of detailed guidance they need to implement good policy.

A good, human-generated product is still a necessity for training an AI. As Clive Thompson points out, if a language learning model (LLM) is trained on AI-generated content, it experiences a phenomenon known as “model collapse.”

Here’s what that sort of thing looks like.

You don’t even need four full generations. Model collapse sets in when as little as 10% of AI-generated material gets mixed in with training data.

So there’s still a role for human experts to produce the underlying content that AIs will turn into summaries for “the interested public at large.” 

That said, an AI doesn’t need to train on other explainers. It can ingest your research outputs just as well as it can your explainers. Luckily that means we can inform AI with the same sorts of rigorous, longform outputs needed for the humans who are actually making and implementing policy!

But we do need to approach our research outputs with the knowledge that AI will be consuming it and that it will use our work to generate explainers.

I think that entails two approaches.

Focus on our core value-add

The value of original research isn’t in the prose we use to describe our findings. It’s in the connections between ideas. I’ve argued at length that ideas live not in texts but in the spaces between texts.

AI isn’t great at surfacing novel connections. Indeed, large language model AI is a fancy autocomplete. That’s not meant to be dismissive —fancy autocompletes are really powerful! But they aren’t capable of generating new knowledge or novel ideas.

So researchers need to double down on ideas, on uncovering and surfacing new connections between content.

The research report is a terrible vehicle for that.

Hypertext affords all sorts of opportunities to play with links — to instantiate the spaces between texts. My book — Screens, Research and Hypertext — shows what a version of that could look like. But it’s not the only possible model. We should be putting more effort into experimenting with alternative ways of presenting research findings.

Write for the robots

Research organisations put their content on the open web. (Well, mostly. I’m squinting at you disapprovingly, academics.) That’s a good thing! If a good idea is buried in a PDF that no one can access, does it really exist? (Looking at you again, academics.)

But the open web is fair game for AIs. If it’s out there for free, AIs are going to scrape it.

So if you can’t beat them, join them.

The best AIs will be the ones that layer generative AI (the LLM things) over an extensive knowledge graph. That’s what’s happening with Google’s autogenerated summaries. Google is using its search magic to find good sources, then applying an LLM to generate a summary using those sources as inputs.

Those search robots rely on a metadata scheme called schema.org to help them “understand” what a particular string of text means. (That’s how it finds good sources to feed to the LLM.)

The bad news is that schema.org is optimised for selling stuff online. Sure, it has markup for reports and journal articles. But the markup is mostly around the metadata for such items, not for their content.

The good news is that schema.org is extensible. It’s fairly straightforward to add in the sorts of elements that make up a good research product (e.g., instead of a generic image or figure, we might have an element called barchart, with metadata allowing us to associate it with a particular text.)

So how do we do all this?

Neither of these is a quick fix. If you’re still using a page-based tool to create texts (*cough* Word *cough*) and then dumping it into a WYSIWYG field on your website—and let’s face it; that’s most everyone in the sector—then you’ll need some truly major changes to both your workflows and to the back end of your website.

If you really plan to take the whole linking texts thing seriously, then you’ll likely need an entirely new set of tools. Most web CMSs are limited to one-way references from one page to another, not two-way relationships between specific blocks of text.

The changes will take time and money. Luckily you’ll have more of both once you stop writing explainers for “the interested public at large.”

Want to prepare your digital publishing for the AI revolution? Or just generally want to up your digital communications game? Fountain Digital Consulting can help. Drop us a line.