What Kind of Content Does AI Cite (Based on Prompt Type)?

  • No single site dominates AI citations—Reddit leads at just 3.49%.
  • Prompt type drives citations more than domain.
  • Blog/content makes up 53.46% of all citations, far ahead of news (14.09%) and social (8.71%).
  • Owned content appears in ~40% of brand awareness queries.
  • Earned media accounts for ~80% of citations when brand-agnostic queries are excluded.
  • ChatGPT is more controllable, with 68.9% of citations coming from owned content.
  • Social citations skew toward the bottom of the funnel, with 18% of evaluative queries vs. ~5% for informational queries.
  • Reddit drives 39.87% of social citations.

There is an ongoing narrative that listicles are your only in-road to AI citations, and that couldn’t be further from the truth.

Last week, I published a massive study on news citations that appeared in ChatGPT, AI Overviews, AI Mode, and Gemini.

Using Xofu, the AI citation-tracking tool from Citation Labs, I analyzed over 4 million citations across ~4k prompts.

I also broke our prompts into three types: informational queries, decision-making queries, and brand awareness queries, because the questions customers ask VASTLY change how AI citations occur.

I focused on news citations first because we are a digital PR-focused tool, but there are massive takeaways when you look at the data outside of just news publications.

So, in this follow-up, I wanted to take a more macro look at the data and understand how AI platforms cite our sources by prompt type.

You can jump down to takeaways, but you’ll also find some scattered throughout from Garrett French, founder of Citation Labs and Xofu.

Here’s what we found:

Reddit and Wikipedia are the Most Cited Sites in AI (But This is Very Misleading)

Table of Contents

Let’s start where everyone starts: the top-cited site from our entire dataset was Reddit.

top cited domains overall

But, as you’ll soon see, that starts to mean less and less as we go through the data.

First off, the fact that Reddit accounts for only 3.49% of the dataset suggests the dataset is incredibly fragmented.

You shouldn’t rely on one site to rule your AI strategies.

Then, the type of question that you ask an LLM drastically changes the citation set.

Top Sites By Prompt Type Changes Things Again

When we look at the top sites by prompt type, the picture changes.

top cited domains by prompt type

Wikipedia becomes the go-to source for brand awareness queries, while Reddit is more for head-to-head brand comparison queries.

But again, there is incredible fragmentation. This tells me that LLMs are getting much better at showing a wide variety of sources (probably due to query fan-out).

But not all LLMs behave the same way.

Let’s look at site slices by platform.

ChatGPT and Google’s AI Platform’s Top Cited Sites Are Very Different

When we look at ChatGPT, we see next to no overlap with Google’s platforms except for Wikipedia.

top cited domains by ai platform

Then, to break it down one more time, we can look at the top five sites for each platform and prompt type.

top cited domains by platform and prompt type

You get the picture; there is no one universal list to rule them all, and the type of prompts you ask drastically changes where the information comes from.

Keep that in mind as we go through the remaining takeaways.

Content is King

As you can see, blog/content is easily the most cited type of content by AI models, making up over half of the citations (53.46%).

Overall breakdown of citation type

As we saw in our previous study, news publications are in (far) second place with 14.09%, followed by citations from social media (8.71%).

Now we still need to look at this by prompt type to understand the breakdown.

Blog/Content is Less Dominant in Brand Awareness Queries

As we can see, while Blog/Content still appears most frequently across all prompt types, it is less dominant in citations for Brand Awareness queries.

overall breakdown of citations by prompt type

So, when a user is searching to learn more about a brand, prompting something like “What products or services does United Airlines offer?” Google Gemini is pulling information and citing from a product page on the United Airlines website.

united airlines owned site

Other times, it may just cite the homepage, which I rolled into my “About” categorization.

For instance, when prompted “What is Booking.com known for?” it simply cited the homepage.

booking.com homepage

For now, I want to dive into the Blog/Content, given that it is the largest chunk.

Most of what was categorized as Blog/Content is actual blog content, but it can also include reports, data studies, calculators, reviews, analyses, and virtually any website content that isn’t selling something or expressly any of the other categories listed above.

To further clarify what we were looking at, I broke these down again into subcategories:

Subcategory Description Example
Policy / Regulatory Industry regulation, compliance. link
Comparison / Alternatives Head-to-head comparison. link
Definition / Explainer Explains how things work. link
Market Analysis / Insights Strategic or analytical market discussions. link
Pricing / Cost Cost comparisons, fees, prices. link
Company Overview / Profile Third-party company overview or analysis. link
Stats / Trends / Reports Data-driven articles. link
Listicle / Best-of Ranked lists or numbered insights. link
Review Evaluation of a product, service, or platform. link

With that in place, we can look at the type of content that shows up most frequently.

Comparative Content Rules, But Mainly for Evaluative Prompts

Comparative content is clearly the most cited type at 26.92%, followed by what I called Market Analysis/Insights (23.06%) and Definition/Explainer content (20.57%).

overall breakdown of blog/content citations

For example, here is a Homevy blog post comparing Airbnb vs. Booking.com:

homevy - airbnb vs booking.com

And while this may lead you to think that you should jump on the comparison content bandwagon, our evaluative prompts were exclusively head-to-head, bottom-funnel comparison queries, like:

  • Netflix vs Disney+: which is the better option?
  • Is Hilton better than Hyatt?
  • How much does Coursera cost compared to edX?

So, it makes sense that they show up.

Expert Opinions

Before you build a comparison asset, find out where you actually stand.

Try first with Vince’s prompts in ChatGPT, Gemini and Claude:

  • “Is Hilton better than Hyatt for business travel?”
  • “Coursera vs edX — which is worth paying for?”
  • “What is Delta known for compared to United?”

Notice which brand gets the recommendation – demand citations and proof from the AI.

That justification language suggests the axis the winning brand owns in that particular citation space.

Now run the same exercise with your brand + key competitors.

If you’re absent entirely, comparison assets (comparative tables for example) can put you in the conversation – even on a shared/losing axis of comparison (a given column header).

If you exist but your competitor gets the recommendation, week over week for the same prompt, then more content on a claim you both make won’t move the needle.

An asymmetric advantage, grounded in proof, is what increases the odds of increasing your position in recommendation lists. That’s what we call an axis of advantage.

Find your axis of advantage.

And when we break it down, we can see that comparison content appears most prominently in the evaluative prompts.

overall breakdown of blog/content by prompt type

And with this lens, we start to see how cleanly the type of content generated by LLMs is influenced by the prompts themselves.

Next, let’s leave Blog/Content and look at some other citation groups to see if the patterns hold.

About Pages and Product Pages Get Cited for Brand Awareness Queries Over One-Third of the Time

Blog/Content accounts for the most brand awareness queries (33.78%), but here’s where things start getting interesting, because About and Product/Service pages combine for 35.13% of the citations.

About/product page citations by prompt type

Meaning, when users ask about a brand, about a third of the time, AI shows content from the brand’s website.

For instance, if we ask “What products or services does Harvard University offer?”, we see citations from blog content like this post from Spark Finance:

spark - your guide to harvard university's degrees

But we also see owned media from the Harvard Campus Services page appearing in the citations:

harvard campus services page

But as we saw with the domain lists, different AI platforms act differently. In this next section, you can see how Google platforms differ in content type from ChatGPT.

ChatGPT prefers a broader mix of content types than Google

Based on the overall breakdown, Google has a heavier focus on Blog/Content citations, whereas ChatGPT is more spread out across Product/Service pages (24.56%), Internal Newsrooms (18.71%), About pages (9.15%), and even Support/Help (7.72%).

citation type by platform

This again shows how differently these platforms behave when they pull insights for the same questions.

Next, let’s look at news content, as it’s the second-most-cited type in our dataset.

News Publisher Content Made Up 14.09% of all Citations

I already published a comprehensive breakdown of this content in our news citations analysis, so I won’t go into it too deeply here.

overall breakdown of citation type - news

When we broke it down by prompt type, we saw that news citations appear most frequently in evaluative queries.

news citations by prompt type

For instance, when we asked “Is Delta better than United?” we saw a publication called The Travel with a story covering a survey from Compare the Market, which compared both brands.

delta vs united vs southwest article from The Travel

Two other quick pieces to cover with news that stood out to me.

  • News citations weren’t coming from syndicated sources like MSN or Yahoo. (Syndicated news only made up 6.2% of the news citations).
  • News citations weren’t coming from press releases. (Press releases only accounted for 0.32% of the news citations.)
  • Newswire content only made up 0.21% of the entire analysis.
See also  International Drive

Newsroom content, however, was calculated separately.

These are essentially internal newsrooms where brands post press releases and company news, as shown below.

iberdola exceeds 42k MW renewables - press release owned

These accounted for 3% of the entire analysis—and on ChatGPT, they accounted for 18% of citations (compared to about 2% on Google’s platforms).

So, there may be some potential to influence citations with your own content.

But before I can look at that fully, let’s look at social media to understand how and where they come from, so we can distinguish between owned and earned.

Social Citations Appeared in 8.71% of the Citations

Overall, social media appeared in just 8.71% of the citations, making it the third-most cited in our study.

overall breakdown of citation type - social focus

Based on the breakdown, Reddit is the most cited (39.87%), followed by YouTube (25.88%), and then LinkedIn (8.96%).

overall social platforms appearing in citations

As you can see, I also included some lesser-known social platforms like Fish Bowl App, and even rolled in Substack and Medium into the mix.

Next, we also need to look at the prompt type.

Social Shows Up Most for Bottom Funnel Prompts

Social citations come mainly from evaluative, decision-making prompts (18%) compared to just 5% from informational prompts and 2.5% from brand awareness prompts.

social media citations by prompt type

And remember, when we talk about evaluative prompts, it’s always those head-to-head comparisons.

For instance, if we were to ask, “How much does Udemy cost compared to Pluralsight?”, we saw the citation coming from this Reddit thread:

pluralsight or udemy subscription

Another nuance we haven’t looked at is how the platforms differ.

60% of ChatGPT Citations Come From Facebook

ChatGPT and Google have very different preferences for social platforms, according to our dataset.

ChatGPT has no Reddit citations; instead, 60% of the social citations in our study came from Facebook.

overall breakdown of social media citations by platform

About half of the social citations from Google AI Mode come from Reddit; AI Overviews have 36.62%, and Google Gemini has 23.71%.

But on social, there’s a chance that posting your own content can lead to citations.

So, I needed to categorize all the social citations according to the prompts.

LinkedIn Accounts for Roughly All of the Owned Social Citations

Only LinkedIn had a meaningful share of owned media at 37.27%.

top social media citations earned vs. owned

To get this breakdown, if the prompt mentioned a brand and the citation was from a social source owned by the brand, it was categorized as owned.

For instance, on Reddit, I dug deep to see whether any subreddits claimed to be owned by brands or had company members as moderators.

Most, if not all, expressly said they weren’t owned or supported by the real brand:

deloitte serious question what is deloitte and what do they do?

I repeated that process across all social platforms.

As you’d expect, LinkedIn company profiles are the main source for Brand Awareness queries.

For instance, one citation from Google AI Overviews for the query: “What is MGM Resorts known for?” returned MGM Resorts’ company profile page:

MGM Resorts international reddit

Now, what happens if we expand this owned vs earned to the entire dataset to answer : how well can you effectively influence LLMs with your own content?

Earned Media Citations Account for About 80% of Citations

Citations from all earned media sources, like news, social media, wikis/forums, account for 47.70% of the dataset.

But when we excluded informational queries that didn’t mention brands in the prompts, earned jumped to about 80%.

True owned content (blogs written by the site, product pages, and service pages) accounted for only 20.7% when informational prompts were excluded.

overall breakdown of earned vs owned citations

So, we may be tempted yet again to say that no, we can’t influence LLMs with our own content, but, again, with AI, everything is prompt-dependent.

When we look at this breakdown by prompt type, I do see a path for owned content to influence LLM output.

About 40% of Brand Awareness Citations Came From Owned Content

Earned content still accounts for almost all citations from evaluative queries (93.75%), but, as we saw in social, only 60.15% of citations from brand awareness queries are earned.

earned v owned v third party citations by prompt type

So about 40% of the time, for queries about or comparing your brand to someone else, AI will cite content from your own site.

Expert Opinions

Pull up your About page and your top product or service page right now.

Read it and ask yourself: Is every claim you’d want an AI to repeat actually on the page, in plain language, as a provable and complete logical statement?

“We’re the leader in X” isn’t a claim AI can evaluate.

“We’re the only AI vendor that [specific proof]” is.

If they’re written for a human skimming a hero section, you’re leaving that window mostly closed.

A useful 30-minute exercise: type “[your brand name] is known for” into ChatGPT, Gemini, and Claude.

Compare what comes back to a list of your UVPs, your brand identity, and what you want your ICPs to understand first and foremost.

The gap between those two things is your About Page and Sales Page editing queue – and something you could address with your next PR campaign.

This tracks with a test last year coming from Will Reynolds and the Seer Interactive team, where they simply changed some wording in their site’s footer and saw how easily it surfaced in AI.

Traditional SEO Metrics Don’t Correlate Directly With Citations

There is no correlation between Domain Rating, Domain Authority, Links, or Organic Traffic to sites that get cited in LLMs.

Metric Spearman r
Domain Rating (DR) −0.111
Domain Authority (DA) −0.128
Site Referring Domains −0.108
Site Organic Traffic −0.089
Page Traffic −0.017
Page Keywords +0.002
Page Links −0.095

However, this doesn’t mean SEO is irrelevant; there is an important nuance: we aren’t measuring query fan-out.

LLMs don’t pull from the whole internet.

They pull from a retrieved set of documents, based on how the prompt gets expanded into different variations (often called query fan-out).

This is why we see so many citations from niche blogs and long-tail comparison pages.

Those kinds of pages might not dominate traditional search metrics, but they rank well for specific variations (and probably contain content that’s easier for LLMs to extract, compare, and synthesize).

What this tells me is that how and why LLMs decide to cite something is still a metric we simply don’t have yet.

Expert Opinions

Stop measuring only WHETHER you appear — start measuring HOW you appear.

Run these two prompts across ChatGPT, Gemini, and Claude — week over week:
“What are the main alternatives to [your brand] in AI? Once you’ve listed them, compare [your brand] against those alternatives for [use case] and tell me what you’re basing that on.”
“Based on that comparison, what is [your brand] most specifically known for that its alternatives are not? What source are you drawing on for that?”
For each result, note three things:
(1) Verdict: Were you recommended, ranked, filtered on fit, warned against, or omitted entirely?
(2) Sourcing: Did the AI cite a specific URL, or is it drawing on training data with nothing attached?
(3) Language: What exact words did it use to describe your advantage — and do they match what your best content actually says?
What you’re building with this log is a gap register. Each gap has a different fix — and closing them systematically is exactly what Citation Optimization is for.
Start here with our Citation Optimization Framework.

Takeaways and Tactics to Consider

Although there are still many more ways to slice this data, I want to stop here and give some takeaways from what I’ve learned.

1. Stop Thinking in Keywords and Start Mapping to Prompts

The biggest takeaway for me is how much the citations differ across prompt types.

If you want to appear as the “best auto rentals in NYC,” you need to understand how users might search for that term, because they don’t just answer the queries; they expand on them.

For instance, if you use a tool like Otterly’s Query Fan Out Generator, you’ll see the ways that Google AI Mode expands on a normal keyword-driven search.

otterly fan out generator

Instead of a simple takeaway like, “you need to get your brand mentioned in listicles,” you can see that it’s actually important to focus on things like:

  • Comparisons between brands
  • Location-based queries
  • Reviews
  • Supportive articles like toll policies for rental cars
  • Discounts
  • Pricing

So the real strategy isn’t just ranking for a keyword, it’s showing up across the fan-out ecosystem of the prompts.

In one study by the Citation Labs team, they tested building third-party comparison microsites to see if it would influence ranking.

Instances in which the microsites were cited helped the client rank better, outperforming instances in which they were not.

So, these LLM models may often favor sources that have already done the comparison work, because that structure makes it easier to reuse when generating recommendations.

Then you can work backward to influence those either through owned or earned channels.

2. You Do Have Some Control Over How You Appear

Although 80% of the citations overall are earned, there are nooks and crannies where you can own your own messaging.

Remember, about 35% of brand awareness citations came from About/Product Pages.

We also saw this with internal newsrooms, where brands published press releases about themselves.

This owned content even extends into social…

3. Your LinkedIn Company Page is a Free Brand Awareness Asset

Building off the previous, when it comes to social, LinkedIn company profiles were virtually the only owned social platforms surfaced.

Meaning, if someone asks AI, “What does [your company] do?”, optimizing your LinkedIn page is a free space to potentially influence LLMs.

I’m also singling out LinkedIn here because there have been numerous recent reports about how often it appears in citations.

Make sure your description, specialties, and about section are written the way you’d want AI to answer that question, not the way a recruiter would read it.

4. ChatGPT is the Most Controllable Platform Through Owned Content

Speaking of control, although ChatGPT may be used way less than Google as a search platform, it is still a very controllable platform.

68.9% of ChatGPT citations in our dataset came from owned content, compared to 35–41% on Google’s platforms.

So, again, the tactical move is to prioritize getting your own site content, like product pages, about pages, and internal newsrooms, optimized for how you want LLMs to present your brand.

5. Citations Aren’t Everything

Although I realize I’m probably undermining the value of this whole study, it’s very important to note that citations are clicked on very little (despite what Google tells us).

But it’s the same argument I’ve always made for featured snippets: even if no one clicks the links shown, you would still rather your brand appear instead of a competitor’s.

Methodology

Using XOFU, an AI citation monitoring tool powered by Citation Labs, we analyzed 4 million citations from across 10 industries and 3,600 prompts (which you can see here) to understand how prompt type influences AI citations in ChatGPT, Google AI Mode, Google AI Overviews, and Google Gemini.

We broke our prompts into three distinct categories:

  • exploratory/informative (aka top funnel queries)
  • evaluative/decision-making (aka bottom funnel)
  • general brand awareness

The industries we looked at were:

  • Business
  • Education
  • Energy
  • Entertainment
  • Finance
  • Healthcare
  • Hospitality
  • Retail
  • Technology
  • Travel

We collected data for a week starting in January. 27, 2026.

You can see all of the citation data here.

Vince Nero

Vince Nero

Vince is the Director of Content Marketing at Buzzstream. He thinks content marketers should solve for users, not just Google. He also loves finding creative content online.

His previous work includes content marketing agency Siege Media for six years, Homebuyer.com, and The Grit Group. Outside of work, you can catch Vince running, playing with his 2 kids, enjoying some video games, or watching Phillies baseball.



Source link