- AI citations require real brand equity—authority, links, and deep coverage of the topic you want to own
- The fastest way to influence AI is to show up in its live “grounding” research, not the training data.
- Content should mirror the model’s reasoning chain, targeting the subtopics and queries it searches.
- In low-competition markets, strong third-party mentions can be enough; competitive queries demand true topical authority.
- In the LLM era, co-mentions on high-ranking pages often matter more than traditional backlinks.
In this episode, I had the founder of SEO platform Clearscope, Bernard Huang, help walk us through how to get your brand cited by AI. After watching a fantastic webinar from Bernard, I wanted to get him on our podcast to pick his brain on getting cited by AI.
Bernard walks through the models’ inner workings and how you can work to get mentioned by tools like ChatGPT or Gemini. Then, most importantly, he shares how digital PR fits into all of it.

Here’s a slightly edited transcription.
What does it take to get into citations?
Bernard Huang (Clearscope) (02:29)
What does it take? Yeah, I think the short story is that it takes brand equity, which means your brand has been around for at least a little while.
It’s contributed to the topic that you want to be associated with, and it has authority, you know, it has links from reputable sources, and it has covered the topic that you want to be cited for comprehensively and well.
That’s it in a nutshell.
What’s the difference between getting cited and getting mentioned in the answer?
Bernard Huang (Clearscope) (03:45)
Sure. Yeah. I think it’s great that you bring it up in a good forum for this audience, who are tuning in to how AI generates responses.
So if you think of yourself as a user and you’re prompting Gemini, ChatGPT, Claude, and so on, you have to ask yourself: what mechanisms does an AI have to generate the answer or the response that you’re going to get back as the end user?
And when you break it down, what you start to understand is that there are three distinct layers in how an AI formulates a response.
At the foundational level, you have the training data that an AI has been trained on. And this is typically billions of different web documents, Common Crawl data, Wikipedia, and whatever secret sauce elements that these different models are using.
And that’s the underlying corpus of information an AI model builds on.
So that’s step one.
Let’s say I’m asking ChatGPT, what are the best SEO tools?
Then ChatGPT has a good understanding of SEO tools, and how all of that’s been discussed.
And it starts, and it looks at its own knowledge for that particular topic.
Step two is very interesting. But this is the reason why SEO plays such a critical role in how AIs generate responses.
And this is the grounding layer, I think, that it’s typically referred to. I call it the validation layer because the agent is doing research on behalf of the prompter, right?
If again, we go back to the best SEO tools, what the agent or the AI model wants to do is to say, okay, well, I have an understanding of what the best SEO tools are, but I need to validate that my understanding is still up to date, fresh and relevant with what is currently on the market.
This is what agents will search, right?
They’ll perform AI web searches for best SEO tools, maybe best keyword research tools, so on and so forth, because they want to augment and make sure that the response that they’re about to give back is up to date on pricing features, you know, different new tools that may have entered the market.
So that’s step two.
Step three is simply the personalization and context layer.
So as you may be aware, the AI models want to make sure that the answers they’re getting back to you are personalized and contextual to what you need.
And so if I’ve had past conversations where I’ve told the model, you know, I work at a software company, and I’m based in Austin, then the AI model will typically remember that again, depending on your settings.
And they’ll use that to infer that the best SEO tools for the B2B software segment are X, Y, or Z.
And that’s where you get the last layer of augmentation before the response gets handed back to you.
So in essence, when we’re thinking about this, and I believe the original question is, OK, well, how do you, say, influence the training data and or win an AI citation?
What we have to start to think about is, well, layer number one, which is the training data set, and layer number two, which is the research that the model does.
OK, how do we then influence training data or the research or the grounding that the agent has to do? What we found is that the fastest way to influence the model is not at the training data layer, because training data is very expensive to create such a big data set and train a new model on, it takes a long time.
So it’s slow.
But to intersect the model when it’s doing research on behalf of the user to make sure that the information it’s about to give out is up to date and relevant.
So what we’re seeing is that you just create relevant high-quality content that targets the AI reasoning chain.
And when I say reasoning chain, it’s just to say that you can peel back the layer on how all of these different like AI models work by looking at how they’re thinking about the prompt that you’ve given it.
And oftentimes it will show you like, okay, well, to find the best SEO tools, I’ve begun my search by searching best SEO tools.
I found that there are, you know, like four different categories. There’s on-page, off-page, technical, so on and so forth.
And it’s like, I am investigating each one by performing additional searches for best technical tools, best keyword research tools.
And then you take that information and you say, OK, well, if I want to be present for that AI model, then I need to create content that targets the way that the AI is researching the topic.
Do you need to build topical authority and relevance through owned content, or can you just earn citations through earned links?
Bernard Huang (Clearscope) (09:47)
Yeah, I think the short answer is that there’s no one-size-fits-all in this particular world.
I say that in the sense that, let’s say you’re a local plumber in a smallish city like Kyle, Texas, or something like that. In that particular world, if there was somebody who was geolocated in Kyle, Texas and they were searching, best plumbers for Kyle tech or they’re not, they didn’t even need the Kyle Texas part.
Cause that’s an inference.
Then it’s possible that simply having a Google My Business and a Yelp page is enough, since there’s not much competition, right?
There aren’t that many plumbers who serve the area of Kyle, Texas.
So I think in those cases, strong third-party signals, whether they be Google My Business, Yelp, Reddit mentions, you know, so on and so forth, I think we’ll go a long way in a not-so-competitive ecosystem and environment.
That said, right, as you’re trying to influence the model on more competitive topics and higher up the funnel, in terms of informational queries, I think the models are going to want more and more demonstrated topic expertise.
It’s like if that plumber in Kyle, Texas wants to be cited for what is a plumber, then you can imagine they’re going to have to do a lot more work to build up the authority on the topic, to create tons of content, to then get citation, citation, to get backlinks and all these things from other sources to basically prove that they are authoritative.
I would say it really depends on your industry.
And it also really depends on what stage of the funnel you want to go after.
But we’ve seen things as simple as creating one piece of content that’s like, I’m a plumber in Kyle, Texas. And that solely being enough for that plumber to start to show up in commercial prompts for that area.
Is informational content helpful if it’s not getting clicked?
Bernard Huang (Clearscope) (13:39)
I think the jury is still out in terms of the influence of informational top-of-funnel content and your sites. And what I mean by that more concretely is, let’s say, NerdWallet, right, who’s an authoritative authority on personal finance, credit cards, so on and so forth.
They in the past would have had tons of informational content on “What is a credit score?”, “How to improve your credit score?”, “What is a travel credit card?” “How is that different than an airline credit card?”
So on and so forth.
Whereas, you know, nowadays people, when they Google, you know, airline credit card versus travel credit card, they get a nice little AI overview that says, well, these are the key differences and this is what you have to pay attention to.
And all of a sudden NerdWallet as the publisher of that content is no longer receiving as much if any, traffic to those pages, right?
Because they end up being a little card that says, yes, that particular piece of content was used as a source. And if you want to check up the sources, here they are.
So that begs the question, is that content still valuable, right?
In the sense that it’s not really getting you much traffic, but it is building authority with the model that your brand, and more specifically your domain, knows what it’s talking about for that particular topic.
All that to say, the jury is still out on how much that is worth.
I have to believe that topic authority, which is your domain’s ability to be visible during the research process that agents take, whether it’s the training data itself or the, you know, grounding with web search that happens, has to be worth something.
I just don’t know, is that 60 %? Is that 40 %? Is that 10 %?
I don’t know.
And I think a lot of people are trying to figure that out, but I haven’t seen conclusive evidence pointing to a strong or weak, like, rating at this particular point.
What’s your take on the depth of content?
Bernard Huang (Clearscope) (17:10)
Yeah, I mean, honestly, it’s the wild wild west when it comes to the AI-like stuff.
And yes, I know exactly the research piece that you’re talking about where, you know, there is an allotment of 2000 characters that I believe Gemini like allocates to its grounding budget to then say that, you know, it can only be influenced up to this amount of characters and then it distributes that by the ranking and if you’re rank number one, maybe you absorb 50 % of that budget to augment the overall response.
I mean, I think it’s not worth getting into the semantics and the micro of it all; I think 2000 today could be 4000 tomorrow, which could be 500, you know, one year from now.
So I think that, you know, when I look at some of the data studies that are coming out, I say, okay, well, you know, at a high level, just taking a couple steps back, it’s clear that AI, you know, wants to produce a highly accurate, highly relevant and somewhat personalized response when it’s prompted with anything that you ask it.
And so, I think you’re going to see varying degrees of budget, just like, okay, how much do I need to rely on your content versus how much should I rely on my own training data to give a response?
I think that should be probably a fluid like sort of analysis where it’s probably looking to see, I’m saying what is the first president of the United States, which is, in essence, a fact, and the AI, when it reads all of the answers on the internet, has come up with a 99.9999 % likelihood of confidence that it’s George Washington, then it’s going to allocate no budget to that particular prompt because it says everything is the same, therefore this answer is always true.
There it is, right?
Obviously, if you were to be able to bluff the model into like, well, George Washington is close to Abraham Lincoln and then convince it to say Abraham Lincoln instead.
I think that would obviously be a huge problem with AI.
Anyway, I think there are a lot of different theories out there.
And I’ll just say that the core theories that exist are actually somewhat disparate.
On one hand, you have, I think, advocates who talk about cosine similarity and knowledge graph, like entity relevance and cohesiveness.
And what that refers to is the idea that to cover a topic comprehensively, there are key entities and subtopics that ought to surround that particular topic. So, you know, trying to go with, I guess, best SEO tools as an example.
A Great piece of content that talks about it, probably has to include Ahrefs, SEMrush, Moz, so on and so forth, because those are the best SEO tools, and the lack of inclusion of one of those particular entities would demonstrate a lack of comprehensiveness for that particular topic.
So one line of thinking says, okay, to vie for inclusion, I not only need to talk about, you know, all of those different things that are known and closely associated with the topic, but you know, I need to add onto that particular list and be, you know, also semantically coherent and relevant.
So then you would say, okay, well, if we’re talking about the best SEO tools, we would add Screaming Frog, ClearScope, or Site Bulb, right?
All of these other like, patterns or iterations that are, close enough related to the main topic, but just trying to basically pack it all in as closely as possible.
That would refer to kind of aspects of what we’re seeing with the chunking, the cosine similarity, the topical relevance, the LSI-type stuff. So that’s one school of thought.
The other school of thought talks about information gain, and information gain at a high level just refers to the fact that topics evolve over time.
And just because the best SEO tools don’t currently include much of the AI visibility stuff on the market, whether it’s Peak, Profound, Scrunch, whatever, it doesn’t mean those aren’t great.
SEO tools because know, AEO or GEO is converging with SEO.
And so a piece of content talking about best SEO tools might include a bunch of different disparate information sources to say, you know, this is a future state of, know, where this topic might be heading.
And so there’s another school of thought that says it’s not about creating more and more tightly and comprehensively packed entities.
But it’s about expanding that and showing that you’re adding to the topic in unique and interesting ways that the topic then gains information on. so anyways, I think those are the two main current themes is that you have a lot of people saying, OK, how do we pack in more entities?
How do we make the chunks and the passages tight, semantically relevant, and make the algorithm happy?
And then you have this other school of thought that’s like, how can we add to this topic in a meaningful way that people aren’t really even thinking about at the moment?
And have that be how we come out as unique and interesting, and something that is AI citation-worthy.
How does digital PR fit into AI citations?
Bernard Huang (Clearscope) (24:54)
Yeah, I mean, in a nutshell, the practices of off-page answer engine optimization are very similar to SEO with a couple caveats.
Caveat number one is that there is a distinction between links and co-mentions.
So what I mean by links in a traditional world, I log into my BuzzStream, I see that I wanna pitch this journalist at Forbes, and I write to them and they say, okay, it’s gonna cost you whatever, $3,000 for an inclusion.
And I’m like, oh, but do I gotta do follow link? And then they’re like, oh, if you wanna do follow link, that’s 5,000. And then it’s this sort of dance.
And then they put it in as sponsored, because you know, that’s what they’re legally obligated to do, even though I want to fight for it not to be sponsored.
In any case, you get this link, and the link passes equity because the crawler, right?
We’re talking about Google here.
The crawler looks at Forbes and says, wow, there’s a link to BuzzStream. Like, okay, well, let me follow that.
They follow that link to the page or homepage it points to and say, okay, well, Forbes is basically voting that BuzzStream—whether it’s the article or the homepage — is good.
And that’s how SEO worked.
But large language models are different. Large language models are, again, corpora of documents that examine associations between topics and entities.
And it’s constantly looking at that. So in the large language model world, or we’ll put on, we’ll say Gemini.
For Gemini, it looks at a document that’s written on Forbes and it says, okay, that’s interesting. And then it sees BuzzStream mentioned alongside entities or topics like journalist outreach or PR or building authority.
And then it essentially creates an internal idea that BuzzStream, as an entity, is related to all those particular topics.
It did not at all need to follow the link that the Forbes article had to BuzzStream to make that inference that BuzzStream is indeed basically, you know, helps with those particular kinds of like topics and build that like relationship. So that’s one of the key differences that we’re seeing play out in terms of digital PR: the link is becoming less important.
And this is why you’re seeing a lot more shady AEO and GEO practitioners going around, like, “Let me just spam Reddit for you.”
When they’re spamming Reddit, all that they’re doing is that, we’ll take ClearScope as an example. They’re just basically trying to inject as many comments as possible that says, ClearScope best SEO tool, best AEO tool. Ideally, that would be the footprint, but then mods and everyone would just ban all those accounts.
So it has to sound somewhat like, you know, human written and whatnot. So they spin it in that way.
But that’s why you see, you know, comment spam being something people recommend and get results with: it’s more about creating linkages within topics and entities rather than through backlinks.
How would a reactive PR campaign benefit AI?
Bernard Huang (Clearscope) (29:47)
Yeah, I think there is like a first-order benefit and a second-order benefit.
So I think the first-order benefit is true for a lot of PR, not just AI.
Like an AI benefit is that you just get the halo effect.
And the halo effect means you have, you know, social media talking about it. You have local news outlets talking about it. And through that, there’s just this general like buzz that everybody picks up on that is generally a good thing. Right. That’s why there’s that statement. All PR is like good PR. So that’s kind of the first-order effect.
I think the second-order effect is that you start to create documents within authorities on the web that create new linkages between your brand and the topics that you care about, for better or for worse. Right? Like this is why, you know, negative PR is a thing.
And this is also why a lot of people who get into, you know, splashes of bad like moments will lean on positive PR to basically shove that stuff down.
I’m sure you’ve probably heard of it, but like, you know, when Neil Patel got sued by like FTX because they did a deal with them and, know, Neil Patel walked away with a bunch of money.
Neil ended up donating some money to some charity, and then if you Google Neil Patel, you would see his donations to the charity rather than the fact that he was being sued by the FTX claims lawyers or whatnot.
And so in any case, what you’re doing as a second-order effect is that you’re just creating documents that are generally from higher authoritative sites.
And then those are being used either during the training data step or at least during the grounding step.
And then that helps influence the model to push you, you know, closer to or away from the entities you would want to be surrounded by.
Vince Nero (32:08)
Because it is a probability play, right?
It’s how likely someone searching for Neil Patel, how likely is the next thing going “Neil Patel plus X”— then fill in the blank that blank— as the LLM kind of tries to answer this it’s all probabilistic, right?
So if all the articles are talking about FTX, then the next words are going to be about FTX.
But if you’ve pushed up all that news about the charity, then the next word is likely to be about the charity. I mean, is that kind of how that works?
Bernard Huang (Clearscope) (32:45)
That’s right. That’s right. Yeah. That’s essentially how it worked in SEO, which is why there is, again, lots of parallels between influencing rankings and influencing algorithm, like AI models.
How would you track success with digital PR and AI?
Bernard Huang (Clearscope) (33:16)
Yeah, that’s a brilliant question.
So I think that the way to track success is to get a prompt tracker, or I mean, you don’t need a prompt tracker, but just like ask Gemini and ChatGPT, what is the best like commercial query that makes sense for your brand and your industry?
Just as an example, with Clearscope, it’d be the best SEO tool.
You know, those would be different commercial questions or prompts that we care about owning and being recommended when users ask for them.
It’s very easy, or you could go to AI mode or Gemini or ChatGPT and just be like, “what are the best X tools or services” or whatever it is that you care about and then see what it responds with.
Then we would say, well, you want to build a PR campaign, which then basically influences the way that the models think about your brand so that you get either recommended or you somehow get up the list, or you’re the sentiment of your brand, like changes, right?
Cause sometimes you’re like, are the best SEO tools?
And then, you’re like way down on like the list.
It’s like, okay, you’re on it, but you’re not high up on the list.
And then, you know, the second thing is that you could be on it, but then it could be like a mixed thing, right? It’s like Clearscope.
Like some say it’s too expensive.
Some say it doesn’t work, right?
Like, so there’s a lot of different ways to like think about visibility, but I mean, the tried and true is find the commercial prompt that you care about and then do a digital PR campaign.
Don’t worry too much about links.
I don’t think links are as important. I would say that I think what’s important is rankings.
So you’re going to be much better off if you can have a digital PR campaign that targets a website that is on the front page of Google already.
So it’s much better to Google best SEO tools, see that there’s a Backlinko article or something like that.
And then try really hard to get Backlinko to add Clearscope, then, like it is to necessarily just kind of cold pitch a bunch of stuff, get some journalists to write about what you’ve done.
I’m not saying that that doesn’t work.
I’m just saying you’re going to get a lot more bang for your buck because that high ranking website is not only going to be crawled faster, it’s going to be seen as more authoritative by the AI models and therefore the weighting that the models give to that particular document is going to be way higher than something that’s a bit more unproven.
How does Clearscope fit into that kind of topical authority workflow?
Bernard Huang (Clearscope) (36:58)
Sure. So again, couple of caveats.
A, I recognize that AI visibility is a fast-moving and fast-changing landscape. know, nothing is completely guaranteed. You know, GPT going from 5.2 to like 6 could just change the entire rules of the game.
And, you know, nothing is, you know, super certain.
Also, Google’s AI mode—I think it’s a matter of when, not if —when that releases, that’s going to have different implications on to, know, how all of this stuff like works.
But, all right.
So, ClearScope, what we do is that we try a lot of different experiments, and then we validate what we’re seeing happen.
And then we do our best to help our customers with the same. And we wrap it all around our software product.
Where we start to have differences in how at least we’re thinking about things is number one, by giving people prompt tracking, which on the surface is fairly commoditized, right?
You could go to any of these prompt-tracking services that keep popping up like every week and charge you, you know, next to nothing.
And all they’re doing is hitting different APIs and saying, you know, was your brand mentioned and was your domain cited, right?
That’s very stock, very commoditized stuff, but we go one level deeper to help you understand how the agents research that particular topic.
So, if we go back to the best SEO tools example, we would have examined the reasoning chain for ChatGPT and Gemini.
And we would say, hey, when Gemini looked deeper into this particular topic, it searched for the best SEO tools for agencies. It searched best technical SEO tools.
And that’s very interesting stuff because that’s content that you should be producing to increase your domain surface area of proving to the model that you should belong closely with the topics that you care about. So that’s pretty unique to what we’re doing and we’re constantly refining, you know, how, that process works.
Step two in this process is to create high-quality and relevant content.
Now, high-quality and relevant content has almost always been subjective.
What makes a piece of content high quality and relevant?
And what our bread and butter is, is helping you understand what that looks like from a comprehensiveness standpoint.
So back to, you know, an earlier statement that I said, there are different schools of thought in terms of what high quality means. In the clear scope world, we grade you on semantic comprehensiveness.
And then we suggest additional entities and concepts you should probably include to make your content even more comprehensive in covering all known entities.
And we help you package all of that together. I got an AI draft builder and, soon, a way to automatically optimize your content for relevance.
Lastly is tracking, right?
Tracking in this new world also looks different.
There has been no shortage of people, you know, just being really frustrated and angry about their loss in clicks.
And it’s because AI’s responsibility is no longer to send the traffic to your site. It is simple to answer the question that the user asks. So traffic is no longer a good sense of demarkating am I doing well in SEO or AEO?
Instead, we have to think about it differently. And the way that we think about it is more or less how everybody else thinks about it. how many times is your brand being mentioned for the topics and prompts that you care about?
And how many pages on your website is being cited for the prompts and topics that you care about? So we give you visibility into that as well.
