Skip To Main Content
llm optimization

Optimizing for LLMs is a Myth. Understand the 4 Pillars of Retrieval Systems.

TL;DR: There’s a lot of noise and hand-waving in the SEO space right now, with experts claiming to be able to “increase LLM visibility” or optimize for LLMs. But that’s not at all how AI systems work. In this article we break down the concept of retrieval mechanisms, and what it takes to be considered for retrieval and citation in a synthesized response.

Over the past couple of years, marketing teams watched as organic traffic declined steadily and significantly, despite having strong organic rankings. The root cause of this is the general shift from people relying on traditional search engines, to instead performing more complex queries in their AI-Search tool of choice (which often ends in a zero-click path). And so naturally the questions marketers ask is: “how do we ensure our brand achieves and maintains visibility?”.

Unfortunately, due to this demand, the SEO community jumped into the world of AI to develop playbooks and ‘best practices’, and the result has been a tremendous amount of hand-waving and noise. One of the most common misunderstandings we see is the notion of “optimizing for LLMs (Large Language Models)”, as though by making adjustments to content on a website, one can impact an LLM’s index and thereby increase visibility. 

But this is a very basic misconception. LLMs are exactly what the name implies: “models” that are pretrained and static (with continual updates) - they do not browse the live web. Instead, AI search tools (e.g. ChatGPT Search, or Google AI Overviews, Perplexity) use a Retrieval-Augmented Generation (RAG) architecture. They leverage a separate, frequently updated index (a ‘Retrieval Mechanism’) and then feed the response into the LLM to synthesize an answer.

In non-technical terms: when you’re using ChatGPT and it’s confident that it can answer from its training data, it does. However, when it’s not confident or the answer requires searching (e.g. it’s time-bound), the retrieval layer will then find relevant pages to extract the information to answer the user’s query. 

So, the real question marketers should be asking is: how do we ensure that our content is cited and found for those types of queries? And that’s where the EACA model comes in!

Eligibility / Authority / Compressibility / Association (EACA)

The commonly accepted model for achieving visibility within retrieval systems. We’ll break down each of these as they pertain to content, website technologies and marketing tactics. 

Eligibility

To understand Eligibility, one must understand the very basic capabilities of a crawler. Essentially Eligibility is like a ‘gate’ for your content - if your content is not eligible, nothing else matters. There are a number of server, client and code issues that could prevent a crawler from being able to discover, fetch, render and understand your pages.

There are many areas to check to assess your site’s Eligibility. For example, many companies have aggressively moved to block crawler traffic with robots.txt, meta tags or WAF configuration. There are also many subtle areas that may not be as obvious, such as rendering methods in modern web development frameworks. We had a client come to us in a panic because they had relaunched a massive content site and visibility had fallen off. When we dug into it, they had used a client-side rendering library, effectively replacing 800,000 pages of HTML with a single line of JavaScript (hint: this is bad for Eligibility). 

When auditing your site for Eligibility, here’s a simple checklist your technical team can adhere to (and sample areas to dig into):

  1. Can the site/page be fetched? (Check robots.txt, auth paywalls, WAF configuration, dynamic routes and status codes)

  2. Can a crawler render my page(s)? (Are we using optimal rendering such as SSR, is the content actually present with the HTML, confirm there’s no interaction gating)

  3. Can the desired content be extracted? (Does the front-end use clean semantics, minimal boilerplate content, transcripts for video/audio, text not lost in images or PDF)

  4. Should this content even exist?(Is it unique content with minimal duplication, is canonicalization configured)

Authority

Since LLM’s are employed as “Answer Engines”, that means that hallucinations are deadly to their continued usage. Simply put, Authority is a measure of the likelihood that an LLM will trust and propagate your content when generating answers, based on confidence in the truthfulness and expertise of your brand. 

The good news here is that this is very much unlike SEO, where brands with greater domain strength can dominate SERPs, while small brands and start-ups struggle to compete (Authority is less mechanically tied to domain metrics).

So how do you establish Authority? By means of highly specific content, on-site authorship and off-site reputation building. 

On-site Tactics

Ultimately, the goal of AI-Search optimization is to have your content/brand cited for complex and highly-specific queries. Therefore, the most straightforward approach is to produce exactly that - highly specific content with precise language and definitions. Additionally, articles should be associated with actual authors whose bios are tied to subject-matter expertise.

Bottom line: large volumes of generic ‘content marketing’ (fluff) content written for the purpose of garnering organic traffic is at best ineffective, but at worst potentially harmful. 

Off-site Tactics

For years now SEO specialists have been accustomed to ‘link building’ and guest posting, as a means to manipulate search rankings based on calculated authority. However, with LLM’s, it’s not the actual link itself that matters (or even the non-linked mention), but rather the context and the reputation of the linking page/entity. 

In a nutshell, establishing Authority within the EACA paradigm means having actual subject-matter expertise, producing useful and specific content and then doing the hard work on establishing partnerships and mentions from credible sources. 

Compressibility

At the surface, Compressibility appears to be about computation time/resources. This is often understood to be advantageous to crawling. E.g. if we have a lighter front-end, our content is faster (cheaper) to consume, and therefore likely to be favoured by cloud-based AI retrieval mechanisms. 

However, cost is only a bi-product of Compressibility. Compressibility does reduce extraction and inference cost as well as storage and retrieval overhead. But that’s not the purpose of ensuring Compressibility. Ultimately, its real job is to ensure accuracy, that generalization of your content doesn’t destroy truth or introduce incorrect patterns. In Machine Learning (ML) terms, Compressibility is about reducing semantic entropy; making content easier to summarize, embed, and retrieve without losing factual precision.

Guidelines for Content Editors

Here is a non-exhaustive list of guidelines when producing content:

  • Is the primary concept in the first 1-2 paragraphs?

  • Are headings clear and descriptive (not clever marketing speak)?

  • Do case study outcomes align with product USP claims?

  • Are capabilities separated from benefits?

  • Can this page be summarized in 5 bullets?

Association

Within the EACA paradigm, Association is the ‘glue’ that makes the other three factors work. It’s not sufficient to only be considered and trusted. It’s also critical that an LLM understand what you’re good for, the problems you’re solving and who you compete with, in order to assess which prompts you should appear in and be cited for solving.

What can you do to establish the desired Association graph for your brand?

Narrow your Category

One of the most effective ways to establish association is to understand what your core USP is, and focus purely on that product/service category. For agencies this can be a challenge, as the traditional approach is akin to being a ‘general contractor’. But for AI-search agents to consider citing your brand or content, it’s critical to have a clear focus on where your strengths lie. 

Be Explicit with Language

Put yourself in the shoes of someone who needs your product or service. What business problem are they looking to solve? What personal goal or need are they looking to fulfill? Craft your content with those requirements in mind, and specifically how you solve them. Include tangible metrics, facts and data points where possible.

Showcase Real Work

Since LLM-driven “answer engines” require confidence level thresholds when selecting which brands’ content they will cite, showcasing real-world customer/client work makes your claimed relationship credible. This is very relatable, as humans are also trust-based - seeing real client testimonials of 9thCO’s work makes our capability claims easily trusted.

Schema Markup

People mistakenly think that employing schema markup will make AI-crawlers somehow trust their services or products above competitors. In reality, implementing schema markup increases Association by reducing inference cost, providing clarity and reinforcement for existing content by surfacing machine-readable relationships.

One word of caution: incorrect or over-engineered schema markup can introduce ambiguity, so “measure twice cut once”.

Conclusion

As people increasingly rely on AI chat agents to perform complex search queries, customer journey and evaluation cycles will continue to shorten. This represents an exciting opportunity for marketers and trustworthy brands to establish and increase visibility. 

There is no magical black-box of tricks and tactics to game the system and “optimize for LLMs”. The key is to understand the fundamental technical considerations and data structures that Retrieval mechanisms evaluate, and to optimize your web presence accordingly!

Contact Us

Feel free to reach out.