What are vector embeddings? And why will they expand beyond keywords in search ranking signals?
Vector embeddings are a core element of how LLMs match results to a search, and how LLMs attempt to gauge meaning. In this post, you’ll learn all about vector embeddings, how AI tools use them, and how you can leverage them to get your brand and content in front of your target audience.
Table of Contents
What are vector embeddings?
In the past, the only game in town for SEO was keyword ranking. This $75 billion industry was built on the idea that you could game Google and other search engines into ranking your corporate website higher than the competition’s site. And, while the tricks largely worked, they also created a host of other problems, from those terrible recipe sites to keyword-stuffed blog posts, and more.
But LLMs don’t use keywords. At least, not the way you’re thinking.
Let’s back up for a moment and look at keywords. A keyword like “GEO” could mean anything from geography, to that old GM car no one drives anymore, to our particular use of “generative engine optimization.” Same thing for “cloud” (internet computing or suspended particles in the air?), or really so many words and phrases. That makes keywords a really bad gauge of semantic intent.
Enter “vector embeddings.”
Vectors, from a technical perspective, are lists of numbers [1, 0.9, 4000, -6, …] that can in this case go into hundreds or even thousands of dimensions. But that’s just the technical piece, in sort of the same way that “01100001” represents the letter “a” in binary. From a marketing perspective, the numbers for a vector are meaningless.
What’s important, though, is that each (extremely long) vector represents the semantic meaning of a word, phrase, or piece of text. So, with vector embeddings, the LLMs are able to, for example, easily distinguish between “the shore” and “to shore up” because while they are similar from a keyword standpoint, they occupy very different spaces on the multi-dimensional semantic map. “The shore” would be semantically close to “the beach” and “to shore up” would be close to “propping up.” You get the idea.
Why searches based on vector embeddings are superior to keywords alone
The below three examples represent questions you could type into Perplexity, ChatGPT, or other LLMs that have no keyword overlap, but would be very close to each other in the kind of semantic mapping identified by vector embeddings.
- “How do I get better visibility in AI search?”
- “How can I optimize for ChatGPT mentions?”
- “What helps with generative engine ranking?”
From the perspective of an LLM, they all ask the same question. So, it makes sense that an LLM would want to answer them in the same way, pulling from the same training data.
The above is why vector embeddings are better than keywords: It allows the LLM to get closer to underlying intent, answering a question based on the best information, not which website has gamed the SEO system the best.
Plus, vector embeddings allow the LLM to understand complicated long-tail queries by stringing together these vectors. Take for example the query:
“What’s the best low-cost ebike that goes 20mph?”
There’s a lot of different elements to this question (ebike + low cost + speed) that makes keywords ill-suited to reply. However, an LLM trained on all sorts of data can retrieve an answer. And with larger datasets (that you as a marketer can provide!) you can help LLMs deliver answers that align with the messages and content you want to get out into the market. Everyone wins!
How do vector embeddings factor into AI SEO?
By adjusting your SEO strategy to take into account how vector embeddings work, you’re in a better position to get LLMs to crawl your content, train models on your website, and present your company/brand/products in answers to relevant inquiries.
Earlier, I noted that keywords are easy(ish) to “game.” That doesn’t mean there aren’t ways to influence AI results, too. The strategies are different, but here are a few AI SEO examples:
Semantic density
There’s a way to write for GEO that’s really different from SEO. The content is written for humans, by humans, including rich language full of subtopics, synonyms, and authority. The days of keyword stuffing and writing for the Google bots are done.
Chunky content
When you type a question into an LLM like ChatGPT, both your query and all potential text passages are transformed into those vectors I mentioned above. Those groups of vectors are called “chunks.” And when an LLM is off retrieving chunks that are similar to the chunk you typed in, they rank order them so they can deliver a chunk that’s most relevant to your query.
How do you write content that’s in a chunk for LLMs? Think of it as bite-sized information. Clear headings, organization, bite-sized responses. Content that’s right for humans and machines. Content that LLMs can isolate and evaluate on the paragraph level. You don’t want to be writing meaningless platitudes (AI isn’t going to see this as novel or informative), but you also don’t want to write like James Joyce, where the LLM struggles to understand your underlying meaning or themes. (Joke: Q. What do you, me and LLMs have in common? A. None of us understood Finnegan’s Wake. Errr… well, ok, at least I didn’t.)
Content planning, schema design, and more
When it comes to tactics for GEO around vector embeddings, there’s actually a lot more that I can cover in this thousand-word post. On Marketing can help with content planning, linking (internal and external), topical modeling, schema design, and a lot more. We can also let you know what you’re doing well and where you need improvement.
How do AI SEO agencies optimize for LLMs?
An AI SEO agency like On Marketing is the best way to get the expertise marketers need to create LLM-forward strategies. As a next step, contact us for an audit or visit our services page to understand how we can help.