How can Google identify and rank relevant documents using entities, NLP & vector space analysis?

Reddi1 · Post by **Reddi1** » Thu Jan 30, 2025 6:46 am

This article in my series on semantics and entities in SEO deals with how Google identifies and ranks suitable content for search queries related to entities, using natural language processing and vector space analysis, among other things. To do this, I have worked through over 20 Google patents and other sources and have summarized the extract below.
The Role of Entities in Search
To keep an overview, I would like to begin by summarizing the possible tasks of entities in information retrieval systems such as Google.

The following tasks are necessary for an entity -based information retrieval system

interpretation of search queries
relevance determination at document level
Evaluation at domain level / publisher
Issue an ad-hoc answer in the form of a knowledge panel , featured snippet …
In all of these tasks, the interaction between entities, search queries and relevance of the content must be fulfilled. In the article Semantic Search: Entities in the Interpretation of Search Queries, I went sweden phone number data into detail about how Google can interpret search queries based on entities. This article focuses on determining the relevance of a document in relation to the entities and/or search terms identified in a search query.

The Role of Relevance at Google
As explained in the article Relevance, Pertinence and Usefulness at Google, a fundamental distinction must be made between relevance (objective relevance) , pertinence (subjective relevance ) and usefulness (situational relevance) . In this article, I will only concentrate on the objective relevance of a document, since pertinence and usefulness have more to do with personalization.

The relevance determination takes place in two steps. First, a document corpus of n documents must be determined in relation to a search query. This is usually done using very simple information retrieval processes. The occurrence of the search term or synonyms in the document plays a role here. These documents can then be provided with annotations or comments similar to tags in order to classify them by topic. Theoretically, they could also be commented on with other tags, such as by purpose (sell, advise, inform...). However, this process most likely already takes place when the content is parsed. The document is then available in the index with comments.

When a search query is triggered, the search engine accesses the appropriate corpus of documents including comments. The interpretation of the search query or the search intent plays a crucial role. I have dealt with this in detail in the articles Semantic Search: Entities in the Interpretation of Search Queries and Overview: Search Intent, Search Intent & User Intent .

In the second step, a ranking engine such as the Hummingbird algorithm uses scoring to determine how relevant the document is to the search query. In addition to determining relevance, Google will use additional scoring levels such as timeliness or trustworthiness (trust) as well as authority of the source and expertise (EAT) to determine a ranking. Which of these scoring types is then weighted and how much will probably vary depending on the industry or even the keyword. This scoring is only carried out in real time for the first 30-50 most relevant search results in order to improve speed.

In this post, I will focus on determining relevance at the document level. I will discuss possible assessments of trust and authority (EAT) with regard to entities in another post.

Two main methods are used to determine the relevance of a document.