The search engine can use data from this website representation vector classification system to return search results.
This classification system may use representations for each of the many websites AN and use the representations to determine a classification for each of the many websites AN.
The search engine decides to use the classification for a search query to select a category of websites with the same or similar classification.
It can return search results from this category of websites.
The classifications of these websites are based on the pattern features that the websites contain.
How are website classifications generated?
This was the part of the patent description that interested me the most.
The part starts by saying that this website view vector austria phone number data classification system could use any suitable method to generate classifications, which gives Google a lot of flexibility.
But then it goes into even more detail by telling us that classification can be based on the content of websites to generate representations of those websites.
This content may include:
Text from the website
pictures on the website
Other website content, e.g. links
Or a combination of two or more of these elements
The patent then provides details on how a neural network is integrated:
The website classification system may use a mapping that maps the website content for the website A to a vector space that identifies a representation for website A.
For instance, the website classification system may use a neural network, that represents the mapping, to create a feature vector A that represents the website A using the content of the website A as input to the neural network.
Labels used in website classification
Website classification can be based on the use of labels. The labels:
may be alphanumeric, numeric or alphabetic characters, symbols or a combination of two or more thereof
Can indicate a type of company that published the corresponding website, e.g. a non-profit or for-profit company
indicates an industry described on the website, e.g. artificial intelligence or education.
Can indicate a type of person who wrote a content, e.g. a doctor, a medical student or a layperson
Could also be reviews that represent a website classification
The scores for classifications could be used:
To meet various thresholds for fulfilling categories
Can be specific to a particular area of knowledge
Can classify websites that cover more than one knowledge domain
can select websites that provide an answer to multiple search queries for specific knowledge areas
provided with the authority of the respective website for the respective knowledge domain
or both
Input data used to classify websites can include things like:
A position of certain words in relation to each other, e.g. that the word "artificial" is generally near or adjacent to the word "intelligence".
Certain phrases contained in the website
For each of the classifications AB, a measure of difference or a similarity measure representing a similarity between the respective classification and the other website
The classification AB, which is most similar
The classification AB with the highest similarity measure or with the shortest distance between the other feature vector and the respective mean feature vector AB, to name a few examples
A relationship between two similarity measures to select a classification for the other website
The patent provides several other ways in which input data can be considered during the classification process
Quality ratings that can be used to classify a website can be the following metrics:
authority
responsiveness to a specific knowledge area
Another feature of the website
Or a combination of two or more of these elements