What Is Google LSI
As always, the information and questions you find on forums can be highly inaccurate and misleading. When it comes to Google’s ranking algorithm there are a few known tidbits and a huge amount of speculation and rumors that although sometimes true can be highly misleading.
In the beginning search engine spiders would look only for the presence and frequency of keywords on a web page to determine that pages relevancy. As search began to grow it became clear that this type of approach would yield poor search results. A good example is the engine could match “car” and “automobile” but fail to recognize homonym (words with multiple meanings) such as “mouse” and “keyboard.” Latent Semantic Indexing (LSI) is a new approach at understanding not only keywords, but the context in which they are used on the entire web page.
Using statistical analysis (LSI) can look at pages that have words which are often used in the same context. Lets say “apple” and “computer” are keywords, “Mac OS” is also there and are therefore also relevant. Another way to look at this is determining whether a page is about “windows” the operating system, or an invention to throw things out of your car. LSI technology is about looking more into the context of indexed content, and allowing a more natural method of search to happen. Consequently, the technology is not only used by Google but other search engines as well.
An excerpt from Google’s LSI Patent that gives the basic key requirements of their LSI technology:
“The system is further adapted to identify phrases that are
related to each other, based on a phrase’s ability to predict
the presence of other phrases in a document. More specifically,
a prediction measure is used that relates the actual co-occurrence
rate of two phrases to an expected co-occurrence rate
of the two phrases. Information gain, as the ratio of actual
co-occurrence rate to expected co-occurrence rate, is one such
prediction measure. Two phrases are related where the prediction
measure exceeds a predetermined threshold. In that case, the
second phrase has significant information gain with respect to
the first phrase. Semantically, related phrases will be those
that are commonly used to discuss or describe a given topic or
concept, such as “President of the United States” and “White
House.” For a given phrase, the related phrases can be ordered
according to their relevance or significance based on their
respective prediction measures.”
From a webmasters perspective there’s not much to worry about. If you are creating quality content with a theme, your rankings will most likely just improve. Those at risk as always are those looking to game the indexing system with keyword stuffing, or over working their keyword density until the page is no longer natural context. Randomly inserting keywords into an article / website will no longer get you those top rankings. In fact, over optimizing and duplicate content could not only hurt your rankings but be the death of them. This this big change in search, its likely to have an effect on the way people create content. Again, the idea is to theme your content (more coming soon.)
That should give you the overview of what exactly LSI technology is. The general idea as always from the search engine’s end is a new way to bring natural search to the table, cut down on spam, and those who are faking their way to the top. If you want more information on LSI I recommend reading through that patent up above. There’s a crap load of information in it.
I’ve always had good rankings for just being real and giving good content.
lol that’s one sweet google evil robot spider thingy picture.
I m running real estate business and interested to gather knowledge about internet marketing sutff and search engine, about Google LSI don’t know much but i got clear in my mind about LSI, txh for informative article.