The LSI Controversy, That Wasn’t.
April 18, 2009 by Erika
There seems to be a bit of debate on LSI lately. I’m not going to say too much about all the commentary back and forth, because frankly I feel senior Theme Zoom engineer (this guy BUILDS search engines) Kelly Reynolds, said it best when he said,
“Latent Semantic Analysis is an algorithm. An idea. A tool. It has many applications for which it is ideally suited, and some for which it is not. Some of those applications include, but are not limited to (from Wikipedia):
- Compare the documents in the concept space (data clustering, document classification)
- Find similar documents across languages, after analyzing a base set of translated documents (cross language retrieval)
- Find relations between terms (synonymy and polysemy)
- Given a query of terms, translate it into the concept space, and find matching documents (information retrieval)
Obviously if you are a search engine, the information retrieval would be most interesting. If you have a huge amount of research abstracts that you are trying to categorize for research purposes, you’d be most interested in document classification. Cross language retrieval would be right up your alley if you are tracking the history and evolution of Indo-European languages using their dated written histories. If you are trying to discover all of the ways that a particular concept is thought of and referenced, you’d focus on synonymy and polysemy.”
I had to smile when I came across Ferny Ceballos’ blog and the LSI: Passive Agressive Attack or Misunderstanding post, both because of the great personal writing style he always uses, and the graphic image of the bull (doing his business) he used, which of course, I promptly swiped to use on my own blog, too. (I know you won’t mind, Ferny).
~ Technology Goddess





Nice post, I like how you presented that
LSI has been a huge target for chitter chatter in the seo world especially. I have also been building search engines and work in cognitive linguistics,computational linguistics and AI.
LSI in the wikipedia format is usually introduced to students in computing because it’s a nice way of showing how you can classifiy things with machines and find similarity in data. This method though has seen a huge amount of change since the late 80’s and is for the most part heavily modified in systems. One variant includes Plsi for example, which Google like. This does not mean they use it as you find it in the textbook though!
It is indeed a tool, and it’s a very useful one too. I’ve written a lot about the misunderstandings around it on my blog.
I came across yours by accident a really like it, bookmarked!