Top

Google Blog Search: Fixing Blog Indexing Problem Caused By RSS

December 3, 2008 by Erika · 2 Comments 

Over the last few months Google Blog Search has had some strange anomalies in the blog search results and indexing that left many blog owners scratching our heads. The good news is that Google is aware of the issues and a fix is in the works, according to Jeremy Hylton of the Google Blog Search Team.

The problems have stemmed from Google Blog Search indexing blogs primarily by RSS feeds only, which meant that for blogs that only have a “partial” feed (such as title only or just the introduction and not the whole post), Google Blog Search would ONLY index the partial portion.  Many blogs have the partial feed enabled because they are running the feeds as post titles or summaries through social apps like FriendFeed and profile pages.

The result of Google Blog Search indexing by RSS feeds meant that any links or text that was in the rest of the post was not avalable through blog search, although the full posts were still being indexed by web search.  This caused major discrepencies in search, indexing and ranking results.

Jeremy Hylton of the Google Blog Search Team says that they will now index the FULL content of the blog page, even if the blog publishes only a partial feed. BUT that this means it will also now index the non-post parts of your blog pages too.

How is this currently effecting your blog indexing? Well, because your non-post parts of the pages are also indexing, you may notice that your tags are also now sometimes indexing as pages, because anytime a blog publishes a new post Google Blogsearch is picking up the new page the post is in, including the sidebar details.

This also causes challenges for people who have alerts set up or do searches on themselves, their sites or their brand because you may get an alert or a search result that shows you are on the post, but when you go to the site you cannot find yourself anywhere on the page.

Jeremy says they are aware that indexing the entire page the post is on is not a perfect or long-term fix, but it is better than indexing by the RSS feed only.

“We do expect to fix the problem you are seeing. We’ll use the full page content, but exclude the content that isn’t really part of the post. I’m not sure if we’ll be able to make the change before the end of the year, but we are working on it and are pretty confident that it can be solved,” said Jeremy.

He also adds that once the blog indexing problems are fixed they will post an update at Google.

“We have changed the way we index blog posts to include the full content of the page. We’ve had occasional complaints about the use of the feed content, particularly the problem with partial feeds. The indexing change has improved the results for a lot of queries, both because we have the full content of the page and because we extract links that are missing from the feeds. The downside of this change is that we see more results that match only the blogroll and other parts of the page that are common to all of a blog’s posts,” explains Jeremy Hylton.

Jeremy also adds that that, “The algorithm will be improved to exclude “the content that isn’t really part of the post” to make the results more useful.”

Google almost always does a great job of picking up on major problems like this one and getting a fix out while keeping us informed of what the issue is.

In my case, I wish I had known this two weeks ago, as I had a few “late nights” up scratching my head trying to figure out why the strange indexing and search results were coming back on my blogs. It’s good to know that the strangeness in the blogosphere index will soon start to stabilize!

Erika- Technology Goddess Search Marketing News

More Google News on Technology Goddess

Additional Resources on this Article can Be found at:

Google Operating System
Nine By Blue
Google Blog Search Discussion Forum

Filed Under: Google
Tagged: , , , , , , ,

Bottom