Saturday, 14 July, 2007

Bulk update

Bulk update

During the past year or so, many web sites that used spamming tactics to push their results into the Google database were implementing ways to automatically generate thousands, in certain cases even millions of pages. These were almost always carrying scraped content, or to put it another way, information that was copied off of other web sites, databases, directories, sometimes combined in a way that they would be hard to detect by the algorithms at that time. Also with the same methods, often unique subdomains were created along the way, trying to evade being filtered by the anti-spam algorithms.

Google as a counter-measure has implemented new ways of identifying irregular site expansion and page generation behaviour, resulting in a new filter that was meant to take a closer look on bulk updates of previously nonexistent URLs, both on new and well established web sites. Should a domain show the symptoms of being used to create a massive, but unoriginal or spam content, the algorithms now take an attempt to not only filter them out from the index, but take preemptive action and block their entry into the index altogether.

Known issues

Any web site that launches a number of pages that is irregular in the history of its domain is probable to be closely examined. Web sites that were producing new content at a certain pace, suddenly expanding in a much more rapid way, new web sites that are launched with several thousands of pages, and web sites that are re-designed and thus show content on thousands of new URLs seem to be affected by this new practice as well. In the end however, all valid URLs that are not seen as an attempt for spam are usually accepted into the index.

+ Resolution: In case you'd like to be exempt from such examinations, you should avoid bulk updates of thousands of new pages, and update your web site gradually instead. However this practice is not a penalty, but a simple precaution from Google, so that the quality of the search results may remain at their optimal level, excluding spam pages. The period for which the pages are examined for rarely seems to take longer than reasonable, and well established web sites will most likely not see an overall re-evaluation of all content because of such updates. The examination itself is meant to check whether such pages are an urged attempt to artificially build relevance, boost PageRank, provide non-unique context for advertisements, or are valid resources that are meant to serve visitors - and content that is to stay on the web site. Adding a massive amount of new URLs that were meant to do either but the last, may temporarily lower domain related parameters and cause a visible drop in rankings for a period of time. As the content and history of the newly added pages build, they will gradually allow the web site to regain this trust partially or completely.

No comments:

Enter your email address:

Delivered by FeedBurner