Major search engines improve Sitemap protocol

The rare collaboration between search rivals Google, Yahoo and Microsoft over site maps has yielded its first result.

Share

The rare collaboration between search rivals Google, Yahoo and Microsoft over site maps has yielded its first result.

The vendors announced they have enhanced Sitemap, a protocol designed to simplify how webmasters and online publishers submit their sites' content for indexing in search engines.

Along with the improvements, the vendors also announced that InterActiveCorp's Ask.com will support the protocol, which thus gains backing from another major search engine operator. IBM also signed up to support the effort.

In November, Google, Yahoo and Microsoft agreed to support Sitemap, an open-source protocol based on Extensible Markup Language (XML).

A site map is a file that webmasters and publishers put on their sites to help the search engines' automated web crawlers properly index web pages. The Sitemap protocol aims to provide a standard format for site maps, which should simplify their creation by web publishers and their discovery and interpretation by search engines.

The vendors announced that the Sitemap protocol, now in version 0.90, provides a uniform way of telling search index crawlers where site map files are located on a site.

All web crawlers recognise the robots.txt instruction, which tells crawlers not to index certain information, so now webmasters can indicate the location of their site map file within robots.txt files. Meanwhile, the protocol's official web site is now available in 18 languages.

Venus Swimwear, a £50m US retailer of swimsuits and beach-related apparel, expects to benefit from the new feature, formally called "autodiscovery," which it is already implementing. The company manually re-submits its site map to search engines when the site changes, but pointing at it in the robots.txt file should ensure the crawlers automatically find the latest version every time, said Rhea Drysdale, a Venus Swimwear e-commerce analyst.

"It's very useful in that it automates the process. There are some weeks when we'll forget to resubmit the site map, and we tend to update our web site weekly. By having [the site map address] on autodiscovery, it notifies them that [the site map] is here and whenever you come to our site, to please take a look at it. If there are changes, they should be able to pick those up quickly," Drysdale said. She also welcomes Ask.com's support for the protocol.

John Honeck, a mechanical engineer who runs several small sites and blogs in his spare time, including his personal blog, also predicts the new feature will be helpful to webmasters. "Anything that is standardised is helpful for the webmaster. We can spend more time on our sites and less time worrying about setting up different accounts, verification processes, and submissions for all of the multitude of search engines out there," he said.

However, Honeck feels the vendors could clarify some points about the autodiscovery feature, such as how it will work in sites with multiple site maps. Privacy issues may also crop up, because pointing at the site map from the robots.txt file makes the information more easily accessible. "While not normally a problem, it could cause a security risk. As search engines can crawl your site more efficiently, so can scrapers and bad bots as well," wrote Honeck.

More information about the enhancements can be found in official blog postings from Yahoo and from Google.

The Sitemap protocol was originally developed by Google and is offered under the terms of the Attribution-ShareAlike Creative Commons License.

Find your next job with computerworld UK jobs