Bots

What is a bot

A bot is a computer program that systematically browses the World Wide Web, typically for the purpose of indexing, for example for search engines. While some bots are more sophisticated than others and can even interpret JavaScript, some features that FIT offers are simply not productive when serving bots.

Detecting a bot

FIT uses regular expressions as well as request header checks to determine if a bot is the originator of a request. When FIT recognizes a client for a bot the Delivery Context Property client/bot is set. You can use the DOM Filter or Text Filter to formulate conditions that base on the respective property.

Delayed Images and Delayed Iframes

While FIT will scale images for bots, delaying images and iframes are deactivated for these clients even if you activate them in the config. The way that delayed images and iframes work, by setting a data-ai-hqsrc attribute containing the actual source URL on image and iframe tags instead of the src attribute, could possibly hide the source URLs from bots. In addition, Delayed Images and Delayed Iframes are a means to improve the user experience and as there is no user when bots request a site we think that putting extra effort into crafting a response is not necessary.

Detection Page

The Delivery Context Detection Page aims at probing client information that is crucial for visual integrity of a site before the main content is requested. This includes but is not limited to the viewport dimensions which are needed to deliver, for example, optimized images. Because bot clients do not surf like real browsers there is a chance that the bot would index the empty detection page instead of the actual content. To prevent it from doing that FIT will not deliver the detection page to bots.

Cloaking

Cloaking means that different content is served to bots and real users. Search engines usually forbid cloaking and questions often arise concerning this subject.

FIT still delivers the same actual content to a bot, like images displaying the same motif, just the technique used for delivering those images changes depending on the client.

So while it is technically possible to cloak content using FIT features like DOM Filter or Text Filter, the automatic disabling of FIT features for bots can not be considered cloaking.

The Robots Exclusion Protocol (robots.txt)

The /robots.txt is used to give the instructions about a site to the web robots/bots. This is called The Robots Exclusion Protocol.

The FIT Server automatically detects and rewrites the URLs of the Sitemap records in the /robots.txt files.