FAQs: Excluding WWW Search Robots
How can I keep WWW search robots from indexing my e-Library pages?
A WWW robot is a program that automatically traverses the Web’s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
Robots can be used for a number of purposes, including the following.
• | Indexing |
• | HTML validation |
• | Link validation |
• | “What’s New” monitoring |
• | Mirroring |
WWW robots can potentially swamp system resources with rapid-fire requests or by retrieving the same files repeatedly. Some users have found WWW robots creating new Web sessions every 2-5 seconds, and after the operating system’s resources are exhausted, the workstation and e-Library servers halt.
To avoid problems with WWW robots accessing and indexing e-Library pages, create a robot exclude file. For more information about WWW robots, refer to the web site: http://info.webcrawler.com/mak/projects/robots/norobots.html.
A file named robots.txt should be placed in the root directory for the web pages. In its simplest form, which prevents access by all robots, the file requires only two lines.
User-agent *Disallow: /