Previously we talked about how to add a website to Google’s indexing service. Now what if you have content that you don’t want to be found by the search engines? You could have various reasons like security or privacy so in this case, NOT being found by the search engines actually makes sense.
In the world of search engines there is something called Robots Exclusion Protocol. If a search engine’s crawler sees that their robot is not allowed to crawl the site, then it will not index your web content.
To use robots.txt create a file called robots.txt inside your web root folder. For example, if your domain address is http://mydomain.com , you should see the robots.txt file in this URL http://mydomain.com/robots.txt
There are two important entries to the text file. The user-agent and the directory to allow or disallow. Here are good examples taken from www.robotstxt.org:

Remember that the search engine’s robot still decides whether to follow the instructions in your robots.txt file or not. Malware robots can ignore your robots.txt, but you can expect credible search engines like Yahoo! and Google to follow the standards on robots.txt.
Ben Carigtan shows you how it’s done.







Be The First To Comment
Please Leave Your Comments Below