Blog

RT @translator: Help us bring up the quality of in your language by making sure it is translated in full! #99% http ... (1 day ago)

Disallow All and Block Search Engine Spiders using Robots.txt

 

robots.txt

robots.txt

You can literally block any visitor including search engines and secure the data or information you have on your website by the help of .htaccess Deny From All. A similar solution is to have a robots.txt, majorly for search engines.

To disallow all search engine visits and stop the any spider or crawler, create a robots.txt and put the follow text in it:

User-agent: *
Disallow: /

That’s a rather strong statement of close-up, as after you have placed the robots.txt file in the document root of your domain, almost all search engine spiders would stop accessing and indexing your entire site, preventing the precious information that you want to keep private from leaking outside.

Usually, you just want a sub folder / directory under the domain to be excluded from the search engine crawling scope, then below is what you need:

User-agent: *
Disallow: /data/

Similarly, put the robots.txt at the root directory of the domain, and all play-by-the-rules search engines would never break your privacy by accessing http://www.yoursite.com/data/ any more.

2 Comments


  1. Inell Riddell
    Mar 30, 2011

    I used to be very happy to search out this net-site.I wanted to thanks on your time for this wonderful learn!! I positively enjoying every little little bit of it and I have you bookmarked to take a look at new stuff you weblog post.


  2. Sacha Matchen
    Mar 30, 2011

    very nice put up, i definitely love this web site, carry on it

Leave a Reply