Follow

VisibleThread Web cannot scan a site because the 'robots.txt' is not allowing it,what can I do?

 

The problem:

You have attempted to scan a site in VisibleThread Web but it is reporting that the site is blocking crawlers because of its 'robots.txt' settings.

 

What does this mean?

A 'robots.txt' file is a set of rules the owner of a website puts in place to prevent unauthorized 'crawling' of the site by 'robots'. 'Robots' here meaning automated engines to discover web content, e.g. Google, or VisibleThread Web. If a site contains a rule blocking crawlers the VisibleThread Web application will not be able to navigate the site to retrieve content.

 

What can I do to overcome this issue?

To enable crawling by the VisibleThread Web application you will need to get the site WebMaster to add an entry to the end of your robots.txt to allow VisibleThread Web scan your website :

User-agent: ClarityGrader

Allow: /

 

VisibleThread Web will refresh it's robots cache after 10 minutes so it will work for you after that period.


 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Article is closed for comments.