Follow

Scanning for Broken Links with VisibleThread Web

As part of processing the content of pages during a scan, VisibleThread Web finds all '<a>'  and '<img>' tags and validates the destination resource.

Why should I care about broken links

Broken links are not uncommon on websites, and can damage your reader's experience and hurt search rankings. Activities such as content migration, product rebranding, changes in organisational structures etc. can often result in broken or obsolete links.  

How VisibleThread Web classifies links that it checks

VisibleThread Web uses three classifications to group links that it has checked:

'OK' Links

Links that appear with a status of 'OK' are links that VisibleThread Web resolved successfully. These links are working as expected. Requests that return the following HTTP Status codes are classified as being 'OK':

  • 200 OK
  • 401 Unauthorized : in this situation VisibleThread Web could not retrieve the resource content as it was protection and required authentication. We classify this as 'OK' as the destination resource exists, but requires authentication.
  • 403 Forbidden : In this situation VisibleThread Web could not retrieve the resource content as it was forbidden to do so by the destination server. We classify this as 'OK' as the destination resource does exist, but VisibleThread Web is forbidden to access it.

'Suspect' Links

Links that are classed as 'Suspect' are links that VisibleThread Web had a problem resolving, however the error encountered is likely to be a temporary issue, or a destination server configuration issue rather than an incorrect link. It is likely that if these links were tested some time later they would resolve successfully. Suspect links still merit investigation but are not as serious as Broken links. Link requests that return the following HTTP status codes are classified as being 'Suspect':

  • 500 Internal Server Error : This means that the destination server encountered an internal error while attempting to resolve the link. This usually occurs where there is infrastructure or configuration issue with the destination server, for example a database crash. 
  • 429 Too Many Requests : This means that the destination server has blocked, or has limited VisibleThread Web's access to the resource. Web Server administrators sometimes rate limit or block crawlers from accessing parts of their web sites. These links are classified as 'Suspect' as VisibleThread Web is unable to verify them due to the destination server configuration.
  • 667 Timeout : 667 is a HTTP code specific to VisibleThread Web. We use it to identify requests that time out. These links are classified as 'Suspect' as VisibleThread Web is unable to verify them as all requests to retrieve them timed out.

'Broken' Links

Links that could not be resolved by VisibleThread Web are classified as 'Broken' links. These links are likely to be permanent issues and should be fixed as soon as possible. 

 

A link is reported as broken, however when I try it in browser it works fine

We have put a lot of effort into ensuring the accuracy of our link checking results, however there may be cases when VisibleThread Web is reporting a broken link that either works okay in your web browser. There are a number of reasons why this might be the case:

  • The webpage or link has been modified since the VisibleThread Web scan was run
  • The webpage may be returning an error code  “404”, which effectively says “Page not found”, even if it looks like its returning a valid page. This is a technical problem with that website which would negatively impact SEO and should be fixed.
  • The link URL that is specified in the web page source HTML is being manipulated by Javascript prior to being rendered by the browser. For example, a web page may contain the following html code "<a href='http://www.visiblethread.com/_locale/products>" , however when this is processed and rendered by a web browser is it changed to "<a href='http://www.visiblethread.com/en_us/products>". Javascript manipulation of link URL's prior to rendering the webpage will typically cause link checkers to report these links as broken.
  • The link URL resolves to a resource that is only available to user from certain regions (GEO restricted resources). For example the link 'https://play.google.com/store/magazines/details/Beautiful_Kitchens?id=CAowp5vxAw' will resolve correctly for users in the U.S. but will return a '403 Forbidden' status for users in other regions.

 

A link is reported as broken, however I can't find the link in the web page

There are a number of reasons why this might be the case:

  • The webpage or link has been modified since the VisibleThread Web scan was run
  • The link URL that is specified in the web page source HTML is being manipulated by Javascript prior to being rendered by the browser. For example, a web page may contain the following html code "<a href='http://www.visiblethread.com/_locale/products>" , however when this is processed and rendered by a web browser is it changed to "<a href='http://www.visiblethread.com/en_us/products>". Javascript manipulation of link URL's prior to rendering the webpage will typically cause link checkers to report these links as broken.

 

 

 

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Article is closed for comments.