Q. I have a PDF that consists only of images, can VisibleThread parse this?
A. VisibleThread does not support an ability to parse PDFs that consist only of images. Technically, this is because reliably extracting text from images is very difficult. There are however 3rd party conversion utilities that may be of help.
If you have PDFs comprised only of scanned images, what are your options?
- Use a 3rd party utility to convert image to text first, then upload the result. OCR (Optical Character Recognition) technology does exist, however VisibleThread cannot vouch for its reliability. There are many utilities. Our tests indicate that the results can be mixed in terms of accuracy.
The best way to find the available utilities is to google these words (or similar): ‘image conversion pdf to word OCR’
- If it is possible, request the PDF in text form from the issuing authority.