Author Topic: PDF OCR X FREE converts the 1st page of your PDFs into text documents  (Read 2480 times)

0 Members and 2 Guests are viewing this topic.

Software Santa

  • Administrator
  • *****
  • Posts: 4280
PDF OCR X converts the 1st page of your PDFs into text documents for Windows XP or later - or Intel Mac, OS X 10.5 or Later

Yeah, the Community version, that is the FREE Version, ONLY Converts the first page of a multi-page PDF to editable text. They want $30 for the version that converts ALL the pages at once.
Well, why not use a program (like the FREE ImagesFromPDF Program for the Mac) to cut a multi-page PDF file into a bunch of single pages first, and then convert all of them? There's a way around paying $30 for it!

Quote
Convert Scanned and Image PDFs into Text Documents:

PDF OCR is a simple drag-and-drop utility for Windows or Mac OS X, that converts your PDFs into text documents. It uses advanced OCR (optical character recognition) technology to extract the text of the PDF even if that text is contained in an image. This is particularly useful for dealing with PDFs that were created via a Scan-to-PDF function in a scanner or photo copier.

Free
Community Version
Limited to PDFs of 1 page or less
   

Simple drag-and-drop utility for Mac OS X and Windows
   Supports over 20 languages..
   Also works with images such as GIF, JPEG, TIF, BMP, PSD, PNG, and more...
   Converts to text or searchable PDF (New in version 1.9)
   
   Convert Scanned and Image PDFs into Text Documents
PDF OCR is a simple drag-and-drop utility for Mac OS X and Windows, that converts your PDFs and images into text documents or searchable PDF files. It uses advanced OCR (optical character recognition) technology to extract the text of the PDF even if that text is contained in an image. This is particularly useful for dealing with PDFs that were created via a Scan-to-PDF function in a scanner or photo copier.

Requirements
Intel Mac running Mac OS X 10.5 or higher.
Windows XP/Vista/7 with Java 1.6 or higher installed.

How it Works    
»    Drag the PDF that you want to convert onto PDF OCR X.
»    Select your conversion settings (e.g. language, output format, etc...)
»    PDF OCR X converts your document to text or searchable PDF.
   
Installation
»    Download PDF OCR and place it in your Applications folder.
»    Drag the PDF OCR icon down to your dock so that you can easily access it.
»    Now you can drag your PDFs onto the PDF OCR icon on your dock to have it converted to text.
   
Features
»    Works with any PDF, whether it is a scanned PDF, or a PDF generated from a document.
»    Easy drag and drop interface.
»    Support for multi-column documents and advanced formatting (New in version 1.3)

Disclaimer
PDF OCR uses OCR (optical character recognition) to convert images of text into text. While the technology is quite good at deciphering legible text, there are limitations and some text may not be extracted correctly.


Using PDF OCR X

PDF OCR X is a very simple application. There is only one dialog box that allows you to choose your input and output settings. This dialog appears after you drag a PDF on the PDF OCR X icon to be converted

About these options

    Output Format: Text will result in plain text output. Selecting "Searchable PDF" will embed the text in the PDF so that it is searchable.
    Language: The language that the source document is in. Some languages include special characters and it helps PDF OCR X to know what the language of your source document is for maximum accuracy. Download additional language packs for PDF OCR X here.
    Layout: If your document is formatted in a single column with flowing text, then you should select the "Single Column" layout option as it is faster than the multi-column option. If, however, your document is formatted in multiple columns or sections, you should select the "Multi-Column" option, as this will instruct PDF OCR X to try to guess the structure of the document and detect where columns begin and end.
    Text Wrap:
        Soft wrap: Assume that the text is meant to flow from one line to the next in most cases.
        Hard wrap: Forcefully add line breaks at the end of each line, even if it may occur mid-sentence.

 
Disabling Interactive Mode

As of version 1.9.7, PDF OCR X allows you to disable interactive mode. This means that you are able to skip all of the settings dialogs that pop up during conversion. The settings from your last conversion are automatically used for conversion when interactive mode is disabled.

Steps to disable Interactive Mode

On Mac, you can select "Preferences" from the "Apple" menu. Then uncheck the "Use Interactive Mode" checkbox.

After this box is unchecked, you will no longer be shown the settings dialog when you select a file for conversion. You can always reactivate interactive mode by returning to this same dialog and re-checking the box.





PDF OCR X Language Packs

As of version 1.4, the default installation of PDF OCR X includes support for only English. However you can select from any of the languages packs and add support for your copy of PDF OCR X by simply downloading the appropriate file and dragging the contained .tessdata file onto your PDF OCR X application Icon.

Installation Instructions

    Download the language that you want to install from the table below. This should result in a file like xxx.traineddata.zip
    Unzip the file. This should result in a file like xxx.traineddata.
    Drag this file onto your PDF OCR X Application Icon (same way you convert PDF files to text).
    If everything went OK you should receive a message saying that the language has been installed correctly.

That's it! The next time you try to convert a PDF with PDF OCR X, you should see this new language listed in the list of available languages.


Acknowledgements:

PDF OCR X uses the following software libraries:
1. ImageMagick
2. Tesseract-OCR
3. jsypt
4. ICU4J

http://solutions.weblite.ca/pdfocrx/
« Last Edit: February 23, 2014, 04:49:58 PM by Software Santa »

 

email