Author Topic: Capture2Text enables users to quickly OCR a small portion of the screen  (Read 3337 times)

0 Members and 1 Guest are viewing this topic.

Software Santa

  • Administrator
  • *****
  • Join Date: Dec 2006
  • Posts: 5238
  • Operating System:
  • Mac OS X 10.6 Mac OS X 10.6
  • Browser:
  • Firefox 24.0 Firefox 24.0
Capture2Text enables users to quickly OCR a small portion of the screen



Quote
Description

Capture2Text enables users to quickly OCR a small portion of the screen and, by default, save the result to the clipboard. Supports 50+ languages including Chinese, English, French, German, Japanese, and Spanish. Portable and does not require installation.

What is Capture2Text? Capture2Text enables users to do the following:
 
  • Optical Character Recognition (OCR)
     Allows the user to quickly snapshot a small portion of the screen, OCR it and (by default) save the result to the clipboard.
  • Speech Recognition
     Using speech recognition the user can speak into their microphone and Capture2Text will convert the speech to text. If the speech recognition technology is not 100% sure, Capture2Text will present the user with a list of the most likely transcriptions. The selected result will (by default) be copied to the clipboard.
Conceptual illustration:
  Download The latest version can be found on the Capture2Text download page hosted by SourceForge. Source code is included.
 How to Install
  • Unzip the contents of the zip file. Make sure that there are no Asian or other non-ASCII characters in the path where you unzipped it. Also, if you are on Windows 7, don't unzip it to the Program Files directory (this will avoid issues related to write privileges).
  • Double-click on Capture2Text.exe. You should see the Capture2Text icon on the bottom-right of your screen (though it might be hidden in which case you will have to click on the "Show hidden icons" arrow).
OCR Capture2Text can OCR the following languages:
 
Afrikaans    Frankish  Maltese   
Albanian     French    Norwegian
Ancient GreekGalician  Polish   
Arabic       German    Portuguese
Azerbaijani  Greek     Romanian 
Basque       Hebrew    Russian   
Belarusian   Hindi     Serbian   
Bengali      Hungarian Slovakian
Bulgarian    Icelandic Slovenian
Catalan      IndonesianSpanish   
Cherokee     Italian   Swahili   
Chinese      Japanese  Swedish   
Croatian     Kannada   Tagalog   
Czech        Korean    Tamil     
Danish       Latvian   Telugu   
Dutch        LithuanianThai     
English      MacedonianTurkish   
Esperanto    Malay     Ukrainian
Estonian     Malayalam Vietnamese
Finnish      Maltese            
By default only Chinese, English, French, German, Japanese, and Spanish are installed.
To acquire other languages:
 
  • Download the appropriate OCR language dictionaries from http://code.google.com/p/tesseract-ocr/downloads/list. These files end in ".tar.gz" (ex. tesseract-ocr-3.02.rus.tar.gz).
  • Open the ".tar.gz" file you just downloaded with 7-Zip or similar decompression software and navigate to the directory that has the file that ends in ".traineddata".
  • Drag the ".traineddata" file (and any other file in this directory) to this path in the Capture2Text directory: Capture2Text\Utils\tesseract\tessdata
  • Restart Capture2Text
Note: Arabic and Hindi are more CPU intentive and will thus be slower to OCR.
OCR Usage
Press the OCR capture key (default: Windows Key + Q) to start the capture. Now, using your mouse, resize the capture box over the area of the screen that you want to OCR. A preview of the captured OCR'd text will appear in the top-left corner of the screen. Press the capture key again or the left mouse button to complete the capture. The captured screen area will be OCR'd and the textual result will be stored in the clipboard by default.
To cancel an OCR capture, press Esc.
To move the capture box, hold down the right mouse button and drag the mouse.
To nudge the capture box, use the arrow keys.
To toggle the active capture box corner, press the space bar.
To change the OCR language, right-click the Capture2Text tray icon, select the OCR Language option and then select the desired language.
To quickly switch between 3 languages, use the OCR language quick access keys: Windows Key + 1, Windows Key + 2, and Windows Key + 3.
When the Tesseract versions of Chinese or Japanese is selected, you should specify the text direction (vertical or horizontal) using the text direction key: Windows Key + W. The text direction will not have any effect on the NHocr Chinese or NHocr Japanese dictionaries.
Using the Preferences dialog, you can change the following OCR settings:
 
  • OCR Hotkeys.
  • Current OCR Language.
  • The 3 Quick-Access OCR Languages.
  • Capture Box color and opacity.
  • Enable/Disable the preview box and change its colors, font and opacity.
  • Change the text direction (used for Chinese and Japanese).
  Speech Recognition Capture2Text can perform speech recognition for the following languages:
 
AfrikaansFrenchPolish
ChineseGermanPortuguese
CzechItalianRussian
DutchJapaneseSpanish
EnglishKoreanTurkish
Speech Recognition Usage
Press the speech recognition capture key (default: Windows Key + A) to start the capture. You will see a box that says "Recording..." in the top-left corner of your screen. Speak a word or phrase or sentence into your microphone. Capture2Text will automatically recognize when you are done speaking and will display a box that says "Analyzing...". The speech recognition will take a couple of seconds. When the speech recognition is complete you will see a list of possible transcriptions to choose from. When you choose a transcription, it will be stored in the clipboard by default.
When the results windows is displayed, you can press Enter to select the first transcription or use the number keys (1-9) to select the corresponding transcription.
To cancel a speech recognition capture, press Esc.
To change the speech recognition language, right-click the Capture2Text tray icon, select the Speech Recognition Language option and then select the desired language.
To quickly toggle between 2 languages, use the speech recognition language hotkey: Windows Key + 4.
Using the Preferences dialog, you can change the following speech recognition settings:
 
  • Speech recognition Hotkeys.
  • Current speech recognition Language.
  • The 2 speech recognition languages to toggle between.
  • The properties of the Results window (font, color, number of results).
  • How much silence to wait for before recording stops.
  Output Options By default, the OCR'd or speech recognized text will be placed in the clipboard.
You also have 3 more ways to output the text.
To send the text to a pop-up window you can right-click the Capture2Text tray icon and select Show Popup Window.
To send the text to whichever textbox currently contains the blinking cursor/I-beam, right-click the Capture2Text tray icon and select Send to Cursor.
Advanced: To send the text directly to a window/control (for example, Notepad++), first fill in the Send to Control settings in the Preferences dialog. Once this is done you may enable/disable the option by right-clicking the Capture2Text tray icon and selecting Send to Control.
Using the Preferences dialog, you can change the following output settings:
 
  • Text to prepend/append to the captured text.
  • Enable/Disable outputting to the clipboard.
  • Enable/Disable outputting to a popup window.
  • Popup window properties (default width and height).
  • Enable/Disable sending the output text to the cursor.
  • Enable/Disable outputting to a control.
  • Additional command to send to the output control.
  Configuration Right-click the Capture2Text tray icon in the bottom-right of your screen and then select the "Preferences..." option to bring up the Preferences dialog.
 Substitutions Sometimes Capture2Text consistently makes the same OCR mistakes such as recognizing an "M" as "I\/|".
By editing the subtitutions.txt file in the Capture2Text directory, you may tell Capture2Text to substitute one text string for another text string.
Just find the appropriate language section and add one substitution per line in this format:
 from_text = to_text
Example (adding 3 substitutions to the English section): English:I\/| = M>< = Xsome%space%text = some_text To create a substitution regardless of language, add the substitution to the "All:" section.
Special tokens and escape characters:
 
%space%Space character
%tab%Tab character
%eq%Equals (=)
%perc%Percent sign (%)
%lf%Linefeed character (\n)
%cr%Carriage return character (\r)
You may disable a substitution by adding a "#" in front.
When done editing substitutions.txt, either restart Capture2Text or switch language for the substitutions to take effect.

https://sourceforge.net/projects/capture2text/

http://capture2text.sourceforge.net/


 

Software Santa first opened on January 1st, 2007
Now celebrating 16 Years of being a Digital Santa Claus!
Software Santa's Speedy Site is Proudly Hosted by A2 Hosting.

Welcome Visitor:





@MEMBER OF PROJECT HONEY POT
Spam Harvester Protection Network
provided by Unspam



Software Santa Welcome Page

The Software Santa Privacy Policy