Software Santa

FREE and/or Open Source Desktop Software! => Office or Productivity Tools => Office/Productivity Applications for Windows => Topic started by: Software Santa on November 18, 2010, 08:30:53 AM

Title: BabelPad is a free Unicode text editor and converter for Windows
Post by: Software Santa on November 18, 2010, 08:30:53 AM
BabelPad is a free Unicode text editor and converter for Windows

http://www.babelstone.co.uk/Software/BabelPad.html (http://www.babelstone.co.uk/Software/BabelPad.html)

Quote
BabelPad (Unicode Text Editor for Windows)


Overview

BabelPad is a free Unicode text editor for Windows that supports the proper rendering of most complex scripts, and allows you to assign different fonts to different scripts in order to facilitate multi-script text editing. BabelPad supports the latest version of Unicode, currently Unicode 6.0.

Summary of Features
User Interface

    * Swap between Edit Mode and Browser Mode :
          o Edit Mode allows documents of any size to be edited in plain text format.
          o Browser Mode allows the current document to be viewed in an Internet Explorer browser window.
    * The user interface menus and other text elements may be displayed in any of the following languages :
          o English
          o Chinese (simplified)
          o Chinese (traditional)
    * Multiple instances of BabelPad may be tiled (horizontally, vertically or patchwork), cascaded, minimized, maximized, restored or closed from the "Window" menu of any open BabelPad window.


File Features

    * Open files encoded as :
          o Unicode : UTF-8
          o Unicode : UTF-16 (Big Endian or Little Endian)
          o Unicode : UTF-32 (Big Endian or Little Endian)
          o Unicode : UTF-7
          o Unicode : SCSU
          o Unicode : CESU-8
          o Unicode 1.0 : UCS-2
          o Unicode 1.1 : UCS-2
          o Unicode 1.1 : UTF-7
          o ISO-8859-1 (Latin1) : Western European
          o ISO-8859-2 (Latin2) : Non-Cyrillic Central European
          o ISO-8859-3 (Latin3) : Esperanto, Galician, Maltese, Turkish
          o ISO-8859-4 (Latin4) : Baltic Rim
          o ISO-8859-5 (Cyrillic)
          o ISO-8859-6 (Arabic)
          o ISO-8859-7 (Greek)
          o ISO-8859-8 (Hebrew)
          o ISO-8859-9 (Latin5) : Improved Turkish
          o ISO-8859-10 (Latin6) : Inuit, Lappish
          o ISO-8859-11 (Thai)
          o ISO-8859-13 (Latin7) : Improved Baltic Rim
          o ISO-8859-14 (Latin8) : Celtic
          o ISO-8859-15 (Latin9, a.k.a. Latin0) : Improved Western European
          o ISO-8859-16 (Latin10) : South-Eastern European
          o Windows CP 874 (Thai)
          o Windows CP 932 (extension of Shift-JIS) : Japanese
          o Windows CP 936 (extension of GB2312) : Simplified Chinese
          o Windows CP 949 (Unified Hangul Code) : Korean
          o Windows CP 950 (extension of Big5) : Traditional Chinese
          o Windows CP 1133 (Lao)
          o Windows CP 1250 (East European)
          o Windows CP 1251 (Cyrillic)
          o Windows CP 1252 (West European)
          o Windows CP 1253 (Greek)
          o Windows CP 1254 (Turkish)
          o Windows CP 1255 (Hebrew)
          o Windows CP 1256 (Arabic)
          o Windows CP 1257 (Baltic)
          o Windows CP 1258 (Vietnamese)
          o EUC-JA (Japanese)
          o EUC-KR (Korean)
          o GB18030 (Extended Chinese) : Unicode-mapped superset of GB2312
          o GB2312 (Simplified Chinese)
          o Big5 (Traditional Chinese)
          o Big5-HKSCS (Big5 plus Hong Kong Supplementary Character Set)
          o Shift-JIS (Japanese) (optionally converting DoCoMo/KDDI/SoftBank emoji extensions)
          o JIS X 0201 (Latin plus Katakana)
          o JIS X 0208 (Japanese)
          o KSC 5601 (KS X 1001) (Korean)
          o Wansung (Korean)
          o Johab (Korean)
          o KOI8-R (Russian)
          o KOI8-U (Ukranian)
          o ARMSCII-8 (Armenian)
          o VISCII (Vietnamese)
          o VIQR (Vietnamese Quoted Readable)
          o TIS-620 (Thai)
          o Mulelao-1 (Lao)
          o TSCII (Tamil)
          o TAM (Tamil Monolingual)
          o TAB (Tamil Bilingual)
          o I.S. 434 (Ogham)
    * Autodetects Unicode encoding forms and character sets declared in HTML or XML documents.
    * Automatically convert CR/LF, CR, LF, Line Separator and Paragraph Separator characters.
    * Option to convert Numeric Character References (NCR) and/or Universal Character Names (UCN) to Unicode characters on Open.
    * Save the current document as :
          o Unicode : UTF-8 (with or without a Byte Order Mark)
          o Unicode : UTF-16 Big Endian or Little Endian (with or without a Byte Order Mark)
          o Unicode : UTF-32 Big Endian or Little Endian (with or without a Byte Order Mark)
          o GB18030 (with or without a Byte Order Mark)
          o ASCII with Hexadecimal Numeric Character Reference (NCR) substitution of non Basic Latin characters
          o ASCII with Decimal Numeric Character Reference (NCR) substitution of non Basic Latin characters
          o ASCII with Universal Character Name (UCN) substitution of non Basic Latin characters
          o ASCII with HTML Entity substitution of non Basic Latin characters
          o SCSU (Standard Compression Scheme for Unicode) [encoder/decoder code kindly supplied by Doug Ewell]
    * Save line breaks as CR/LF, LF, CR, or as Unicode Line Separator [U+2028] or Paragraph Separator characters [U+2029].


Edit Features

    * Left-To-Right (LTR) or Right-To-Left (RTL) page layout.
    * Line Wrap mode or No Line Wrap mode.
    * Drag and Drop editing.
    * Multiple Undo/Redo.
    * Indent and Unindent selected lines of text using TAB and Shift-TAB.
    * Option to Auto-Indent text as you type (useful for writing code).
    * Select a "word" by double-clicking and navigate by "word" by means of the left/right arrows (works for most Unicode scripts).
    * Select a line of text by left-clicking in the margin (select a paragraph by double-clicking in the margin).
    * Find and Replace functions.
    * Transcode from one list of characters or codepoints to another list of characters or codepoints
    * Batch replace one list of text strings with another list of text strings
    * Select default font and font size from dropdown list on the toolbar.
    * Configure individual Unicode blocks to always use a particular font regardless of which font is currently selected for default display.
    * Status Bar displays codepoint and Unicode name of the character at the current caret position.
    * For CJK ideographs the status bar also displays the Mandarin, Korean or Vietnamese reading for the character at the current caret position (choice of reading is user-selectable).
    * Able to open and edit very large (multi-megabyte) files with little degredation in performance.
    * Standard printing functionality enabled.


Text Conversion

    * Case Conversion (covering all scripts that have upper/lower case distinctions, including Latin, Greek, Cyrillic, Armenian and Deseret) :
          o Convert the selected alphabetic text to upper case.
          o Convert the selected alphabetic text to lower case.
          o Convert the selected alphabetic text to title case.
    * Normalization (conforms to Unicode 6.0 normalization algorithm) :
          o Convert the selected text to Normalization Form NFD (cannonical decomposition).
          o Convert the selected text to Normalization Form NFC (cannonical composition).
          o Convert the selected text to Normalization Form NFKD (cannonical decomposition with compatibility characters replaced).
          o Convert the selected text to Normalization Form NFKC (cannonical composition with compatibility characters replaced).
    * CJK Conversion :
          o Convert the selected Simplified Chinese text to Traditional Chinese.
          o Convert the selected Traditional Chinese text to Simplified Chinese.
    * Entity Conversion :
          o Convert all HTML Entities (e.g. ü) in the selected text to Unicode characters.
          o Convert all non-Basic Latin characters in the selected text to HTML Entities or hexadecimal Numeric Character References (NCRs).
          o Convert all Numeric Character References (e.g. ü or ü) in the selected text to Unicode characters.
          o Convert all non-Basic Latin characters in the selected text to hexadecimal Numeric Character References (NCRs).
          o Convert all non-Basic Latin characters in the selected text to decimal Numeric Character References (NCRs).
          o Convert all Universal Character Names (e.g. \u00FC) in the selected text to Unicode characters.
          o Convert all non-Basic Latin characters in the selected text to Universal Character Names (UCNs).
          o Convert all characters in the selected text to their Unicode Names (e.g. LATIN SMALL LETTER U WITH DIAERESIS).
          o Convert the selected Unicode character name to its corresponding character
          o Convert all characters in the selected text to U+XXXX notation (e.g. U+00FC).
          o Convert hexadecimal scalar value in front of the caret to a Unicode character or vice versa by hitting Alt-X (emulates the ALt-X functionality in Micrososft Word).
    * Transliteration Conversion :
          o Convert the selected Extended Wylie Tibetan transliteration to Unicode Tibetan.
          o Convert the selected Mongolian transliteration to Unicode Mongolian.
          o Convert the selected Manchu transliteration to Unicode Manchu.
          o Convert the selected Yi romanisation to Unicode Yi.
          o Convert the selected Yi romanisation to International Phonetic Alphabet (IPA).
          o Convert the selected Unicode Yi text to Yi romanisation.
          o Convert the selected Unicode Yi text to International Phonetic Alphabet (IPA).
          o Convert the selected Vietnamese Unicode text to VIQR transliteration.
          o Convert the selected VIQR transliteration to Vietnamese Unicode.
    * PUA Conversion :
          o Convert precomposed Tibetan (SetA) to standard Unicode Tibetan.
          o Convert standard Unicode Tibetan to precomposed Tibetan (SetA).
          o Convert Hong Kong Supplementary Character Set (HKSCS) PUA characters to CJK Unified Ideograph characters.
    * Reordering :
          o Reverse the order of all selected characters in a line.


Rendering Features

    * Utilises Microsoft's Uniscribe rendering engine to correctly render complex text.
    * Option to render all Unicode characters as individual spacing glyphs (i.e. with no shaping or ligation of complex text, and combining characters not combined).
    * Option to display text in different colours for all the different Unicode-defined scripts.


Input Methods

    * Select any installed Windows Keyboard Layout or IME from a dropdown list on the toolbar.
    * Romanised input methods for the following scripts :
          o Tibetan (using the Tibetan & Himalayan Digital Library [THDL] Extended Wylie Transcription System [EWTS])
          o Manchu
          o Mongolian
          o Uyghur
          o Yi (using the Liangshan Yi Phonetic Alphabet)
    * Unicode Input Mode :
          o Enter Unicode characters in the range U+0001 through U+10FFFF as scalar hexadecimal values (with or without leading zeros), demarcated by pressing the Space or Return key.
          o Select One-off Unicode Input Mode by pressing Ctrl+Q (this allows you to enter a single Unicode character as described above, but on pressing Space, Enter or Escape you are returned to the original keyboard/IME).


Tools and Utilities

    * Font Analysis Utility : lists all Unicode blocks covered by a particular font or lists all fonts that cover a particular Unicode block.
    * Font Information Utility : provides information about the currently selected font.
    * Font Glyph Export Utility : export any or all glyphs from any font to file in BMP, GIF, JPG or PNG format.
    * Font Coverage Utility : List all fonts that cover a particular character or all the characters in a piece of text or all the characters in the BabelMap edit buffer.
    * Advanced Character Search Utility : lists all characters that meet specified criteria.
    * UCD Data Utility : generates UCD-format data for a given range of characters for any version of Unicode.
    * Character History Utility : enumerates the UCD properties for a given character for all versions of Unicode, including mappings to Unicode 1.0.0 and 1.0.1 where appropriate.
    * Han Radical Lookup Utility : lists all Han ideographs with a given radical and number of strokes (covers all 74,616 characters in the CJK, CJK-A, CJK-B, CJK-C and CJK-D blocks).
    * Mandarin Pinyin Lookup Utility : lists all Han ideographs with a given Mandarin pinyin pronunciation.
    * Cantonese Jyutping Lookup Utility : lists all Han ideographs with a given Cantonese jyutping pronunciation.
    * Yi Radical Lookup Utility : lists all Yi syllables with a given radical and number of strokes.
    * Unicode Summary Utility : provides a summary of the script, block and character coverage of the current version of Unicode.
    * Unicode Version History Utility : provides a summary of the repertoire of each version of Unicode from 1.0 onwards.
    * Document Analysis Utility : provides statistical information about the current document, and highlights any invalid characters.
    * Character Frequency Utility : lists all the characters in the document by frequency.


Download
BabelPad Version 6.0.0.1 (supports Unicode 6.0) [2010-10-27]

For an overview of the new features in BabelPad version 6.0.0.0 and subsequent minor updates, please see the BabelStone Blog.

BabelPad is distributed as a single executable (no installer). Simply download the zipped file, and then unzip the file BabelPad.exe to the desired location on your computer. A help file is available, but is currently out of date. Windows 95, 98 and Me are no longer supported, but if you do need a version of BabelPad that runs under Windows 9X/Me, an unsupported build of BabelPad version 1.9.3 is available here.

    * BabelPad.zip (for Windows 2000, XP, Vista or 7) [3,228 KB]


BabelPad is free and fully functional for personal or commercial use, but you are welcome to make a small donation to help support its contiinued development if you want ($5 suggested).


BabelPad Limitations

    * Horizontal scroll is fixed width, and so some extremely long lines may be truncated when not in Line Wrap mode.
    * When in Line Wrap mode, it is not possible to scroll into view the trailing part of a line that is so long that it does not completely fit onto the screen.
    * The Unicode Bidirectional algorithm has not yet been implemented, and so complex bidirectional text may not be displayed as expected. However, simple bidirectional text (e.g. a Hebrew phrase embedded in English text) should display correctly.
    * Line breaking behaviour does not conform to the Line Breaking Properties specified by Unicode.


BabelPad Tips

    * When making global changes to a huge (multi-megabyte) document, first disable Undo/Redo (Options : Edit Options from the menu). This will greatly improve the speed of Replace operations.
    * If you want to convert a large file with a high proportion of characters above U+007F to UCN (\uABCD) format, NCR (ꯍ or Ӓ) format or HTML entity (&entity;) format, then Save As and select "ASCII plus ..." from the encoding dropdown list (and then reopen the file if necessary). This takes a fraction of the time compared with selecting the entire document and using the appropriate function from the Convert menu.
    * If you want to convert a large file with a high proportion of UCN (\uABCD) or NCR (ꯍ or Ӓ) entities to Unicode characters, then check the Convert NCRs and/or the Convert UCNs checkbox when opening the file. This takes a fraction of the time compared with selecting the entire document and using the appropriate function from the Convert menu after the file has been opened.
    * To enter a single Unicode character by hexadecimal codepoint value, press Ctrl-Q and enter the codepoint value followed by Enter or Space.
    * To enter a sequence of Unicode characters by hexadecimal codepoint values, select the Unicode Input Mode (Input : Unicode from the menu or "U+" from the Input toolbar), and type in the codepoint values separated by spaces. Press Ctrl-D to return to the default input mode. Alternativel, type the hexadecimal codepoint value and hit Alt-X.
    * To convert a single Unicode character to its hexadecimal codepoint value, put the cursor immediately after the character to convert, and hit Alt-X.
    * When entering Tibetan, Mongolian, Manchu or Yi text using BabelPad's custom input methods for these scripts or entering Unicode text as scalar values using BabelPad's Unicode input method, you may access the keyboard normally by using the AltGr key (or Ctrl + Alt if your keyboard does not have an AltGr key). For example, when using BabelPad's Tibetan Input Method, pressing the numeral keys will enter Tibetan digits, but holding down the AltGr key at the same time as pressing the numeral keys will allow Arabic digits to be entered instead.
    * If you are working with GB2312-encoded documents open and save your files as GB18030 (GB18030 is a superset of GB2312 that has a one-to-one mapping to Unicode).


Uniscribe Issues

BabelPad uses Microsoft's Unicode Script processor, Uniscribe (filename usp10.dll), to format and render Unicode text. The more recent a version of Uniscribe you have installed on the computer the better support you will have for complex scripts such as Indic and south-east Asian scripts, Tibetan and Mongolian. The version of Uniscribe that BabelPad is using is indicated in the About BabelPad... dialogue box.

Uniscribe should come pre-installed on Windows 2000 and Windows XP, and should also have been installed if you are running Internet Explorer Version 5 or above on other Windows operating systems. However Uniscribe may not be available on some PCs running Windows 95, 98, ME or NT 4.0 that do not have Internet Explorer 5 or above. If when you attempt to run BabelPad, a dialog box entitled "Unable to Locate DLL" with the message "The dynamic link library USP10.dll could not be found in the specified path" appears, this means that Uniscribe is not installed on your PC.

Uniscribe is constantly being updated to support new scripts and to add new functionality to existing script support, so it is important that you have the latest possible version of Uniscribe installed on your PC. Even if you do not use complex scripts, you will only get advanced features for Latin script such as ligatures with a recent version of Uniscribe (to see this try entering <s ZWJ t> with Code2000). You may run BabelPad with a particular version of Uniscribe by simply placing a copy of the Uniscribe file (usp10.dll) in the same directory that BabelPad.exe is located.

Some versions of Uniscribe may have bugs that may produce unexpected rendering behaviour, or even cause BabelPad to crash. Those that I know of are outlined below :

    * Versions of Uniscribe greater than Version 1.405.2416.1 may only work correctly when running under the Windows XP operating system. The following unexpected display behaviour may be observed in BabelPad's character map utility when running under Windows 95/98/ME, Windows NT4 or Windows 2000 with a version of Uniscribe that is greater than Version 1.405.2416.1 :
          o Strange characters displayed in the range U+0000 through U+001F of the Basic Latin block
          o Strange characters displayed in the range U+0080 through U+009F of the Latin-1 Supplement block
          o Only digits displayed for Indic scripts (including Tibetan and Mongolian) if the selected font does not have OpenType tables defined for the particular script
    * Version 1.453.3665.0 has a bug that may cause any application that relies on it to crash if an attempt is made to display any character in the Lao script range.
    * Version 1.460.3707.0 has a bug that may cause any application that relies on it to crash if an attempt is made to display a sequence of 16 or more consecutive Tibetan letters without a break (i.e. a space, tsheg or shad).
    * Some early versions of Uniscribe may crash when an attempt is made to display Arabic text with the "Arial Unicode MS" font.
    * Early versions of Uniscribe have a bug that causes it to return to BabelPad the wrong character position and screen point of Unicode characters outside of the Basic Multilingual Plane when in Right-To-Left (RTL) mode. This means that you may be unable to click on or select text composed of characters from Unicode Planes 1-16 when in RTL layout mode.
    * Version 1.626.7600.16385 that ships with Windows 7 causes any characters in the Supplementary Multilingual Plane (Plane 1) that are nor defined in Unicode 5.1 to be rendered as two square boxes (this affects Unicode 5.2 additions such as Avestan, Egyptian Hieroglyphs, Imperial Aramaic, Inscriptional Pahlavi, Inscriptional Parthian, Kaithi, Old South Arabian, Old Turkic, Enclosed Alphanumeric Supplement, Enclosed Ideographic Supplement and Rumi Numeral Symbols). If you encounter this problem on Windows 7, you can render the affected characters correctly in BabelPad by disabling shaping and joining (Ctrl+0 [zero]); however complex scripts will not render correctly in this mode.

http://www.babelstone.co.uk/Software/BabelPad.html (http://www.babelstone.co.uk/Software/BabelPad.html)