Ellié Computing Merge allows you to compare ASCII, MBCS
(multi-byte character set), and Unicode text files.
The following settings apply to both text files when
compared individually and text files within folders. If you
specify the settings to folders, all the files will
automatically use these settings rather than the default
options set in the application.
Ellié Computing Merge uses the Unicode (UTF-8) format
for its in-memory representation of the files that it opens
because it is practical and generally spares memory (with
respect to UTF-16 or UTF-32). Codepages are used at load- and
save-time to translate files from their disk-based single- or
multi-byte character set representations into Unicode and vice
versa.
Once a file is opened, you can use the Set text
output options... command from the Browse
button or Side menu to change the
codepage used to save the file.
ASCII and MBCS
Around one hundred differents encodings are supported
Unicode
Unicode files typically start with a two-byte signature
(0xFFFE or 0xFEFF) or three-byte signature (0xEF, 0xBB, 0xBF)
called the BOM (Byte Order Mark) that can be used by
applications to determine that a file uses Unicode. The
signatures for UTF-32 and UTF-7 are also detected. The
identifier is not visible in text editors. This identifier
can be generated when Merge saves a Unicode text files.
By
default, the encoding used to load files is taken from the
Application options box in Edition panel. You
can use the Reopen with encoding... menu item
from Browse button menu to specify how to
interpret input text. Merge can detect almost all UTF
variants with or without their signatures (not UTF-7 without
the signature). If the detection fails, you can still use the
box to specify the correct encoding (either by preventing the
detection by checking Ignore signatures and/or by specifying
the correct encoding in the combo box).