 |
Department of Engineering |
 |
 |
End-of-line characters in text files
Suppose in an editor you open a new file, type A, press the Enter
key, then type BC and save as raw ASCII. What's in the file and how
big will the file be? The answer depends on the type of machine you're
using. Usually -
- In unix the file will be 4 bytes long, the values being: 65 10 66 67. The '10' denotes end-of-line (EOF). There's no end-of-file character.
- In Windows the file will have 5 bytes: 65 13 10 66 67. The '13' and '10' represent Carriage Return and LineFeed.
- On old Macs, the file will have 4 bytes: 65 13 66 67.
These differences don't cause trouble until you copy files between different
types of machines. Even then you might well not have trouble because
programs that use text-files are aware of the differences. If you're
programming at a low level however, you may need to take care.
Note also that some programs expect the final line of a file to
end with an EOF token. C++ include files, for example, should end with an EOF token.
Converting
- On linux, dos2unix converts plain text files
in DOS format to UNIX format, and unix2dos
does the inverse. See the manual pages for examples.
- emacs tries to detect the "coding system" of files loaded in,
and lets you choose the "coding system" of saved files. Options include
mac, dos and unix, but also many other variations such as iso-latin-9-with-esc-dos, iso-2022-7bit-unix, etc, which take account of the charset as well as the end-of-line convention. Emacs will usually display the file in a readable way, saving the file in its original format by default.
See also
Wikipedia's Newline page
offers more detail and history.