Department of Engineering

IT Services

End-of-line characters in text files

Suppose in an editor you open a new file, type A, press the Enter key, then type BC and save as raw ASCII. What's in the file and how big will the file be? The answer depends on the type of machine you're using. Usually -

  • In unix the file will be 4 bytes long, the values being: 65 10 66 67. The '10' denotes end-of-line (EOF). There's no end-of-file character.
  • In Windows the file will have 5 bytes: 65 13 10 66 67. The '13' and '10' represent Carriage Return and LineFeed.
  • On old Macs, the file will have 4 bytes: 65 13 66 67.

These differences don't cause trouble until you copy files between different types of machines. Even then you might well not have trouble because programs that use text-files are aware of the differences. If you're programming at a low level however, you may need to take care.

Note also that some programs expect the final line of a file to end with an EOF token. C++ include files, for example, should end with an EOF token.

Converting

  • On linux, dos2unix converts plain text files in DOS format to UNIX format, and unix2dos does the inverse. See the manual pages for examples.
  • emacs tries to detect the "coding system" of files loaded in, and lets you choose the "coding system" of saved files. Options include mac, dos and unix, but also many other variations such as iso-latin-9-with-esc-dos, iso-2022-7bit-unix, etc, which take account of the charset as well as the end-of-line convention. Emacs will usually display the file in a readable way, saving the file in its original format by default.

See also

Wikipedia's Newline page offers more detail and history.