
Text file, displayed by the command
cat in an
xterm window
In
computing,
plain text is a term used for the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to
formatted text.
The
encoding has traditionally been either
ASCII, one of its many derivatives such as
ISO/IEC 646 etc., or sometimes
EBCDIC.
Unicode is today gradually replacing the older ASCII derivatives limited to 7 or 8 bit codes. It will probably serve much the same purposes, but this time permitting almost any human language as well as important punctuation and symbols such as mathematical relations (≠ ≤ ≥ ≈), multiplication (× •), etc, which are not included in the more restricted ASCII set.
Usage
The purpose of using
plain text today is primarily a "lowest common denominator" independence from programs that require their very own special encoding or formatting (with due sacrifices and limitations). Plain text files can be opened, read, and edited with most
text editors. Examples include
Notepad (
Windows),
edit (
DOS),
ed,
vi,
vim or
Gedit (
Unix,
Linux),
SimpleText (
Mac OS), or
TextEdit (
Mac OS X). Other computer programs are also capable of reading and importing plain text.
It can also be used by simple computer tools such as line printing text commands like
type (
DOS and
Windows) and
cat (Unix).
Plain text files are almost universal in
programming; a source code file containing instructions in a
programming language is almost always a plain text file. Plain text is also commonly used for
configuration files, which are read for saved settings at the startup of a program.
Plain text is the original and ever popular method of conveying
e-mail.
HTML formatted e-mail messages often include an automatically-generated plain text copy as well, for compatibility reasons.
Encoding
Character encodings
Text was once commonly encoded in
ASCII, using 8
bits for one letter or other character, encoding 7 bits, allowing 128 values, and using the 8th as a checksum bit when transferring a file. This just allowed the ordinary
Latin alphabet, transfer control codes, parentheses and interpunction, which annoyed especially Portuguese and Swedish computer users. Therefore, when data transfer became more stable, the remaining 128 values were encoded, everywhere differently, and in a way that made multilingual texts impossible to encode. At last
Unicode was defined, which currently allows for 1,114,112 code values used for any modern text writing system, and a lot of extinct ones. For example Unicode codes Chinese, Hebrew, Cyrillic as well as Latin. Some of these text formats may be quite complicated to process correctly, but they still contain no structural data, such as bold start and end markers, and are therefore plain text.
Control codes
The ASCII codes before
SPACE (=
32 =
20H) are not intended as displayable characters, but instead as
control characters. They are used for diverse interpreted meanings. For example, the code
NULL (=
0, sometimes denoted
Ctrl-@) is used as string end markers in the programming language C and successors. Most troublesome of these are the codes
LF (=
LINE FEED =
10 =
0AH) and
CR (=
CARRIAGE RETURN =
13 =
0DH). Windows and
OS/2 require the sequence
CR,LF to represent a newline, while
Unix and relatives use just the
LF, and Classic Mac OS (but not Mac OS X) uses just the code CR. This was once a slight problem when transferring files between Windows and Unices, but today most computer programs treat this seamlessly.See also
- Plaintext, most commonly used in a cryptographic context
Category:Computer file formats
de:Plaintextes:Archivo de textofa:نوشته سادهfr:Fichier texteid:Teks biasanl:Platte tekstja:プレーンテキストno:Ren tekstpt:Texto planoru:Текстовые данныеth:เพลนเท็กซ์