In
computer science,
data is anything in a form suitable for use with a
computer. Data is often distinguished from
programs. A program is a set of
instructions that detail a task for the computer to perform. In this sense, data is thus everything that is not program
code.
In an alternate usage,
binary files (which are not
human-readable) are sometimes called "data" as distinguished from human-readable "
text". The total amount of digital data in 2007 was estimated to be 281 billion
gigabytes.
Data vs programs
Fundamentally, computers follow the instructions they are given. A set of instructions to perform a given task (or tasks) is called a "
program". In the nominal case, the program, as
executed by the computer, will consist of
binary machine code. The elements of
storage manipulated by the program, but not actually executed by the
CPU, contain data.
Typically, different
files are used to store programs vs data.
Executable files contain programs; all other files are
data files. However, executable files may also contain data which is "built-in" to the program. In particular, some executable files have a
data segment, which nominally contains constants and initial values (both data).
For example: a
user might first instruct the
operating system to load a
word processor program from one file, and then edit a
document stored in another file. In this example, the document would be considered data. If the word processor also features a
spell checker, then the dictionary (word list) for the spell checker would also be considered data. The
algorithms used by the spell checker to suggest corrections would be considered
code.
The line between program and data can become blurry. An
interpreter, for example, is a program. The
input data to an interpreter is itself a program—just not one expressed in native
machine language. In many cases, the interpreted program will be a human-readable
text file, which is manipulated with a
text editor—more normally associated with
plain text data.