
A peer-to-peer system of nodes without central infrastructure.

Centralized server-based service model.
A
peer-to-peer, commonly abbreviated to P2P, distributed network architecture is composed of participants that make a portion of their resources (such as processing power, disk storage or network bandwidth) directly available to other network participants, without the need for central coordination instances (such as servers or stable hosts). Peers are both suppliers and consumers of resources, in contrast to the traditional
client-server model where only servers supply, and clients consume.
Peer-to-peer was popularized by
file sharing systems like
Napster. Peer-to-peer file sharing networks have inspired new structures and philosophies in other areas of human interaction. In such social contexts,
peer-to-peer as a meme refers to the
egalitarian social networking that is currently emerging throughout
society, enabled by
Internet technologies in general.
Architecture of P2P systems
Peer-to-peer networks are typically formed dynamically by
ad-hoc additions of nodes. In an 'ad-hoc' network, the removal of nodes has no significant impact on the network. The distributed architecture of an application in a peer-to-peer system provides enhanced scalability and service robustness.
Peer-to-peer systems often implement an
Application Layer overlay network on top of the native or physical network topology. Such overlays are used for indexing and peer discovery. Content is typically exchanged directly over the underlying
Internet Protocol (IP) network.
Anonymous peer-to-peer systems are an exception, and implement extra routing layers to obscure the identity of the source or destination of queries.
In
structured peer-to-peer networks, connections in the overlay are fixed. They typically use
distributed hash table-based (DHT) indexing, such as in the
Chord system (
MIT).
Unstructured peer-to-peer networks do not provide any algorithm for organization or optimization of network connections. In particular, three models of unstructured architecture are defined. In
pure peer-to-peer systems the entire network consists solely of
equipotent peers. There is only one routing layer, as there are no preferred nodes with any special infrastructure function.
Hybrid peer-to-peer systems allow such infrastructure nodes to exist, often called
supernodes . In
centralized peer-to-peer systems, a central server is used for indexing functions and to bootstrap the entire system. Although this has similarities with a structured architecture, the connections between peers are not determined by any algorithm. The first prominent and popular peer-to-peer
file sharing system, Napster, was an example of the centralized model.
Gnutella and
Freenet, on the other hand, are examples of the decentralized model.
Kazaa is an example of the hybrid model.
P2P networks are typically used for connecting
nodes via largely
ad hoc connections. Sharing content files (see
file sharing) containing audio, video, data or anything in digital format is very common, and real time data, such as
telephony traffic, is also passed using P2P technology.
A pure P2P network does not have the notion of
clients or servers but only equal
peer nodes that simultaneously function as both "clients" and "servers" to the other nodes on the network. This model of network arrangement differs from the
client-server model where communication is usually to and from a central server. A typical example of a file transfer that is not P2P is an
FTP server where the client and server programs are quite distinct: the clients initiate the download/uploads, and the servers react to and satisfy these requests.
The P2P
overlay network consists of all the participating peers as network nodes. There are links between any two nodes that know each other: i.e. if a participating peer knows the location of another peer in the P2P network, then there is a directed edge from the former node to the latter in the overlay network. Based on how the nodes in the overlay network are linked to each other, we can classify the P2P networks as unstructured or structured.
Structured peer-to-peer systems
Structured P2P network employ a globally consistent protocol to ensure that any node can efficiently route a search to some peer that has the desired file, even if the file is extremely rare. Such a guarantee necessitates a more structured pattern of overlay links. By far the most common type of structured P2P network is the
distributed hash table (DHT), in which a variant of
consistent hashing is used to assign ownership of each file to a particular peer, in a way analogous to a traditional
hash table's assignment of each key to a particular array slot.
Distributed hash tables
thumb|300px|Distributed hash tablesDistributed hash tables (DHTs) are a class of decentralized
distributed systems that provide a lookup service similar to a
hash table: (
key,
value) pairs are stored in the DHT, and any participating
node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to
scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
DHTs form an infrastructure that can be used to build peer-to-peer networks. Notable distributed networks that use DHTs include
BitTorrent's distributed tracker, the
Kad network, the
Storm botnet,
YaCy, and the
Coral Content Distribution Network.
Some prominent research projects include the
Chord project, the
PAST storage utility, the
P-Grid, a self-organized and emerging overlay network and the
CoopNet content distribution system (see below for external links related to these projects).
DHT-based networks have been widely utilized for accomplishing efficient resource discovery for grid computing systems, as it aids in resource management and scheduling of applications. Resource discovery activity involve searching for the appropriate resource types that match the user’s application requirements. Recent advances in the domain of decentralized resource discovery have been based on extending the existing DHTs with the capability of multi-dimensional data organization and query routing. Majority of the efforts have looked at embedding spatial database indices such as the Space Filling Curves (SFCs) including the Hilbert curves, Z-curves, k-d tree, MX-CIF Quad tree and R*-tree for managing, routing, and indexing of complex Grid resource query objects over DHT networks. Spatial indices are well suited for handling the complexity of Grid resource queries. Although some spatial indices can have issues as regards to routing load-balance in case of a skewed data set, all the spatial indices are more scalable in terms of the number of hops traversed and messages generated while searching and routing Grid resource queries.
Unstructured peer-to-peer systems
An unstructured P2P network is formed when the overlay links are established arbitrarily. Such networks can be easily constructed as a new peer that wants to join the network can copy existing links of another node and then form its own links over time. In an unstructured P2P network, if a peer wants to find a desired piece of data in the network, the query has to be
flooded through the network to find as many peers as possible that share the data. The main disadvantage with such networks is that the queries may not always be resolved. Popular content is likely to be available at several peers and any peer searching for it is likely to find the same thing. But if a peer is looking for rare data shared by only a few other peers, then it is highly unlikely that search will be successful. Since there is no
correlation between a peer and the content managed by it, there is no guarantee that flooding will find a peer that has the desired data. Flooding also causes a high amount of signaling traffic in the network and hence such networks typically have very poor search efficiency. Many of the popular P2P networks are unstructured.
In
pure P2P networks: Peers act as equals, merging the roles of clients and server. In such networks, there is no central server managing the network, neither is there a central router. Some examples of pure P2P
Application Layer networks designed for file sharing are Gnutella (pre v0.4) and Freenet.
There also exist
hybrid P2P systems, which distribute their clients into two groups: client nodes and overlay nodes. Typically, each client is able to act according to the momentary need of the network and can become part of the respective
overlay network used to coordinate the P2P structure. This division between normal and 'better' nodes is done in order to address the scaling problems on early pure P2P networks. Examples for such networks are for example Gnutella (after v0.4) or
G2.
An other type of hybrid P2P network are networks using on the one hand central server(s) or bootstrapping mechanisms, on the other hand P2P for their data transfers. These networks are in general called 'centralized networks' because of their lack of ability to work without their central server(s). An example for such a network is the
eDonkey network (eD2k).
Indexing and resource discovery
Older peer-to-peer networks duplicate resources across each node in the network configured to carry that type of information. This allows local searching, but requires much traffic.
Modern networks use central coordinating servers and directed search requests. Central servers are typically used for listing potential peers (
Tor), coordinating their activities (
folding@home), and searching (
Napster,
eMule). Decentralized searching was first done by flooding search requests out across peers. More efficient directed search strategies, including supernodes and distributed hash tables, are now used.
Many P2P systems use stronger peers (super-peers, super-nodes) as servers and client-peers are connected in a star-like fashion to a single super-peer.
Peer-to-peer-like systems
In modern definitions of peer-to-peer technology, the term implies the general architectural concepts outlined in this article. However, the basic concept of peer-to-peer computing was envisioned in earlier software systems and networking discussions, reaching back to principles stated in the first
Request for Comments, RFC 1.
A distributed messaging system that is often likened as an early peer-to-peer architecture is the
USENET network news system that is in principle a client-server model from the user or client perspective, when they read or post news articles. However,
news servers communicate with one another as peers to propagate
Usenet news articles over the entire group of network servers. The same consideration applies to
SMTP email in the sense that the core email relaying network of
Mail transfer agents has a peer-to-peer character, while the periphery of
e-mail clients and their direct connections is strictly a client-server relationship.
Tim Berners-Lee's vision for the
World Wide Web, as evidenced by his
WorldWideWeb editor/browser, was close to a peer-to-peer design in that it assumed each user of the web would be an active editor and contributor creating and linking content to form an interlinked
web of links. This contrasts to the
broadcasting-like structure of the web as it has developed over the years.
Advantages and weaknesses of P2P networks
In P2P networks, all clients provide resources, which may include
bandwidth, storage space, and computing power. As nodes arrive and demand on the system increases, the total capacity of the system also increases. This is not true of a client-server architecture with a fixed set of servers, in which adding more clients could mean slower data transfer for all users.
The distributed nature of P2P networks also increases robustness, and—in pure P2P systems—by enabling peers to find the data without relying on a centralized index server. In the latter case, there is no
single point of failure in the system.
As with most network systems, unsecure and unsigned codes may allow remote access to files on a victim's computer or even compromise the entire network. In the past this has happened for example to the
FastTrack network when anti P2P companies managed to introduce faked chunks into downloads and downloaded files (mostly
MP3 files) were unusable afterwards or even contained malicious code. Consequently, the P2P networks of today have seen an enormous increase of their security and file verification mechanisms. Modern
hashing,
chunk verification and different encryption methods have made most networks resistant to almost any type of attack, even when major parts of the respective network have been replaced by faked or nonfunctional hosts.
Internet service providers (
ISPs) have been known to throttle P2P file-sharing traffic due to the high-bandwidth usage . Compared to Web browsing, e-mail or many other uses of the internet, where data is only transferred in short intervals and relative small quantities, P2P file-sharing often consists of relatively heavy bandwidth usage due to ongoing file transfers and swarm/network coordination packets.
A possible solution to this is called
P2P caching, where a ISP stores the part of files most accessed by P2P clients in order to save access to the Internet.
Social and economic impact
The concept of P2P is increasingly evolving to an expanded usage as the relational dynamic active in distributed networks,
i.e., not just computer to computer, but human to human.
Yochai Benkler has coined the term
commons-based peer production to denote collaborative projects such as free software. Associated with peer production are the concepts of:
- peer governance (referring to the manner in which peer production projects are managed)
- peer distribution (or the manner in which products, particularly peer-produced products, are distributed)
Some researchers have explored the benefits of enabling virtual communities to self-organize and introduce incentives as a resource sharing and cooperation, arguing that what is missing from today's peer-to-peer systems should be seen both as a goal and a means for self-organized virtual communities to be built and fostered. Ongoing research efforts for designing effective incentive mechanisms in P2P systems, based on principles from game theory are beginning to take on a more psychological and information-processing direction.
Applications
Active peer-to-peer technologies include:
- Completely decentralized networks of peers: Usenet (1979) and WWIVnet (1987).
- Software publication and distribution (Linux, several games); via file sharing networks.
- In bioinformatics, drug candidate identification. The first such program was begun in 2001 the Centre for Computational Drug Discovery at the University of Oxford in cooperation with the National Foundation for Cancer Research. There are now several similar programs running under the United Devices Cancer Research Project.
- Pennsylvania State University, MIT and Simon Fraser University are carrying on a project called LionShare designed for facilitating file sharing among educational institutions globally.
- The U.S. Department of Defense has started research on P2P networks as part of its modern network warfare strategy. In May, 2003 Dr. Tether. Director of Defense Advanced Research Project Agency testified that U.S. Military is using P2P networks.
- Kato et al.’s studies indicate over 200 companies with approximately $400 million USD are investing in P2P network. Besides File Sharing, companies are also interested in Distributing Computing, Content Distribution.
- delivery of TV content over a P2P network (P2PTV)
- Skype, one of the most widely used internet phone applications is using P2P technology.
- VoIP (using application layer protocols such as SIP)
- An earlier generation of peer-to-peer systems were called "metacomputing" or were classed as "middleware". These include: Legion, Globus
- Windows Peer-to-Peer. Distributed peer application development, collaboration
Shipped with Advanced Networking Pack for Windows XP, Windows XP SP2, Windows Vista. This is a Windows component that runs only over IPv6 and provides a 'meta' peer-to-peer network that applications can utilize. It does not have file sharing support but third-parties can develop one. It also includes the Peer Name Resolution Protocol that allows dynamic domain name publication and resolution of names to endpoints. Windows Meeting Space and the People Near Me feature of Windows Vista use this protocol. It can be used to setup a Windows Internet Computer Name (WICN) using netsh p2p. Windows Remote Assistance and HomeGroup features of Windows 7 also use it.
Examples
Usenet and SMTP servers are connected in a P2P structure, with users connecting to these servers as clients, in the standard client-server arch.
Tim Berners-Lee's vision for the
World Wide Web was close to a P2P network in that it assumed each user of the web would be an active editor and contributor, creating and linking content to form an interlinked "web" of links. This contrasts to the current
broadcasting-like structure of the web.
Some networks and channels such as
Napster,
OpenNAP and
IRC serving channels use a client-server structure for some tasks (e.g. searching) and a P2P structure for others. Networks such as
Gnutella or
Freenet use a P2P structure for nearly all tasks, with the exception of finding peers to connect to when first setting up.
P2P architecture embodies one of the key technical concepts of the Internet, described in the first Internet
Request for Comments, RFC 1, "Host Software" dated April 7, 1969. More recently, the concept has achieved recognition in the general public in the context of the absence of central indexing
servers in architectures used for exchanging multimedia files.
See also