CiteSeer was a public
search engine and
digital library for scientific and academic papers. It is being replaced by
CiteSeerx. It was created by researchers
Steve Lawrence,
Kurt Bollacker and
Lee Giles while they were at the
NEC Research Institute (now
NEC Labs),
Princeton, New Jersey, USA. CiteSeer's goal was to actively crawl and harvest academic and scientific documents on the web and use autonomous
citation indexing to permit querying by citation or by document, ranking them by
citation impact. It is hosted on the
World Wide Web at the College of Information Sciences and Technology, The
Pennsylvania State University, and has over 700,000 documents, primarily in the fields of
computer and
information science and engineering.
CiteSeer freely provides
Open Archives Initiative metadata of all indexed documents and links indexed documents when possible to other sources of metadata such as
DBLP and the
ACM portal.
CiteSeer's goal was to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the
open access movement that is attempting to change
academic and scientific publishing to allow greater access to scientific literature.
The name can be construed to have at least two explanations. As a pun, a 'sightseer' is a tourist who looks at the sights, so a 'cite seer' would be a researcher who looks at cited papers. Another is a 'seer' is a prophet and a 'cite seer' is a prophet of citations.
CiteSeer has not been comprehensively updated since 2005 due to limitations in its architecture design. It has a representative sampling of research documents in computer and information science but is limited in coverage because it only has access to papers that are publicly available, usually at an author's homepage, or that are submitted by an author.
A new version and design of CiteSeer can be found at the Next Generation CiteSeer,
CiteSeerx, website. CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. As such authors whose documents are freely available are more likely to be represented in the index.
Compared to DBLP
A comparison of
DBLP references to actual papers in CiteSeer is somewhat like comparing apples to oranges. DBLP is a manually implemented bibliography gleaned from publisher websites. Consider the references in DBLP to well known authors such as
Alex Pentland (
MIT) or
Ramesh Jain (
UCI) (DBLP listings for Alex Pentland - http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/p/Pentland:Alex.html or Ramesh Jain - http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/j/Jain:Ramesh.html). DBLP shows a regular number of publications (~9) each year in
DBLP through 2007. Note that
CiteSeer has only one of their publications after 2000. However, DBLP has not cached any of their actual publications but only links to them on publisher websites. In addition CiteSeer has technical reports and papers in other fields such as statistics that DBLP does not index.
Recent developments
Other CiteSeer Engines
The CiteSeer model had been extended to cover academic documents in business with
SmealSearch and in e-business with
eBizSearch. However, these were not maintained by their sponsors. A older version of both of these could be once found at
BizSeer.IST but is no longer in service. For enhanced access and performance, similar versions of CiteSeer were supported at universities such as the
Massachusetts Institute of Technology,
University of Zürich and the
National University of Singapore. However, these versions of CiteSeer
proved difficult to maintain and many are no longer available.
Versions of CiteSeer have been or are available at the following links:
Other Seer like search and repository systems have been built for chemistry,
ChemXSeer and for archaeology, ArchSeer. Another has been built for robots.txt file search,
BotSeer. All of these are built on the open source indexer
Lucene.
Next Generation CiteSeer (CiteSeerx)
The Next Generation CiteSeer project, CiteSeer
x, funded by the National Science Foundation and
Microsoft Research, enhances CiteSeer both as a search engine and as a digital library. As an example, CiteSeer's notion of "contribution" to
acknowledgments in addition to citations, which would make it the first automatically generated
acknowledgment index. CiteSeer
x is designed differently from CiteSeer with new algorithms for entity extraction and a modular, expandable, robust, scalable architecture based on open source tools such as
Lucene and many Apache projects. As such, CiteSeer
x will promote the creation of other Seer like systems.
The Next Generation CiteSeer, CiteSeer
x, is now available in beta
with over one million documents indexed and constantly growing.
See also
- DBLP (Digital Bibliography & Library Project)