Research and Development (2010-2015)


SNAC's research and development work demonstrated the feasibility of separating the description of people from the description of the historical records that were created by them and that document their lives and work. This separation serves two complementary objectives: improving the economy and effectiveness of archival description and providing researchers with a novel tool that integrates access to distributed historical records and reveals the social context within which the records were created.

Data Sources

The descriptions that constituted the bulk of the source data for the work were in three forms:

  • Detailed guides or finding aids that are encoded using the international standard Encoded Archival Description (EAD).
  • Summary descriptions of archival collections that use the library standard MARC21.
  • Original descriptions of people (traditionally described as authority records and represented in a wide variety of non-standard formats).

SNAC worked with nearly 190,000 finding aids contributed by scores of repositories, 2.2 million MARC21 descriptions contributed by OCLC WorldCat, and approximately 500,000 original descriptions of people contributed by the British Library, the U.S. National Archives and Records Administration, the Smithsonian Institution Archives, and others.

Data Processing Methods

Processing of the source data took place in three steps.

  • Data was extracted from the source descriptions and assembled into descriptions of individuals, families, and organizations using the international standard Encoded Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF).
  • The resulting EAC-CPF descriptions were matched and combined with one another, and then matched against more than 25 million Virtual International Authority File (VIAF) records. Data from matching VIAF records supplemented the data in the EAC-CPF descriptions.
  • Finally, the resulting EAC-CPF descriptions were added to the foundation of a public research tool that serves both as an integrated pathway into distributed historical resources and as a biographical-historical resource.

Each step in the processing was performed by one of the three SNAC collaborators. The first step, extracting/assembling, was done at the Institute for Advanced Technology in the Humanities, University of Virginia. The second step, matching/combining, was performed at the School of Information, University of California, Berkeley. The third and last step, developing the public research tool, was carried out at the California Digital Library, University of California Office of the President.

The U.S. National Endowment for the Humanities (2010-2012) and the Andrew W. Mellon Foundation (2012-2015) funded the R&D.