Digital archives allow scholars who may not have the means and the time to travel to brick and mortar archives to work with newspaper and magazine archives. However, digital archives come with their own set of challenges and limitations. The advice offered below outlines common challenges and highlights unique opportunities presented by digital and microform archives.
Platforms and Search Interfaces
Publication Years, Frequency, and Missing Issues
Microfilm v. Digital Archives
Text Capture and Accuracy
Platforms and Search Interfaces
Today, few scholars work with print archives of newspapers and magazines on a regular basis. Most use digital or microfilm copies for ease of access and to save time and resources. Digital archives also offer new and unique research opportunities such as large-scale text analysis that print and microform archives lack.
While a small number of large companies dominates the market for digital newspaper and magazine archives, there are also many smaller commercial and non-commercial players in the marketplace. Researchers may prefer the functionality and search interface of certain publishers, but their personal preferences do not bear on archive selection. Content is typically the deciding factor and given the sheer size and complexity of most newspaper and magazine archives few archives are available on multiple platforms. Scholars are used to adapting to the limitations of available digital archives. For example, the “complete” archive (1860-2009) of the Philadelphia Inquirer is currently only available on the ProQuest platform. However, issues which are no longer protected by copyright law are also available in the America’s Historical Newspapers (Readex) archive, and NewsBank offers a digital archive of the last four decades. Scholars who need access to issues from the last ten years will choose NewsBank which is the exclusive provider for this time span, while those who need access to earlier years are most likely to go with ProQuest.
Having identified the appropriate digital archive and secured access to it, scholars should familiarize themselves with available platform features. Most large commercial providers offer the same set of features, but labels and presentation can vary widely. Common features include advanced search options, document type and date range faceting, permalinks and ready to go citations in the most common citation styles. A little-used and hence often hard to find option is the whole page view which is typically presented as a browse feature that allows scholars to read a complete issue page by page. Publishers that specialize in newspaper and magazine archives may offer simultaneous searching in multiple archives. Such cross-search functionality is limited to newspapers and magazines available from a single company and purchased by a library. For example, ProQuest has a large number of newspaper archives, but the University community can only cross-search the archives purchased by Falvey Library which include the New York Times, the Washington Post, Black Historical Newspapers, the Philadelphia Inquirer, the Pittsburgh-Post Gazette, the Wall Street Journal, and the Irish Times. Other ProQuest newspaper archives such as those for the Los Angeles Times, the Boston Globe, and the Chicago Tribune are not part of the Library’s digital collections and cannot be cross-searched with available archives.
Publication Years, Frequency, and Missing Issues
Finding out which years, months, weeks, and days of a newspaper or magazine are accessible can be tricky. It helps to start with the publication history of a news source to determine the years during which a newspaper or magazine was published, how frequently it was published, and whether publication was at any time interrupted or the title of the publication changed. Once publication facts are established, the next step is to figure out which issues survived, which archives have copies, and whether paper copies have been microfilmed and/or digitized. The last step will be to contact libraries, archives, and publishers to negotiate access.
Three resources introduced earlier can assist with this task: WorldCat, Wikipedia, and the US Newspaper Directory published by the Library of Congress. Contact a librarian if you need assistance.
Microfilm v. digital archives
Publication frequencies and poor paper quality were the main motivators to copy and preserve newspaper and magazine archives in alternative formats. Shelf space is limited in most institutions and print originals of old newspapers are fragile and difficult to preserve. Most research libraries have thus amassed extensive newspaper archives on microfilm. Microfilm copies are created from print originals and are an economic and efficient way to replicate large newspapers and magazine archives. Newspapers and magazines were among the first publications to be microfilmed partially due to the poor paper quality of the originals. Microfilm was widely adopted in the 20th century. It is well suited for the long-term preservation of text-based publications. Images such as photographs are a different matter and are in many cases of extremely poor quality.
Digital copies of newspapers and magazines are often created from already existing microfilm copies and not from print originals. While such digital copies offer great reproduction quality, they generally do not improve the quality of the microfilm master copy. Occasionally publishers go back to the print originals or clean up microfilm images if possible. Access to digital archives is often more convenient than access to microfilm so long as a scholar has a laptop and an internet connection. Reading microfilm on the other hand requires a specialized reader. Digital archives offer full text searching which permits scholars to work faster and more efficiently. Last, but not least, many digital archives offer simultaneous searching of multiple newspapers and magazines.
Text Capture AND Accuracy
Digital archives consist mostly of images and not machine-readable text even though most of the content is text. These images must be converted into machine-readable text for indexing, searching, and data mining operations. The conversion can be accomplished in various ways ranging from manually rekeying text to employing OCR (optical character recognition) software.
Rekeying entire text corpora is labor intensive and cost prohibitive but highly accurate. It is mostly used for pre-18th century print texts which are difficult to read under the best of circumstances. OCR software cannot deliver useful results when dealing with images of poorly preserved originals, text printed with fonts that use letters that closely resemble each other, or layouts with uneven spacing between words, to name just a few scenarios. In some cases, only rekeying will achieve reliable search recall. For example, the Early English Books Online Text Creation Partnership is a scholarly project dedicated to rekeying a subset of all the books printed in the UK and its dependencies between 1473 and 1700. Early English Books Online, an image-based digital archive based on a microfilm archive includes all the books but text searching accuracy is limited for reasons already given.
Some archives include both, a rekeyed text archive and an image-based archive, thus offering the advantages of high search accuracy at the same time as exact reproductions of the original text in context. African American Newspapers from Readex is such an archive. Some companies such as Nexis Uni offer only text-based copies of news stories without access to the original layout and the historical context of individual issues. The Nexis Uni archive also does not include accompanying images.
In recent years, archives and libraries have begun to crowdsource transcription and translation work. Exemplary projects include the Citizen Archivist initiative under the auspices of the National Archives and the What’s on the Menu? project of the New York Public Library.
Newspaper and magazine archives typically lack a subject index, thus limiting search operations to keywords in text or titles. This is important to keep in mind as words and phrases commonly used today may not retrieve the desired results when used as search terms in newspapers from the colonial era. For example, anybody looking for ads placed by pharmacists in the Pennsylvania Gazette (1728-1800) will come up empty-handed even though there were a fair number of pharmacists plying their trade in early Philadelphia. Changing the search terms to "apothecary" or "druggist" changes results dramatically. A small number of ads use the term "chemist" or "chymist."
Differences in word usage and spelling between American English and British English should be kept in mind. Standardization of spelling is a fairly recent phenomenon. Older newspapers and magazines often include spelling variations of individual words in the same issue. The ad featured here spells chemist with a "y," while other articles in the same newspaper use today's standardized spelling. The spelling in late 19th and 20th century publications is more consistent but scholars still need to be aware of differences between British and American spelling.
Word usage may not just vary over time, but a word can have different meanings in different regions and countries. For example, the word "wench" was commonly used for girls in British English from the 13th to the 19th century. However, in the American colonies the word "wench" took a racial turn: it referred to African American servants and slaves up to the 19th century. It is always advisable to take a closer look at one's search terms and the Oxford English Dictionary which includes American etymology and usage is well suited for this purpose.
Newspapers and magazines are rarely translated due to the sheer volume of text. This may change in the future with advances in artificial intelligence. For now, scholars are mostly limited to English language newspapers and magazines unless they have reading knowledge of other languages. Fortunately, English language news are widely published around the globe including in countries where English is not the national language. Falvey Library recently acquired the archive of the Moscow News, an English newspaper published in Moscow from 1930 to 2014. News aggregators such as Nexis Uni include English newspapers from China, Japan, India, and many other countries. Chinese news outlets include the South China Morning Post, a Hong Kong based newspaper which is published in English, the China Daily and Global Times, English language newspapers published by the Chinese government, and numerous Chinese language newspapers including the overseas edition published by Xinhua News. Among the few exceptions regarding news in translation are MideastWire.com, The Current Digest of the Russian Press and the Foreign Broadcast Information Service Daily Reports, better known as FBIS Daily Reports. MideastWire.com offers translations of selected news from 22 Arab countries and the Arab diaspora. Similarly, the Current Digest selects articles on important issues from Russian news sources and translates them into English. Its digital archive goes back to 1949 and it covers all major Russian news outlets. FBIS was an office of the CIA that monitored and translated selected foreign news for dissemination within the U.S. government. One of the drawbacks of these resources is the fact that only selected articles are presented in translation not including the original article and lacking the original publication context.