Home » Guest Post » Philologic @ NU Merges EEBO-TCP and ProQuest Databases into a Single Searchable Corpus

Philologic @ NU Merges EEBO-TCP and ProQuest Databases into a Single Searchable Corpus

After the recent announcement about the most recent release of TCP texts, Jeff Garrett got in touch and asked us to remind TCP users–especially those at libraries in the Committee on Institutional Cooperation–about the specialized searching made possible by the PhiloLogic implementation at Northwestern University. Today, I’m glad to publish this guest post from him. 

Reaching the 40,000 text milestone offers a good opportunity to remind EEBO-TCP users of a powerful alternative way to search EEBO TCP texts—and many other texts besides: the PhiloLogic implementation at Northwestern, or PhiloLogic @ NU for short. This site was developed as a joint CIC project in 2005 and 2006 to create a large merged database of early modern English texts searchable through the University of Chicago’s PhiloLogic search engine, additionally enhanced by the Virtual Modernization (VM) tool developed at Northwestern University. During 2010 and 2011, in collaboration with staff at the University of Chicago’s Electronic Text Service, ProQuest, and the Text Creation Partnership, the launch version of PhiloLogic @ NU was significantly expanded to more than double its original size. It now includes exciting new material: several ProQuest databases absent in the launch version, e.g., The Bible in English and Editions and Adaptations of Shakespeare; revised/expanded versions of seven ProQuest databases already represented in PhiloLogic @ NU; and finally about 15,000 new texts from both Phases I and II of EEBO TCP. Visit PhiloLogic.northwestern.edu to see what’s new! If you access PhiloLogic @ NU from a CIC member school or from one of several other participating institutions, you can also go right to work using the resource.

What the enhanced version of PhiloLogic @ NU can mean for students and researchers is best illustrated by an example. Let’s say you are studying the resonance of the Bible’s Second Commandment (“Love thy neighbour as thyself”) in English-language literature of the last 500 years. You might start by searching ProQuest’s “Bible in English” database through PhiloLogic @ NU. To find the relevant biblical passages in PhiloLogic—and there are at least a dozen in most Bible editions—it’s best to do a proximity search for “love” and “neighbour,” restricting proximity to within three words. Thanks to the VM tool, you will uncover occurrences, for example, in the King James Version of 1611, The New English Bible of 1970 (“The second is this: ‘Love your neighbour as yourself'”), along with numerous others, 194 in all, most of which would not show up in a flat literal-string keyword or keyword phrase search in other online versions of the Bible. Wholly obsolete as well as typographically variant spellings will be retrieved, as in the Bishops’ Bible of 1568 (Matt. 22): ” And the seconde is lyke vnto this: Thou shalt loue thy neyghbour as thy selfe.” Virtual Modernization is so powerful because it invokes variant spellings and typographical variants of both search terms—of “love” (e.g. “loue”) and of “neighbour” (e.g., “neghbour,” “neigbour,” “neighbor,” “ neighbour,” “neighboure,” “neyboure,” ”neygbour,” “neyghbour,” “neyghboure”)—and then searches them against each other in same word order, but otherwise in all possible combinations. No keyword search could have done this before VM.

But now for the next step: bringing in EEBO-TCP and other databases to find instances in English literature where this biblical commandment is mentioned, altered, and commented upon. PhiloLogic @ NU’s “combo2” file pools a host of very large ProQuest databases with 30,000 EEBO TCP files. No surprise that the results for our Second Commandment search now skyrocket to 3528, opening up access to occurrences of and variations upon this biblical phrase in works from Geoffrey Chaucer to H.G. Wells. On the early end of the spectrum would be this passage from a vita of Saint Catherine of Siena printed in 1500: “Knewest thou not well that in thise two thynges scondeth the perfection of myn commaundementys that is in loue off god and loue of thyn neyghbour.” But we also uncover interesting 19th and early 20th century material useful for our study by including some of the more modern ProQuest databases. A passage from George Eliot’s Adam Bede of 1859, for example, reads: “ . . . she went clean again’ the Scriptur, for that says, ‘Love your neighbour as yourself;’ but I said, ‘If you loved your neighbour no better nor you do yourself, Dinah, it’s little enough you’d do for him. You’d be thinking he might do well enough on a half-empty stomach.’” This comes up because ProQuest’s Nineteenth-Century Fiction database is included in the new combo2 file.

Our only regret at the present moment is that although all 25,353 Phase I EEBO TCP texts are represented in PhiloLogic @ NU, so far only 4,180 of the newer Phase II texts are. We look forward to adding the new Phase II material sometime in the future, once the new version of PhiloLogic is introduced—an exciting development described in an earlier post to this blog. But even with the smaller corpus base—and a few quirks—PhiloLogic @ NU is an enormously powerful tool, supporting creative searching across a database of close to 100,000 texts.

For now, PhiloLogic @ NU is available only to CIC member institutions and to several partners outside the CIC, with access to individual and cumulated files customized for each institution based on existing ProQuest licenses and membership in Phases I and II of the Text Creation Partnership. Drop us a line at speciallibraries@northwestern.edu if you’d like to know more.

Jeff Garrett, Northwestern University Library

Leave a Reply

Your email address will not be published. Required fields are marked *

*