In my previous blog post “Meet a TCP Editor: Sarah Wingo” I noted that one of my favorite things about being a TCP editor is the way in which each text is like a puzzle in need of solving.
This post will outline one of TCP’s basic rules for marking up text, and how that rule affects what readers will see when using TCP texts. The basic idea behind this rule is function over form. In other words, TCP aims to capture structural information which will be useful for intelligible display, informed searching, and intelligent navigation. In this way we capture the content of each book, and the meaning/purpose of any special formatting, but do not exactly reproduce the look or specific style presented in the original printed work. One slight exception to this rule is how we capture the information for title pages. The information contained in a title page tends to receive the highest frequency of searches, so we try to avoid cluttering it with markup and as such leave title pages relatively markup free, sometimes even removing unnecessary markup.
Take for example the following title page:
The above image is a simple title page. The markup for this title page is shown below in image 1:
In image 1. the <P> tags indicate where paragraphs start and end. No alterations are made to the spelling or the cases of the letters from the original image. However, you will notice in image 2., which shows how this text would display to the viewer, that certain visual elements from the original page are not captured in the TCP text:
Most notably, the font size is now uniform where in the original title page “London, Printed in the Yeer 1642.” appeared much smaller than the rest of the text. Furthermore, the decorative illustration situated between the author’s name and the publication information is not represented in the TCP markup and thus does not appear when that markup is rendered to the viewer. The font sizes and decorative figures provide stylistic elements in the title page that, do not contribute directly to our ability to understand the content of the title page. Font size is standardized because noting subtle changes in font size would be cost effective in terms of the time it would take as compared to any benefits it would provide. Another reason they are left out is because they are unlikely to be the subject of a search.
As a rule TCP does not capture detailed information about illustrations in text. This is because TCP is primarily concerned with text-based searches and analysis. However, figures that do more than decorate or divide the page are noted and can be searched.
Take for example the following image:
The illustration in image 3. would be represented in the text with very simple <FIGURE></FIGURE> tags. If there was any text contained within the illustration, it would also be represented in <HEAD> within the <FIGURE>:
Furthermore, editors may choose to add a very basic illustration description, especially if a illustration is highly detailed. Such a description might look like this:
<FIGURE><HEAD>text here</HEAD><FIGDESC>description here </FIGDESC></FIGURE>
However, this illustration has no text, and the editor did not see fit to add a description so it is simply represented in the markup as <FIGURE></FIGURE>:
As you can see in the above image, the only information conveyed here is that an illustration exists in a specific location on this page. If the viewer wishes to know more about the illustration they will have to pull up the EEBO page image for this text. This is essentially a compromise: the primary objective for TCP is to create searchable texts. However, we recognize that illustrations, too, are important to a text and can add meaning. The editors account for this by notifying viewers that an illustration is present by capturing useful text associated with it, and by describing it when feasible.
Function over form can also be demonstrated with the ways that lists are coded. Take for example the page containing a list in Image 6: