Home » Uncategorized » Function over Form: understanding the TCP encoding philosophy

Function over Form: understanding the TCP encoding philosophy

In my previous blog post “Meet a TCP Editor: Sarah Wingo” I noted that one of my favorite things about being a TCP editor is the way in which each text is like a puzzle in need of solving.

This post will outline one of TCP’s basic rules for marking up text, and how that rule affects what readers will see when using TCP texts. The basic idea behind this rule is function over form. In other words, TCP aims to capture structural information which will be useful for intelligible display, informed searching, and intelligent navigation. In this way we capture the content of each book, and the meaning/purpose of any special formatting, but do not exactly reproduce the look or specific style presented in the original printed work.  One slight exception to this rule is how we capture the information for title pages. The information contained in a title page tends to receive the highest frequency of searches, so we try to avoid cluttering it with markup and as such leave title pages relatively markup free, sometimes even removing unnecessary markup.

Take for example the following title page:

PDF. of title page













The above image is a simple title page. The markup for this title page is shown below in image 1:

Image 1. title page markup

In image 1. the <P> tags indicate where paragraphs start and end. No alterations are made to the spelling or the cases of the letters from the original image. However, you will notice in image 2., which shows how this text would display to the viewer, that certain visual elements from the original page are not captured in the TCP text:

Image 2. markup from Image 1. rendered.

Most notably, the font size is now uniform where in the original title page “London, Printed in the Yeer 1642.” appeared much smaller than the rest of the text.  Furthermore, the decorative illustration situated between the author’s name and the publication information is not represented in the TCP markup and thus does not appear when that markup is rendered to the viewer. The font sizes and decorative figures provide stylistic elements in the title page that, do not contribute directly to our ability to understand the content of the title page. Font size is standardized because noting subtle changes in font size would be cost effective in terms of the time it would take as compared to any benefits it would provide.   Another reason they are left out is because they are unlikely to be the subject of a search.

As a rule TCP does not capture detailed information about illustrations in text. This is because TCP is primarily concerned with text-based searches and analysis. However, figures that do more than decorate or divide the page are noted and can be searched.

Take for example the following image:

Image 3. title page with detailed image.

The illustration in image 3. would be represented in the text with very simple <FIGURE></FIGURE> tags.  If there was any text contained within the illustration, it would also be represented in <HEAD> within the <FIGURE>:


Furthermore, editors may choose to add a very basic illustration description, especially if a illustration is highly detailed. Such a description might look like this:

<FIGURE><HEAD>text here</HEAD><FIGDESC>description here </FIGDESC></FIGURE>

However, this illustration has no text, and the editor did not see fit to add a description so it is simply represented in the markup as <FIGURE></FIGURE>:

Image 4. markup for text containing figure

Online, this is rendered as:

Image 5. rendered text containing figure

As you can see in the above image, the only information conveyed here is that an illustration exists in a specific location on this page. If the viewer wishes to know more about the illustration they will have to pull up the EEBO page image for this text.  This is essentially a compromise: the primary objective for TCP is to create searchable texts. However, we recognize that illustrations, too, are important to a text and can add meaning. The editors account for this by notifying viewers that an illustration is present by capturing useful text associated with it, and by describing it when feasible.

Function over form can also be demonstrated with the ways that lists are coded. Take for example the page containing a list in Image 6:

Image 6. Lists

This page is complicated because at first glance it appears that there are three lists. However, the lists are not independent of each other: their relationship is important to the meaning of the text, and thus must be captured. Ultimately there are three levels of hierarchy on this page. The top level, the list of names and families; the second level, the list of places they come from; and the third level containing two sibling lists with place names from two different regions.
The relationship among these pieces of data (in other words, the function of these lists) must be captured. However, the layout from the original page, with the two lowest level lists sitting side by side, will not be captured. Image 7. below depicts how these lists would be marked up by an editor, and image 8. shows how they would be rendered to the viewer in our platform:

Image 7. markup for lists in Image 6.

Image 8. rendering of list from Images 6 & 7.









You will notice that although the form, or layout, of the information contained in Image 8. is different from the form/layout of the information in Image 6., the information conveyed by each image is the same. The TCP editor captures hierarchical information of the lists. They do not capture the information that, in the original volume, the lists were displayed in two columns side by side.
By understanding TCP’s approach to tagging readers can better understand the information they are looking at and thus make informed decisions about how to search and view a given set of texts.  For example, a reader searching for text related to images could do a search for figures and based on the information above would know that in Images 3 and 4 “IHS” is the text related to that figure.  The reader could also choose to view the image of the page to get more detail, but if the content of the text were their only goal they would not have to do so.
By: Sarah Wingo

1 Respond for Function over Form: understanding the TCP encoding philosophy

  1. […] my last post, Function over Form: understanding the TCP encoding philosophy, I provided insights into the markup behind TCP texts, and discussed the philosophy behind why […]

Leave a Reply

Your email address will not be published. Required fields are marked *