SPFE Documentation | Collections > Essays on SPFE > What is structured content?

What is structured content?

By J. F. MacDonald


Structured content adds metadata to content to serve a specific and limited purpose in an automated content creation and delivery system.

Information is structured when its functional components are tagged with metadata such that computer programs may retain content structure and purpose when processing it.

If you've written a letter, you've written structured content. If it was a formal letter, you likely wrote your name, return address, and date at the top, aligned on either the left or right side of the page, followed by the name and address of the person or institution for whom the letter was intended. After the salutation and body of the letter, you would have closed the letter with the appropriate farewell, your signature, name, and possibly your title. The letter followed a recognizable structure.

Correctly recognizing and interpreting such a letter requires a human reader familiar with the form and conventions of letters. Increasingly in today's world, the content we create must first be interpreted, transmitted, transformed, and reformatted by computer software before a human sees it. For a human, a properly formatted letter is structured content, whether it is typed or handwritten. But for a computer, it is unstructured. Scan or fax the letter, and software sees only bits to encode, transmit, and decode. The software won't recognize it as a letter; it can only reproduce the image that was scanned.

Here, we consider content to be structured if computer programs can recognize and retain its structure and purpose when processing it.

Structured content is tagged with metadata

For software to retain content structure while reformatting or otherwise processing it, the content must be tagged with metadata. Metadata is additional information that identifies the structural role that content components play. Information that identifies your letter's return address as a return address would be such metadata.

Many methods can and are used to tag content with metadata. The following are widely used:

But tagging information alone does not make it structured content. Content must function after it's processed by a computer program for it to serve its purpose. Simply tagging your letter's destination address as right-aligned and bold will not guarentee it will reach it's destination. A program must recognize the address as a destination address.

Structured content functions as a whole to serve a purpose

Information takes on meaning when it serves a purpose. An address found on a torn slip of paper could mean many things. It could have been a letter's return or destination address. It could have been given to someone who needed directions. It could have been torn from a tax return. You can assign the address's purpose and meaning only when you can place it in context. To find context, you need additional information.

For a body of content to serve a purpose, it must contain sufficient information to identify that purpose. For structured content to function as a stand-alone unit of information, it must be not only be complete, but also have its intent captured in its metadata.

But metadata that does not serve the content's purpose is superfluous. It isn't necessary to know why a letter was written to deliver it. Structured content serves a specific and limited purpose.

Structured content conforms to a specific type

To serve a specific purpose, structured content must conform to a specific type. That is, its structure must conform to a type. Whatever the content of a letter, it will function as a letter if it has the structure of a letter. We can identify the letter with the type "document," but that isn't sufficient to identify its purpose. We and our computer programs need more metadata than that.

Content types can form a hierarchy. A letter is not only a letter but is also a document. For some purposes, such as where to store it on my personal computer, I may need only identify it as type "document." But likely I'll store it in a folder that depends on the type of letter. I'll keep letters to loved ones and letters to the IRS in different folders. To identify that purpose, a structured letter would need to have metadata that identifies it not only as type "letter," but also identifies the type of letter.

As we can see in the case of a letter, an item of structured content may need to function in different ways. Thus its metadata needs to serve all its purposes. Yet we must identify one specific item of content with one specific type. Rather than encumber content typing with all the ways that an item of content may be used, we identify the content type with those essential features common for all its purposes. Whether we send it or file it away, a letter is a letter.

So, some essential metadata identifies the content type, and optional metadata can further help the content fulfill its purpose.

This document is an example of structured content

Though this is a generic document, it has structure. Its structure is illustrated in the following structure diagram.

Figure 1   EPPO-simple generic-topic structure

The tag <generic-topic> labels the root element of a generic-topic document.

The head element holds the document header, which comprises the identity group, the revision history group, and the index group. The latter contains the set of terms discussed in the topic.

The body element holds the topic contents. A body-content group follows the main title element and every section title.

Additional metadata is present in a header that helps process it in context of a larger body of content. The header metadata is intended for computer processing rather than for a human reader.

This document has optional metadata that help it serve its purpose:

As with a letter, a little metadata goes a long way in helping this content serve its purpose when it's processed by a computer. Or rather, preserve its purpose. It may or may not be a good letter or a good essay, but that's not the intent of structure metadata. The purpose of structure metadata is simply to preserve content's purpose when it's processed by computer programs.

Structured content serves both writers and readers

Granted, we all know how to structure a letter. But a letter has many subtypes and purposes. You wouldn't write a love letter the same way you'd write a business letter. As evidenced by the number of how-to-write-a-letter books that have been published (amazon.com lists more than a thousand), determining the form a specific letter should take is not a trivial task. As with creating most any kind of content, half the battle is in determining the structure that best serves its purpose.

Information serves many purposes and many of those purposes, as those of a letter, are served with widely recognized structures. Developing information types and defining content structures is both art and science. Writers who begin with a clearly defined purpose and develop or use structures that serves those purposes will have a better shot at achieving their purpose. A reader may well have a different purpose. Yet, by preserving intent when it's distributed, structured content helps content users as well as authors. It helps users find what they need.