SAM Quick Start

SAM is a markup language for semantic authoring. It is not a fixed language like Markdown. It is a meta-language like XML. Markdown has a fixed set of document structures that you can use. SAM, like XML, lets you define and create your own structures to suit your own needs. Unlike XML, and like Markdown, SAM is designed to be easy to write in a plain text editor.

To make it easier to write, SAM provides a set of common text structures, such as paragraphs and lists, that are shared by all SAM document types. These common structures deliberately basic. There are other document structures you will probably need that you will have to design for your self using SAM's generic structures: blocks and annotations. See the SAM Recipes document for some suggestions on how to create some commonly used document structures.

SAM is designed for structured writing. That is, writing that obeys a specific defined structure that tells you what goes into a document and how it is expressed. A structured document isn't designed to be read directly by the reader. It must be processed with algorithms to produce readable output. But it also supports using algorithms to perform many other functions, such as guiding the writer, validating the content, linking the content, managing the content, creating new forms of content, or publishing the content to different media and for different purposes. (In this document I refer to such algorithms collectively as "the application layer".)

To enable all these algorithms, the writers must obey the structure of the markup language and must embed certain items of metadata into the content for the algorithms to work with. To do this easily and effectively, the writer must be able to clearly see the structure as they are writing. That is not the case with XML, whether you write it in a plain text editor or in a sophisticated XML editor that makes it look like a Word document. Verbose XML tags obscure structure and Word-like interfaces hide structure behind an approximation of final formatting. SAM is designed to make the structure of structured documents clear and explicit to the writer, while also making them simple and straightforward to type.

Structured writing essentially divides content into a set of nested named blocks. SAM introduces blocks with a one word label followed by a colon. It shows the nesting of one block inside another by indenting the nested block under its parent. Here is an example of a structured document, a recipe, written in SAM:

recipe: Hard Boiled Egg
    introduction:
        A hard boiled egg is simple and nutritious. It makes
        a quick and delicious addition to any breakfast.

        Hard boiled eggs can also be an ingredients in
        other dishes, such as a {Cobb Salad}(recipe).

    ingredients:: ingredient, quantity
        eggs, 12
        water, 2qt
    preparation:
        1. Place eggs in pan and cover with water.
        2. Bring water to a boil.
        3. Remove from heat and cover for 12 minutes.
        4. Place eggs in cold water to stop cooking.
        5. Peel and serve.
    prep-time: 15 minutes
    serves: 6
    wine-match: champagne and orange juice
    beverage-match: orange juice
    nutrition:
        serving: 1 large (50 g)
        calories: 78
        total-fat: 5 g
        saturated-fat: 0.7 g
        polyunsaturated-fat: 0.7 g
        monounsaturated-fat: 2 g
        cholesterol: 186.5 mg
        sodium: 62 mg
        potassium: 63 mg
        total-carbohydrate: 0.6 g
        dietary-fiber: 0 g
        sugar: 0.6 g
        protein: 6 g

All SAM documents consist of a single root block, in this case the block labeled "recipe". That block contains a number of other blocks with labels like "ingredients" and "nutrition". Some of these block also contain other blocks. For example, the preparation block contain a list block (the list is one of SAM's default block types). Some blocks may contain just a single value. These blocks are called "fields". The individual fields that make up the nutrition block are indented are fields.

This is a semantic version of a recipe. It does not tell you how the recipe should be formatted. It does not even contain the section headings that will probably be used when the recipe is printed on paper or screen. To actually make a printed recipe, the semantic version needs to be processed by an algorithm that decides which pieces of information to publish, what order to publish them in, what titles to use for the sections, how to format the text, and which media to publish to. However, the semantic version lets you do cool stuff like make a collection of recipes with less than 80 calories per serving that can be prepared in less than 20 minutes, because it makes those pieces of information clearly available to algorithms for querying and processing. It also helps to make sure the people writing recipes provide all the information we need in the order and form we want it in.

The main parts of a SAM document as as follows.

Blocks

A block is a container for other structures. In the example above, recipe: starts a block that contains the entire recipe document. nutrition: starts a block that contains the nutrition information. Blocks are delimited by indentation. Everything indented under the block tag is part of the block. Every SAM document must have a single root block that contains all the content of the document.

The recipe block has a title, "Hard Boiled Egg", which is the text after the colon in recipe:.

recipe: Hard Boiled Egg
    introduction:
        A hard boiled egg is simple and nutritious. It makes
        a quick and delicious addition to any breakfast.

This is a shortcut and is exactly equivalent to this (provided there is nothing indented under the title tag):

recipe:
    title: Hard Boiled Egg
    introduction:
        A hard boiled egg is simple and nutritious. It makes
        a quick and delicious addition to any breakfast.

Fields

A field is a container for a single value. Each of the lines in the nutrition block is a field. For example, sugar: 0.6 g is a field with the label sugar and the value 0.6 g. A field has the same format as a block except that there is nothing indented under it. If you indent something under a field it will become a block and its value will become the title of the block.

Paragraphs

A paragraph is a string of text. Paragraphs may run over several lines. A paragraph ends with a blank line or when the block it belongs to ends (which means when next line is less indented). A paragraph is a kind of block but it does not require a label. Its block name is p but this is completely implicit. Paragraphs may not have children. That is, you can't indent anything under a paragraph.

Lists

You can create ordered lists by using numbers followed by a period to start each line of the list.

1. Place eggs in pan and cover with water.
2. Bring water to a boil.
3. Remove from heat and cover for 12 minutes.
4. Place eggs in cold water to stop cooking.
5. Peel and serve.

Unordered lists can be created the same way using asterisks.

* Dog
* Cat
* Monkey

Lists are blocks, as are each item in the list. The content of a list item is a paragraph. Therefore the equivalent XML structure to the SAM structure above is:

<ul>
    <li>
        <p>Dog</p>
    </li>
    <li>
        <p>Cat</p>
    </li>
    <li>
        <p>Monkey</p>
    </li>
</ul>

Since paragraphs cannot have children, when you create a list following a paragraph, the number or asterisk that starts the list items must be at the same indent level as the paragraph above.

Lists can be nested:

1. Cats
    * Tabby
    * Manx
    * Siamese
2. Dogs
    * Collie
    * Beagle
    * Spaniel

You can create more than one paragraph in a list item by indenting the second paragraph to match the indent of the first one:

1. Cats are fluffy.

   Dogs are silly.

   Cows fly over the moon.

2. Rocks are hard.

   Earth is round.

   The moon is made of green cheese. 

Phrases:

A phrase is a piece of text within a paragraph that you want to apply metadata to. A phrase is marked up with curly braces. In the recipe example, the words "Cobb Salad" are marked up as a phrase: {Cobb Salad}. We mark up phrases so that we can add metadata to them with annotations and citations.

Characters

SAM documents are Unicode documents encoded in UTF-8. You can therefore enter any character you like as literal text. However, there are two circumstances where you may need to enter characters in a different way:

To enter a character that is a SAM markup character, precede it with a backslash:

The curly brace \{ is not a commonly used character.

Note that characters such as the curly brace are only recognized as markup in SAM if they form part of a complete SAM markup structure. Because there is no closing curly brace in the above sentence, the curly brace by itself would still be recognized as text. However, using the backslash character escape is probably still a good idea to prevent accidentally creating markup sequences without realizing it.

To enter characters that are not on your keyboard, you can either type unicode character using whatever keyboard sequence your OS provides or you can enter an XML/HTML style character entity. For example, to enter the symbol for a British pound, you can use the &pound character entity.

The doggy in the window costs &pound;5.00.

SAM recognizes all of the HTML character entities. (Note, however, that the recognition of HTML character entities could vary based on the SAM parser used. The current SAM Parser uses the Python html library to do character entity decoding. Running the parser with a different version of the html library might result in a different set of named entities being recognized.)

You can also use XML style numeric and hexadecimal character entities:

The doggy in the window costs &#163;5.00.

and:

The doggy in the window costs &#xA3;5.00.

Note that the use of character entities is an alternative to using backslash character escapes, so you can do:

The curly brace &lbrace; is not a commonly used character.

instead of

The curly brace \{ is not a commonly used character.

To enter a literal backslash into your text you can either do:

\\

or

&bsol;

or, if the backslash is not preceding a SAM markup character, you can just enter it alone:

\

To enter a literal character escape sequence, you can escape the leading & either with a backslash:

\&pound;

or with a character escape:

&amp;pound;

Annotations

An annotation is a way of adding descriptive labels or metadata to individual phrases within a text. Annotations can cover anything from formatting (bold, italic) to linking to semantic annotation. Annotations are applied to phrases. In the example above, the phrase "Cobb Salad" has an annotation "recipe" ({Cobb Salad}(recipe)) which tells us that the phrase is the name of a recipe. There is no fixed set of annotations in SAM. It is up to you to determine what annotations you need for your content.

Annotations have three parts, the second and third parts being optional.

The first part is the type. It asserts that the annotated phrase contains a certain type of information or performs a certain type of role in the document.

{John Wayne}(actor) plays an ex-Union Colonel out for revenge. 

This annotation asserts that the phrase "John Wayne" is the name of an actor (its type).

The second part is the "specifically" attribute. Sometimes the meaning of the phrase is not clear from the words of the phrase itself. In this case you can make the meaning clear using the "specifically" attribute, which is contained in quotation marks after the type:

{The Duke}(actor "John Wayne") plays an ex-Union Colonel out for revenge. 

In this case the phrase itself is not the canonical name of the actor it refers to, so we add the specifically attribute "John Wayne" to clarify the meaning of the annotated phrase.

The third part of the annotation is the namespace attribute. In some cases the type and specifically attributes may not be enough to uniquely identify the phrase being annotated because the same phrase, even it its canonical form, my refer to the same type of object in another context. In this case we can use the namespace attribute, which is contained in a second set of parentheses, to specify the context in which the annotated phrase should be understood.

{The Duke}(actor "John Wayne" (SAG)) plays an ex-Union Colonel out for revenge. 

In this case, the names of American actors are governed by the Screen Actors Guild. It is possible there could be another actor in another jurisdiction who also uses the name "John Wayne". So here we specify that the name, as we are using it, belongs to the namespace of actors that is governed by SAG.

Annotation chaining

You cannot markup one phrase inside another phrase, however, you can add more than one annotation to a phrase. To add additional annotations, simply follow one with another without any space between them:

{Clint Eastwood}(actor)(director) stared in and directed {Gran Torino}(movie).

Annotation lookup

You do not have to repeat the full annotation of a phrase that occurs multiple times in a document. If you have already annotated a phrase once in a document, you can simply mark it as a phrase (by wrapping it in curly braces) and the parser will look back through the document for the last time that phrase was annotated and copy the most recent annotation to the current phrase. In no annotation of that phrase is found, the parser will raise a warning. This is informational only, it is not an error to have an unannotated phrase.

Links

Links are a form of annotation. As a meta-language, SAM does not support links itself. It is up to you to decide if your markup language includes support for links or not. However, SAM does reserve the annotation type link for creating links, and has a shortcut for creating links, which looks like this:

{Cobb Salad}(http://allrecipes.com/recipe/14415/cobb-salad/)

That is, if the first thing in the annotation is recognized as a URL, it is assumed to be a link annotation pointing to that URL. This the example above is equivalent to:

{Cobb Salad}(link "http://allrecipes.com/recipe/14415/cobb-salad/") 

Bold and Italic

You can indicate bold and italic text with annotations, like this:

This text {is in bold}(bold) type.

This text {is in italic}(italic) type.

However, SAM provides shortcuts for bold and italic, as follows:

This text *is in bold* type.

This text _is in italic_ type.

These forms are equivalent, but notice that the shortcut forms don't support the other annotation attributes and that you cannot chain other annotations with the shortcuts. Thus if you wanted to annotate a book title as both italic and a title, you would have to do it like this:

{Moby Dick}(italic)(book-title) is a long book. 

(But note that you should not actually mark up book titles like this as the fact that the phrase is marked up as a book-title is enough to tell the formatting algorithm to format it in italic.)

Also note that the bold and italic shortcuts cannot be nested. If you want to mark some text as both bold and italic, you have to use regular annotations:

This information is _really_, *really*, {really}(bold)(italic) important.

Attributes

SAM does not have a general attribute mechanism like XML. XML attributes are one of the things that make it hard to write in, even in a structured editor. However, SAM does provide some basic attributes that are commonly needed for content management purposes.

You can attach attributes to blocks and block-like things and to phrases. An attribute looks like an annotation, in the sense that it is contained in parentheses, but attributes as distinguished by an initial character the specifies their type. Thus a condition starts with a question mark (?deluxe).

To attach an attribute to a block, place it directly after the colon of the block header:

note:(?basic)
    Do not immerse your Basic model Widget in water. Only higher
    trim levels are waterproof.

The supported attribute types are:

ID

An identifier for the block or field. An ID must be unique in the document. An ID is preceded by an asterisk: *my-id.

Name

A name also identifies a block or field but it has wider scope. Names are used when you want to identify something across a set of documents. SAM does not check that names are unique across a set of documents (it has no way of knowing what that set might be). Resolution of names is up to the processing software. A name is preceded by a # symbol: #my-name.

Condition

Specifying a condition token is a way of telling the publishing software to include this block only when a certain condition is true. How the publishing software determines if a condition is true is entirely up to the application layer. A condition attribute is preceded by a ?: ?my-condition.

Language

You can specify the language of a block using a language attribute. The language attribute should a W3C language tag. It is preceded by a !: !en-CA.

You can apply multiple attributes to a block:

note:(*note.waterproof)(?basic)
    Do not immerse you Basic model Widget in Water. Only higher
    trim levels are waterproof.

You can only apply one each of the id, name, and language attributes, but you can apply as many condition attributes to a block as you need.

Note that if you wish to add other forms of management domain metadata to your blocks you can do so using fields within the block. SAM's set of management domain annotations are just a convenience feature. You can add any management domain metadata you like using regular fields. As a semantic format, there is nothing in SAM that says that all of the content of blocks or fields has to appear in print. Blocks and fields can contain any kind of data or metadata you like. It is up to the publishing algorithms to decide which content to publish and which to use for management purposes.

Citations

Citations are used to refer to another resource. The expectation is that on output the citation markup will be replaced by text that creates a reference to that resource. (Note that this is different from an annotation which might lead to a link being created on a phrase, but will not change the content of the phrase.)

Citations come in two forms. Citations of internal names/ids, and citations of external resources. Citations can be applied to:

To cite a resource that has an id within the current SAM document, reference the id like this:

Moby Dick is about a big fish[*moby]. See [*whale].

fig:(*whale)
    >>(image whale.png)

footnote:(*moby)
    Actually, Moby Dick is a whale, not a fish.

(Note that fig and footnote are not part of SAM itself. They are block types in a particular language defined using SAM. Whether or not your SAM-based tagging language supports fig or footnote, and what they mean in that language, is entirely up to you. This is just an example of how you might use citations in your tagging language.)

To cite a named resource within your content set, reference the name like this:

Moby Dick[#MobyDick] is about a big fish.

Or like this:

{Moby Dick}[#MobyDick] is about a big fish.

The advantage of applying the citation to a phrase is that it allows you to turn the citation into a link in online content.

This example is using a name citation as a means of referencing a bibliographical entry that could be used by a publishing algorithm to build a citation in the desired format on output. This assumes that there is a bibliographic entry with the named MobyDick somewhere in your content set.

To cite an external work without referring to another resource in your content set, use a standard reference syntax.

Moby Dick[Melville, 1851] is about a big fish.

SAM does not attempt to decode the format of this style of citation. It just delivers it as a string to the application layer.

The application layer is responsible for all processing of references. Since the syntax does not make any distinction between the types of references being made (*foo could be the name of any block, such as a graphic, a footnote, or a bibliographic entry) the presumed semantics of a tagging language that uses citations is that the treatment of the citation is based on the type of object that the citation references, not the type of the citation. Thus in the example above, [*whale] is a figure reference because the block with the ID *whale is a figure and [*moby] is a footnote reference because the block with the ID *moby is a footnote.

You can chain citations the same way you chain annotations and you can also chain citations and annotations together.

Record sets

A record set is a kind of table. But because SAM is designed for semantic authoring, record sets are designed more like a database table than a formatted table. The content in a record set could be presented in a table, but it could also be presented in other ways or queried like a database table.

A record set consists of records, one per line. A record is a set of field values separated by commas. Each record in a record set has the same set of fields of the same type, like a record in a database table. The names of the fields are specified in the record set header. In the recipe example, the ingredients are specified using a record set, with each ingredient forming a record.

ingredients:: ingredient, quantity
    eggs, 12
    water, 2qt

The record set label is ingredients:: and it is marked with two colons instead of one. The field names follow the two colons, separated by commas. Each line that follows contains a record of an ingredient, with the fields separated by commas. If the above were written using regular blocks and fields rather than a record set, it would look like this:

ingredients:
    row:
        ingredient: eggs
        quantity: 12
    row:
        ingredient: water
        quantity: 2qt

You could use record sets to create a simple table layout:

table:: cell, cell
    eggs, 12
    water, 2qt

This is equivalent to:

table:
    row:
        cell: eggs
        cell: 12
    row:
        cell: water
        cell: 2qt

However, here the semantics of the content has been lost. Record sets exist to allow you to retain and express the semantics of tabular data.

Grids

There is another way to do simple table layouts, using grids. A grid looks like this:

+++
    eggs  | 12
    water | 2qt

This is equivalent to a record set in the form:

grid:: cell, cell
    eggs, 12
    water, 2qt

Which is equivalent to blocks and fields like this:

grid:
    row:
        cell: eggs
        cell: 12
    row:
        cell: water
        cell: 2qt

However, using grids can make it easier to read small table-like structures in the SAM source.

There are no advanced table features like table heads or row and column spanning in grids. SAM is intended for semantic authoring rather than complex layout effects. For suggestions on how to handle complex tables in SAM, see the SAM Recipes document.

Code Blocks

A code block is a piece of computer code or perhaps the text of a terminal display. Code blocks are usually presented literally as written, often in a fixed space font and with line breaks in the same place as in the source. Most code blocks are in a specific programming or data structure language which the processing routine may need to know to do things like color coding the syntax. In SAM, a codeblock is introduced with three back ticks. (```).

```(python)
    for i in range(1,10):
        print("Hello World " + str(i))

The code block must be intended under the codeblock header (the three backticks). The indentation of the code in the code block will be calculated based on the least indented line. Thus in the example above, the line for i in range(1,10): will be treated as having an indent of 0, and the line print("Hello World " + str(i)) will be treated as having an indent of 4.

The language of the codeblock is given in the annotation immediately following the three back ticks of the header.

The content of a code block is not processed as SAM markup. This means that you do not have to escape any of the characters in your code sample. It also means that you can quote SAM markup itself without it being parsed as SAM markup. (Every example in this document uses exactly this technique.)

Inline code

Inline code is contained between backticks:

Python uses the `print()` function to print output.

Note that this is not a generic monospaced type decoration. It is intended specifically for code. Text in a code decoration is not parsed the way ordinary SAM markup is parsed. Character escapes are not recognized. The text is presented verbatim. Thus if you want to insert the code for a character entity, you can do it like this:

In XML use `&quot;` to enter a & character.

The only character escaping that is done in inline code is for the back tick symbol. To enter a literal back tick into inline code, use two back ticks in a row.

Sam creates inline code like this: ```&quot```.

This will render as:

Sam creates inline code like this: `&quot`.

Inline code can be annotated in the same way as a codeblock, so if you want to specify the language of a piece of inline code you can do this:

Python uses the `print()`(python) function to print output.

Blockquotes

A block quote is a quotation from another document that is set apart from the main text. In SAM, a block quote is introduced by three quotation marks in a row. You can use either double or single quotation marks. The body of the block quote is indented under the block quote header:

As Lewis Carroll observed:

"""[Lewis Carroll, Alice in Wonderland]
    Why, sometimes I've believed as many as six
    impossible things before breakfast.

The content of a block quote is regular SAM markup and is processed just like any other SAM markup.

Lines

A line is a piece of text with a fixed line ending. Poetry, for example, is a set of lines. Normally, SAM runs the lines of a paragraph together and leaves it up to the publishing software to determine line breaks at publishing time. To preserve the line breaks in your text, use lines. In SAM, lines are created by preceding each line with a pipe character followed by a space (the space is important!):

Jabberwocky is a nonsense poem by Lewis Carroll.

"""[Lewis Carroll, Through the Looking-Glass, and What Alice Found There (1871)]

    | 'Twas brillig, and the slithy toves
    | Did gyre and gimble in the wabe;
    | All mimsy were the borogoves,
    | And the mome raths outgrabe.

Inserts

An insert is an instruction to the application layer to insert a resource into the document output.

Inserts may be created at the block level or inside a paragraph or field value. At the block level, and insert is placed on a line by itself and is indicated by three greater-than signs:

>>>(image foo.png)

Inside a paragraph or field value, an insert is indicated by a single greater-than followed the the identification of the resource in parentheses:

My favorite flavor of ice cream is >($favorite-flavor).

The resource to be inserted may be identified either by type and url as in the block example above or by reference to an id, name, fragment, variable, or key (see below for information on variables and keys) as in the inline example, which inserts the value of a variable.

When a insert in indicated by URL, the type of the insert must be indicated (as it is with the word image) in the example above. SAM reserves the following instert type names: image, video, audio, feed, app, and object. You can also add your own insert type names.

Note that SAM does not process inserts. That is entirely up to the applications layer. SAM just provides standardized syntax for specifying an insert.

You can also assign names, conditions, or ids to an insert.

>>>(image fancy.png)(?model=deluxe)

Remember that it is up to the application layer to implement such conditions.

Includes

You can include one SAM file in another. The included file must be a complete SAM file. Its structured is included in the structure of the included file at the indent level of the include statement.

<<<(foo.sam)

Unlike inserts, which are simply parsed by the parser and passed on to the application layer for processing, includes are executed by the parser. The result of the include is presented to the application layer as part of the parsed document.

The ID uniqueness constraint that applies to individual SAM document also applies to included files. The IDs must be unique across the entire document parsed from the source file and any included files.

Includes cannot be made conditional, since conditions are parsed by the application layer, not the parser. If you want a conditional include in your tagging language, add a field for this purpose to your tagging language.

my-include:(?bar) foo.sam

Naturally, this include must be processed by the application layer.

For similar reasons, an include cannot have a name, id, or language attribute, since it does not produce an artifact in the output. If you need to apply any of these things to the included content, you can wrap the include statement in another structure such as a block or a fragment and apply the attributes to that. This will result in the included content being wrapped in that block or fragment in the output where the application layer can deal with it appropriately.

Variables

You can define a variable using the form:

$foo=bar

The value of a variable may contains annotations and/or citations.

To insert the value of a variable, use an insert (block level or inline) with the variable name:

>($title)

Note that the SAM parser does not resolve variables. The variable definitions and variable insert instructions are passed through to the application layer and must be resolved by the processing application. This allows the processing application to determine the scope within which variable names will be resolved.