SAM Parser

SAM Parser is an application to process SAM files into an equivalent XML representation which you can then further process in any desired form of output.

SAM Parser allows to specify an XSLT 1.0 stylesheet to post-process the output XML into a desired output format. In particular, it only allows for processing individual SAM files to individual output files. It does not support any kind of assembly or linking of multiple source files. This facility is not intended to meet all output needs, but it provides a simple way to create basic outputs.

SAM Parser is a Python 3 program that requires the regex and libxml libraries that are not distributed with the default Python distribution. If you don't already have a Python 3 install, the easiest way to get one with the required libraries installed is to install Anaconda.

SAM Parser is invoked as follows:

samparser <output_mode> <options>

Three output modes are available:

All output modes take the following options:

in the output file.

XML output mode takes the following options:

HTML output mode takes the follow options:

Regurgitate mode does not take any additional options.

Short forms of the options are available as follows

option short form
-outfile -o
-outdir -od
-expandrelativepaths -xrp
-outputextension -oext
-smartquotes -sq
-xslt -x
-xsd
-transformedoutputfile -to
-transformedoutputdir -tod
-transformedextension -toext
-css
-javascript

Validating with an XML schema

Eventually, SAM is going to have its own schema language, but until that is available (and probably afterward) you can validate your document against an XML schema. Schema validation is done on the XML output format, not the input (because it is an XML schema, not a SAM schema). To invoke schema validation, use the -xsd option on the command line:

-xsd <scehma.xsd>

Expanding relative paths

The SAM parser can expand relative paths of insert statements in the source document while serializing the output. This can be useful if the location of the output file is not the same relative to the included resources as the location of the source file. To tell the parser to expand relative paths into absolute URLs, use the -expandrelativepaths option. The short form is -xrp.

- xrp

Note that this applies to paths in SAM insert statements only. If you include paths in custom structures in your markup, they will not be expanded as the parser has no way of knowing that the value of a custom structure is a path.

Regurgitating the SAM document

The parser can also regurgitate the SAM document (that is, create a SAM serialization of the structure defined by the original SAM document). The regurgitated version may be different in small ways from the input document but will create the same structures internally and will serialize the same way as the original. Some of the difference are:

To regurgitate, use the regurgitate output mode.

Smart quotes

The parser incorporates a smart quotes feature. The writer can specify that they want smartquotes processing for their document by including the smartquotes declaration at the start of their document.

!smart-quotes: on

By default, the parser supports two values for the smart quotes declaration, on and off (the default). The built-in on setting supports the following translations:

Note that no smart quote algorithm is perfect. This option will miss some instances and may get some wrong. To ensure you always get the characters you want, enter the unicode characters directly or use a character entity.

Smart quote processing is not applied to code phrases or to codeblocks or embedded markup.

Because different writers may want different smart quote rules, or different rules may be appropriate to different kinds of materials. the parser lets you specify your own sets of smart quote rules. Essentially this lets you detect any pattern in the text and define a substitution for it. You can use it for any characters substitutions that you find useful, even those having nothing to do with quotes.

To define a set of smart quote substitutions, create a XML file like the sq.xml file included with the parser. This file includes two alternate sets of smart quote rules, justquotes and justdashes, which contains rulesets which process just quotes and just dashes respectively. The dashes and quotes rules in this file are the same as those built in to the parser. Note, however, that the parser does not use these files by default.

To invoke the justquotes rule set:

  1. Add the declaration !smart-quotes: justquotes to the document.

  2. Use the command line parameter -sq <path-to-sam-directory>/sq.xml.

To add a custom rule set, create your own rule set file and invoke it in the same way.

Note that the rules in each rule set are represented by regular expressions. The rules detect characters based on their surroundings. They do not detect quotations by finding the opening and closing quotes as a pair. They find them separately. This means that the order of rules in the rule file may be important. In the default rules, close quote rules are listed first. Reversing the order might result in some close quotes being detected as open quotes.

HTML Output Mode

Normally SAM is serialized to XML which you can then process to produce HTML or any other output you want. However, the parser also supports outputting HTML directly. The attraction of this is that it allows you to have a semantically constrained input format that can be validated with a schema but which can still output to HTML5 directly.

SAM structures are output to HTML as follows:

To generate HTML output, use the html output mode the command line.

To specify a stylesheet to be imported by the resulting HTML file, use the -css option with the URL of the css file to be included (relative to the location of the HTML file). You can specify the -css option more than once.

To specify a javascript file to be imported by the resulting HTML file, use the -javascript option with the URL of the javascript file to be included (relative to the location of the HTML file). You can specify the -javascript option more than once.

Running SAM Parser on Windows

To run SAM Parser on Windows, use the samparser batch file:

samparser xml foo.sam -o foo.xml -x foo2html.xslt -to foo.html

To run SAM Parser on Xnix or Mac, invoke Python 3 as appropriate on your system. For example:

python3 samparser.py xml foo.sam -o foo.xml -x foo2html.xslt -to foo.html