Kyle Miller > Software > htmacros > htmacros documentation

htmacros documentation

The htmacros package is a macro-based system for the production of HTML. The author does not like writing pure HTML, but does enjoy writing LaTeX with the AUCTeX package, and wanted a system which provided some extra features for building websites (where LaTeX is only for single documents). This package lets the user write in a LaTeX-like syntax to structure and compose websites.

1. Using htmacros

The program is called by

runhm.py infile.hm outdir

where infile.hm is the source text of the website and outdir is some path in which the output is placed. If outdir does not exist, it will be created.

For an example of how to structure a website with this system, we refer the reader to test/test.hm in the package’s main directory. The command runtest in the root of the package’s directory calls runhm.py and places the output in test/out.

This documentation may be built by running doc/makedoc, and the result is placed in doc/doc_html.

2. Reference

This section gives an overview of all of the macros in the language.

2.1. Basic execution

The execution model of the system is that the source text is read in character-by-character, and functions are executed depending on what the character is. The result of the function is then added to the current output stream (which are LazyTokens). Thus, a function can begin parsing itself and get the result for further processing.

There are two environments: the character environment and the escape environment. Environments may nest like in Scheme. The character environment contains the handler mapping for each character, and the escape environment contains the handler for each executable token. These are not to be confused with text environments, although it may be the case that text environments extend these environments.

The evaluation of a handler may be delayed by way of LambdaTokens. That is, the result of parsing must at some point be evaluated. This allows one to define macros which use the value of a variable at some point in the future. The evaluation of a handler also involves evaluation variables which may be acquired by \var for the purposes of making a templating language.

The following section describes these handlers.

2.2. Character handlers

The default handler for characters just returns itself as a StringToken.

The following is a list of basic character handlers in parser.py:

End of file. Returns EOFToken to end parsing.
%. Marks the beginning of a comment, and everything up to a newline or the end of the file is ignored.
\. Marks the beginning of an executable token. See Token handlers for more information.
{ and }. When the parser comes across a {, it tries to find a matching closing brace, and the contents are returned. This is how arguments to executable tokens are handled.
[ and ]. For handling optional arguments. The resulting LazyToken is ArgumentToken.

The following is a list of character handlers from textmarkup.py:

` and '. These render as curly single quotes.
`` and ''. These render as curly double quotes.
Newlines. Paragraphing like in an e-mail, such as
```
This is the first paragraph.

This is the second paragraph.
```
is handled by replacing any sequence of any number of spaces along with at least two newlines by a ParagraphToken.
-, --, and ---. These are replaced by a hyphen (-), an en-dash (–), and an em-dash (—), respectively.
&. Represents a column break inside a tabular environment, otherwise it’s an error. See \& for actually rendering an ampersand.

2.3. Token handlers

To enter a token, it must be escaped with a backslash (a “\”). For instance, \textit is a token for beginning italicized text.

By default, the name of a token must be composed of alphabetical characters (namely A-Z and a-z).

The following is a list of basic token handlers in parser.py:

\var{varname}: Gets the value of variable varname in the evaluation dictionary. This token is often used in a page template (for instance, <H1>\var{pagetitle}</H1>). Evaluation is delayed.
\def{macroname}{arg1,arg2,...}{replacement}: Defines a macro called macroname with some number of arguments. Evaluation of the definition is delayed (that is, the definition does not enter the current escape environment until it is evaluated). When the macro is evaluated, the replacement is given arguments as evaluation variables. The result of calling the macro is also delayed. User-defined macros can’t have optional arguments at the moment.
\begin{envname}...\end{envname}: Enters a text environment called envname. An environment is like a macro which takes a long textual argument, but has the benefit of not requiring one to carefully make sure all braces are matched. See Text environments.
\#: Just expands to #. (I have no idea why this is here.)
\setoutputdir{dirname}: Changes which directory, relative to the global output directory, the output files should be placed.
\include{filename}: Loads a file with respect to _global_input_dir, while also saving the variables in the list _fluid_let.
\&: Renders a plain ampersand.

The following is a list of token handlers from textmarkup.py:

\textit{text}: Sets the font face to italic for text.
\textbf{text}: Sets the font face to bold for text.
\texttt{text}: Switches the font to typewriter text for text.
\emph{text}: Like \textit but switches back and forth between italic and nonitalic text when nested.
\\: Renders as a line break. Inside a table, begins a new row.
\hline: Renders as a horizontal rule. Inside a table, adds a border between rows.
\`{a}: Puts a grave accent on a. Example: à.
\'{a}: Puts an acute accent on a. Example: á.
\^{a}: Puts a circumflex on a. Example: â.
\"{a}: Puts an umlaut on a. Example: ä.
\~{a}: Puts a tilde on a. Example: ã.
\r{a}: Puts a ring accent on a. Example: a .
\c{c}: Puts a cedilla on c. Example: ç.
\copyright: Renders as ©.
\char{code}: Lets one write something like \char{trade} for ™.
\title{text}: Sets the title of the current page to text.
\footnote{text}: Adds a footnote to the current page.
\item[caption]: An error unless inside a list environment such as itemize, enumerate, or description. Inside these environments, \item represents the beginning of a list item. The caption (which is optional) makes most sense in the description environment, where it is the term before the indented description. Otherwise, the caption is boldened.
\multicolumn{numcols}{formatting}: Lets a column span multiple rows. The formatting string is the same as in the tabular environment, but must only describe a single column.
\multirow{numrows}{width}: Lets a row span multiple rows The width is ignored. It’s present for LaTeX-compatibility.
\verb#text#: The text is taken verbatim (literatim, even) and rendered using typewriter text. The character immediately after \verb is taken to be the delimeter. For instance, \verb|\textit{text}| is also valid. The resulting text is escaped for HTML, so HTML-tag-looking text will not be mistaken for actual HTML.
\rawverb#text#: Like verb, but there is no HTML escaping.
\setpagetemplate{filename}: Sets the page template for page rendering to the contents of the file filename.
\setstylesheet{filename}: Copies file to {_global_base_out_dir}/css and sets _page_css to the filename in the output directory.
\section{text}: Creates a section with text text. Sets the next object to be labeled to this heading.
\subsection{text}: Creates a subsection with text text. Sets the next object to be labeled to this heading.
\subsubsection{text}: Creates a subsubsection with text text. Sets the next object to be labeled to this heading.
\addbreadcrumb[name]{label}: Adds to the breadcrumb trail the item with label label. The name argument may be used to override the default name of the labeled item. This also adds the _breadcrumbs variable to the _fluid_let list if it’s not already there so that an included file may modify the breadcrumb trail without having to worry about rectifying it (since \include will fix it automatically).
\popbreadcrumb: Removes the last breadcrumb from the breadcrumb trail.

The following is a list of tokens from references.py:

\label{name}

Takes the last-defined object which can be labeled and binds it to the label name. The label must be unique for the current page.

\ref[text]{name}

Creates a link whose text is either text, if it is specified, or the default text associated with the label name. The label has the following syntax:

If the referenced object is a page, then it is of the form page_label.
If the referenced object is in a page, then it of the form page_label#object_label. But, if the object is in the current page, the shorthand #object_label may be used.

\link[text]{linkurl}

Links to an external site whose URL is linkurl. If the text text for the link is not given, then linkurl is used instead.

The following is a list of tokens from images.py:

\includegraphics[width=xxx,height=yyy,ext=zzz,alt=text,page=nnn]{filename}

Places the image filename in the current page. The file is copied to an appropriate location and renamed suitably if the width, height, ext, and page attributes are set. This function requires imagemagick to be installed to operate. Any of the attributes may be omitted. The attributes do the following:

width: Sets the maximum width of the resulting, scaled image. If height is not given, then it is automatically computed.
height: Sets the maximum height of the resulting, scaled image. If width is not given, then it is automatically computed.
ext: Sets the extension that the included file should have (imagemagick does the conversion). Examples of valid extensions are jpg and png, depending on your distribution of imagemagick.
alt: Sets the ALT text of the included image.
page: Takes this page number from the document (for PDF documents).

A useful macro to have is the following:

\def{thumbnail}{image}{
  \file[\includegraphics[width=140,page=0,ext=png]{\var{image}}]{\var{image}}}

which creates a 140-pixel-wide thumbnail of any image or PDF document by \thumbnail{filename}.

The following is a list of tokens from filerefs.py:

\file[text]{filepath}: Copies the file at filepath to the output directory and inserts a link. The text of the link is text, if it is given, or otherwise the filename part of filepath.

2.4. Text environments

Text environments (brought about by an incantation such as \begin{env}...\end{env}) can change the current character and escape environments, and then process the result of the ....

Text environments from textmarkup.py:

\begin{page}{filename}...\end{page}

This environment takes the result of ... and writes it to filename, with respect to _curr_out_dir. This will create the output directory if it doesn’t already exist. The environment also ensures that two pages do not have the same filename. The page environment also sets up the following evaluation variables which are fed into the template in _page_template:

pagetitle: Text defined by \title.
pagecontent: The result of evaluating ....
css: A string which includes necessary stylesheet data.
pagemodified: Text defined by the \modified token.
breadcrumbs: Text which contains the breadcrumb links.
relpagepath: The path to this page, relative to the base URL.

The page requires a title (defined by \title) and text describing when it was modified (defined by \modified).

The page environment declares itself as the next object for being labeled with the title text as its text.

\begin{itemize}...\end{itemize}

Creates an unordered list whose elements are marked by \item.

\begin{enumerate}...\end{enumerate}

Creates an ordered list whose elements are marked by \item.

\begin{description}...\end{description}

Creates a dictionary list whose elements are marked by \item[caption], where caption is the term for each list element.

\begin{tabular}{formatting}...\end{tabular}

Renders a table, where & represents column breaks and \\ represents row breaks. The formatting string is some string of lcr|, where l, c, and r align a column left, center, and right, respectively, and | inserts a border between columns. A double border can be inserted using ||. Table cells can be modified using \multirow and \multicolumn. Borders between rows can be inserted using \hline.

\begin{center}...\end{center}

Surrounds the text with a center tag.

\begin{quote}...\end{quote}

Surrounds the text with a blockquote tag.

\begin{abstract}...\end{abstract}

Requires CSS for proper formatting, but lets one create an abstract for a page.

\begin{figure}[placement]...\end{figure}

Sets up a figure environment which can be floated (using CSS) and which sets the current object for reference to this figure. The \caption token may be used to caption the figure. The placements use CSS classes of the form figure_placement, and all figures are of the CSS class figure.

\begin{framebox}...\end{framebox}

Wraps text with a span tag, whose class is framebox.

\begin{fbox}...\end{fbox}

Same as the framebox environment.

\begin{verbatim}...\end{verbatim}

Takes the inside text verbatim and wraps it in PRE tags. Beware: the text before and including the first newline is removed so that one can begin the verbatim text on its own lines. Also, the resulting text is escaped for HTML, so HTML-tag-looking text will not be mistaken for actual HTML.

2.5. Variables

_global_base_out_dir: The base output directory. This is set by the command line.
_curr_out_dir: This is the current output directory. This can be set by \setoutputdir.
_global_input_dir: This is the current directory relative to which input files are referenced. This can be modified by \include.
_fluid_let: This is a list of variables which should be saved while including another file. By default, the list consists of _global_input_dir and _curr_out_dir.
_curr_pageid: This is the id of the current page for handling references.
_page_template: This is a template which the page environment uses to construct the output file. The template is read in without being evaluated (but it is parsed).
_page_title: The title of the current page.
_page_modified: This is text set by \modified.
_page_footnotes: This is a list of footnotes to be rendered at the end of the current page.
_curr_page_reference: This is the reference object for the current page (see references.py).
_table_formatting: Contains the formatting string for the current tabular environment.
_figure_placement: This is the placement for the current figure environment.
_page_css: This is the filename of the css with respect to the output directory.
_breadcrumbs: This is a list of (name, label) pairs which form the “breadcrumb trail” to put at the top of a page (for helping show website structure).

2.6. Math mode

Math mode kind of works to some degree, but it really shouldn’t be part of this language: typesetting mathematics in HTML is way too hard! But, it is possible to do some simple math such as $x^2+2x+1=(x+1)(x+1)$ for x² + 2x + 1 = (x + 1) ⋅ (x + 1). Greek letters (such as α) are defined just as in LaTeX, as well as all LaTeX operators (but not very well).

Again: it is recommended to not do anything too complicated in math mode.

For centered math, \begin{equation*}...\end{equation*} is implemented.