JoelNothman.com

25 March, 2012

Reporting data with datatemplate

Filed under: Open software,Research,Technology by Joel @ 10:30 pm, 25 March 2012.

You have a lot of data, but you only need to show a little here, a little there. The data also might change, and you want to easily update a table in LaTeX or HTML. You could just format it by hand, but I see a lot of copy-paste, regex or shell hacking in your future… and you risk forgetting to update your table to match the changed raw data.

datatemplate intends to be a generic tool for loading, extracting and
formatting the necessary data. It draws on three tools:

  • Django templates
  • An easy Python or command-line interface for loading common data sources (CSV, JSON, etc.) into the template context
  • New template tags for using SQL directly in Django templates, plus the ability to import comma-delimited or tab-delimited data as a SQLite table for random access

Django templates are a fairly powerful way to include custom formatting and template control flow. Datatemplate is not limited to producing tables either, and might also be used to generate LaTeX \newcommand definitions from data.

See also the readme at the datatemplate github page.

Example

Say you have experiments.csv containing:

experiment,variable,precision,recall
e1,A,0.5,0.4
e1,B,0.6,0.7
e2,A,0.5,0.6
e2,B,0.4,0.6
e3,A,0.5,0.3
e3,B,0.8,0.7
e4,A,0.2,0.4
e4,B,0.5,0.6
e5,A,0.6,0.6
e5,B,0.4,0.5

You want to output a LaTeX table of F1 scores, with rows for experiments e1, e3 and e5, and columns for variables A and B.

Write experiments.tpl:

\begin{tabular}{l|*{ {{columns|length}} }{r}}
\hline
Experiment {% for variable in columns %}& {{variable}}{% endfor %} \\
\hline
\hline
{% for row_label, experiment in rows %}
  {{row_label|texescape}}
  {% for variable in columns %}{% select 2 * precision * recall / (precision + recall) AS f1 FROM data WHERE experiment = "{{experiment}}" AND variable = "{{variable}}" %}
    & {{f1|floatformat}}
  {% endselect %}{% endfor %} \\
{% endfor %}
\hline
\end{tabular}

This depends on:

  • an SQLite database with a table called data loaded with the CSV content
  • rows, a sequence of (row label, selected experiment value) tuples
  • cols, a sequence of selected variable values

The {% select ... %} statement selects a single row from the database and places its results in context. {% forselect ... %}, which iterates over SQL query results, could have also been used. In loading the CSV data, columns are recognised as FLOAT or INT as necessary, so you can perform numerical select queries, or use Django filters such as floatformat.

We may then execute:

datatemplate --csvsql data=experiments.csv \
  --json-var rows='[["Baseline", "e1"], ["Improved", "e3"], ["Best", "e5"]]' \
  --json-var columns='["A", "B"]' \
  < experiments.tpl

This generates the LaTeX code for the following table:

Sample output table

[Apologies, the data is nonsense.]

Related tools

Powered by WordPress