Archiving cryptographic secrets on paper

Tags:

For storing rarely used secrets that should not be kept on a networked computer, it is convenient to print them on paper. However, ordinary barcodes can store not much more than 2000 octets of data, and in practice even such small amounts cannot be reliably read by widely used software (e.g. ZXing).

In this note I show a script for splitting small amounts of data across multiple barcodes and generating a printable document. Specifically, this script is limited to less than 7650 alphanumeric characters, such as from the Base-64 alphabet. It can be used for archiving Tarsnap keys, GPG keys, SSH keys, etc.

The script is implemented in Python, since this is one of the most widespread interpreters, is compatible with both Python 2 and Python 3, and has one external dependency, the iec16022 binary. On Debian-based systems these can be installed using apt-get install python3 iec16022.

The script accepts any ASCII sequence and generates an HTML page sized adequately for printing on A5 paper that contains multiple ISO/IEC 16022 (Data Matrix) barcodes. The barcodes can be read with any off-the-shelf software, e.g. ZXing. Even if up to 30% of the barcode area is corrupted, the data can still be recovered.

Warning: versions of iec16022 prior to 0.2.7 are likely to randomly drop characters at the end of the barcode. Every time you are using this tool, check that the key is actually recoverable before irreversibly erasing it.

multi_iec16022.py (download)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#!/usr/bin/env python3
# encoding:utf-8

# This code is released under CC0.
# https://creativecommons.org/publicdomain/zero/1.0/

import argparse, re, os, subprocess, hashlib

parser = argparse.ArgumentParser(description='Convert ASCII data to printable A5 HTML page.')
parser.add_argument('input', metavar='INPUT', type=argparse.FileType('r'),
                    help='input tarsnap.key file')
parser.add_argument('output', metavar='OUTPUT', type=str,
                    help='output directory with HTML and DataMatrix images')
parser.add_argument('title', metavar='TITLE', type=str,
                    help='title for the output page')
args = parser.parse_args()

key = args.input.read()
os.mkdir(args.output, 0o700)

chunks = re.findall(re.compile('.{1,850}', re.DOTALL), key)
for index, chunk in enumerate(chunks):
    subprocess.call([
        "iec16022", "--ecc=200", "--format=PNG",
        "--barcode={}".format(chunk),
        "--outfile={}/chunk{}.png".format(args.output, index)
    ])

with open("{}/index.html".format(args.output), "w") as html:
    images = ""
    for index, chunk in enumerate(chunks):
        images += """<img src="chunk{}.png">""".format(index)
    digest = hashlib.sha256(key.encode()).hexdigest()
    html.write("""<!DOCTYPE html>
<head>
    <style type="text/css">
    * {{ margin: 0; padding: 0; }}
    body {{ width: 128mm; height: 190mm;
            margin: 10mm; padding: 1mm;
            border: 1px solid black;
            font-size: 14px; }}
    p {{ padding-top: 1em; }}
    img {{ width: 41mm; height: 41mm;
           image-rendering: pixelated; }}
    </style>
</head>
<body>
    <p>This page contains {}.</p>
    <p>This data is encoded using multiple ISO/IEC 16022:2006
    (Data&nbsp;Matrix) ECC 200 barcodes.
    To reproduce the data, scan every barcode from left to right
    and from top to bottom, and concatenate their contents without
    anything in between.</p>
    <p>The SHA-256 digest of the original data is <small>{}</small>.</p>

    <p>{}</p>
</body>
""".format(args.title, digest, images))

It can be invoked as follows:

1
$ python multi_iec16022.py tarsnap.key tarsnap_datamatrix "Tarsnap key for foobar.com"

Afterwards, tarsnap_datamatrix/index.html will contain a page similar to the following:

This page can now be printed on a laser printer (with no margins) and laminated. If done properly, it is likely to outlast the service for which it holds the secrets.

Note that ZXing works most reliably when the types of barcodes are restricted to DataMatrix alone, and also it has both a maximum distance to symbol (where the data is no longer recoverable) as well as a minimum distance to symbol (where the symbol takes too much area of the camera, confusing the pattern recognizer).

The ISO/IEC 16022 format was chosen because it is widely supported and admits a flexible character set (e.g. the grammar of alphanumeric QR codes does not include lowercase letters). However, no extensive thought was put into this choice and it is possible that another 2D barcode would be more efficient.


Want to discuss this note? Drop me a letter.