Adventures with single-drive backup to LTO tape using open source tools

I got a Tandberg LTO-6 drive off eBay recently as a way to have an offline, air-gapped third backup of data that normally lives on my NAS or backup storage server.

Although my NAS is already backed up daily to a ZFS pool on another server, all of these systems are networked—and therefore, vulnerable to ransomware, malware, sloppy sysadmin commands on the terminal, and even electric-surge-caused hardware malfunction. And although I do back up some data to cloud storage, not all data is worth the recurring monthly charges of S3/Glacier/Backblaze B2. Besides, playing with hardware is fun.

Magnetic tape, which can store as much as 2.5 TB uncompressed (in LTO-6, the generation I started with) or 12 TB uncompressed (in LTO-8, the current generation as of mid-2021), is a time-tested option that fits in perfectly.

Veeam Backup & Replication Community Edition works well with standalone tape drives. However, it’s a proprietary system that uses Microsoft Tape Format for the on-tape format—a format that is very challenging to recover yourself without using proprietary tools. Moreover, the tape backup mechanism in Community Edition (i.e., without using licensed NAS backup features) is not meant for backing up large volumes of general purpose files—it’s really designed for archiving VM backups from disk.

LTFS also works. However, my initial attempts to use it were foiled by a Microsemi HBA that doesn’t support TLR. Also, if you don’t use proprietary tape software, LTFS can actually perform more slowly for a bunch of reasons (e.g., multithreaded copying, large number of small files, etc.).

When using a Linux desktop, way more options are available using decades-old software that was designed for tape from the get-go.

Remember tar (tape archive)?

Turns out: tar is as usable in 2021 for tape as it was thirty years ago.

Piping a sorted list of filenames into tar

find 'Folder/' -type f | sort | tar -cvf /dev/nst0 --no-recursion -T -

Getting checksums before tar

find 'Folder/' -type f -print0 | sort -z | xargs -0 sha256sum | tee Folder.20210812.sha256sum

You can use sha256sum directly with a glob spec of files, but find will recurse through directories.

Buffering tar with mbuffer

tar --label="archive-name" -b512 -cvf - | sudo mbuffer -m 8G -P 80 -s 262144 -o /dev/nst0

tar -b512 sets the blocking factor to 512 so that each tar record matches the 262144-byte block size of the tape drive (512 × 512 = 262144).

mbuffer -P 80 tries to fill the buffer to 80% before starting to write out.

mbuffer -s 262144 matches the 262144-byte block size.

Verifying the contents of the tape archive

tar -tvf /dev/nst0

This only reads through the end of the file. You need to advance to the next file to read through another tape archive.

Advancing the tape

mt -f /dev/nst0 status
mt -f /dev/nst0 fsf 1
mt -f /dev/nst0 bsf
mt -f /dev/nst0 rewind
mt -f /dev/nst0 eject

Enabling hardware encryption (drive dependent)

This Tandberg drive seems to have the same guts as an HP LTO-6 drive. 256-bit encryption keys can be generated and loaded, but these drives require an extra flag (-a 1). The convenience advantage of enabling hardware encryption is that we can stream from tar directly to tape and back, and the encryption is all transparent to the applications.

stenc -g 256 -k keyfile.key -kd "optional key description"
stenc -f /dev/nst0 -e on -a 1 --ckod --protect -k keyfile.key
stenc -f /dev/nst0 --detail
stenc -f /dev/nst0 -e off -a 1

Bonus: Encoding a barcode into cartridge memory (aka LTO-CM or MAM) using IBM ITDT

The barcode is set in the RFID memory chip and is assigned attribute number 0806. HPE’s LTFS utilities can encode it as part of the LTFS format process, but I figured out how to do this when not using LTFS.

Every attribute is preceded by a 5-byte attribute header, which contains:

  • 2 bytes: the attribute number itself (hex 08 06)
  • 2 bytes: format—apparently ASCII (hex 01 00)
  • 1 byte: length—this has to be 32 decimal (hex 20)

The remaining 32 bytes should be padded with spaces. An example 37-byte binary file, when dumped using xxd (hexadecimal representation on the left, ASCII on the right) should look like this:

$ xxd 0806.bin
00000000: 0806 0100 2046 4a4b 3637 304c 3620 2020  .... FJK670L6
00000010: 2020 2020 2020 2020 2020 2020 2020 2020
00000020: 2020 2020 20

We can try to read the attribute from the cartridge using ITDT:

.\itdt.exe -f \\.\tape0 readattr -p 0 -a 0806 -d 0806.bin

And we can try to encode it to the cartridge using ITDT:

.\itdt.exe -f \\.\tape0 writeattr -p 0 -a 0806 -s 0806.bin

Here’s the evidence that the barcode was properly encoded:

Screenshot of HPE Library and Tape Tools showing barcode field

Removing useless Windows 10 preinstalled apps

Based on https://community.spiceworks.com/topic/1408834-removing-windows-10-apps-gpo, with the additional refinement of a filter that removes only Store apps and not system apps or frameworks, using PowerShell:

1. List the apps that would be uninstalled.

The -AllUsers flag requires an elevated PowerShell run on an administrator account. Omit the -AllUsers flag if running as a nonadministrator for the current user.

Get-AppxPackage -AllUsers | where-object {$_.IsFramework -eq $false -And $_.name -notlike "*store*" -And $_.name -notlike "*calc*" -And $_.SignatureKind -eq "Store"} | select Name

On a 1711 newly installed VM, this resulted in this list:

Name
----
Microsoft.MicrosoftOfficeHub
Microsoft.Microsoft3DViewer
Microsoft.ZuneVideo
Microsoft.WindowsMaps
Microsoft.WindowsFeedbackHub
Microsoft.BingWeather
Microsoft.Messaging
Microsoft.MicrosoftStickyNotes
Microsoft.XboxIdentityProvider
Microsoft.XboxSpeechToTextOverlay
Microsoft.Print3D
Microsoft.GetHelp
Microsoft.WindowsSoundRecorder
Microsoft.Getstarted
Microsoft.WindowsCamera
Microsoft.3DBuilder
Microsoft.Xbox.TCUI
Microsoft.People
Microsoft.RemoteDesktop
Microsoft.XboxGameOverlay
Microsoft.Office.Sway
Microsoft.Windows.Photos
Microsoft.MSPaint
Microsoft.SkypeApp
Microsoft.XboxApp
Microsoft.DesktopAppInstaller
Microsoft.WindowsAlarms
Microsoft.OneConnect
Microsoft.Wallet
Microsoft.ZuneMusic
Microsoft.Office.OneNote
microsoft.windowscommunicationsapps
Microsoft.MicrosoftSolitaireCollection

2. Actually uninstall them.

Get-AppxPackage -AllUsers | where-object {$_.IsFramework -eq $false -And $_.name -notlike "*store*" -And $_.name -notlike "*calc*" -And $_.SignatureKind -eq "Store"} | Remove-AppxPackage

Certain apps that cannot be uninstalled might be listed in the output.

Creating a LUKS-encrypted DVD/BD data disc

I’ve been backing up some of my larger files to Bluray lately, instead of trying to upload them over a 10 Mbps uplink.

In the past, I used GPG (on a .tar or compressed .tar.xz) or Veracrypt (on a file container) to encrypt at rest, before burning those files onto a standard UDF/ISO9660 optical disc. Now that I use a Linux desktop, I wanted something slightly more native — a method that

  1. protects the directory structure and filenames without needing to use an archive file (like .tar);
  2. would be generally unintelligible on a Windows PC (this is a feature, not a bug); and
  3. could be scripted on the command line for server backups, without requiring a GUI.

Based on some resources online, I settled on using LUKS.

Continue reading “Creating a LUKS-encrypted DVD/BD data disc”

Microsoft derps on Excel ad

Original resolution of Excel ad showing treemap chart

Microsoft’s Facebook ad for new features in Excel highlights the Treemap visualization, but gets it totally wrong.

Ad for Excel visualization features

A treemap is supposed to visualize relative size in a hierarchy. But in the illustration here, the data don’t fit this type of visualization (it’s a time series of one flat variable—without hierarchy).

Original resolution of Excel ad showing treemap chart

But it’s even worse than that. The relative sizes don’t make sense! Why would the 31 MPG box for January be so much larger than the 32 MPG box for May?

This seems like a great illustration of why math/statistical education should be required for everyone—even visual designers and marketers. Or at least, the people selling the product should understand what the software actually does.

CRA PDF form validation frustrations

While trying to file my Canadian taxes as a nonresident, using the “Income Tax and Benefit Return for Non-residents … of Canada” — since I live in the United States and am a tax resident of the United States — I ran into a really frustrating bug in the first 5 form fields.

The form doesn’t accept non-Canadian provinces/territories and postal codes!

Can't put Massachusetts as a state
MA? not allowed.

Can't input a ZIP code
ZIP code? not allowed.

It’s really foolish, because many of the people who would be filing this form are likely residing outside of Canada. That’s why this version of the T1 return has an added Country field in the address block.

This is the kind of situation when PDF forms should just step back and allow free-form, unvalidated input.