Combining tar with logging and mbuffer

As a supplement to my last blog post on archiving to an LTO tape drive using conventional open source utilities like tar, this is how you can save a filelist of the tar output (same as would be outputted by tar -tvf /dev/nst0) to a text file, while also redirecting tar through mbuffer to the tape drive.

tar --label="backup-20230101-volume2" -b512 -cvf - --exclude='.DS_Store' --files-from=/tmp/directories-list.txt 2> >(tee /tmp/tar-filelist.txt >&2) | mbuffer -m 4G -P 80 -s 262144 -o /dev/nst0

It’s critical that the output of tee is again redirected to stderr using >&2 — if you don’t do this, the text from the log will end up in the stdout that gets piped to the mbuffer, which will get written to tape. In that circumstance, tar will not be able to understand the archive when reading it back, since there will be spurious text data.

If you aren’t piping tar’s stdout through mbuffer, you can avoid the redirection problem because tar won’t be outputting to stdout at all. For example:

tar --label="backup-20230101-volume2" -b512 -cvf /dev/nst0 --exclude='.DS_Store' --files-from=/tmp/directories-list.txt 2> tee /tmp/tar-filelist.txt

Adventures with single-drive backup to LTO tape using open source tools

I got a Tandberg LTO-6 drive off eBay recently as a way to have an offline, air-gapped third backup of data that normally lives on my NAS or backup storage server.

Although my NAS is already backed up daily to a ZFS pool on another server, all of these systems are networked—and therefore, vulnerable to ransomware, malware, sloppy sysadmin commands on the terminal, and even electric-surge-caused hardware malfunction. And although I do back up some data to cloud storage, not all data is worth the recurring monthly charges of S3/Glacier/Backblaze B2. Besides, playing with hardware is fun.

Magnetic tape, which can store as much as 2.5 TB uncompressed (in LTO-6, the generation I started with) or 12 TB uncompressed (in LTO-8, the current generation as of mid-2021), is a time-tested option that fits in perfectly.

Veeam Backup & Replication Community Edition works well with standalone tape drives. However, it’s a proprietary system that uses Microsoft Tape Format for the on-tape format—a format that is very challenging to recover yourself without using proprietary tools. Moreover, the tape backup mechanism in Community Edition (i.e., without using licensed NAS backup features) is not meant for backing up large volumes of general purpose files—it’s really designed for archiving VM backups from disk.

LTFS also works. However, my initial attempts to use it were foiled by a Microsemi HBA that doesn’t support TLR. Also, if you don’t use proprietary tape software, LTFS can actually perform more slowly for a bunch of reasons (e.g., multithreaded copying, large number of small files, etc.).

When using a Linux desktop, way more options are available using decades-old software that was designed for tape from the get-go.

Remember tar (tape archive)?

Turns out: tar is as usable in 2021 for tape as it was thirty years ago.

Piping a sorted list of filenames into tar

find 'Folder/' -type f | sort | tar -cvf /dev/nst0 --no-recursion -T -

Getting checksums before tar

find 'Folder/' -type f -print0 | sort -z | xargs -0 sha256sum | tee Folder.20210812.sha256sum

You can use sha256sum directly with a glob spec of files, but find will recurse through directories.

Buffering tar with mbuffer

tar --label="archive-name" -b512 -cvf - | sudo mbuffer -m 8G -P 80 -s 262144 -o /dev/nst0

tar -b512 sets the blocking factor to 512 so that each tar record matches the 262144-byte block size of the tape drive (512 × 512 = 262144).

mbuffer -P 80 tries to fill the buffer to 80% before starting to write out.

mbuffer -s 262144 matches the 262144-byte block size.

Verifying the contents of the tape archive

tar -tvf /dev/nst0

This only reads through the end of the file. You need to advance to the next file to read through another tape archive.

Advancing the tape

mt -f /dev/nst0 status
mt -f /dev/nst0 fsf 1
mt -f /dev/nst0 bsf
mt -f /dev/nst0 rewind
mt -f /dev/nst0 eject

Enabling hardware encryption (drive dependent)

This Tandberg drive seems to have the same guts as an HP LTO-6 drive. 256-bit encryption keys can be generated and loaded, but these drives require an extra flag (-a 1). The convenience advantage of enabling hardware encryption is that we can stream from tar directly to tape and back, and the encryption is all transparent to the applications.

stenc -g 256 -k keyfile.key -kd "optional key description"
stenc -f /dev/nst0 -e on -a 1 --ckod --protect -k keyfile.key
stenc -f /dev/nst0 --detail
stenc -f /dev/nst0 -e off -a 1

Bonus: Encoding a barcode into cartridge memory (aka LTO-CM or MAM) using IBM ITDT

The barcode is set in the RFID memory chip and is assigned attribute number 0806. HPE’s LTFS utilities can encode it as part of the LTFS format process, but I figured out how to do this when not using LTFS.

Every attribute is preceded by a 5-byte attribute header, which contains:

  • 2 bytes: the attribute number itself (hex 08 06)
  • 2 bytes: format—apparently ASCII (hex 01 00)
  • 1 byte: length—this has to be 32 decimal (hex 20)

The remaining 32 bytes should be padded with spaces. An example 37-byte binary file, when dumped using xxd (hexadecimal representation on the left, ASCII on the right) should look like this:

$ xxd 0806.bin
00000000: 0806 0100 2046 4a4b 3637 304c 3620 2020  .... FJK670L6
00000010: 2020 2020 2020 2020 2020 2020 2020 2020
00000020: 2020 2020 20

We can try to read the attribute from the cartridge using ITDT:

.\itdt.exe -f \\.\tape0 readattr -p 0 -a 0806 -d 0806.bin

And we can try to encode it to the cartridge using ITDT:

.\itdt.exe -f \\.\tape0 writeattr -p 0 -a 0806 -s 0806.bin

Here’s the evidence that the barcode was properly encoded:

Screenshot of HPE Library and Tape Tools showing barcode field

Appendix: Source Code

These are backups of the open source programs used above, providing some assurance that even if these programs end up disappearing from Linux distributions’ package repositories, I will still be able to access the data stored on these tapes. (There’s probably nothing to worry about here; it’s more likely LTO-6 drives will be EOL long before tar and mt-st disappear.)

Creating a LUKS-encrypted DVD/BD data disc

I’ve been backing up some of my larger files to Bluray lately, instead of trying to upload them over a 10 Mbps uplink.

In the past, I used GPG (on a .tar or compressed .tar.xz) or Veracrypt (on a file container) to encrypt at rest, before burning those files onto a standard UDF/ISO9660 optical disc. Now that I use a Linux desktop, I wanted something slightly more native — a method that

  1. protects the directory structure and filenames without needing to use an archive file (like .tar);
  2. would be generally unintelligible on a Windows PC (this is a feature, not a bug); and
  3. could be scripted on the command line for server backups, without requiring a GUI.

Based on some resources online, I settled on using LUKS.

Continue reading “Creating a LUKS-encrypted DVD/BD data disc”

Using OpenStack images on XenServer – Fedora 22, CentOS 7

For a long time, I’ve been using kickstart scripts (link to GitHub repo) to set up Fedora and CentOS virtual machines on a XenServer host. In the last year or so, the trend of cloud computing has led distributions to release prebuilt “cloud” images in OpenStack-compatible qcow2 or raw disk format, which happen to be broadly compatible with hypervisors. Fedora Cloud’s introduction with F21 prompted me to look into ways of using cloud-init/cloud-config without an entire private cloud infrastructure.

It should no longer be necessary to use a kickstart to install a new VM, because the distribution’s prebuilt images easily work on XenServer with a few conversions.

(Kickstart scripts remain useful for customizing an image, of course; they are often the mechanism with which Linux distros build such images.)

What are prebuilt images?

When I say “prebuilt images”, I mean VM hard disk files released by the Linux distribution. For instance, Fedora 22’s Cloud Base and Atomic Host images are provided in qcow2 and xz’d raw files:

Fedora 22 Cloud Base and Atomic Host images

These releases are designed to work in actual cloud infrastructure—meaning a compute hypervisor (usually KVM), a metadata service that supplies configuration like hostname and networking at boot time, and some APIs that can programmatically affect the virtual machine’s behaviour and configuration. OpenStack is the leading example.

But OpenStack is overkill when you’re just virtualizing a handful of VMs. You don’t need a private cloud when you’re not running a cluster or spinning up machines programmatically. That’s exactly why I found myself running XenServer.

Nonetheless, unless you’re using Xen full paravirtualization (which there are now good reasons to avoid), these images should broadly work with all major hypervisors: QEMU-KVM, VirtualBox, Xen PVHVM, VMware, etc… with minor format tweaks.

How to convert a prebuilt image for use in XenServer

Broadly, there are three steps in the process, the first of which is most important:

  1. Convert qcow2 disk image to VHD.
  2. Import VHD in XenCenter.
  3. Customize imported machine and convert to template.

You can optionally also export the template to an XVA file.

1. Convert qcow2 to VHD

The qemu-img utility can do this. Use your package manager of choice to install (e.g. yum install qemu-img or dnf install qemu-img on F22+). You should do this on another Linux machine (even a VM is okay), because messing with the Xen dom0 is not recommended.

Locate your downloaded *.qcow2 file, which might look something like Fedora-Cloud-Base-22-20150521.x86_64.qcow2. If it’s compressed, like CentOS-Atomic-Host-7.1.2-GenericCloud.qcow2.xz, decompress it first.

Use the command $ qemu-img convert -f qcow2 -O vpc [input file] [output file] to do the conversion. For example,

$ qemu-img convert -f qcow2 -O vpc Fedora-Cloud-Base-22-20150521.x86_64.qcow2 Fedora-Cloud-Base-22-20150521.x86_64.vhd

2. Import the new VHD

If you have XenCenter installed on Windows, use the File -> Import… option to load the VHD. Follow the prompts to set up the VM’s CPU, memory, storage, and networking allocations.

Manual import on command line

Ugh, not using the UI? That means a whole lot more work to import. Are you sure about this???

If you do not have access to XenCenter, it’s a more involved process.

Transfer the newly converted disk image to the hypervisor dom0, such as by copying it into a shared storage location (e.g. NFS image library), and you should be able to use xe vdi-import to load the VHD:

First, get the size of the disk image with $ qemu-img info [VHD file]. Note the size in bytes.

$ qemu-img info Fedora-Cloud-Base-22-20150521.x86_64.vhd
image: Fedora-Cloud-Base-22-20150521.x86_64.vhd
file format: vpc
virtual size: 3.0G (3221471232 bytes)
disk size: 516M
cluster_size: 2097152

Create a VDI in XenServer using the command line tool to hold this new data:

# set SIZE to size in bytes, e.g.
$ SIZE=3221471232
# set SR to the UUID of a storage repository in which to store the VDI
$ SR=$(xe sr-list name-label='NFS virtual disk storage' --minimal)
$ UUID=$(xe vdi-create name-label=Fedora-Cloud-Base-22-20150521.x86_64 virtual-size=$SIZE sr-uuid=$SR type=user)

Then load the VHD:

$ xe vdi-import uuid=$UUID filename=Fedora-Cloud-Base-22-20150521.x86_64.vhd format=vhd --progress

If all has gone well, you get output to the effect of

[|] ######################################################> (100% ETA 00:00:00) 
Total time: 00:00:24

You can check that it’s there by doing

$ xe vdi-list uuid=$UUID

It’s time to make a VM (important: must be PVHVM) to which to attach this VHD. You’ll need to create the CD drive, set up networking, etc, all on the command line. The CD drive should be installed with a cloud-init/cloud-config datasource(Aren’t you regretting not using the GUI now?)

$ VM=$(xe vm-install new-name-label=Fedora-Cloud-Base-22-20150521 template='Other install media')
# make an optical drive, which you might need for cloud-init
$ xe vm-cd-add cd-name='cloud-init-example.iso' vm=$VM device=3

# get the list of networks and their UUIDs; select one
$ xe network-list
# the following line is an example
$ xe vif-create network-uuid=b4187ad6-916e-d1d4-90a7-2b7f1353bca2 vm-uuid=$VM device=0

Now, create the virtual block device (VBD) that associates the VHD disk image with the VM.

$ VBD=$(xe vbd-create vm-uuid=$VM device=0 vdi-uuid=$UUID bootable=true mode=RW type=Disk)

 

The VM is now ready (although you’ll need to adjust CPU and RAM, which is outside the scope of this guide), either to be booted or to be stored as a template!

3. Customize and convert to template

I like to convert the now-ready VM to a template before using it for anything. This makes it a lot easier to deploy from this point onward. It’s also helpful to tweak the default CPU/memory parameters if desired.

When it’s ready, you can select a halted VM, and choose VM -> Convert to Template… in XenCenter. The equivalent for the xe CLI is something I haven’t figured out yet; the process might require taking a snapshot, and copying the snapshot to become a template.

 

Installing a Puppet master on CentOS 7 with nginx and Unicorn

Puppet master node successful test

I was experimenting with configuration management tools, and wanted to set up a Puppet master node for my virtualized machines.

It is unfortunate that most guides out there today are tailored specifically for Ubuntu, or rely on prepackaged DEBs that do all the work (which, obviously, don’t really help on CentOS/Fedora/RedHat). This guide on DigitalOcean for setting up a Puppet master on Ubuntu 14.04 is pretty solid, but my own preferences are for CentOS and Fedora. Furthermore, I have completely migrated to using nginx in all my servers, though many deployment guides for Puppet still use Apache and Passenger. And the closest I could find in a guide for CentOS 6, nginx, and Unicorn used SysVinit and God… which are unnecessary now that systemd has been adopted.

(If you’re not as picky, just use Foreman Installer. It’ll take care of everything… even on CentOS 7.)

This guide will use:

  • CentOS 7 (at the time of writing, latest release)
    • systemd
  • nginx 1.7.x (mainline, from official nginx repository)
  • Unicorn
  • Puppet open source 3.7.x

Continue reading “Installing a Puppet master on CentOS 7 with nginx and Unicorn”

Fedora 21 on XenServer

Fedora

In this post:

  1. Prebuilt Fedora Cloud images for XenServer
  2. Kickstart scripts for Fedora Server on XenServer

Fedora
Fedora 21 was just released earlier this week on December 9, 2014. The biggest change was the separation of the distribution into three “products”:

  • Fedora Cloud, a lightweight optimized distribution for public/private clouds, containerization with Docker and Project Atomic.
  • Fedora Server, the traditional platform for running services, usually on a headless host whether virtualized or on baremetal.
  • Fedora Workstation, a developer-friendly desktop running a cutting edge OS.

Of course, as always, OpenStack/KVM and Docker get a lot of love, but Xen and XenServer are once again ignored. This post is my solution. With the prebuilt images distributed here and the kickstart scripts below, you can deploy Fedora 21 on your own XenServer (6.2+).
Continue reading “Fedora 21 on XenServer”

Found some old screenshots…

When I first came to Penn, the website for the Nominations & Elections Committee looked like this:

Old NEC site circa 2011
No, this wasn’t the year 1999… this was in 2011.

NEC website redesign

I set out to redevelop and redesign this, upgrading it from a static HTML site edited over SFTP to a WordPress CMS on Canvas. More importantly, the website redesign in 2012 needed to fit the rebranding that Penn underwent that academic year. In other words, I wanted it to look more like the university’s design. (An email to the Communications office responsible for web assets clarified that we could, in fact, do this.)

Continue reading “Found some old screenshots…”

PVHVM CentOS 7 on XenServer

In this post:

  1. Benchmarks
  2. Prebuilt image
  3. Kickstart script

Following my previous post on running CentOS 7 and Ubuntu 14.04 as fully-paravirtualized guests on XenServer, I ran some benchmarks to compare the relative performance of fully-paravirtualized (henceforth abbreviated PV) domUs against HVM guests using paravirt drivers and interrupts/timers (henceforth PVHVM).

The performance differences between the two types has been studied for some time. Once upon a time, PV was undoubtedly faster, free of the overhead associated with full hardware emulation. With newer hardware features that have been supported for the last few years, PVHVM, which takes advantage of features in the processor as well as a Linux kernel that recognizes that it is operating as a virtual guest, has surpassed PV performance in most arenas.

Benchmarks

Benchmarks have severe limitations. The statistics here are only meant to be compared relatively among themselves—between the PV and PVHVM guests running exactly the same specs and software. It would be a futile exercise to compare them against VMs running on other servers, which might have better SANs, lighter workloads, or faster CPUs and RAM. The specific test profiles in the Phoronix software are also based on outdated versions of Apache httpd and nginx, which makes them unreliable for assessing real-world performance.

Some of the relevant comparisons:

It’s worth noting that CentOS 7 with a 3.10 kernel performed poorly compared to other distributions—both Fedora 20 (kernel 3.15) and Ubuntu 14.04 (kernel 3.13) outperformed CentOS on web serving workloads (not shown). But the evidence pretty conclusively showed that PVHVM generally performed better than PV on all of the operating systems.

Prebuilt image

Update (2017-04-28): Because these images are now out of date and insecure, the .xva images have been deleted. You should instead use the distribution’s latest cloud images in .qcow2 format, converted for XenServer.

To that end, I’d like to offer a prebuilt CentOS 7.0.1406 image that runs in PVHVM on XenServer. You should feel free to choose between this and the PV version from my previous post, depending on your needs. (If you need to accommodate higher density on your server, you might be better off with PV. Run benchmarks yourself to decide what you should use.)

As before, you can decompress (xz -d ___.xvz.xz or use your GUI of choice) then import through XenCenter (File – Import…) or the command line (xe vm-import filename=___.xva).

This image is provided with no guarantees. Please let me know in the comments if you find an issue with it.

  • CentOS 7.0.1406 (as of 2014-07-31)
    Filename: centos-7.0.1406-20140731-pvhvm-template.xva.xz
    Size: 325 MB xz-compressed; 1.4 GB decompressed
    Specs: 2 vCPUs, 2 GB RAM, 8 GB disk without swap, installed software, with XenServer Tools 6.4.93
    SHA256 hash: c3ef221ae886cea4c3be09996d0cb2049dc2ac8f10dd5323f85beee25ce9d4cd
    MD5 hash: 44583aa3cdbf1db1c62b2db05530ce6f
    Username: centos
    Password: Asdfqwerty

Kickstart script

A PVHVM system requires no special accommodations when installing, except that UEFI and GPT are not certain to be supported. Merely select the “Other install media” option in XenCenter, and use a standard installer ISO/DVD. Do NOT use any of the CentOS or RHEL templates! Those will create PV guests.

An automated kickstart like the one used to create the image above may help you build a generic template. Hit <Tab> at the CentOS DVD menu and append a ks=__ parameter to use a kickstart script hosted at an HTTP location.

The image above was built with the cent70-server-pvhvm.ks script at revision e278f2a8139fb624bc2cdcd9a80d8b51b7910de3, embedded below. If there are any updates to this script, they will be added to the develop branch on GitHub. You can also edit it yourself before deploying.

[github file=/frederickding/xenserver-kickstart/blob/e278f2a8139fb624bc2cdcd9a80d8b51b7910de3/centos-7.0/cent70-server-pvhvm.ks][/github]

Did this help you?

If you were able to use this image or the kickstart, I’d appreciate a brief comment to let me know it worked for you. I’d hope that the bandwidth costs are going to good use!