EXOTIC SILICON
“Delving in to the depths of disklabels, and discovering differences between BSDs”
Jay looks at disk partitioning and BSD disklabels in detail
Reckless Guide
Part 6
How well do you really understand disk partitioning on BSD systems?
Beginners are often confused by this topic, and even advanced users sometimes seem to have gaps in their knowledge. Urban myths abound, so this week we'll cut through them and de-mystify things a bit with Jay's help.
This article is part of a series - check out the index.
Website themes
The Exotic Silicon website is available in ten themes, but you haven't chosen one yet!
A matter of opinion...
The disklabel describes the way that OpenBSD sees the storage volume as being partitioned, which might be completely different to the view that other operating systems have of it.
And also different to the way that you consider it to be partitioned!
Scope of this discussion
A large number of disk partitioning schemes have evolved over the years to meet the needs of changing hardware, but in this article we'll mostly be discussing those that you're likely to encounter when running a recent version of an OpenBSD system, on either an amd64 based PC, or an arm based SBC.
Since some of the details of disk partitioning are similar between OpenBSD, NetBSD, and FreeBSD systems, we'll note some of the most relevant compatibilities and differences between OpenBSD and the other systems along the way. We'll also include some historial perspective where it might be useful.
Basic knowledge - MBR partitioning
If you have any experience at all of installing an operating system on commodity PC hardware, you're almost certainly familiar with the MBR partitioning scheme. Also known as ‘fdisk’ partitioning, after the name of the program commonly used to manipulate it, (which is simply a contraction of Fixed Disk), this partitioning scheme has been used since the 1980s.
In it's most basic form, the MBR partitioning scheme allows up to four partitions to be defined on a disk. Since many or even most operating systems default to configuring all of the space that is allocated to them on any particular physical disk as a single logical volume, this four partition limit was more than adequate for a machine booting one or two different operating systems from the same, and perhaps only, hard disk.
Operating systems other than BSD systems that did want to create multiple storage volumes on the same physical disk would usually use multiple MBR partitions. Linux systems, for example, often use two MBR partitions. One of them is used for both the OS itself and user data, and the second partition is used as swap space.
Basic knowledge - BSD disklabels
In contrast to this, BSD systems have traditionally divided disk space into many partitions. Historically, these partitions would also often have been located on different physical disks. This not only allowed the storage capacity to exceed that of a single disk, but also provided an opportunity to tune performance by placing heavily used partitions such as /var on higher performance disks. This is why /var is traditionally the place to store ‘variable’ data.
On non-PC architectures, where the MBR scheme was not an established standard, BSD had it's own partitioning scheme known as the ‘BSD disklabel’. This originally allowed up to eight part partitions to be defined, and the number was later increased to sixteen. When BSD was introduced to the PC architecture, the disklabel scheme was retained and made to work in conjunction with a single MBR partition. In this way, any other operating systems could identify the area of the disk used by the BSD system from the presence of it's MBR partition, whilst the BSD system itself could define as many of it's own partitions as it liked using the disklabel.
Few other operating systems use a similar notion of allocating a single MBR partition on PC architectures, and then dividing it using their own partitioning scheme. One that does is Plan 9, and although the Plan 9 partitioning scheme is somewhat different to a BSD disklabel, it has an analogous function. (Although note that when contained within an MBR partition on a disk that contains other operating systems, the Plan 9 partitioning scheme only describes the system's own partitions, in contrast to a BSD disklabel on OpenBSD where the disklabel describes the whole disk.)
The disklabel does not live within the BSD MBR partition
Urban myth alert!
Forget the false concept that the disklabel lives inside the MBR partition.
The first thing to be aware of when trying to understand how disk partitioning works on an OpenBSD system, is that the two partitioning schemes are quite separate, and yet can happily co-inhabit the same physical disk.
To explain this unfamiliar concept in simple terms to new users of OpenBSD systems, a lie is often told that the disklabel partitioning scheme somehow lives, ‘inside’, the BSD MBR partition.
Whilst this is, indeed, a convenient way to think about the way that disks are partitioned when you're creating your first simple and straightforward setup, it's simply not true. This notion of the disklabel being within the MBR partition often causes problems and confusion when those same users try to create more complex partitioning schemes later on, and fail miserably.
As difficult as it might be, especially if you've been believing this lie for some time, to really grasp the concepts of disk partitioning on OpenBSD systems you should absolutely discard the mental image of the disklabel living inside the MBR partition.
Once again, the concept that the BSD MBR partition defines the whole area of the disk that the OpenBSD system will use, and that this broad and general information is used by all other installed operating systems whilst a second partitioning scheme within that MBR partition, which only the OpenBSD system reads, merely contains further details of how that chunk of the disk is used by the OpenBSD system itself, is plain wrong.
The BSD disklabel, if present, is responsible for describing the entire disk layout to the OpenBSD kernel, including space allocated to other operating systems, such as FAT, or EXT-2 partitions.
Multiple partitioning schemes on the same disk
The idea that a disk can not only contain two distinct partitioning schemes, but that they can be independent of each other and that any particular operating system might only look at one of them, might sound like a recipe for disaster. After all, surely both partition tables need to have the same overall view of the disk, and be kept in sync with each other, in order to avoid one operating system writing data over another operating system's filesystems. We need to ensure that any changes made in one partitioning scheme would be accurately reflected in the other one, and what about the case where a disk has a BSD disklabel, but no BSD MBR partition? What if we have more than one BSD MBR partition? Or multiple BSD systems installed on the same disk?
In fact, the system works quite well. Having the OpenBSD kernel consult only the BSD disklabel for it's partitioning information rather than trying to build it from multiple sources, greatly simplifies the setup and reduces the risk of an unusual disk layout breaking something.
As a result, none of the scenarios described above are necessarily problematic. Assuming that you understand what is going on, of course.
GPT partitioning
GPT is intended as a replacement for the MBR partitioning scheme and addresses some of the limitations that the MBR scheme imposes. GPT partitioning uses 64-bit addressing, and allows for many more partitions to be defined on a single disk, (typically 128).
Broadly speaking, on a BSD system there are two ways that such an enhanced partitioning scheme can be used. One approach is to define a single GPT partition covering the whole disk area allocated to BSD, and then arrange for this to be sub-divided using a BSD disklabel just as was done when using MBR partitioning. Alternatively, we could dispense with the BSD disklabel scheme altogether, and simply create a separate GPT partition for each BSD partition that was traditionally defined in it.
Both approaches have advantages and dis-advantages. Using the GPT partitioning scheme directly allows for other operating systems to more easily parse the disk layout and potentially access the data in the BSD partitions, without having to understand a disklabel structure.
On the other hand, the BSD disklabel is a more compact layout, which is easier to parse on smaller systems. It's also a ‘home-grown’ standard which is intimately linked to BSD, thus making it easier to adapt for future requirements by re-purposing obsolete data fields. This has already been done several times in OpenBSD, as we will see shortly. Finally, storing all of the data currently contained in a disklabel entry within a GPT might be difficult.
The disklabel structure has evolved during the development of OpenBSD, and several obsolete data fields have been re-purposed for new uses.
BSD disklabels allocate 32 bytes for free-text naming of the device, four bytes for flags, (rarely used), another four bytes mostly used for long obsolete low-level formatting purposes, and three bytes encoding parameters about the future filesystem which is to be created on the partition. These filesystem values are simply used as hints to newfs when it is invoked to initialize the new filesystem, and the actual parameters it uses are then written to the filesystem superblock. This allows the filesystem parameters to be decided in advance at the time of partitioning, and helps to ensure that if a partition is re-formatted then the same parameters are used again. On OpenBSD, each disklabel also includes an eight byte DUID.
Since GPT provides just 16 bits of space per partition for private usage, some trickery would be necessary to retain most or all of the features of the disklabel. Eliminating the free-text fields, which are less necessary in an era when hardware has other ways to report such identifying information, and the flags that are effectively never used, reduces the storage requirements to three bytes of filesystem parameters, and the DUID. GPT provides 16 bytes for a disk-wide GUID, but if the disk is shared with another OS then this identifier would not be exclusive to the BSD system, unlike the DUID in the OpenBSD disklabel. This could be considered an advantage or disadvantage depending on individual requirements. The remaining 24 bits of data define the fragment size, block size and cylinders per group. The range of common values for fragment and block size can be encoded in as little as six bits, and the cylinders per group parameter in the disklabel is virtually unused anyway.
Other BSD systems such as FreeBSD and NetBSD have adopted the GPT in such a way that the use of a traditional BSD disklabel is no longer necessary. OpenBSD has taken the other approach, and simply uses a single GPT partition in just the same way as it would use a single MBR partition. Even when using GPT, the OpenBSD kernel relies on a BSD disklabel to represent it's view of the different partitions.
The upshot of all this is that unless we need to use GPT for compatibility with another operating system or because the BIOS requires it for booting, we can happily continue using MBR and a BSD disklabel, even on disks larger than 2 TB that would necessitate the use of GPT with some other operating systems.
BSD disklabels - structure and location
On the amd64 and arm architectures, the BSD disklabel is stored in a single 512 byte sector on the disk, and typically looks something like this:
00000000 57 45 56 82 0c 00 00 00 76 6e 64 20 64 65 76 69 |WEV.....vnd devi|
00000010 63 65 00 00 00 00 00 00 66 69 63 74 69 74 69 6f |ce......fictitio|
00000020 75 73 00 00 00 00 00 00 00 02 00 00 64 00 00 00 |us..........d...|
00000030 01 00 00 00 f2 05 2a 01 64 00 00 00 88 52 6a 74 |......*.d....Rjt|
00000040 f0 c7 69 61 fe 68 25 bd 00 00 00 00 00 00 00 00 |..ia.h%.........|
00000050 00 00 00 00 88 52 6a 74 00 00 00 00 00 00 00 00 |.....Rjt........|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 57 45 56 82 b8 ea 10 00 00 20 00 00 |....WEV...... ..|
00000090 00 00 01 00 00 00 20 00 00 00 00 00 00 00 00 00 |...... .........|
000000a0 07 14 01 00 d4 09 85 0b 00 00 20 00 00 00 00 00 |.......... .....|
000000b0 01 00 00 00 88 52 6a 74 00 00 00 00 00 00 00 00 |.....Rjt........|
000000c0 00 00 00 00 e0 ff 7f 00 e0 09 a5 0b 00 00 00 00 |................|
000000d0 07 14 01 00 e0 ff f8 0e c0 09 25 0c 00 00 00 00 |..........%.....|
000000e0 07 14 01 00 00 00 c0 00 a0 09 1e 1b 00 00 00 00 |................|
000000f0 07 14 01 00 00 00 20 00 a0 09 de 1b 00 00 00 00 |...... .........|
00000100 07 14 01 00 00 00 80 02 a0 09 fe 1b 00 00 00 00 |................|
00000110 07 14 01 00 00 00 40 00 a0 09 7e 1e 00 00 00 00 |......@...~.....|
00000120 07 14 01 00 00 00 c0 00 a0 09 be 1e 00 00 00 00 |................|
00000130 07 14 01 00 c0 ff 7f 25 c0 09 7e 1f 00 00 00 00 |.......%..~.....|
00000140 07 1c 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The disklabel shown describes the following disk:
# /dev/rvnd0c:
type: vnd
disk: vnd device
label: fictitious
duid: f0c76961fe6825bd
flags:
bytes/sector: 512
sectors/track: 100
tracks/cylinder: 1
sectors/cylinder: 100
cylinders: 19531250
total sectors: 1953125000 # total bytes: 931.3G
boundstart: 0
boundend: 1953125000
drivedata: 0
16 partitions:
# size offset fstype [fsize bsize cpg]
a: 1.0G 0 4.2BSD 2048 16384 1
b: 92.2G 2097152 swap
c: 931.3G 0 unused
d: 4.0G 195365344 4.2BSD 2048 16384 1
e: 119.8G 203753920 4.2BSD 2048 16384 1
f: 6.0G 454953376 4.2BSD 2048 16384 1
g: 1.0G 467536288 4.2BSD 2048 16384 1
h: 20.0G 469633440 4.2BSD 2048 16384 1
i: 2.0G 511576480 4.2BSD 2048 16384 1
j: 6.0G 515770784 4.2BSD 2048 16384 1
k: 300.0G 528353728 4.2BSD 4096 32768 1
The DUID and the parameters for partition ‘a’ have been highlighted in the hexdump. The other partitions follow sequentially, each being described by 16 bytes. These disklabel features, together with the 32-bit magic of 0x57, 0x45, 0x56, 0x82, are usually enough to make it obvious when you're looking at a BSD disklabel structure on a disk, although note that the magic is also repeated at offset 0x84 into the label.
On OpenBSD, the basic structure of the BSD disklabel is defined in /usr/src/sys/sys/disklabel.h. This much is somewhat standardised across hardware architectures, and the byte locations of some of the parameters are the same in other BSD operating systems, so there is limited compatibility with NetBSD, for example.
However, the location on disk where the disklabel is actually written, is not consistent between architectures. The relevant architecture-specific defines are in /usr/src/sys/arch/*/include/disklabel.h, and some architectures also have a fair amount of code in /usr/src/sys/arch/*/*/disksubr.c to handle disklabels in formats that their native operating systems use.
In theory, the lack of a standard location to store the BSD disklabel means that a machine of one architecture may not read BSD disklabel information from a disk which has been partitioned on a machine of a different architecture.
In practice, though, if we look through the disklabel.h header file for each of the hardware architectures currently supported by OpenBSD, we find that all of them except alpha, luna 88k, and sparc64, define the same values for sector and offset. As a result, disk-interchange problems are likely to be minimal.
Fun fact!
Magic numbers
The same magic number of 0x82 0x56 0x45 0x57 is used by the disklabel structures in OpenBSD, NetBSD, and FreeBSD, even though the structures are different and not entirely compatible. DragonFly BSD also uses 0x82 0x56 0x45 0x57 in it's 32-bit disklabel structure, although it does have a different magic number for it's newer 64-bit disklabel format.
On both amd64 and arm, sector 0 is reserved for MBR partitioning data, and also boot code in the case of amd64. If a disk only has a BSD disklabel, and no MBR partitions at all, then the BSD disklabel will usually be written to sector 1. If sector 1 is in use by GPT, but no GPT partition is defined for OpenBSD, then recent versions of OpenBSD will write a BSD disklabel elsewhere. Older versions of OpenBSD would refuse to write a disklabel to such a disk.
If an MBR partition for a BSD system is present, then the disklabel for that OS usually lives in sector 1 of that partition.
BSD disklabels - compatibility between BSDs
Although the on-disk format of the BSD disklabel is, for many purposes, broadly compatible between OpenBSD and NetBSD, a disklabel in sector 1 of an MBR partition of type A9, (NetBSD), will be ignored by the OpenBSD kernel. When presented with such a disk, the OpenBSD system will construct a ‘default’ or ‘spoofed’ disklabel based on the data in the MBR, just as it would do if no BSD disklabel was present.
Looking at an installation image for NetBSD 9.2, (for the arm64 architecture), using /sbin/disklabel on an OpenBSD machine, we see the following:
# /dev/rvnd0c:
type: vnd
disk: vnd device
label: fictitious
duid: 0000000000000000
flags:
bytes/sector: 512
sectors/track: 100
tracks/cylinder: 1
sectors/cylinder: 100
cylinders: 23540
total sectors: 2354048
boundstart: 0
boundend: 2354048
drivedata: 0
16 partitions:
# size offset fstype [fsize bsize cpg]
c: 2354048 0 unused
i: 163840 32768 MSDOS
j: 2157440 196608 unknown
NetBSD installation image read by OpenBSD
As this is an image intended for arm64-based systems, it has a FAT partition defined in the MBR, (which used to hold boot code for this architecture), as well as the NetBSD partition. This can be confirmed by looking at the output of /sbin/fdisk:
Disk: vnd0 geometry: 23540/1/100 [2354048 Sectors]
Offset: 0 Signature: 0xAA55
Starting Ending LBA Info:
#: id C H S - C H S [ start: size ]
-------------------------------------------------------------------------------
*0: 0C 327 0 69 - 1966 0 8 [ 32768: 163840 ] Win95 FAT32L
1: A9 1966 0 9 - 23540 0 48 [ 196608: 2157440 ] NetBSD
2: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused
3: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused
The same image as read by /sbin/fdisk on OpenBSD
The OpenBSD kernel has simply created it's own BSD disklabel from the data in the MBR, and just as the MBR FAT partition is assigned to disklabel partition ‘i’, the NetBSD MBR partition is assigned to disklabel partition ‘j’, with the filesystem type set to unknown.
The code that does this parsing of MBR partitions to automatically create a disklabel when none is present can be found in /usr/src/sys/kern/subr_disk.c.
However, as we might expect, a BSD disklabel exists in the second sector of the NetBSD partition.
This can be copied sector 1 of the whole disk quite easily:
# dd if=/dev/vnd0c of=label skip=196609 count=1
# dd if=label of=/dev/vnd0c seek=1
Copying the disklabel to the sector at absolute offset 1
With that done, at first glance, our OpenBSD system now seems to be happily reading and parsing this copy of the NetBSD disklabel:
# /dev/rvnd0c:
type: SCSI
disk: STORAGE DEVICE
label: fictitious
duid: 0000000000000000
flags:
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 64
sectors/cylinder: 2048
cylinders: 1149
total sectors: 2354048
boundstart: 0
boundend: 0
drivedata: 0
8 partitions:
# size offset fstype [fsize bsize cpg]
a: 2157440 196608 4.2BSD 0 0 0
c: 2354048 0 unused
e: 163840 32768 MSDOS
It works!
At least to a certain extent...
Alternative method
Changing the partition type
Instead of copying the disklabel elsewhere, we could obviously also have changed the partition type of the NetBSD MBR partition from A9, (NetBSD), to A6, (OpenBSD), and our OpenBSD machine would have then read the disklabel in it's original location.
Note that in this particular example, only one native BSD filesystem partition is defined the NetBSD disklabel anyway, so we still only see a single ‘a’ partition of type 4.2 BSD, but it's now recognised as the root partition. The FAT partition is assigned to disklabel partition ‘e’ rather than ‘i’, as the information defining it is now being read from the BSD disklabel, and so the code that automatically allocates such partitions from ‘i’ onwards is redundant.
At this point, we can mount the NetBSD FFS root partition, and read files contained on it.
BSD disklabels - compatibility caveats
As we have just seen, there is a degree of basic compatibility between the structure of disklabels on OpenBSD and NetBSD systems.
However, although the data for the actual partition parameters was correctly read in our example above, some other fields of the disklabel which differ between the two BSDs warrant our attention.
We can see that the on-disk DUID has been read as 0000000000000000. If we attempt to write a DUID of 0000000000000000 using /sbin/disklabel on an OpenBSD machine, a new random DUID will be created instead, so seeing 0000000000000000 in the disklabel output on an OpenBSD machine usually implies that we are looking at a spoofed disklabel. However, the OpenBSD kernel will also happily read and report a DUID of 0000000000000000 that is actually stored on-disk.
In fact, the NetBSD disklabel doesn't use these bytes to store a DUID at all, they are used to store information about sectors reserved as spares, and cylinders reserved as spares or for other purposes. Since modern hardware doesn't really need to deal with the concept of sparing at the OS level, these bytes will usually be set to 0x00, but in any case, if we want to ensure reliable operation of the disklabel on both systems, we should either avoid writing a DUID to this disklabel or overwrite it with 0x00 bytes again before transferring it back to a NetBSD machine.
Fun fact!
Support for DUIDs
Support for DUIDs in OpenBSD was added in 2010, in version 1.47 of disklabel.h. This change moves the location of d_acylinders, a define which also exists in the disklabel structure on NetBSD, but in a different byte location.
Less obviously, the bounds for the OpenBSD area are also set to zero. The bytes that store this information in a disklabel created on an OpenBSD machine, were previously used to store track and cylinder skew values, as well as head switching time, and track seeking times. This was changed in revision 1.45 of disklabel.h in OpenBSD, but the bytes retain their original purpose on NetBSD.
Once again, modern systems will almost certainly have these bytes set to zero, as the concept of sector skewing relates to an aspect of low-level disk formatting which has been opaque to everything outside of the disk's on-board controller for decades.
The fragment and block sizes in the disklabel are actually set to zero. These values are defined in the p_fragblock field on OpenBSD, which exists in the same byte position as p_frag on NetBSD. However, these parameters are also defined in the FFS filesystem superblock, so their presence in the disklabel is not strictly necessary to access an existing filesystem contained in the partition. For reference, the definition for the superblock structure can be found in /usr/src/sys/ufs/ffs/fs.h within the OpenBSD source code.
Another issue to consider is that the disklabel structure on both NetBSD and FreeBSD is limited to 32-bit values for the size of the disk, so it can only directly address 2 Tb, (assuming a 512 byte sector size). OpenBSD has expanded this to a 48-bit value, which should allow up to 128 Pb to be addressed, although the manual page for disklabel(8) states that the maximum size for disks and partitions is 64 Pb.
DragonFly BSD is interesting here, as it has both a 32-bit disklabel which is basically the same as NetBSD, as well as a new 64-bit disklabel format which is totally different. Here, if the 64-bit disklabel is contained within an MBR partition, it actually does define everything else relative to the start of that MBR partition rather than relative to the whole disk. Additionally, the offsets are in bytes and not sectors, so the addressable capacity reaches 16 Eb.
BSD disklabels - enhancements in OpenBSD
In OpenBSD, the disklabel format has been updated and enhanced to meet the requirements of modern systems. Since this has mostly been achieved by re-purposing obsolete and little-used fields of the disklabel, a degree of backwards compatibility exists.
As a result of this, we can usually read and parse disklabels from disks that have been partitioned on NetBSD systems, as long as we ensure that the disklabel is in a location where it will be noticed by the OpenBSD kernel. However, modifications, such as writing a DUID to the media, might make the disklabel unusable on the original system.
Whereas the documentation for NetBSD and FreeBSD recommends the use of alternative partitioning schemes for disks larger than 2 Tb on those systems, large disks that will be used exclusively with OpenBSD can be successfully partitioned using a BSD disklabel alone.
Of course, it may still be desirable to create an MBR partition on such a disk, so that any other operating system accessing it doesn't assume that the disk is completely unused. It might also be necessary to satisfy some BIOSes which expect to see an MBR structure on each connected disk, especially if we hope to be able to boot from it. Since the MBR partitioning scheme is also limited to 32-bit values, if a single MBR partition of type A6, (OpenBSD), is created on such a disk, it can only be made to span the area which is addressable with 32 bits. This will, however, likely be sufficient to prevent another OS from overwriting data in the OpenBSD area.
By default, invoking /sbin/disklabel on a disk with an MBR partition of type A6, (OpenBSD), but no BSD disklabel, will cause the bounds of the OpenBSD area to be set to those of the OpenBSD MBR partition. To create disklabel partitions that span the area past the 32-bit limit, these bounds need to be changed. This can easily be done, either by using the ‘b’ command in the interactive editor, or otherwise changing the values of boundstart and boundend when manually editing the disklabel.
Advanced topic - sharing some space between BSD systems
We've seen that we can place more than one BSD disklabel on the same physical disk, for example in sector 1 of an OpenBSD MBR partition, and sector 1 of a NetBSD MBR partition. Each operating system will ignore the disklabel of the other.
We've also seen that the disklabel defines partitions across the entire disk, and not just within a particular MBR partition.
As a result, it's perfectly possible to create a disk that contains several FFS filesystems, some of which are accessible from both OpenBSD and NetBSD, and others which are only accessible from one of the two operating systems.
This could be used, for example, if we wanted to prepare a USB flash drive that contained source code and binaries for various utilities, with two sets of binaries, one for each operating system, in such a way that the correct binaries always appeared on the ‘d’ partition, and the source code on the ‘e’ partition, without the need to store two identical copies of the source code archives.
Working out how to create such a disk is left as an exercise to the reader.
Recovering an overwritten disklabel with the in-memory copy
One day, you might unintentionally erase the beginning of a disk by supplying the wrong argument to a command such as this:
Overwrite the beginning of a storage volume:
# dd if=/dev/zero of=/dev/rsd9c bs=1m count=256
Probably not the best start to your day...
In such cases, the on-disk copy of the disklabel will most likely have been overwritten, assuming that the OpenBSD partition was located at the start of the disk.
However, if you realise your mistake quickly enough and abort the dd process, or alternatively had specified a fairly low number for the count parameter, a large amount of your data is likely still intact.
In these cases do not immediately reboot the system, as the kernel keeps a copy of the disklabel in memory, and this can be used to restore the on-disk copy that has been overwritten by direct access to the raw device.
We can view the in-memory copy of the disklabel easily enough, as this is what /sbin/disklabel uses by default:
# disklabel sd9
View the in-memory copy of the disklabel
If the disk that has been partially overwritten was the main boot and root disk, you might be lucky enough that /sbin/disklabel is still accessible.
If so, it's probably best to manually write down the partition offsets before either booting into a fresh installation on another disk, or transferring the overwritten disk to another system for data recovery. If the root partition has been overwritten, then the system is likely to become unstable quite quickly.
On the other hand, if the disk did not contain the root filesystem, we can easily write the in-memory copy of the disklabel to a file:
# disklabel sd9 > /root/partitions
Write the in-memory copy of the disklabel to a file
We could also write it directly back to the disk itself:
# disklabel sd9 | disklabel -R sd9 /dev/stdin
Write the in-memory copy of the disklabel back to the disk
However, remember that the MBR will also have been overwritten.
If there was previously an OpenBSD MBR partition on the disk, then the disklabel would have been stored in it's second sector. As the MBR is now blank, the above command will write the disklabel to sector 1 of the physical disk, instead of where it originally was. This will, of course, allow the partitions to be accessed within OpenBSD, but if we later create an OpenBSD MBR partition, then from that point onwards the kernel will expect to find the disklabel in sector 1 of that partition, so we will need to copy it there.
Important!
Invoking fdisk and writing MBR or GPT partition data will invalidate the in-memory copy of the old disklabel, so be sure to record the data from it somewhere before trying to re-create any MBR or GPT partitions!
Of course, if the disk was previously formatted with GPT on OpenBSD, there will be a backup of the GPT at the end of the disk which can be used to recreate the primary GPT.
Multiple MBR partitions of type A6
Although space for 16 disklabel partition entries per physical storage device is enough for most use-cases, it can sometimes be a limitation. This is especially true on the main system disk, which in many machines is the only disk present.
Naïve users often wonder if it's possible to create multiple MBR partitions of type A6, (OpenBSD), and in this way gain extra sets of 16 disklabel partitions. After reading this article, it should be immediately obvious why this approach won't work.
However, there are other ways to usefully use more than one OpenBSD partition, and there are also some indirect ways to get more than 16 disklabel partitions on a single physical storage volume, (which don't involve creating a second MBR partition for OpenBSD).
The code that looks for the OpenBSD MBR partition when building a disklabel is in /usr/src/sys/kern/subr_disk.c, and we can clearly see that once an MBR partition of type A6 has been found, any subsequent additional partitions of the same type will be ignored. Unlike unrecognised partition types, they won't even be allocated a disklabel partition entry in the ‘i’ to ‘p’ range.
Handy hint!
Multiple MBR partitions of type A6 can be used to allocate non-contiguous areas of disk space to an OpenBSD installation.
This is actually a good thing, as it means that we can use multiple MBR partitions to reserve an area comprised of non-contiguous sections of the physical disk for use with OpenBSD. If we had an 80 Gb disk, with the first 40 Gb allocated to OpenBSD, followed by two 20 Gb FAT partitions, we might later decide to re-purpose the last 20 Gb for use with OpenBSD as well. By changing the type of the second FAT partition to A6, the area it covers remains allocated in the MBR, but the OpenBSD kernel will by default simply ignore it. However, by changing the boundend parameter in the disklabel to point to the end of the second OpenBSD MBR partition, we will be able to create disklabel partitions in this area, (after removing any reference to the second FAT partition that was previously defined within the disklabel).
Obviously, it's probably much easier and more convenient to avoid creating such non-standard disk layouts, but the discussion above certainly demonstrates a possible use case for multiple MBR partitions of type A6.
More than 16 disklabel partitions
For systems that are going to be installed on a softraid crypto volume, for full disk encryption, one obvious way to increase the number of disklabel partitions available is to create two or more RAID partitions on the unencrypted volume instead of just one. Since each softraid crypto volume appears as a separate device, each one can have it's own disklabel with up to 16 partitions.
It's also possible to create softraid concat volumes with single chunks, effectively making areas of a disk appear as separate devices. Since each separate device can have it's own disklabel, more partitions can be created overall.
If all of the disk space is already allocated, and re-partitioning is not a practical option, then creating vnode pseudo devices using the vnd driver is another possibility.
But of course, these are just workarounds, and a much better solution would be to actually expand the number of available partition entries in the disklabel. If we look again at the hexdump of the disklabel presented earlier, we can see that there is clearly enough space for another six partition entires:
00000000 57 45 56 82 0c 00 00 00 76 6e 64 20 64 65 76 69 |WEV.....vnd devi|
00000010 63 65 00 00 00 00 00 00 66 69 63 74 69 74 69 6f |ce......fictitio|
00000020 75 73 00 00 00 00 00 00 00 02 00 00 64 00 00 00 |us..........d...|
00000030 01 00 00 00 f2 05 2a 01 64 00 00 00 88 52 6a 74 |......*.d....Rjt|
00000040 f0 c7 69 61 fe 68 25 bd 00 00 00 00 00 00 00 00 |..ia.h%.........|
00000050 00 00 00 00 88 52 6a 74 00 00 00 00 00 00 00 00 |.....Rjt........|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 57 45 56 82 b8 ea 10 00 00 20 00 00 |....WEV...... ..|
00000090 00 00 01 00 00 00 20 00 00 00 00 00 00 00 00 00 |...... .........|
000000a0 07 14 01 00 d4 09 85 0b 00 00 20 00 00 00 00 00 |.......... .....|
000000b0 01 00 00 00 88 52 6a 74 00 00 00 00 00 00 00 00 |.....Rjt........|
000000c0 00 00 00 00 e0 ff 7f 00 e0 09 a5 0b 00 00 00 00 |................|
000000d0 07 14 01 00 e0 ff f8 0e c0 09 25 0c 00 00 00 00 |..........%.....|
000000e0 07 14 01 00 00 00 c0 00 a0 09 1e 1b 00 00 00 00 |................|
000000f0 07 14 01 00 00 00 20 00 a0 09 de 1b 00 00 00 00 |...... .........|
00000100 07 14 01 00 00 00 80 02 a0 09 fe 1b 00 00 00 00 |................|
00000110 07 14 01 00 00 00 40 00 a0 09 7e 1e 00 00 00 00 |......@...~.....|
00000120 07 14 01 00 00 00 c0 00 a0 09 be 1e 00 00 00 00 |................|
00000130 07 14 01 00 c0 ff 7f 25 c0 09 7e 1f 00 00 00 00 |.......%..~.....|
00000140 07 1c 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
There is enough space for six more entries!
The 256 bytes defining the parameters of the first sixteen partitions, as well as the following 96 spare bytes, which are enough to define a further six partitions, are highlighted separately in the hexdump above.
The number of disklabel partitions that the kernel will parse is controlled by the MAXPARTITIONS definition. In theory, this is architecture-specific, as it is defined in /usr/src/sys/arch/*/include/disklabel.h. In practice, though, all of the currently supported hardware architectures have had MAXPARTITIONS defined as 16 for a very long time.
However, before rushing to change this kernel value, we should realise that it is referenced in the define for DISKMINOR in /usr/src/sys/sys/disklabel.h. At a minimum, the device special files in /etc/ that relate to disks would have to be changed to support the new mapping of minor numbers. Other changes might also be necessary for reliable operation.
Summary
This week we looked at some interesting details of the BSD disklabel structure as it is implemented in OpenBSD, and how this implementation differs from other that of other BSD systems. We saw that on an OpenBSD system, it's usually possible to define the desired partitioning scheme simply and easily using just a BSD disklabel, without the need for GPT, and we also looked at various ways to create more than sixteen disklabel partitions on a single physical disk.
Next week, we'll be busy creating our own encryption keys and self-signed certificates, then putting them to good use for email delivery, web serving, and ipsec tunnels. Not only that, but we'll set up our own private certificate authority as well! On top of all that, we'll be adding in some techniques for debugging connections, and learning how to get a real CA-signed certificate in a few simple steps! It sounds like a lot of fun, so don't forget to come back and join us!
IN NEXT WEEK'S INSTALLMENT, JAY WILL BE DE-MYSTIFYING TLS KEYS AND CERTIFICATES. DON'T MISS IT!