Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up GPT partition tables automatically #119

Merged
merged 3 commits into from
Dec 8, 2024

Conversation

DemiMarie
Copy link
Contributor

@DemiMarie DemiMarie commented Nov 17, 2024

This has many problems:

  • Old partition tables are not cleared before writing the partition table entry that would overwrite them.
  • An invalid partition table header is not written before the partition table entries that the valid one would refer to.
  • A valid partition table header is written before the entries it refers to, not after.
  • CRC32 checksums are not checked when reading (but are created when writing). are now always checked.
  • Lots and lots of debug prints.
  • No tests (it does pass a local test, though).
  • No installation (so this file would not even be included in a built package!).
  • The fixer is recompiled even when there is no need to rebuild it.
  • The default CFLAGS are meant for local development & debugging.
  • NDEBUG is forcibly undefined.

Fixes: QubesOS/qubes-issues#4974

@marmarek
Copy link
Member

This fails to build on several distributions due to cc1: error: unrecognized command-line option ...-std=c23...

Copy link
Member

@marmarek marmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read the older version, and after you pushed b3e1f4d, I've removed comments you addressed already (before me posting them ;) ).

gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated
Comment on lines 876 to 880
new_gpt = gpt_grow_sectors(used_primary ? sectors - 1 : 1, sector_size,
new_sector_size, sectors, gpt, &to_write);
} else {
new_gpt = gpt_shrink_sectors(used_primary ? sectors - 1 : 1, sector_size,
new_sector_size, sectors, gpt, &to_write);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take location of the other GPT from the header, instead of hardcoding to sectors - 1 - otherwise it will explode on first boot after resizing disk (if that happens to be also first boot on a disk with different sector size). Same in few other locations below.
(don't try to adjust GPT size to the actual disk size - leave that to the OS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup GPT is only looked at if the primary GPT is invalid. If the primary GPT is valid, the backup GPT is written before the primary GPT is. Since a GPT cannot be written atomically, this requires that the backup GPT must be written at the end of the disk, so that if there is a problem (such as power failure) either the backup GPT is now valid (at the new sector size) or the old primary GPT (with the old sector size) is still present. Furthermore, writing a GPT will clobber any other GPTs at the same end of the disk, so if the primary GPT was used the backup GPT must be written first and visa versa. Therefore, adjusting the GPT size to the actual disk size is unavoidable, unless I am missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup GPT is only looked at if the primary GPT is invalid.

That's not true. At the very least, fdisk will complain if backup GPT doesn't match the primary GPT even if primary GPT is valid.

Anyway, I'm not going to argue about this too much, it's a very rare case anyway (resizing the disk and changing sector size at the same time). Just ensure you don't resize GPT if you don't change sector size (I think this case is okay currently).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a corner case, but it is still a case I want to handle correctly. Can you explain, in detail, how this could go wrong?

My understanding is:

  • Tools must create a primary and backup GPT.
  • Tools must accept a primary GPT that is valid, even if the backup GPT is invalid.
  • Tools must use the primary GPT if it is valid.
  • If the primary GPT is valid but the backup GPT is invalid or visa versa, tools must not reject the GPT altogether but are allowed to issue a warning.

Do real-world tools behave differently than this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take location of the other GPT from the header, instead of hardcoding to sectors - 1 - otherwise it will explode on first boot after resizing disk (if that happens to be also first boot on a disk with different sector size). Same in few other locations below.

Do you mean that if the primary GPT header is valid, but the primary GPT entries are invalid, the backup GPT header should be read from AlternateLBA in the primary header?

Most of the places where sectors - 1 is written refer to either:

  1. Where to write the backup GPT. The backup GPT is meant as a backup if the primary GPT is bad, so it needs to be written at the well-known location regardless of where the primary GPT is.
  2. Where to find the backup GPT if the primary GPT is bad. In this case (such as if e.g. the primary header CRC32 checksum is invalid), there is no point (that I can think of) in using the primary GPT header’s AlternateLBA at all, as it’s presumably garbage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(don't try to adjust GPT size to the actual disk size - leave that to the OS)

Ah, qubes-rootfs-resize.service. That could also use this tool to find the partition to resize, instead of hardcoding 3 (but if you want to hardcode 3, that’s fine too).

gptfixer/gpt.c Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
gptfixer/gpt.c Outdated Show resolved Hide resolved
dracut/simple/init.sh Outdated Show resolved Hide resolved
@marmarek
Copy link
Member

(the current version fails to build)

@codecov-commenter
Copy link

codecov-commenter commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 41.56118% with 277 lines in your changes missing coverage. Please review.

Project coverage is 60.56%. Comparing base (7e2d762) to head (f2517cc).

Files with missing lines Patch % Lines
gptfixer/gpt.c 41.56% 187 Missing and 90 partials ⚠️

❗ There is a different number of reports uploaded between BASE (7e2d762) and HEAD (f2517cc). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (7e2d762) HEAD (f2517cc)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #119       +/-   ##
===========================================
- Coverage   79.45%   60.56%   -18.89%     
===========================================
  Files           5        6        +1     
  Lines         477      951      +474     
  Branches        0      104      +104     
===========================================
+ Hits          379      576      +197     
- Misses         98      285      +187     
- Partials        0       90       +90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DemiMarie DemiMarie requested a review from marmarek November 20, 2024 05:53
@DemiMarie DemiMarie marked this pull request as ready for review November 20, 2024 06:17
@DemiMarie DemiMarie changed the title Draft: Fix up GPT partition tables automatically Fix up GPT partition tables automatically Nov 20, 2024
@DemiMarie
Copy link
Contributor Author

This is now ready for review, and I recommend backporting to R4.2 if possible. It’s a rather large change, but the bug it fixes is somewhat embarrassing.

@marmarek marmarek added the openqa-pending PR to be tested in the next OpenQA run label Nov 20, 2024
@marmarek
Copy link
Member

Debian 12 HVM fails to start with in-vm kernel (with no sector change needed):

[2024-11-20 16:36:03] �[2J�[3J�[-1;-1ffixgpt: Sector size is 512, device size is 0x500000000, 0x2800000 sectors

[2024-11-20 16:36:03] Partition table is 16384 bytes long

[2024-11-20 16:36:03] Partition table entries consume 16384 bytes

[2024-11-20 16:36:03] Setting up swapspace version 1, size = 1073737728 bytes

[2024-11-20 16:36:03] UUID=a1448fb4-6ee4-44f2-b7b4-9a8dbc392229

[2024-11-20 16:36:03] mount: mounting /dev/mapper/dmroot on /root failed: No such file or directory

[2024-11-20 16:36:03] Failed to mount /dev/mapper/dmroot as root file system.

[2024-11-20 16:36:04] 
[2024-11-20 16:36:04] 
[2024-11-20 16:36:04] BusyBox v1.35.0 (Debian 1:1.35.0-4+b3) built-in shell (ash)
[2024-11-20 16:36:04] Enter 'help' for a list of built-in commands.
[2024-11-20 16:36:04] 
[2024-11-20 16:36:04] (initramfs) �[6n[2024-11-20 16:38:14] Logfile Opened

@marmarek
Copy link
Member

(and one log line got junk prefix)

@marmarek
Copy link
Member

marmarek commented Nov 21, 2024

(and one log line got junk prefix)

I don't see how that would be from this tool, maybe that's still from grub cleaning up terminal settings.

@marmarek
Copy link
Member

marmarek commented Nov 21, 2024

And it looks like dom0-provided initramfs lacks the fixgpt binary:

/init: line 32: /usr/sbin/fixgpt: not found

@qubesos-bot
Copy link

qubesos-bot commented Nov 21, 2024

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024120723-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024111705-4.3&flavor=update

  • system_tests_audio@hw1

  • system_tests_basic_vm_qrexec_gui_xfs

    • TC_20_NonAudio_whonix-workstation-17-pool: test_012_qubes_desktop_run (error + cleanup)
      raise TimeoutError from exc_val... TimeoutError
  • system_tests_kde_gui_interactive

    • kde_install: wait_serial (wait serial expected)
      # wait_serial expected: qr/WxLpl-\d+-/...

    • kde_install: Failed (test died + timed out)
      # Test died: command 'curl --form upload=@/tmp/kde-install.log --fo...

Failed tests

5 failures
  • system_tests_basic_vm_qrexec_gui_zfs

    • switch_pool: Failed (test died)
      # Test died: command 'dnf install -y ./zfs-release.rpm' failed at /...
  • system_tests_audio@hw1

  • system_tests_basic_vm_qrexec_gui_xfs

    • TC_20_NonAudio_whonix-workstation-17-pool: test_012_qubes_desktop_run (error + cleanup)
      raise TimeoutError from exc_val... TimeoutError
  • system_tests_kde_gui_interactive

    • kde_install: wait_serial (wait serial expected)
      # wait_serial expected: qr/WxLpl-\d+-/...

    • kde_install: Failed (test died + timed out)
      # Test died: command 'curl --form upload=@/tmp/kde-install.log --fo...

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/119126#dependencies

3 fixed
  • system_tests_extra

    • TC_00_QVCTest_whonix-gateway-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0
  • system_tests_audio@hw1

  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: Failed (test died)
      # Test died: command 'test "$(cd ~user;ls e1*)" = "$(qvm-run -p wor...

Unstable tests

  • system_tests_audio

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (2/5 times with errors)
    • job 116847 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...
  • system_tests_audio@hw1

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (2/5 times with errors)
    • job 116847 AssertionError: too short audio, expected 10s, got 0.00013605442176...
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...

@marmarek
Copy link
Member

Another test iteration, this time Debian 12 HVM worked, but Debian 12 PV and Fedora 40 PVH failed:

[2024-11-22 01:36:34] [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.11.8-200.fc40.x86_64 root=/dev/mapper/dmroot ro root=/dev/mapper/dmroot console=tty0 console=hvc0 swiotlb=8192 noresume clocksource=tsc xen_scrub_pages=0
...
[2024-11-22 01:36:37] [    3.210251] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
[2024-11-22 01:36:37] [    3.231226]  xvda: xvda1 xvda2 xvda3
[2024-11-22 01:36:37] [    3.250629] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
[2024-11-22 01:36:37] [    3.267314] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
[2024-11-22 01:36:37] [    3.177537] dracut-pre-trigger[411]: gptfix: Sector size is 512, device size is 0x500000000, 0x2800000 sectors
[2024-11-22 01:36:37] [    3.394376]  xvda: xvda1 xvda2 xvda3
[2024-11-22 01:36:37] [    3.409244]  xvda: xvda1 xvda2 xvda3
[2024-11-22 01:36:37] [    3.711063]  xvdc: xvdc1 xvdc3
[2024-11-22 01:36:37] [    3.766307] Adding 1048572k swap on /dev/xvdc1.  Priority:-2 extents:1 across:1048572k SS
...
[2024-11-22 01:36:38] [�[0;32m  OK  �[0m] Finished �[0;1;39msystemd-fsck-root.service�[0m…System Check on /dev/mapper/dmroot.


[2024-11-22 01:36:38]          Mounting �[0;1;39msysroot.mount�[0m - /sysroot...


[2024-11-22 01:36:38] [    4.438454] EXT4-fs (xvda): VFS: Can't find ext4 filesystem
[2024-11-22 01:36:38] [    4.446723] EXT4-fs (xvda): VFS: Can't find ext4 filesystem
[2024-11-22 01:36:38] [    4.454289] EXT4-fs (xvda): VFS: Can't find ext4 filesystem
[2024-11-22 01:36:38] [�[0;1;31mFAILED�[0m] Failed to mount �[0;1;39msysroot.mount�[0m - /sysroot.


[2024-11-22 01:36:38] See 'systemctl status sysroot.mount' for details.

Looking at dracut/full-dmroot/qubes_cow_setup.sh, could it be maybe some race condition causing [ -b /dev/xvda1 ] test to fail (udev being in process of re-creating partition devices)?
I guess similar thing might have happened to Debian 12 HVM before.

@marmarek
Copy link
Member

marmarek commented Nov 22, 2024

And also, even though GPT didn't need changes here, I see reloading partition table, twice. Lets fix it last, as having this issue makes it easier to debug the other one.

@marmarek
Copy link
Member

Just to not forget - there are a bunch of fixup! commits in the history here, to be squashed before merge. And TBH, this whole thing may want squashing to much smaller number of commits, especially since it is a candidate for backport.

@DemiMarie
Copy link
Contributor Author

@marmarek: can you retest the most recent push? It should fix all of the race conditions, though it loses a couple features (and so is not suitable for merging).

@marmarek
Copy link
Member

marmarek commented Dec 3, 2024

you forgot to update gitlab-ci conf

@marmarek
Copy link
Member

marmarek commented Dec 4, 2024

The test fails due to difference on "Syncing disks." line.

And also, if you extend after_script section in gitlab config, test coverage will be collected.

@marmarek
Copy link
Member

marmarek commented Dec 4, 2024

There is still partition table reload in no-change case. Can this be avoided?

@marmarek
Copy link
Member

marmarek commented Dec 4, 2024

I think it's automatic on device close, if it was opened for writing. Maybe it can be opened read-only for the checking phase?

@marmarek
Copy link
Member

marmarek commented Dec 7, 2024

the CI test still fails

@marmarek
Copy link
Member

marmarek commented Dec 7, 2024

Attention: Patch coverage is 41.56118% with 277 lines in your changes missing coverage. Please review.

Now test coverage is properly reported :) The not covered lines are the expected ones - mostly verbose messages and various error handling. Especially, all the code that is expected to be used normally looks to be covered, so all good :)

@marmarek
Copy link
Member

marmarek commented Dec 8, 2024

* [ ]  The default CFLAGS are meant for local development & debugging.

This is the only point left. Otherwise all good and tested.

This includes a C program that adjusts partition tables when the device
sector size changes.  It is intended to be secure against malicious
inputs, although it will not be used with untrusted inputs in Qubes OS.
In particular, all calculations are done with sufficiently wide integer
types to prevent overflow or wraparound, partition tables read from disk
are strictly validated before use, and the total size of the partition
table entries is limited to 1MiB.

The partition table adjustment does not change the start or end of any
partition.  If the old sector size is 512 and the new size is 4096, the
tool will return an error if any partition does not start or end on a
4096-byte boundary.  Furthermore, this tool will refuse to write a
partition table if either the primary or backup GPT would overwrite data
that is part of any partition.  This ensures that user data is not
corrupted.  All used entries in the original GPT are preserved.
However, unused entries will be truncated if necessary.

The FirstUsableLBA and LastUsableLBA fields are converted if necessary,
and are rounded in the direction of _less_ usable space.  This means
that FirstUsableLBA is rounded up, while LastUsableLBA is rounded down.
If the new primary GPT uses space after FirstUsableLBA, or the new
backup GPT uses space before LastUsableLBA, the values of these fields
are adjusted accordingly.

The new partition tables are written in a crash-safe way.  This ensures
that in the event of a system crash or power loss, there is always a
valid GPT that can be used.  The specific algorithm used is as follows:

1. If the GPT being written is the primary GPT, start by writing the
   pMBR, followed by (new sector size - 512) bytes of zeroes.
2. If the new sector size is 512, overwrite the old 512-byte-sector GPT
   header with zeroes, then overwrite the old 4096-byte-sector GPT
   header with zeroes.
3. Write the partition entries.
4. Write the new partition header.

All writes are made with O_SYNC, so they cannot be reordered on-disk.
Furthermore, overwriting a single sector is atomic.  If the primary GPT
was valid, the backup GPT is written first; otherwise, the primary GPT
is written first.

The default CFLAGS enable sanitizers and are intended for local
development only.  Production builds should set CFLAGS via the 'make'
command line or an environment variable, which will override the CFLAGS
in the Makefile.

The tool will exit with a failure status if file descriptor 0, 1, or 2
is closed when it is invoked.  Using these FDs could potentially cause
the device to get clobbered or read from accidentally, which would be
bad.

The block device is opened exclusively to prevent racing with other code
that will use the device.  It is fsync()d before being used to ensure
that the code operates on the most recent on-disk state, not on stale
state cached in memory.  The whole block device is locked exclusively to
avoid racing with udev.  If the user passes a partition, the tool exits
with an error.

A BIOS Parameter Block (BPB) is checked for, to avoid accidentally
corrupting a FAT or NTFS file system that happens to have a GPT with
valid checksums at the wrong location.
@DemiMarie
Copy link
Contributor Author

* [ ]  The default CFLAGS are meant for local development & debugging.

This is the only point left. Otherwise all good and tested.

Is it okay to keep -Werror on in the default build, under the assumption that anyone building without setting CFLAGS is doing development? Or is that a bad assumption?

This adjusts the GPT in the initramfs, and (obviously) requires the
previous commit.

Fixes: QubesOS/qubes-issues#4974
No functional change.  The large number of partitions in the test is to
ensure that the code correctly handles such partition tables without a
buffer overflow.  Previous versions of the code did have a buffer
overflow due to incorrectly computing the buffer size.
@DemiMarie DemiMarie force-pushed the gpt-fixer branch 2 times, most recently from f3a860d to 01c0506 Compare December 8, 2024 02:51
@marmarek
Copy link
Member

marmarek commented Dec 8, 2024

Is it okay to keep -Werror on in the default build,

Yes, this is fine.

under the assumption that anyone building without setting CFLAGS is doing development? Or is that a bad assumption?

This should be okay too

@marmarek marmarek merged commit f3a860d into QubesOS:main Dec 8, 2024
2 checks passed
@DemiMarie DemiMarie deleted the gpt-fixer branch December 8, 2024 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openqa-pending PR to be tested in the next OpenQA run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support 4k storage
4 participants