Commit graph

11747 commits

Author SHA1 Message Date
Andrei Vagin
2e26b36d44 pagemap: print page regions in the format start - end
During investigations, it’s much easier to read logs when regions are
printed in the start - end format rather than `start/size`.

In addition, all page counters and memory sizes are now printed in
hexadecimal, as they are hard to read in decimal form.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
7e0da4d975 pagemap: use unsigned long for page counts
Variables storing page counts were previously `unsigned int`, limiting
them to a maximum of 2^32 pages. With a 4k page size, this corresponds
to a 16TB memory mapping, which is insufficient for larger mappings.

This commit changes the type for these variables to `unsigned long` to
support larger memory mappings.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
afb2e6c3f9 pagemap: change PagemapEntry.nr_pages to uint64 to support huge mappings
Update the nr_pages field in PagemapEntry to uint64 to prepare for
checkpointing and restoring huge memory mappings.

Backward compatibility with older pagemap images is preserved.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
c7395f4cbe files: fork helpers without CLONE_FILES | CLONE_FS
On restore, CRIU needs to change mount namespaces to properly restore
files and unix sockets. However, the kernel prevents this if a process
is sharing its file system information (fs) with other processes.

Fixes #2687

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:23 -08:00
Filip Hejsek
a8c5e11715 lsm: use attr/apparmor/current to get apparmor label
On some kernels, attr/current can be intercepted by BPF LSM, causing
errors (#2033). Using attr/apparmor/current is preferable, because it
is guaranteed to return the apparmor label. attr/current will still be
used as a fallback for older kernels.

Fixes: #2033

Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
2025-11-02 07:48:23 -08:00
dong sunchao
80c280610e compel/mips: Relax ELF magic check to support MIPS libraries
On MIPS platforms, shared libraries may use EI_ABIVERSION = 5 to indicate
support for .MIPS.xhash sections. The previous ELF header check in
handle_binary() strictly compared e_ident against a hardcoded value,
causing legitimate shared objects to be rejected.

This patch replaces the memcmp-based check with a structured validation
of ELF magic and class, and allows EI_ABIVERSION values beside 0.

fixes: #2745
Signed-off-by: dong sunchao <dongsunchao@gmail.com>
2025-11-02 07:48:23 -08:00
Lorenzo Fontana
053a22a23b pagemap: prevent integer overflow in pagemap_len
Fixes #2738

Original-patch-by: Andrey Vagin <avagin@google.com>
Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
a779417a3f zdtm: stop importing junit_xml
We are dropping support for generating JUnit XML reports in zdtm.py as we've
migrated testing infrastructure entirely to `GitHub Actions` and other
third-party test runners.

This package has been removed from some distribution repositories (e.g.,
Fedora), making it simpler to remove the dependency than to force installation
via pip.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
254ba3e8cc ci: avoid Docker 28 due to regression
This change modifies the CI script to avoid Docker version 28, which has
a known regression that breaks Checkpoint/Restore (C/R) functionality.
The issue is tracked in the moby/moby project as
https://github.com/moby/moby/issues/50750.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:23 -08:00
Dong Sunchao
4b73985955 criu/sockets: Restrict SO_PASSCRED and SO_PASSSEC to supported families
Linux 6.16+ restricts SO_PASSCRED and SO_PASSSEC to AF_UNIX, AF_NETLINK, and AF_BLUETOOTH
This patch updates CRIU to check the socket family before dumping these options

Fixes: #2705
Signed-off-by: Dong Sunchao <dongsunchao@gmail.com>
2025-11-02 07:48:23 -08:00
Dong Sunchao
fa1b399064 zdtm/static/sock_opts00: use unix socket to test SO_PASSCRED and SO_PASSSEC
SO_PASSCRED and SO_PASSSEC are only valid for AF_UNIX and AF_NETLINK
This patch updates the test logic to use a unix socket for these options,
while preserving the original value consistency check

Fixes: #2705
Signed-off-by: Dong Sunchao <dongsunchao@gmail.com>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
2ba3430106 test/zdtm/static/maps12: fix pointer-to-int cast
The `offset` argument to `mmap()` was computed with a direct cast from
pointer to `off_t`:

`(off_t)addr_hint - (off_t)map_base`

This causes a build failure when compiling since pointers and `off_t`
may differ in size on some platforms.

maps12.c: In function 'mmap_pages':
maps12.c:114:50: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
  114 |                    filemap ? fd : -1, filemap ? ((off_t)addr_hint - (off_t)map_base) : 0);
      |                                                  ^
maps12.c:114:69: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
  114 |                    filemap ? fd : -1, filemap ? ((off_t)addr_hint - (off_t)map_base) : 0);

The fix in this patch is to cast both pointers to `intptr_t`,
perform the subtraction in that type, and then cast the result
back to `off_t`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Andrei Vagin
dcee5bd6ff make: Disable branch-protection for PIE code on ARM64
Branch protection uses PAC. It cryptographically "signs" a function's
return address before it is stored on the stack. Upon return, the address
is authenticated using a secret key. If the signature is invalid, the
program will fault.

The PIE code is used for the parasite and the restorer. In both cases, it
runs in a foreign process. The case of the restorer is even trickier
because it needs to restore the original PAC keys, which invalidates
all previously "signed" pointers within the restorer itself.

Fixes #2709

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
98f2bd525a ci/vagrant: install vanilla kernel for Fedora Rawhide test
We need at least 6.16 to test MADV_GUARD_INSTALL support, but
our current Fedora Rawhide test uses only Rawhide's user space,
while using Fedora 42 kernel. Let's start using a vanilla kernel.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
01265cfc69 test/zdtm/static/maps12: add madv guards test
Test for madvise(MADV_GUARD_INSTALL).

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
9c0f725a62 criu/mem: dump: note MADV_GUARD pages as VMA_AREA_GUARD VMAs
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
59b4d662ae criu/pie/restorer: add madvise(MADV_GUARD_INSTALL) restore logic
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
63c7029686 criu/{mem, vdso, cr-restore}: introduce VMA_AREA_GUARD fake VMAs
Introduce a new kind of VMA - VMA_AREA_GUARD. In fact, it is not
a real VMA as it is not represented as struct vm_area_struct in
the kernel.

We want to reuse an existing vma infrastructure in CRIU to dump
an information about MADV_GUARD_INSTALL-covered address space
ranges as VMAs. Then, on restore, we need to carefully skip
those fake VMAs everywhere we expect a normal VMAs to be processed.
And only in restorer we use these VMAs to get an information about
where to call MADV_GUARD_INSTALL.

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
cc047d595f criu/mem: dump: skip MADV_GUARD pages content dump
1. get info about MADV_GUARD_INSTALL-protected pages with
help of pagemap by looking for PME_GUARD_REGION flag if /proc/<pid>/pagemap
is used or by looking for PAGE_IS_GUARD flag if ioctl(PAGEMAP_SCAN) is used
2. skip those pages

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
5843cbf975 criu/mem: refactor should_dump_page helper
Make should_dump_page to return int to indicate failure, also
return useful data back through the struct page_info structure
passed as a pointer.

Also, correspondingly convert all call sites.

No functional changes intended, except fixing a bug in
should_dump_page() as it could return (-1) when pmc_fill()
fails, while caller didn't expect that before.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
42580fcb16 criu/pagemap-cache: pagescan: look for PAGE_IS_GUARD pages
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
1873e8f502 cr-dump: warn if MADV_GUARD is supported but isn't shown in pagemap
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
4fc07a8a41 kerndat: add pagemap_scan_guard_pages feature check logic
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
2bb77daa92 kerndat: add madvise(MADV_GUARD_INSTALL) feature-detection
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
fce491113b criu/include/mman: define MADV_GUARD_INSTALL
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
5f94dd71e7 CI: Consolidate arm64 tests on GitHub runners
The arm64 tests are currently being executed on both actuated and GitHub
runners. This change removes the actuated runner to avoid redundancy and
streamline our CI process.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
c6c6f6f231 zdtm/socket-tcp-closing: fill socket buffers effectivly
Send large chunks to fill socket buffers.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
d586b30c6b vagrant: fix tar including archive in itself
The tar command was failing with the following message:

  $ tar cf criu.tar ../../../criu
  tar: Removing leading `../../../' from member names
  tar: ../../../criu/scripts/ci/criu.tar: archive cannot contain itself; not dumped

In addition, the /vagrant no-longer exist in the new Fedora images.

  bash: line 1: cd: /vagrant: No such file or directory

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
2762b21e4a vagrant: update image to fedora 42
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
0d1e280d09 vagrant: fix 'qemu' install
Installing this package currently fails with the following message:

  Package qemu is not available, but is referred to by another package.
  This may mean that the package is missing, has been obsoleted, or
  is only available from another source

  E: Package 'qemu' has no installation candidate

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Ignacio Moreno Gonzalez
64276874d8 restore: flush caches during restore
See the previous commit for rationale and architecture-specific details.

[ avagin: tweak code comment ]

Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Ignacio Moreno Gonzalez
95d5e2e59b compel: flush caches after parasite injection
After the CRIU process saves the parasite code for the target thread in
the shared mmap, it is necessary to call __clear_cache before the target
thread executes the code.

Without this step, the target thread may not see the correct code to
execute, which can result in a SIGILL signal.

For the specific arm64 case. this is important so that the newly copied
code is flushed from d-cache to RAM, so that the target thread sees the
new code.

The change is based on commit 6be10a2 by @fu.lin and on input received
from @adrianreber.

[ avagin: tweak code comment ]

Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Kir Kolyshkin
22c83e3eba images/Makefile: use msg-gen
In general, we use "$(E)" instead of "$(Q) echo", but we also have
a msg-gen macro which can be used here.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-11-02 07:48:22 -08:00
Kir Kolyshkin
066bf7bf3c Keep images/google/protobuf directory
Commit 68f92b551 removed images/google/protobuf directory, so it is
re-created each time during the build process.

This resulted in a weird behavior change. Previously, one could do
something like this:

	git clone $CRURL criu
	(cd criu && sudo make install-criu)
	rm -rf criu

This worked fine, including running rm -rf as a non-root user, since no
new directories were created under criu -- all directories were still
owned by the original user.

Since commit 68f92b551 the same sequence fails:

	rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.c': Permission denied
	rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.d': Permission denied
	rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.h': Permission denied

A workaround is to keep empty images/google/protobuf directory,
which is what this commit does.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-11-02 07:48:22 -08:00
Kir Kolyshkin
21c3b9c005 images/Makefile: fix using $(Q)
Commit 68f92b551 used `$$(Q)` instead of `$(Q)` in the Makefile target,
which resulted in the following error:

$(Q) echo "Generating descriptor.pb-c.c"
/bin/sh: 1: Q: not found
Generating descriptor.pb-c.c
$(Q) protoc --proto_path=/usr/include --proto_path=images/ --c_out=images/ /usr/include/google/protobuf/descriptor.proto
/bin/sh: 1: Q: not found

as well as:

$(Q) rm -rf images/google
/bin/sh: line 1: Q: command not found

Fix it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-11-02 07:48:22 -08:00
Radostin Stoyanov
7fbf7b2be4 images: remove symlink for descriptor.proto
Currently the build scripts create the following symlink:

  criu-4.1/images/google/protobuf/descriptor.proto -> /usr/include/google/protobuf/descriptor.proto

This symlink points to a system-wide absolute-path target. Also,
this symlink ends up in the release tarball. The tarball may later be
downloaded and unpacked by e.g. OS distributions. If unpacking is
done using Python 3.14+, it will fail.

This happens because Python 3.14 will switch the default behavior of
extractall() from "fully trusting the content of archive" to
"disallow common attack vectors while extracting the archive".
With this new behavior, extractall() raises an exception when at
least one file in the archive extracts or points to outside of the
extraction directory (these are called path traversal attacks and
zip slip attacks).

Reported-by: Dmitrii Kuvaiskii <dimakuv@amazon.de>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
455c677399 zdtm: Add ztatic/mnt_ext_file_bind_auto test
The test creates a file bindmount in criu mntns and binds it into test
mntns, this external file bindmount is autodetected and restored via
"--external mnt[]" criu option.

Note: In previous patch we fix the problem on this code path where file
bindmount restore fails as there is excess "/" in source path.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Chuan Qiu
e31828ed8c mount: Fix trailing / when a file is bind-mounted
E.g. I have a /etc/hosts in workspace mounted from the host, and get the following message.

(00.141008)      1: mnt-v2: Create plain mountpoint /tmp/.criu.mntns.K1biY1/mnt-0000000938 for 938
(00.141546)      1: mnt-v2:     Mounting unsupported @938 (0)
(00.141887)      1: mnt-v2:     Bind /tmp/agent/1-d8c746c6fda3a8b2/workspace/etc/hosts/ to /tmp/.criu.mntns.K1biY1/mnt-0000000938
(00.142179)      1: Error (criu/mount-v2.c:319): mnt-v2: Failed to open_tree /tmp/agent/1-d8c746c6fda3a8b2/workspace/etc/hosts/: Not a directory
(00.143774) Error (criu/cr-restore.c:2320): Restoring FAILED.

Signed-off-by: Chuan Qiu <qiuc12@gmail.com>
2025-11-02 07:48:22 -08:00
समीर सिंह Sameer Singh
3dc865bc80 test: add static tests for ICMP socket
Add ZDTM static tests for IP4/ICMP and IP6/ICMP
socket feature.

Signed-off-by: समीर सिंह Sameer Singh <lumarzeli30@gmail.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:22 -08:00
समीर सिंह Sameer Singh
a80c544845 sk-inet: Add support for checkpoint/restore of ICMP sockets
Currently there is no option to checkpoint/restore programs that use
ICMP sockets, such as `ping`. This patch adds support for the same.

Fixes #2557

Signed-off-by: समीर सिंह Sameer Singh <lumarzeli30@gmail.com>
2025-11-02 07:48:22 -08:00
Andrei Vagin
677a568919 zdtm/netns_sub_sysctl: skip unsupported sysctls
net/unix/max_dgram_qlen can't be tuned from non-root userns before:
v5.17-rc1~170^2~215 ("net: Enable max_dgram_qlen unix sysctl to be
configurable by non-init user namespaces")

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
87bd09a0d1 net/sysctl: make ipv4/ping_group_range work in user namespaces
We dump sysctls from criu user namespace, but restore from restored user
namespace. So group id values should be mapped to the restored user
namespace gid space to restore correctly.

Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
45d09ae17e net/sysctl: fix broken ipv4_sysctls_op
We have ability to skip sysctl if there is no value, but we still give
n requests to sysctl_op, that is not correct and probably can segfault
on nullptr access. Fix it by adding ri to count non skipped requests.

To be on the safe side, let's add a check that ri == n on read, as we
should not do any skips there.

While on it lets fix bad error message prefix: s/unix/ipv4/.

Remove excess has_iarg set, and add sarg reset to NULL for the case
sysctl_op skipped it.

Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
4f057a6aeb net/sysctl: fix missprint in an error message
Fixes: f38e58836 ("net/sysctl: c/r ipv4/ping_group_range value")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
4c7d42f67a ipc/sysctl: fix CTL_FLAGS_IPC_EACCES_SKIP by making it a flag
Having CTL_FLAGS_IPC_EACCES_SKIP == (CTL_FLAGS_OPTIONAL |
CTL_FLAGS_READ_EIO_SKIP) is probably not what we want. So let's make it
a real distinct flag.

Fixes: 840735aa0 ("ipc_sysctl: Prioritize restoring IPC variables using non usernsd approach")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Ivan Pravdin
922754dffd rpc/log: return first error always
Use shared first error buffer to return correct
first error in rpc.

Fixes: #338

Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
2025-11-02 07:48:22 -08:00
Radostin Stoyanov
a79b33d0c5 cpuinfo: show error when image is missing
The `criu cpuinfo check` command calls cpu_validate_cpuinfo(), which
attempts to open the cpuinfo.img file using `open_image()`. If the
image file is not found, `open_image()` returns an "empty image"
object. As a result, `cpu_validate_cpuinfo()` tries to read from it
and fails with the following error:

(00.002473) Error (criu/protobuf.c:72): Unexpected EOF on (empty-image)

This patch adds a check for an empty image and appropriate error message.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:22 -08:00
Andrei Vagin
99ba6db89b crtools: do a few minor cleanups
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:22 -08:00
Liana Koleva
fcbaac0598 crtools: simplify check for cpuinfo subcommands
The cpuinfo command requires a "dump" or "check" subcommand. Thus, we
replace `CR_CPUINFO` with `CR_CPUINFO_DUMP` and `CR_CPUINFO_CHECK`.
This allows us to remove unnecessary subcommand check in
`image_dir_mode()` and perform all parsing in `parse_criu_mode()`.

With this change the check for validating the cpuinfo subcommand is
now done only once with `CR_CPUINFO_DUMP` or `CR_CPUINFO_CHECK` enum.

Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:22 -08:00
Prajwal S N
fbfed312e0 feat: introduce Nix flake
CRIU currently requires a number of dependencies in order to build from
source. The package names vary across distributions and package
managers. A Nix flake allows developers to spin up a dev environment
with `nix develop`, eliminating the hassle of manual dependency
management. It also prevents polluting the global package set on the
machine.

Signed-off-by: Prajwal S N <prajwalnadig21@gmail.com>
2025-11-02 07:48:22 -08:00