When `opts.pid` is explicitly set to `os.getpid()`, `pycriu` fails to
daemonize the `criu` process. This causes `criu` to run as a child of
the dumped process, leading to the error "The criu itself is within
dumped tree".
This can be fixed by modifying `_send_req_and_recv_resp` to check if the
target PID matches the current process PID. If so, it enables daemon
mode, ensuring `criu` is detached and the dump succeeds.
Signed-off-by: unichronic <ishuvam.pal@gmail.com>
In a simple case where the parent process and the child one are in one
pid namespace we can safely use vpid(item) to prace the child. But, for
the cases where the child is a pid namespace init, or the child is put
into external pid namespace, the parent and the child have different pid
namespaces and using pid vpid(item) (which e.g. for init will always be
1 here) to ptrace the child process is inorrect.
Let's use the pid reported to us from clone as it's always the right pid
of the child from the parent's point of view.
Fixes: 7dd583002 ("restore: add infrastructure to enable shadow stack")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The return value of sys_rseq was previously ignored during
unregistration, under the assumption that it would not fail if the rseq
structure was properly registered.
However, if sys_rseq fails, the kernel retains the registration. If the
memory containing the rseq structure is subsequently unmapped or reused,
kernel updates to the rseq area can cause the process to crash (e.g.,
via SIGSEGV).
Check the return value of sys_rseq. If it fails, log the error code and
abort the restoration process. This makes rseq unregistration failures
fatal and explicit, aiding in debugging and preventing later obscure
crashes.
Signed-off-by: liqiang2020 <liqiang64@huawei.com>
Fedora rawhide ships a pre-release of GCC 16 which produces following
error:
uprobes.c:34:22: error: variable ‘dummy’ set but not used [-Werror=unused-but-set-variable=]
34 | volatile int dummy = 0;
| ^~~~~
Marking this variable as "__maybe_unused" to fix the error.
Signed-off-by: Adrian Reber <areber@redhat.com>
In some cases, CRIU can observe tasks that exit during checkpointing,
and sets the state of these tasks to COMPEL_TASK_DEAD.
This patch adds a string representation of this value that can be used
by CRIT when decoding the images.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
CRIU defines the following constants for task state in compel/include/uapi/task-state.h
COMPEL_TASK_ALIVE = 0x01
COMPEL_TASK_STOPPED = 0x03
COMPEL_TASK_ZOMBIE = 0x06
Thus, we need to swap the values for "zombie" and "stopped" used in CRIT.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
When running the command 'make docker-test', almost all zdtm tests fail,
logging 'ip: not found'. 'ip' command of the iproute2 package was missing.
So added the package to the list of dependencies in 'apt-packages.sh'. Now
tests run
Signed-off-by: ImranullahKhann <imranullahkhann2004@gmail.com>
As of Alpine Linux 3.19, musl libc no longer contains separate
fopen64(), fseeko64(), or ftello64() functions. This causes building
CRIU with amdgpu plugin to fail with the following error:
amdgpu_plugin.c: In function 'parallel_restore_bo_contents':
amdgpu_plugin.c:2286:17: error: implicit declaration of function 'fseeko64'; did you mean 'fseeko'? [-Wimplicit-function-declaration]
2286 | fseeko64(bo_contents_fp, entry->read_offset + offset, SEEK_SET);
| ^~~~~~~~
| fseeko
make[2]: *** [Makefile:31: amdgpu_plugin.so] Error 1
make[1]: *** [Makefile:363: amdgpu_plugin] Error 2
To fix this, add the missing $(DEFINES) to plugin builds, and since we
always compile with _FILE_OFFSET_BITS=64, we don't need the 64 suffix.
Fixes: #2826
Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The initialization of the struct timespec used as clockid input
parameter was removed in commit:
b4441d1bd8 ("restorer.c: rm unneded struct init")
This causes the build to fail on Alpine with clang version 21.1.2:
GEN criu/pie/parasite-blob.h
criu/pie/restorer.c:1230:39: error: variable 'ts' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
1230 | if (sys_clock_gettime(t->clockid, &ts)) {
| ^~
1 error generated.
make[2]: *** [/criu/scripts/nmk/scripts/build.mk:118: criu/pie/restorer.o] Error 1
make[1]: *** [criu/Makefile:59: pie] Error 2
make: *** [Makefile:278: criu] Error 2
To fix this, we remove the "const" from the declaration of
clock_gettime. Since the kernel writes the current time into
the struct timespec provided by the caller, the pointer must
be writable.
Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
After commit [1] we accidentally stopped reporting the errors from
kerndat_has_timer_cr_ids(), let's fix that.
Fixes: 1eaa870cc ("kerndat: check that hardware breakpoints work") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The "man 2 close":"Dealing with error returns from close()" says:
"Retrying the close() after a failure return is the wrong thing to do"
We should not leave the fd there, attempting to close it again on next
close()/close_safe() may lead to accidentally closing something else.
It confirms with the kernel code where sys_close() removes fd from
fdtable in this stack:
+-> sys_close
+-> file_close_fd
+-> file_close_fd_locked
+-> rcu_assign_pointer(fdt->fd[fd], NULL)
If there was an fd this stack is always reached and fd is always
removed.
Let's replace the fd with -1 after close no matter what.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The README currently uses an external link to criu.org for the embedded
CRIU logo. Loading this URL when viewing the README on GitHub sometimes
fails with "Error Fetching Resource". Using a local copy of the logo
fixes this issue.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Docker version 28 broke container restore in combination with network
namespaces. The workaround in the CI script was excluding Docker version
28. Now that there is also Docker version 29, which is still broken,
this also excludes Docker version 29.
Signed-off-by: Adrian Reber <areber@redhat.com>
Commit "plugin: Add DUMP_DEVICES_LATE callback" introduced a new plugin
callback that is invoked in cr_dump_tasks(). The return value of this
callback was assigned to the variable ret. However, this variable is later
used as the return value when goto err is triggered in subsequent
conditions. As a result, CRIU exits with "Dumping finished successfully" even
when some actions have failed and inventory.img has not been created.
To fix this, we replace ret with exit_code and use it only when it is
actually needed.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Introduce an opt-in mode for building and running ZDTM static tests
with Guarded Control Stack (GCS) enabled on AArch64.
Changes:
- Support `GCS_ENABLE=1` builds, adding `-mbranch-protection=standard`
and `-z experimental-gcs=check` to CFLAGS/LDFLAGS.
- Export required GLIBC_TUNABLES at runtime via `TEST_ENV`.
- %.pid rules to prefix test binaries with `$(TEST_ENV)`
so the tunables are set when running tests.
- Makefile rules for selectively enabling GCS in tests
Usage:
# Build and run with GCS enabled
make -C zdtm/static GCS_ENABLE=1 posix_timers
GCS_ENABLE=1 ./zdtm.py run --keep-img=always \
-t zdtm/static/posix_timers
By default (`GCS_ENABLE` unset or 0), test builds and runs are
unchanged.
NOTE: This assumes that the test victim was compiled also using
GCS_ENABLE=1 so that the proper GCS AArch64 ELF headers are present
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Reviewed-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
This commit finalizes AArch64 Guarded Control Stack (GCS)
support by wiring the full dump and restore flow.
The restore path adds the following steps:
- Define shared AArch64 GCS types and constants in a dedicated header
for both compel and CRIU inclusion
- compel: add get/set NT_ARM_GCS via ptrace, enabling user-space
GCS state save and restore.
- During restore switch to the new GCS (via GCSSTR) to place capability
token sa_restorer address
- arch_shstk_trampoline() — We enable GCS in a trampoline that using
prctl(PR_SET_SHADOW_STACK_STATUS, ...) via inline SVC. The trampoline
ineeded because we can’t RET without a valid GCS.
- restorer: map the recorded GCS VMA, populate contents top-down with
GCSSTR, write the signal capability at GCSPR_EL0 and the valid token at
GCSPR_EL0-8, then switch to the rebuilt GCS (GCSSS1)
- Save and restore registers via ptrace
- Extend restorer argument structures to carry GCS state
into post-restore execution
- Add shstk_set_restorer_stack(): sets tmp_gcs to temporary restorer
shadow stack start
- Add gcs_vma_restore implementation (required for mremap of the GCS VMA)
Tested with:
GCS_ENABLE=1 ./zdtm.py run -t zdtm/static/env00
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Add debug and info messages to log Guarded Control Stack state when
dumping AArch64 threads. This includes the following values:
- gcspr_el0
- features_enabled
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: cleanup fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
- Define user_aarch64_gcs_entry in core-aarch64.proto to store
Guarded Control Stack state (gcspr_el0, features_enabled).
- Extend thread_info_aarch64 with an optional gcs field
Also extend thread_info_aarch64 with an optional gcs field
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Introduce an opt-in mode for building and running compel tests
with Guarded Control Stack (GCS) enabled on AArch64.
Changes:
- Extend compel/test/infect to support `GCS_ENABLE=1` builds,
adding `-mbranch-protection=standard` and
`-z experimental-gcs=check` to CFLAGS/LDFLAGS.
- Export required GLIBC_TUNABLES at runtime via `TEST_ENV`.
Usage:
make -C compel/test/infect GCS_ENABLE=1
make -C compel/test/infect GCS_ENABLE=1 run
By default (`GCS_ENABLE` unset or 0), builds and runs are unchanged.
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
When GCS is enabled, the kernel expects a capability token at GCSPR_EL0-8
and sa_restorer at GCSPR_EL0-16 on rt_sigreturn. The sigframe must be
consistent with the kernel’s expectations, with GCSPR_EL0 advanced by -8
having it point to the token on signal entry. On rt_sigreturn, the kernel
verifies the cap at GCSPR_EL0, invalidates it and increments GCSPR_EL0 by 8
at the end of gcs_restore_signal() .
Implement parasite_setup_gcs() to:
- read NT_ARM_GCS via ptrace(PTRACE_GETREGSET)
- write (via ptrace) the computed capability token and restorer address
- update GCSPR_EL0 to point to the token's location
Call parasite_setup_gcs() into parasite_start_daemon() so the sigreturn
frame satisfies kernel's expectation
Tests with GCS remain opt‑in:
make -C compel/test/infect GCS_ENABLE=1 && make -C compel/test/infect run
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: cleanup fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
Add basic prerequisites for Guarded Control Stack (GCS) state on AArch64.
This adds a gcs_context to the signal frame and extends user_fpregs_struct_t to
carry GCS metadata, preparing the groundwork for GCS in the parasite.
For now, the GCS fields are zeroed during compel_get_task_regs(), technically
ignoring GCS since it does not reach the control logic yet; that will be
introduced in the next commit.
The code path is gated and does not affect normal tests. Can be explicitly
enabled and tested via:
make -C infect GCS_ENABLE=1 && make -C infect run
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: clean up fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
Introduce ARM64 Guarded Control Stack (GCS) constants and macros
in a new uapi header for use in both CRIU and compel.
Includes:
- NT_ARM_GCS type
- prctl(2) constants for GCS enable/write/push modes
- Capability token helpers (GCS_CAP, GCS_SIGNAL_CAP)
- HWCAP_GCS definition
These are based on upstream Linux definitions
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
Refactor user_fpregs_struct_t to wrap user_fpsimd_state in a
dedicated struct, preparing for future extending by just
adding new members
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
At least on tests running on Fedora rawhide following error could be
seen:
```
criu/tty.c: In function 'pts_fd_get_index':
criu/tty.c:262:21: error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
262 | char *pos = strrchr(link->name, '/');
|
```
This fixes it.
Signed-off-by: Adrian Reber <areber@redhat.com>
Static code analysis reported:
1. criu/cr-restore.c:2438:2: var_decl: Declaring variable "end_vma"
without initializer.
4. criu/cr-restore.c:2451:5: assign: Assigning: "s_vma" = "&end_vma",
which points to uninitialized data.
7. criu/cr-restore.c:2449:4: uninit_use: Using uninitialized value
"s_vma->list.next".
This tries to fix it by initializing the variable.
Signed-off-by: Adrian Reber <areber@redhat.com>
Static code analysis reported:
criu/cr-dump.c:2328:2: identical_branches: The same code is executed
when the condition "ret" is true or false, because the code in the
if-then branch and after the if statement is identical. Should the if
statement be removed?
This is a fix for the warning.
Signed-off-by: Adrian Reber <areber@redhat.com>
The syntax of the inherit-fd functionality for unix socket and pipe
includes a colon.
Fixes: 0df3f79fc0 ("criu(8): fix --inherit-fd description")
Fixes: c37324b6d0 ("crtools: describe the inherit-fd option")
Signed-off-by: Mark Polyakov <mark@thundercompute.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
On AMD Instinct MI300 systems, restoring a large GPU application can
fail because the checkpoint size is too large and the maximum value of
an offset (with integer type) is insufficient. This problem occurs when
the total size of all buffer objects exceeds int max, not because any
single buffer is too large, but it can also happen with a large number
of small buffers.
Fixes: #2812
Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
These header files are copied directly from the Linux kernel and contain
typos. We skip these files in codespell to simplify maintenance.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch updates drm.h and amdgpu_drm.h kernel headers,
and adds drm_mode.h (included by drm.h) from the rocm-7.1.0
release tag.
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
amdgpu_plugin_drm.c:167:6: error: variable 'num_bos' set but not used [-Werror,-Wunused-but-set-variable]
167 | int num_bos = 0;
|
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Add a comment that explains the purpose of `retry_needed`.
Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Use `return 0` on success in `post_dump_dmabuf_check()` for consistency
with other functions.
Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
These pr_info lines begin with "CC3" and "TWI" were not meant to be
included in the patch.
Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
amdgpu libraries that use dmabuf fd to share GPU memory between
processes close the dmabuf fds immediately after using them.
However, it is possible that checkpoint of a process catches one
of the dmabuf fds open. In that case, the amdgpu plugin needs
to handle it.
The checkpoint of the dmabuf fd does require the device file
it was exported from to have already been dumped
To identify which device this dmabuf fd was exprted from, attempt
to import it on each device, then record the dmabuf handle
it imports as. This handle can be used to restore it.
Signed-off-by: David Francis <David.Francis@amd.com>
The amdgpu plugin was counting how many files were checkpointed
to determine when it should close the device files.
The number of device files is not consistent; a process may
have multiple copies of the drm device files open.
Instead of doing this counting, add a new callback after all
files are checkpointed, so plugins can clean up their
resources at an appropriate time.
Signed-off-by: David Francis <David.Francis@amd.com>
Buffer objects held by the amdgpu drm driver are checkpointed with
the new BO_INFO and MAPPING_INFO ioctls/ioctl options. Handling
is in amdgpu_plugin_drm.h
Handling of imported buffer objects may require dmabuf fds to be
transferred between processes. These occur over fdstore, with the
handle-fstore id relationships kept in shread memory. There is a
new plugin callback: RESTORE_INIT to create the shared memory.
During checkpoint, track shared buffer objects, so that buffer objects
that are shared across processes can be identified.
During restore, track which buffer objects have been restored. Retry
restore of a drm file if a buffer object is imported and the
original has not been exported yet. Skip buffer objects that have
already been completed or cannot be completed in the current restore.
So drm code can use sdma_copy_bo, that function no longer requires
kfd bo structs
Update the protobuf messages with new amdgpu drm information.
Signed-off-by: David Francis <David.Francis@amd.com>
The amdgpu plugin usually calls drm ioctls through the libdrm
wrappers. However, amdgpu restore requires dealing with dmabufs
and gem handles directly, which means drm ioctls must be
called directly.
Add the drm.h header (from the kernel's uapi).
Signed-off-by: David Francis <David.Francis@amd.com>
For amdgpu plugin to call the new amdgpu drm CRIU ioctls, it needs
the amdgpu drm header file, copied from the kernel's includes.
Signed-off-by: David Francis <David.Francis@amd.com>
amdgpu dmabuf CRIU requires the ability of the amdgpu plugin
to retry.
Change files_ext.c to read a response of 1 from a plugin restore
function to mean retry.
Signed-off-by: David Francis <David.Francis@amd.com>
amdgpu represents allocated device memory as a memory mapping
of the device file. This is a non-standard VMA that must
be handled by the plugin, not the normal VMA code.
Ignore all VMAs on device files.
Signed-off-by: David Francis <David.Francis@amd.com>
This patch implements the entire logic to enable the offloading of
buffer object content restoration.
The goal of this patch is to offload the buffer object content
restoration to the main CRIU process so that this restoration can occur
in parallel with other restoration logic (mainly the restoration of
memory state in the restore blob, which is time-consuming) to speed up
the restore phase. The restoration of buffer object content usually
takes a significant amount of time for GPU applications, so
parallelizing it with other operations can reduce the overall restore
time.
It has three parts: the first replaces the restoration of buffer objects
in the target process by sending a parallel restore command to the main
CRIU process; the second implements the POST_FORKING hook in the amdgpu
plugin to enable buffer object content restoration in the main CRIU
process; the third stops the parallel thread in the RESUME_DEVICES_LATE
hook.
This optimization only focuses on the single-process situation (common
case). In other scenarios, it will turn to the original method. This is
achieved with the new `parallel_disabled` flag.
Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Currently the restore of buffer object comsumes a significant amount of
time. However, this part has no logical dependencies with other restore
operations. This patch introduce some structures and some helper
functions for the target process to offload this task to the main CRIU
process.
Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
When enabling parallel restore, the target process and the main CRIU
process need an IPC interface to communicate and transfer restore
commands. This patch adds a Unix domain TCP socket and stores this
socket in `fdstore`.
Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>