The initialization of the struct timespec used as clockid input
parameter was removed in commit:
b4441d1bd8 ("restorer.c: rm unneded struct init")
This causes the build to fail on Alpine with clang version 21.1.2:
GEN criu/pie/parasite-blob.h
criu/pie/restorer.c:1230:39: error: variable 'ts' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
1230 | if (sys_clock_gettime(t->clockid, &ts)) {
| ^~
1 error generated.
make[2]: *** [/criu/scripts/nmk/scripts/build.mk:118: criu/pie/restorer.o] Error 1
make[1]: *** [criu/Makefile:59: pie] Error 2
make: *** [Makefile:278: criu] Error 2
To fix this, we remove the "const" from the declaration of
clock_gettime. Since the kernel writes the current time into
the struct timespec provided by the caller, the pointer must
be writable.
Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This commit finalizes AArch64 Guarded Control Stack (GCS)
support by wiring the full dump and restore flow.
The restore path adds the following steps:
- Define shared AArch64 GCS types and constants in a dedicated header
for both compel and CRIU inclusion
- compel: add get/set NT_ARM_GCS via ptrace, enabling user-space
GCS state save and restore.
- During restore switch to the new GCS (via GCSSTR) to place capability
token sa_restorer address
- arch_shstk_trampoline() — We enable GCS in a trampoline that using
prctl(PR_SET_SHADOW_STACK_STATUS, ...) via inline SVC. The trampoline
ineeded because we can’t RET without a valid GCS.
- restorer: map the recorded GCS VMA, populate contents top-down with
GCSSTR, write the signal capability at GCSPR_EL0 and the valid token at
GCSPR_EL0-8, then switch to the rebuilt GCS (GCSSS1)
- Save and restore registers via ptrace
- Extend restorer argument structures to carry GCS state
into post-restore execution
- Add shstk_set_restorer_stack(): sets tmp_gcs to temporary restorer
shadow stack start
- Add gcs_vma_restore implementation (required for mremap of the GCS VMA)
Tested with:
GCS_ENABLE=1 ./zdtm.py run -t zdtm/static/env00
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
When GCS is enabled, the kernel expects a capability token at GCSPR_EL0-8
and sa_restorer at GCSPR_EL0-16 on rt_sigreturn. The sigframe must be
consistent with the kernel’s expectations, with GCSPR_EL0 advanced by -8
having it point to the token on signal entry. On rt_sigreturn, the kernel
verifies the cap at GCSPR_EL0, invalidates it and increments GCSPR_EL0 by 8
at the end of gcs_restore_signal() .
Implement parasite_setup_gcs() to:
- read NT_ARM_GCS via ptrace(PTRACE_GETREGSET)
- write (via ptrace) the computed capability token and restorer address
- update GCSPR_EL0 to point to the token's location
Call parasite_setup_gcs() into parasite_start_daemon() so the sigreturn
frame satisfies kernel's expectation
Tests with GCS remain opt‑in:
make -C compel/test/infect GCS_ENABLE=1 && make -C compel/test/infect run
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: cleanup fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
Add basic prerequisites for Guarded Control Stack (GCS) state on AArch64.
This adds a gcs_context to the signal frame and extends user_fpregs_struct_t to
carry GCS metadata, preparing the groundwork for GCS in the parasite.
For now, the GCS fields are zeroed during compel_get_task_regs(), technically
ignoring GCS since it does not reach the control logic yet; that will be
introduced in the next commit.
The code path is gated and does not affect normal tests. Can be explicitly
enabled and tested via:
make -C infect GCS_ENABLE=1 && make -C infect run
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: clean up fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
Introduce ARM64 Guarded Control Stack (GCS) constants and macros
in a new uapi header for use in both CRIU and compel.
Includes:
- NT_ARM_GCS type
- prctl(2) constants for GCS enable/write/push modes
- Capability token helpers (GCS_CAP, GCS_SIGNAL_CAP)
- HWCAP_GCS definition
These are based on upstream Linux definitions
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
Refactor user_fpregs_struct_t to wrap user_fpsimd_state in a
dedicated struct, preparing for future extending by just
adding new members
Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
Compilation on gentoo/arm64 (llvm+musl) fails with:
In file included from compel/include/uapi/compel/asm/sigframe.h:4,
from compel/plugins/std/infect.c:14:
/usr/include/asm/sigcontext.h:28:8: error: redefinition of 'struct sigcontext'
28 | struct sigcontext {
| ^~~~~~~~~~
In file included from criu/arch/aarch64/include/asm/restorer.h:4,
from criu/arch/aarch64/crtools.c:11:
/usr/include/asm/sigcontext.h:28:8: error: redefinition of 'struct sigcontext'
28 | struct sigcontext {
| ^~~~~~~~~~
This is happening because <asm/sigcontext.h> and <signal.h> are
mutually incompatible on Linux.
To fix, use <signal.h> instead of <asm/sigcontext.h> for arm64
(like all others arches do).
Fixes: #2766
Signed-off-by: Pepper Gray <hello@peppergray.xyz>
On MIPS platforms, shared libraries may use EI_ABIVERSION = 5 to indicate
support for .MIPS.xhash sections. The previous ELF header check in
handle_binary() strictly compared e_ident against a hardcoded value,
causing legitimate shared objects to be rejected.
This patch replaces the memcmp-based check with a structured validation
of ELF magic and class, and allows EI_ABIVERSION values beside 0.
fixes: #2745
Signed-off-by: dong sunchao <dongsunchao@gmail.com>
PAC stands for Pointer Authentication Code. Each process has 5 PAC keys
and a mask of enabled keys. All this properties have to be C/R-ed.
As they are per-process protperties, we can save/restore them just for
one thread.
Signed-off-by: Andrei Vagin <avagin@google.com>
We need to dynamically calculate TASK_SIZE depending
on the MMU on RISC-V system. [We are using analogical
approach on aarch64/ppc64le.]
This change was tested on physical machine:
StarFive VisionFive 2
isa : rv64imafdc_zicntr_zicsr_zifencei_zihpm_zca_zcd_zba_zbb
mmu : sv39
uarch : sifive,u74-mc
mvendorid : 0x489
marchid : 0x8000000000000007
mimpid : 0x4210427
hart isa : rv64imafdc_zicntr_zicsr_zifencei_zihpm_zca_zcd_zba_zbb
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
We don't need to have compel/arch/riscv64/plugins/std/syscalls/syscalls.S
tracked in git. It is autogenerated. We also need to update our .gitignore
to ignore autogenerated files with syscall tables.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
The restore of a task with shadow stack enabled adds these steps:
* switch from the default shadow stack to a temporary shadow stack
allocated in the premmaped area
* unmap CRIU mappings; nothing changed here, but it's important that
CRIU mappings can be removed only after switching to a temporary
shadow stack
* create shadow stack VMA with map_shadow_stack()
* restore shadow stack contents with wrss
* switch to "real" shadow stack
* lock shadow stack features
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
When calling sigreturn with CET enabled, the kernel verifies that the
shadow stack has proper address of sa_restorer and a "restore token".
Normally, they pushed to the shadow stack when signal processing is
started.
Since compel calls sigreturn directly, the shadow stack should be
updated to match the kernel expectations for sigreturn invocation.
Add parasite_setup_shstk() that sets up the shadow stack with the
address of __export_parasite_head_start as sa_restorer and with the
required restore token.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
To support sigreturn with CET enabled parasite must rewind its stack
before calling sigreturn so that shadow stack will be compatible with
actual calling sequence.
In addition, calling sigreturn from top level routine
(__export_parasite_head_start) will significantly simplify the shadow
stack manipulations required to execute sigreturn.
For x86 make fini_sigreturn() return the stack pointer for the signal
frame that will be used by sigreturn and propagate that return value up
to __export_parasite_head_start.
In non-daemon mode parasite_trap_cmd() returns non-positive value
which allows to distinguish daemon and non-daemon mode and properly stop
at int3 in non-daemon mode.
Architectures other than x86 remain unchanged and will still call
sigreturn from fini_sigreturn().
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
All architectures create on-stack structure for floating point save area
in compel_get_task_regs() if the caller passes NULL rather than a valid
pointer.
The only place that calls compel_get_task_regs() with NULL for floating
point save area is parasite_start_daemon() and it is simpler to define
this strucuture on stack of parasite_start_daemon().
The availability of floating point save data is required in
parasite_start_daemon() to detect shadow stack presence early during
parasite infection and will be used in later patches.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
In the compel/arch/arm/plugins/std/syscalls/syscall.def, the syscall number of bind on ARM64 should be 200 instead of 235
Signed-off-by: Sally Kang <snapekang@gmail.com>
Power ISA 3.0 added a new syscall instruction. Kernel 5.9 added
corresponding support.
Add CRIU support to recognize the new instruction and kernel ABI changes
to properly dump and restore threads executing in syscalls. Without this
change threads executing in syscalls using the scv instruction will not
be restored to re-execute the syscall, they will be restored to execute
the following instruction and will return unexpected error codes
(ERESTARTSYS, etc) to user code.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
The 288d6a61e2 change broke all the syscall numbers.
Reported-by: Michał Mirosław <emmir@google.com>
Fixes: (288d6a61e2 "loongarch64: reformat syscall_64.tbl for 8-wide tabs")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Note: Silently drops MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED as it's
not currently detectable. This is still better than silently dropping
all membarrier() registrations.
Signed-off-by: Michał Mirosław <emmir@google.com>
Apparently Skylake uses init-optimization when saving FPU state, and ptrace()
returns XSTATE_BV[0] = 0 meaning FPU was not used by a task (in init state).
Since CRIU restore uses sigreturn to restore registers, FPU state is always
restored. Fill the state with default values on dump to make restore happy.
Signed-off-by: Michał Mirosław <emmir@google.com>
Newer Intel CPUs (Sapphire Rapids) have a much larger xsave area than
before. Looking at older CPUs I see 2440 bytes.
# cpuid -1 -l 0xd -s 0
...
bytes required by XSAVE/XRSTOR area = 0x00000988 (2440)
On newer CPUs (Sapphire Rapids) it grows to 11008 bytes.
# cpuid -1 -l 0xd -s 0
...
bytes required by XSAVE/XRSTOR area = 0x00002b00 (11008)
This increase the xsave area from one page to four pages.
Without this patch the fpu03 test fails, with this patch it works again.
Signed-off-by: Adrian Reber <areber@redhat.com>
The ppc64le ABI allows functions to store data in caller frames.
When initializing the stack pointer prior to executing parasite code
we need to pre-allocating the minimum sized stack frame before
jumping to the parasite code.
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
The x86 implement hardware breakpoint to accelerate the tracing syscall
procedure instead of `ptrace(PTRACE_SYSCALL)`. The arm64 has the same
capability according to <<Learn the architecture: Armv8-A self-hosted
debug>>[[1]].
<<Arm Architecture Reference Manual for A-profile architecture>[[2]]
illustrates the usage detailly:
- D2.8 Breakpoint Instruction exceptions
- D2.9 Breakpoint exceptions
- D13.3.2 DBGBCR<n>_EL1, Debug Breakpoint Control Registers, n
Note:
[1]: https://developer.arm.com/documentation/102120/0100
[2]: https://developer.arm.com/documentation/ddi0487/latest
Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Error from:
./test/zdtm.py run -t zdtm/static/fpu00 --fault 134 -f h --norst
(00.003111) Dumping GP/FPU registers for 56
(00.003121) Error (compel/arch/x86/src/lib/infect.c:310): Corrupting fpuregs for 56, seed 1651766595
(00.003125) Error (compel/arch/x86/src/lib/infect.c:314): Can't set FPU registers for 56: Invalid argument
(00.003129) Error (compel/src/lib/infect.c:688): Can't obtain regs for thread 56
(00.003174) Error (criu/cr-dump.c:1564): Can't infect (pid: 56) with parasite
See also:
145e9e0d8c6 ("x86/fpu: Fail ptrace() requests that try to set invalid MXCSR values")
145e9e0d8c
We decided to move from mxcsr cleaning up scheme and use mxcsr mask
(0x0000ffbf) as kernel does. Thanks to Dmitry Safonov for pointing out.
Tested-on: Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz
Reported-by: Mr. Jenkins
Suggested-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Those that codespell have a few variants for:
./soccr/soccr.c:219: thise ==> these, this
./soccr/soccr.c:444: sence ==> sense, since
./criu/net.c:665: ot ==> to, of, or
./criu/net.c:775: ot ==> to, of, or
./criu/files.c:1244: wan't ==> want, wasn't
./criu/kerndat.c:1141: happend ==> happened, happens, happen
./criu/mount-v2.c:781: carefull ==> careful, carefully
./test/zdtm/static/socket_aio.c:54: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen4v6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/sk-unix-dgram-ghost.c:201: childs ==> children, child's
./test/zdtm/static/sk-unix-dgram-ghost.c:205: childs ==> children, child's
./compel/arch/x86/src/lib/infect.c:297: automatical ==> automatically, automatic, automated
While at it, do some other minor fixes in the same lines.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
I am not sure if this is going to bring any compatibility issues.
If yes, we need to remove this patch and add "useable" to the list of
ignored words instead.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Will use this for cross mount namespace bindmounts.
Note: don't need separate kdat for mount-v2, as MOVE_MOUNT_SET_GROUP
were added much later than open_tree and all related fixups.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Mounts-v2 requires new kernel feature MOVE_MOUNT_SET_GROUP to be able to
restore propagation between mounts right.
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/7da7f9a17
Changes: define move_mount syscall, check mainstream kernel
MOVE_MOUNT_SET_GROUP feature, use our "linux/mount.h" to overcome
possible problems of non-existing header on older kernels.
v3: coverity CID 389201: check ret of umount2 and rmdir at cleanup stage
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
During error injection tests there are random values loaded in some of
the registers. The kernel, however, has the following check:
if (mxcsr[0] & ~mxcsr_feature_mask)
return -EINVAL;
So depending on the random values loaded mxcsr might have values that
the kernel rejects with EINVAL. Setting mxcsr to zero during the tests
lets the error injection test pass.
Signed-off-by: Adrian Reber <areber@redhat.com>
When PTRACE_GET_THREAD_AREA errors on kernels with
!CONFIG_IA32_EMULATION beacuse of missing support (-EIO), compel should
ignore uch errors in native mode.
However the check for error type uses return value of ptrace rather than
errno, which will always result in error propagation.
Use errno to detect type of error to fix this.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Fixes: e2e8be37 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Since
e2e8be37 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")
we doing fault-injection test for C/R of threads register set by filling tasks
xsave structures with the garbage. But there are some features for which that's not
safe. It leads to failures like described in #1635
In this particular case we meet the problem with PKRU feature, the problem
that after corrupting pkru registers we may restrict access to some vma areas,
so, after that process with the parasite injected get's segfault and crashes.
Let's manually specify which features is save to fill with the garbage by
keeping proper XFEATURE_MASK_FAULTINJ mask value.
Fixes: e2e8be37 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")
https://github.com/checkpoint-restore/criu/issues/1635
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
The mips64el-cross test target started to show following error:
error: listing the stack pointer register '$29' in a clobber list is deprecated [-Werror=deprecated]
This fixes it in three different places by removing $29' from the
clobber list. This is only compile tested as we have no mips hardware
for testing.
Signed-off-by: Adrian Reber <areber@redhat.com>
pidfd_getfd syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.
v2:
- check size written/read of val_a/val_b is correct
- return with error when val_a != val_b
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>