Compare commits

...

1856 commits

Author SHA1 Message Date
unichronic
9e5fbcd668 pycriu: Fix self-dump failure with explicit PID
When `opts.pid` is explicitly set to `os.getpid()`, `pycriu` fails to
daemonize the `criu` process. This causes `criu` to run as a child of
the dumped process, leading to the error "The criu itself is within
dumped tree".

This can be fixed by modifying `_send_req_and_recv_resp` to check if the
target PID matches the current process PID. If so, it enables daemon
mode, ensuring `criu` is detached and the dump succeeds.

Signed-off-by: unichronic <ishuvam.pal@gmail.com>
2026-01-21 00:25:29 +00:00
Pavel Tikhomirov
21a6758268 cr-restore/shstk: Make arch_shstk_unlock use correct pid
In a simple case where the parent process and the child one are in one
pid namespace we can safely use vpid(item) to prace the child. But, for
the cases where the child is a pid namespace init, or the child is put
into external pid namespace, the parent and the child have different pid
namespaces and using pid vpid(item) (which e.g. for init will always be
1 here) to ptrace the child process is inorrect.

Let's use the pid reported to us from clone as it's always the right pid
of the child from the parent's point of view.

Fixes: 7dd583002 ("restore: add infrastructure to enable shadow stack")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2026-01-20 00:08:19 +00:00
liqiang2020
07af3304fd restore/pie: check return value of sys_rseq on unregister
The return value of sys_rseq was previously ignored during
unregistration, under the assumption that it would not fail if the rseq
structure was properly registered.

However, if sys_rseq fails, the kernel retains the registration. If the
memory containing the rseq structure is subsequently unmapped or reused,
kernel updates to the rseq area can cause the process to crash (e.g.,
via SIGSEGV).

Check the return value of sys_rseq. If it fails, log the error code and
abort the restoration process. This makes rseq unregistration failures
fatal and explicit, aiding in debugging and preventing later obscure
crashes.

Signed-off-by: liqiang2020 <liqiang64@huawei.com>
2026-01-12 19:07:39 -08:00
Adrian Reber
fb59ae504e test: fix GCC 16 compile error
Fedora rawhide ships a pre-release of GCC 16 which produces following
error:

  uprobes.c:34:22: error: variable ‘dummy’ set but not used [-Werror=unused-but-set-variable=]
  34 |         volatile int dummy = 0;
  |                      ^~~~~

Marking this variable as "__maybe_unused" to fix the error.

Signed-off-by: Adrian Reber <areber@redhat.com>
2026-01-12 19:06:43 -08:00
Radostin Stoyanov
b208bec12d crit: show dead task_state
In some cases, CRIU can observe tasks that exit during checkpointing,
and sets the state of these tasks to COMPEL_TASK_DEAD.
This patch adds a string representation of this value that can be used
by CRIT when decoding the images.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2026-01-12 18:49:12 -08:00
Radostin Stoyanov
9885fb3c75 crit: fix incorrect task state decoding
CRIU defines the following constants for task state in compel/include/uapi/task-state.h

COMPEL_TASK_ALIVE = 0x01
COMPEL_TASK_STOPPED = 0x03
COMPEL_TASK_ZOMBIE = 0x06

Thus, we need to swap the values for "zombie" and "stopped" used in CRIT.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2026-01-12 18:49:12 -08:00
ImranullahKhann
71fe85ec90 ci: add iproute2 to the list of packages in apt-packages.sh
When running the command 'make docker-test', almost all zdtm tests fail,
logging 'ip: not found'. 'ip' command of the iproute2 package was missing.
So added the package to the list of dependencies in 'apt-packages.sh'. Now
tests run

Signed-off-by: ImranullahKhann <imranullahkhann2004@gmail.com>
2026-01-08 15:35:49 -08:00
Radostin Stoyanov
36f1e9d38c amdgpu: use fseeko with large-file support instead of fseeko64
As of Alpine Linux 3.19, musl libc no longer contains separate
fopen64(), fseeko64(), or ftello64() functions. This causes building
CRIU with amdgpu plugin to fail with the following error:

amdgpu_plugin.c: In function 'parallel_restore_bo_contents':
amdgpu_plugin.c:2286:17: error: implicit declaration of function 'fseeko64'; did you mean 'fseeko'? [-Wimplicit-function-declaration]
 2286 |                 fseeko64(bo_contents_fp, entry->read_offset + offset, SEEK_SET);
      |                 ^~~~~~~~
      |                 fseeko
make[2]: *** [Makefile:31: amdgpu_plugin.so] Error 1
make[1]: *** [Makefile:363: amdgpu_plugin] Error 2

To fix this, add the missing $(DEFINES) to plugin builds, and since we
always compile with _FILE_OFFSET_BITS=64, we don't need the 64 suffix.

Fixes: #2826

Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2026-01-08 07:48:23 -08:00
Radostin Stoyanov
ddf7a170ff infect-types: fix user_gcs redefine error
In file included from compel/arch/aarch64/src/lib/infect.c:10:
compel/include/uapi/compel/asm/infect-types.h:24:8: error: redefinition of 'user_gcs'
   24 | struct user_gcs {
      |        ^
/usr/include/asm/ptrace.h:329:8: note: previous definition is here
  329 | struct user_gcs {
      |        ^
1 error generated.
make[1]: *** [/criu/scripts/nmk/scripts/build.mk:215: compel/arch/aarch64/src/lib/infect.o] Error 1

Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2026-01-08 07:48:23 -08:00
Radostin Stoyanov
2dd66866e3 zdtm/cgroup_stray: fix uninitialized variable
51.04  DEP       cgroup_stray.d
51.07  CC        cgroup_stray.o
51.11 cgroup_stray.c:164:18: error: variable 'c' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
51.11   164 |                 if (write(sk, &c, 1) != 1) {
51.11       |                                ^
51.11 1 error generated.
51.12 make[1]: *** [../Makefile.inc:96: cgroup_stray.o] Error 1
51.12 make[1]: Leaving directory '/criu/test/zdtm/static'
51.12 make: *** [Makefile:7: static] Error 2

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2026-01-08 07:48:23 -08:00
Radostin Stoyanov
974c1bc898 zdtm/tempfs_subns: fix uninitialized variable
DEP       tempfs_subns.d
 CC        tempfs_subns.o
tempfs_subns.c:50:23: error: variable 'fd' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
   50 |                         if (write(fds[1], &fd, sizeof(fd)) != sizeof(fd)) {
      |                                            ^~
1 error generated.
make[1]: *** [../Makefile.inc:96: tempfs_subns.o] Error 1

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2026-01-08 07:48:23 -08:00
Radostin Stoyanov
b1a51489dd compel: fix sys_clock_gettime function signature
The initialization of the struct timespec used as clockid input
parameter was removed in commit:
b4441d1bd8 ("restorer.c: rm unneded struct init")

This causes the build to fail on Alpine with clang version 21.1.2:

  GEN      criu/pie/parasite-blob.h
  criu/pie/restorer.c:1230:39: error: variable 'ts' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
   1230 |                         if (sys_clock_gettime(t->clockid, &ts)) {
        |                                                            ^~
  1 error generated.
  make[2]: *** [/criu/scripts/nmk/scripts/build.mk:118: criu/pie/restorer.o] Error 1
  make[1]: *** [criu/Makefile:59: pie] Error 2
  make: *** [Makefile:278: criu] Error 2

To fix this, we remove the "const" from the declaration of
clock_gettime. Since the kernel writes the current time into
the struct timespec provided by the caller, the pointer must
be writable.

Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2026-01-08 07:48:23 -08:00
Pavel Tikhomirov
fc1867c44d kerndat: Fix error handling for kerndat_has_timer_cr_ids() fail
After commit [1] we accidentally stopped reporting the errors from
kerndat_has_timer_cr_ids(), let's fix that.

Fixes: 1eaa870cc ("kerndat: check that hardware breakpoints work") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2026-01-02 12:51:53 -08:00
Pavel Tikhomirov
2e5f9facf9 util: Make close_safe() reset fd to -1 even on close() failure
The "man 2 close":"Dealing with error returns from close()" says:

  "Retrying the close() after a failure return is the wrong thing to do"

We should not leave the fd there, attempting to close it again on next
close()/close_safe() may lead to accidentally closing something else.

It confirms with the kernel code where sys_close() removes fd from
fdtable in this stack:

  +-> sys_close
    +-> file_close_fd
      +-> file_close_fd_locked
        +-> rcu_assign_pointer(fdt->fd[fd], NULL)

If there was an fd this stack is always reached and fd is always
removed.

Let's replace the fd with -1 after close no matter what.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-12-29 10:00:35 +00:00
Radostin Stoyanov
d4e8114130 readme: use a local copy of the CRIU logo
The README currently uses an external link to criu.org for the embedded
CRIU logo. Loading this URL when viewing the README on GitHub sometimes
fails with "Error Fetching Resource". Using a local copy of the logo
fixes this issue.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-12-17 08:43:50 -08:00
Adrian Reber
30acbabcdd ci: also exclude docker version 29
Docker version 28 broke container restore in combination with network
namespaces. The workaround in the CI script was excluding Docker version
28. Now that there is also Docker version 29, which is still broken,
this also excludes Docker version 29.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-12-14 17:28:58 +09:00
Radostin Stoyanov
f66e59ee5c cr-dump: fix error handling
Commit "plugin: Add DUMP_DEVICES_LATE callback" introduced a new plugin
callback that is invoked in cr_dump_tasks(). The return value of this
callback was assigned to the variable ret. However, this variable is later
used as the return value when goto err is triggered in subsequent
conditions. As a result, CRIU exits with "Dumping finished successfully" even
when some actions have failed and inventory.img has not been created.

To fix this, we replace ret with exit_code and use it only when it is
actually needed.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-12-09 12:23:23 -08:00
Igor Svilenkov Bozic
f78bea8d34 zdtm: gcs: add opt-in GCS test support for AArch64
Introduce an opt-in mode for building and running ZDTM static tests
with Guarded Control Stack (GCS) enabled on AArch64.

Changes:
 - Support `GCS_ENABLE=1` builds, adding `-mbranch-protection=standard`
   and `-z experimental-gcs=check` to CFLAGS/LDFLAGS.
 - Export required GLIBC_TUNABLES at runtime via `TEST_ENV`.
 - %.pid rules to prefix test binaries with `$(TEST_ENV)`
   so the tunables are set when running tests.
 - Makefile rules for selectively enabling GCS in tests

Usage:
    # Build and run with GCS enabled
    make -C zdtm/static GCS_ENABLE=1 posix_timers
    GCS_ENABLE=1 ./zdtm.py run --keep-img=always \
        -t zdtm/static/posix_timers

By default (`GCS_ENABLE` unset or 0), test builds and runs are
unchanged.

NOTE: This assumes that the test victim was compiled also using
      GCS_ENABLE=1 so that the proper GCS AArch64 ELF headers are present

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Reviewed-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
d591e320e0 criu/restore: gcs: adds restore implementation for Guarded Control Stack
This commit finalizes AArch64 Guarded Control Stack (GCS)
support by wiring the full dump and restore flow.

The restore path adds the following steps:

 - Define shared AArch64 GCS types and constants in a dedicated header
   for both compel and CRIU inclusion
 - compel: add get/set NT_ARM_GCS via ptrace, enabling user-space
   GCS state save and restore.
 - During restore switch to the new GCS (via GCSSTR) to place capability
   token sa_restorer address
 - arch_shstk_trampoline() — We enable GCS in a trampoline that using
   prctl(PR_SET_SHADOW_STACK_STATUS, ...) via inline SVC. The trampoline
   ineeded because we can’t RET without a valid GCS.
 - restorer: map the recorded GCS VMA, populate contents top-down with
   GCSSTR, write the signal capability at GCSPR_EL0 and the valid token at
   GCSPR_EL0-8, then switch to the rebuilt GCS (GCSSS1)
 - Save and restore registers via ptrace
 - Extend restorer argument structures to carry GCS state
   into post-restore execution
 - Add shstk_set_restorer_stack(): sets tmp_gcs to temporary restorer
   shadow stack start
 - Add gcs_vma_restore implementation (required for mremap of the GCS VMA)

Tested with:
    GCS_ENABLE=1 ./zdtm.py run -t zdtm/static/env00

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
2429d49e67 criu/dump: gcs: save GCS state during dump
Add debug and info messages to log Guarded Control Stack state when
dumping AArch64 threads. This includes the following values:
  - gcspr_el0
  - features_enabled

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: cleanup fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
41ecb7ac71 images: aarch64: add user_aarch64_gcs_entry
- Define user_aarch64_gcs_entry in core-aarch64.proto to store
    Guarded Control Stack state (gcspr_el0, features_enabled).
  - Extend thread_info_aarch64 with an optional gcs field

Also extend thread_info_aarch64 with an optional gcs field

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
92e6e523b5 compel: gcs: add opt-in GCS test support for AArch64
Introduce an opt-in mode for building and running compel tests
with Guarded Control Stack (GCS) enabled on AArch64.

Changes:
 - Extend compel/test/infect to support `GCS_ENABLE=1` builds,
   adding `-mbranch-protection=standard` and
   `-z experimental-gcs=check` to CFLAGS/LDFLAGS.
 - Export required GLIBC_TUNABLES at runtime via `TEST_ENV`.

Usage:
    make -C compel/test/infect GCS_ENABLE=1
    make -C compel/test/infect GCS_ENABLE=1 run

By default (`GCS_ENABLE` unset or 0), builds and runs are unchanged.

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
2f676d20e4 compel: gcs: set up GCS token/restorer for rt_sigreturn
When GCS is enabled, the kernel expects a capability token at GCSPR_EL0-8
and sa_restorer at GCSPR_EL0-16 on rt_sigreturn. The sigframe must be
consistent with the kernel’s expectations, with GCSPR_EL0 advanced by -8
having it point to the token on signal entry. On rt_sigreturn, the kernel
verifies the cap at GCSPR_EL0, invalidates it and increments GCSPR_EL0 by 8
at the end of gcs_restore_signal() .

Implement parasite_setup_gcs() to:
- read NT_ARM_GCS via ptrace(PTRACE_GETREGSET)
- write (via ptrace) the computed capability token and restorer address
- update GCSPR_EL0 to point to the token's location

Call parasite_setup_gcs() into parasite_start_daemon() so the sigreturn
frame satisfies kernel's expectation

Tests with GCS remain opt‑in:
	make -C compel/test/infect GCS_ENABLE=1 && make -C compel/test/infect run

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: cleanup fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
6bb856b0af compel: gcs: initial GCS support for signal frames
Add basic prerequisites for Guarded Control Stack (GCS) state on AArch64.

This adds a gcs_context to the signal frame and extends user_fpregs_struct_t to
carry GCS metadata, preparing the groundwork for GCS in the parasite.

For now, the GCS fields are zeroed during compel_get_task_regs(), technically
ignoring GCS since it does not reach the control logic yet; that will be
introduced in the next commit.

The code path is gated and does not affect normal tests. Can be explicitly
enabled and tested via:

    make -C infect GCS_ENABLE=1 && make -C infect run

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: clean up fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
73ca071483 gcs: add GCS constants and helper macros
Introduce ARM64 Guarded Control Stack (GCS) constants and macros
in a new uapi header for use in both CRIU and compel.

Includes:
 - NT_ARM_GCS type
 - prctl(2) constants for GCS enable/write/push modes
 - Capability token helpers (GCS_CAP, GCS_SIGNAL_CAP)
 - HWCAP_GCS definition

These are based on upstream Linux definitions

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
2025-12-07 19:20:00 +01:00
Igor Svilenkov Bozic
501b714f76 compel/aarch64: refactor fpregs handling
Refactor user_fpregs_struct_t to wrap user_fpsimd_state in a
dedicated struct, preparing for future extending by just
adding new members

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
[ alex: fixes ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Mike Rapoport <rppt@kernel.org>
2025-12-07 19:20:00 +01:00
Adrian Reber
90300748ef tty: fix compiler error
At least on tests running on Fedora rawhide following error could be
seen:

```
  criu/tty.c: In function 'pts_fd_get_index':
  criu/tty.c:262:21: error: initialization discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
    262 |         char *pos = strrchr(link->name, '/');
        |
```

This fixes it.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-11-28 09:18:59 +00:00
Adrian Reber
09bb362664 restore: fix "Defect type: UNINIT"
Static code analysis reported:

  1. criu/cr-restore.c:2438:2: var_decl: Declaring variable "end_vma"
     without initializer.
  4. criu/cr-restore.c:2451:5: assign: Assigning: "s_vma" = "&end_vma",
     which points to uninitialized data.
  7. criu/cr-restore.c:2449:4: uninit_use: Using uninitialized value
     "s_vma->list.next".

This tries to fix it by initializing the variable.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-11-28 09:18:15 +00:00
Adrian Reber
bf82389de3 dump: fix "Defect type: IDENTICAL_BRANCHES"
Static code analysis reported:

 criu/cr-dump.c:2328:2: identical_branches: The same code is executed
 when the condition "ret" is true or false, because the code in the
 if-then branch and after the if statement is identical. Should the if
 statement be removed?

This is a fix for the warning.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-11-28 09:18:15 +00:00
Mark Polyakov
2cf8f13ca1 doc: update pipe/socket examples for --inherit-fd
The syntax of the inherit-fd functionality for unix socket and pipe
includes a colon.

Fixes: 0df3f79fc0 ("criu(8): fix --inherit-fd description")
Fixes: c37324b6d0 ("crtools: describe the inherit-fd option")
Signed-off-by: Mark Polyakov <mark@thundercompute.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-16 15:56:26 +00:00
Yanning Yang
62aadb22ab amdgpu: use 64-bit offsets for parallel restore
On AMD Instinct MI300 systems, restoring a large GPU application can
fail because the checkpoint size is too large and the maximum value of
an offset (with integer type) is insufficient. This problem occurs when
the total size of all buffer objects exceeds int max, not because any
single buffer is too large, but it can also happen with a large number
of small buffers.

Fixes: #2812

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-16 07:44:37 -08:00
Radostin Stoyanov
1db7eed69f amdgpu: use local kernel headers instead of libdrm
Use local copies of amdgpu and DRM headers for consistency.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
Radostin Stoyanov
29525f8cb3 codespell: skip amdgpu kernel headers
These header files are copied directly from the Linux kernel and contain
typos. We skip these files in codespell to simplify maintenance.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
Radostin Stoyanov
e4a5e164b4 plugins/amdgpu: update kernel headers
This patch updates drm.h and amdgpu_drm.h kernel headers,
and adds drm_mode.h (included by drm.h) from the rocm-7.1.0
release tag.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
Radostin Stoyanov
f56ccfd2d6 plugins/amdgpu: remove unused variable
amdgpu_plugin_drm.c:167:6: error: variable 'num_bos' set but not used [-Werror,-Wunused-but-set-variable]
  167 |         int num_bos = 0;
      |

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
David Francis
6ed49894c5 plugins/amdgpu: add a comment for retry_needed
Add a comment that explains the purpose of `retry_needed`.

Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
David Francis
77e6558ddb plugins/amdgpu: apply code-style fixes
Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
David Francis
690b610432 plugins/amdgpu: return 0 in post_dump_dmabuf_check
Use `return 0` on success in `post_dump_dmabuf_check()` for consistency
with other functions.

Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
David Francis
ff35a9126e plugins/amdgpu: remove excessive debug messages
These pr_info lines begin with "CC3" and "TWI" were not meant to be
included in the patch.

Co-authored-by: Andrei Vagin <avagin@google.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-14 18:31:37 +00:00
David Francis
9e404e2083 plugin/amdgpu: Support for checkpoint of dmabuf fds
amdgpu libraries that use dmabuf fd to share GPU memory between
processes close the dmabuf fds immediately after using them.
However, it is possible that checkpoint of a process catches one
of the dmabuf fds open. In that case, the amdgpu plugin needs
to handle it.

The checkpoint of the dmabuf fd does require the device file
it was exported from to have already been dumped

To identify which device this dmabuf fd was exprted from, attempt
to import it on each device, then record the dmabuf handle
it imports as. This handle can be used to restore it.

Signed-off-by: David Francis <David.Francis@amd.com>
2025-11-14 18:31:37 +00:00
David Francis
d43217dadb plugin: Add DUMP_DEVICES_LATE callback
The amdgpu plugin was counting how many files were checkpointed
to determine when it should close the device files.

The number of device files is not consistent; a process may
have multiple copies of the drm device files open.

Instead of doing this counting, add a new callback after all
files are checkpointed, so plugins can clean up their
resources at an appropriate time.

Signed-off-by: David Francis <David.Francis@amd.com>
2025-11-14 18:31:37 +00:00
David Francis
db0ec806d1 plugin/amdgpu: Add handling for amdgpu drm buffer objects
Buffer objects held by the amdgpu drm driver are checkpointed with
the new BO_INFO and MAPPING_INFO ioctls/ioctl options. Handling
is in amdgpu_plugin_drm.h

Handling of imported buffer objects may require dmabuf fds to be
transferred between processes. These occur over fdstore, with the
handle-fstore id relationships kept in shread memory. There is a
new plugin callback: RESTORE_INIT to create the shared memory.

During checkpoint, track shared buffer objects, so that buffer objects
that are shared across processes can be identified.

During restore, track which buffer objects have been restored. Retry
restore of a drm file if a buffer object is imported and the
original has not been exported yet. Skip buffer objects that have
already been completed or cannot be completed in the current restore.

So drm code can use sdma_copy_bo, that function no longer requires
kfd bo structs

Update the protobuf messages with new amdgpu drm information.

Signed-off-by: David Francis <David.Francis@amd.com>
2025-11-14 18:31:36 +00:00
David Francis
5eb61e1b14 plugin/amdgpu: Add drm header
The amdgpu plugin usually calls drm ioctls through the libdrm
wrappers. However, amdgpu restore requires dealing with dmabufs
and gem handles directly, which means drm ioctls must be
called directly.

Add the drm.h header (from the kernel's uapi).

Signed-off-by: David Francis <David.Francis@amd.com>
2025-11-14 18:31:36 +00:00
David Francis
0b7ca29c19 plugin/amdgpu: Add amdgpu drm header
For amdgpu plugin to call the new amdgpu drm CRIU ioctls, it needs
the amdgpu drm header file, copied from the kernel's includes.

Signed-off-by: David Francis <David.Francis@amd.com>
2025-11-14 18:31:36 +00:00
David Francis
fb02dbf685 files-ext: Allow plugin files to retry
amdgpu dmabuf CRIU requires the ability of the amdgpu plugin
to retry.

Change files_ext.c to read a response of 1 from a plugin restore
function to mean retry.

Signed-off-by: David Francis <David.Francis@amd.com>
2025-11-14 18:31:36 +00:00
David Francis
7a4ee0ae8e restorer: Skip non-regular VMAs
amdgpu represents allocated device memory as a memory mapping
of the device file. This is a non-standard VMA that must
be handled by the plugin, not the normal VMA code.

Ignore all VMAs on device files.

Signed-off-by: David Francis <David.Francis@amd.com>
2025-11-14 18:31:36 +00:00
Yanning Yang
920437205c plugins/amdgpu: Update README.md and criu-amdgpu-plugin.txt
Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-14 18:27:31 +00:00
Yanning Yang
4a3a695dfb plugins/amdgpu: Implement parallel restore
This patch implements the entire logic to enable the offloading of
buffer object content restoration.

The goal of this patch is to offload the buffer object content
restoration to the main CRIU process so that this restoration can occur
in parallel with other restoration logic (mainly the restoration of
memory state in the restore blob, which is time-consuming) to speed up
the restore phase. The restoration of buffer object content usually
takes a significant amount of time for GPU applications, so
parallelizing it with other operations can reduce the overall restore
time.

It has three parts: the first replaces the restoration of buffer objects
in the target process by sending a parallel restore command to the main
CRIU process; the second implements the POST_FORKING hook in the amdgpu
plugin to enable buffer object content restoration in the main CRIU
process; the third stops the parallel thread in the RESUME_DEVICES_LATE
hook.

This optimization only focuses on the single-process situation (common
case). In other scenarios, it will turn to the original method. This is
achieved with the new `parallel_disabled` flag.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-14 18:27:31 +00:00
Yanning Yang
33ed774c8d plugins/amdgpu: Add parallel restore command
Currently the restore of buffer object comsumes a significant amount of
time. However, this part has no logical dependencies with other restore
operations. This patch introduce some structures and some helper
functions for the target process to offload this task to the main CRIU
process.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-14 18:27:31 +00:00
Yanning Yang
6386140754 plugins/amdgpu: Add socket operations
When enabling parallel restore, the target process and the main CRIU
process need an IPC interface to communicate and transfer restore
commands. This patch adds a Unix domain TCP socket and stores this
socket in `fdstore`.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-14 18:27:31 +00:00
Pengda Yang
ddbb3dbd8d limit the field width of 'scanf'
Fixes: #2121

Signed-off-by: Pengda Yang <daz-3ux@proton.me>
2025-11-14 18:26:27 +00:00
Andrei Vagin
3c7d4fa013 criu: Version 4.2 (CRIUTIBILITY)
Major changes:
* plugins/amdgpu: Implement parallel restore
* Handle processes with uprobes vma
* Fix: getsockopt usage for SO_PASSCRED/SO_PASSSEC on Linux 6.16
* Relax ELF magic check to support MIPS libraries
* pagemap: prevent integer overflow in pagemap_len

This release's name is a nod to the growing challenge we face in
maintaining compatibility across the rapidly evolving Linux kernel
ecosystem.

The full changelog can be found here: https://criu.org/Download/criu/4.2.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-13 08:40:46 -08:00
Andrei Vagin
0a7e7d09dd log: use sizeof(*hdr) instead of sizeof(hdr)
Using sizeof(hdr) where hdr is a pointer gives the size of the pointer,
not the size of the structure it points to.

Reported-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-13 08:40:46 -08:00
Andrei Vagin
e689d902b3 criu/log: properly handle truncated length from vsnprintf
vsnprintf does not always return the number of bytes actually written to
the buffer.

If the output was truncated due to the buffer limit, the return value is
the total number of bytes which WOULD have been written to the final
string if enough space had been available.

This means we must cap the return value to the buffer size excluding the
terminating null byte to correctly calculate the log entry size.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-13 08:40:46 -08:00
Radostin Stoyanov
6344e8d71c cr-servce: move kerndat_init after log_init
kerndat_init() can generate a significant volume of logs. If called
before log_init(), all these messages will be saved in the
early_log_buffer, which has a limited capacity. Additionally, saving to
the early_log_buffer can introduce a performance penalty, especially
when verbose mode is not enabled.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-13 08:40:46 -08:00
Andrei Vagin
a525b3c32e test/vdso-proxy: handle merged vma-s
When we compare two list of vma-s, we need to take into account that
some of them could be merged.

Fixes #12286

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-13 08:40:46 -08:00
Andrei Vagin
ce680fc6c7 Revert "plugins/amdgpu: Implement parallel restore"
This functionality (#2527) is being reverted and excluded from this
release due to issue #2812.

It will be included in a subsequent release once all associated issues
are resolved.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-13 08:40:46 -08:00
Radostin Stoyanov
1d08ff8ca7 coredump: fix handling of num_pages
This patch fixes the following error:

$ sudo make -C test/others/criu-coredump run
...
Traceback (most recent call last):
  File "/home/circleci/criu/coredump/coredump", line 55, in <module>
    main()
  File "/home/circleci/criu/coredump/coredump", line 47, in main
    coredump(opts)
  File "/home/circleci/criu/coredump/coredump", line 14, in coredump
    cores = generator(os.path.realpath(opts['in']))
  File "/home/circleci/criu/coredump/criu_coredump/coredump.py", line 192, in __call__
    self.coredumps[pid] = self._gen_coredump(pid)
  File "/home/circleci/criu/coredump/criu_coredump/coredump.py", line 214, in _gen_coredump
    cd.vmas = self._gen_vmas(pid)
  File "/home/circleci/criu/coredump/criu_coredump/coredump.py", line 992, in _gen_vmas
    v.data = self._gen_mem_chunk(pid, vma, v.filesz)
  File "/home/circleci/criu/coredump/criu_coredump/coredump.py", line 879, in _gen_mem_chunk
    page_mem = self._get_page(pid, page_no)
  File "/home/circleci/criu/coredump/criu_coredump/coredump.py", line 797, in _get_page
    num_pages = m.get("nr_pages", m.compat_nr_pages)
AttributeError: 'dict' object has no attribute 'compat_nr_pages'
+ exit 1
make[1]: *** [Makefile:3: run] Error 1

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-13 08:40:46 -08:00
alam0rt
cb8e1da3f4 coredump: use compat_nr_pages as fallback
Use nr_pages when available, falling back to compat_nr_pages
for compatibility.

Signed-off-by: alam0rt <sam@samlockart.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:42:32 -08:00
Radostin Stoyanov
0fa6ff3d18 test/others: add tests for check() with pycriu
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:35 -08:00
Radostin Stoyanov
567f70ce19 test/others: add test for check() with libcriu
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:35 -08:00
Radostin Stoyanov
a1dc885027 test/rpc: update errno check
The --mntns-compat-mode option is no longer parsed with CHECK.
Use --log-file instead to test the error message.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:35 -08:00
Radostin Stoyanov
3c841af2cf pycriu: use explicit imports for __init__
_init__.py defines the public API for pycriu. It is important to use
explicit imports to avoid leaking every symbol from criu.py into the
pycriu namespace. This avoids import-time side effects, prevents name
collisions, and circular-import traps.

Fixes the following lint error:
  F403 `from .criu import *` used; unable to detect undefined names

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:35 -08:00
Radostin Stoyanov
f7ccb63bdd pycriu: set RPC opts for CHECK
This allows users to specify RPC options when
using the check() functionality.

Co-authored-by: Andrii Herheliuk <andrii@herheliuk.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:35 -08:00
Radostin Stoyanov
9371c4a789 cr-service: refactor RPC opts parsing for check()
The check() functionality is very different from dump, pre-dump,
and restore. It is used only to check if the kernel supports required
features, and does not need the majority of options set via RPC.

In particular, we don't need to open `image_dir` when running `check()`
because this functionality doesn't create or process image files. In
this case, `image_dir` is used as `work_dir`, only when the latter is
not specified and a log file is used.

This patch updates the RPC options parser so that it only handles the
logging options when check() is used. Logging to a file is required when
log_file is explicitly set or no log_to_stderr is used. In such case, we
also resolve images_dir and work_dir where the log file will be created.

Fixes: #2758

Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:35 -08:00
Radostin Stoyanov
72ca94db4d cr-service: refactor logging setup
Move the logging initialization into a helper function that
can be reused.

No functional change intended.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:35 -08:00
Radostin Stoyanov
5966ffe8a7 cr-service: refactor images_dir path resolution
Move the images_dir selection logic from setup_opts_from_req() into a
new function: resolve_images_dir_path(). This improves readability and
allows the code to be reused. While at it, use snprintf() instead of
sprintf() for the /proc path and ensure NULL termination after strncpy().

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
60a731ab38 cr-service: drop images_dir from setproctitle
Commit 9089ce8 ("service: use setproctitle") extended cr-service to
get the full path of images_dir using readlink(). However, the RPC
API was later extended to allow setting a custom path (folder) to
be set instead of passing a file descriptor, which causes readlink()
to fail as the path is not a symbolic link.

It would be better to drop the code setting the images-dir path as a
string in the proctitle.

Fixes: #2794

Suggested-by: Andrei Vagin <avagin@google.com>
Co-authored-by: Andrii Herheliuk <andrii@herheliuk.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
ee4100c09f cr-service: refactor images/workdir setup
Move the code that opens the images directory, resolves its absolute
path via readlink(), selects the work_dir, and chdir()s into it into a
new function: setup_images_and_workdir(). This reduces the size of
`setup_opts_from_req()`, improves its readability, and allows this
functionality to be reused.

While at it, change open_image_dir() to take a const char *dir
parameter, reflecting that the path is not modified by the function and
allowing callers to pass string literals without casts.

No functional changes are intended.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Andrii Herheliuk
71a637923f pycriu: set default value for sk_name
This change allows users to call criu.use_sk() without any
parameters to use the default socket name.

Co-authored-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
2025-11-05 15:41:34 -08:00
Andrii Herheliuk
d2c46b92b0 pycriu: better socket error handling
[Errno 2] No such file or directory -> Socket file not found.
[Errno 111] Connection refused -> Service not running.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
2025-11-05 15:41:34 -08:00
Andrii Herheliuk
7aad7317b4 lib/pycriu: changing the default behavior to use the system binary
Use system-installed CRIU binary instead of a local file

Thanks to @avagin for suggesting this solution.

Co-authored-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
3f97cfe876 test/libcriu: check setting of RPC config file
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
2878faa74c libcriu: enable setting of RPC config file
Container runtimes that use libcriu (e.g., crun) need to specify a CRIU
configuration file that allows to overwrite default options set via RPC.
This is particularly useful to set options such as `--tcp-established`
via `/etc/criu/runc.conf` in Kubernetes.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Kir Kolyshkin
07ad2473f2 Use command -v instead of which
Unlike "which", which is a separate executable not always installed by
default, "command -v" is a shell built-in available at least for bash,
dash, and busybox shell.

Unlike "which", "command -v" is also easier to grep for, and it is
already used in a few places here.

Inspired by commit 57251d811.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
afcfcd3bf6 ci: add which dependency in dnf packages
which is used in Makefiles to check for dependencies:

Example:
export USE_ASCIIDOCTOR ?= $(shell which asciidoctor 2>/dev/null)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
6860181474 ci: add wheel and setuptools in dnf packages
These dependencies are required to for `pip install`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
d3dfb663b1 make: don't install external dependencies
Don't install external pip dependencies when running `make install`.
As we are not really into developing a Python project, we should
not install additional packages. CRIU does that nowhere else.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
f74e68daf9 ci: verify call order of action-script hooks
The existing test collects all action-script hooks triggered during
`h`, `ns`, and `uns` runs with ZDTM into `actions_called.txt`, then
verifies that each hook appears at least once. However, the test does
not verify that hooks are invoked *exactly once* or in *correct order*.

This change updates the test to run ZDTM only with ns flavour as this
seems to cover all action-script hooks, and checks that all hooks are
called correctly.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
f824dc735b ci: consolidate action-script tests
This patch consolidates the action-script tests into
`test/others/action-script` to ensure all tests are executed
consistently and reduce duplication. Since we had two tests that appear
to do the same thing, we can remove the one that doesn't use zdtm.py.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Andrii Herheliuk
d5c81f8108 pycriu: prevent always appending "Unknown" to error messages
Regardless of the actual error message, "Unknown" was always appended
to the end of the string, resulting in messages like:
"DUMP failed: Error(3): No process with such pidUnknown".

Fixed by changing standalone if statements to else-if blocks so
"Unknown" is only added when no specific error condition matches.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
2025-11-05 15:41:34 -08:00
Andrii Herheliuk
540c631dd0 pycriu: add missing protobuf dependency
pycriu depends on protobuf to function correctly. Currently,
it raises an error if protobuf is not installed. Adding
protobuf to the dependencies ensures it is available after
installing pycriu.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
2025-11-05 15:41:34 -08:00
Andrii Herheliuk
a5ae3c184b pycriu: set licence to LGPLv2.1
We use LGPL-v2.1 license for the libcriu and pycriu as they are
intended to be usable by both proprietary and open-source applications.

Signed-off-by: Andrii Herheliuk <andrii@herheliuk.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Igor Svilenkov Bozic
697c31abe4 zdtm: shstk: add SHSTK_ENABLE test build option
* add SHSTK_ENABLE=1 toggle
* passes -mshstk to compiler and -z shstk to linker

Example:
  $ make -C test/zdtm/static clean
  $ make -C test/zdtm/static V=1 SHSTK_ENABLE=1 env00

  $ readelf --notes test/zdtm/static/env00 | grep SHSTK
      Properties: x86 feature: SHSTK

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-05 15:41:34 -08:00
Igor Svilenkov Bozic
6fd71b9ee9 x86/criu: shstk: restore SHSTK via premap loops
* call shstk_vma_restore() for VMA_AREA_SHSTK in vma_remap()
* delete map/copy/unmap from shstk_restore() and keep token setup + finalize
* before the loop naturally stopped at cet->ssp-8, so a -8 nudge is required here

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
[ alex: small code cleanups ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-05 15:41:34 -08:00
Igor Svilenkov Bozic
abf4a71d99 x86/criu: shstk: add shstk_vma_restore()
1. create shadow stack vma during vma_remap cycle
2. copy contents from a premapped non-shstk VMA into it
3. unmap premapped non-shstk VMA
4. Mark shstk VMA for remap into the final destination

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Co-Authored-By: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
[ alex: debugging, rework together with Andrei and code cleanup ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-05 15:41:34 -08:00
Igor Svilenkov Bozic
02462c19c4 restorer: shstk: allocate restorer shadow stack
* reserve space for restorer shadow stack
* set tmp_shstk at mem, advance mem by PAGE_SIZE
* forget the extra PAGE_SIZE (shstk) for premapped VMAs

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
[ alex: small code cleanups ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-05 15:41:34 -08:00
Alexander Mikhalitsyn
b18c07d8a8 restorer: shstk: add shstk_min_mmap_addr()
* default: return whatever passed in
  eg. to be used as
     shtk_min_mmap_addr(kdat.mmap_min_addr)
* x86: ignore def and return 4G

On x86, CET shadow stack is required to be mapped above 4GiB
On the other hand forcing 4GiB globally would break 32-bit restores.

Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-05 15:41:34 -08:00
Igor Svilenkov Bozic
f29cb750db x86/criu: shstk restorer memory accounting functions
* shstk_restorer_stack_size(): PAGE_SIZE
* shstk_set_restorer_stack(): set restorer temporary shadow stack start

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-05 15:41:34 -08:00
Igor Svilenkov Bozic
3365c7c025 restorer: shstk: add restorer shadow stack stubs
* shstk_restorer_stack_size() – restorer shadow stack size
* shstk_set_restorer_stack() – set restorer shadow stack start

Signed-off-by: Igor Svilenkov Bozic <svilenkov@gmail.com>
Co-Authored-By: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
bb9a7202a7 test/others/rpc: show logs on error
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
9d072222ef test/others/rpc: parse action-script via config
Extend the test for overwriting config options via RPC with
repeatable option (--action-script) and verify that the value
will not be silently duplicated.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
c03c08d1bc cr-service: refactor rpc config parsing
When an additional configuration file is specified via RPC, this file is
parsed twice: first at an early stage to load options such as --log-file,
--work-dir, and --images-dir; and again after all RPC options and
configuration files have been evaluated.

This allows users to overwrite options specified via RPC by the
container runtime (e.g., --tcp-established). However, processing
the RPC config file twice leads to silently duplicating the values
of repeatable options such as `--action-script`.

To address this problem, we adjust the order of options parsing so
that the RPC config file is evaluated only once. This change should
not introduce any functional changes. Note that this change does
not affect the logging functionality, as early log messages are
temporarily buffered and only written to the log file once it has
been initialized (see commit 1ff2333 "Printout early log messages").

Fixes #2727

Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Shashank Balaji
dcce9bd0e2 zdtm: add a test for --allow-uprobes option
Program flow:
- Parse the test's own executable to calculate the file offset of the uprobe
target function symbol
- Enable the uprobe at the target function
- Call the target function to trigger the uprobe, and hence the uprobes vma
creation
- C/R
- Call the target function again to check that no SIGTRAP is sent, since the
uprobe is still active

At least v1.7 of libtracefs is required because that's when
tracefs_instance_reset was introduced. The uprobes API was introduced in v1.4,
and the dynamic events API was introduced in v1.3.

Ubuntu Focal doesn't have libtracefs. Jammy has v1.2.5, and Noble has v1.7.

Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-05 15:41:34 -08:00
Shashank Balaji
f548d3af4a crtools: remove "consult documentation"
Most people know this, don't they? :)

Suggested-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-05 15:41:34 -08:00
Mahadasyam, Shashank (SGC)
aeec40bf02 docs: add documentation for --allow-uprobes
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-05 15:41:34 -08:00
Mahadasyam, Shashank (SGC)
bab72af9a5 vma: introduce --allow-uprobes option
This commit teaches criu to deal with processes which have a "[uprobes]" vma.

This vma is mapped by the kernel when execution hits a uprobe location. This
is done so as to execute the uprobe'd instruciton out-of-line in the special
vma. The uprobe'd location is replaced by a software breakpoint instruction,
which is int3 on x86. When execution reaches that location, control is
transferred over to the kernel, which then executes whatever handler code
it has to, for the uprobe, and then executed the replaced instruction out-of-line
in the special vma. For more details, refer to this commit:
d4b3b6384f

Reason for adding a new option
------------------------------

A new option is added instead of making the uprobes vma handling transparent
to the user, so that when a dump is attempted on a process tree in which a
process has the uprobes vma, criu will error, asking the user to use this option.
This gives the user a chance to check what uprobes are attached to the processes
being dumped, and try to ensure that those uprobes are active on restore as well.

Again, the same reason for requiring this option on restore as well. Because
if a process is dumped with an active uprobe, and on restore if the uprobe
is not active, then if execution reaches the uprobe location, then the process
will be sent a SIGTRAP, whose default behaviour will terminate and core dump
the process. This is because the code pages are dumped with the software
breakpoint instruction replacement at the uprobe'd locations. On restore, if
execution reaches these locations and the kernel sees no associated active
uprobes, then it'll send a SIGTRAP.

So, using this option is on dump and restore is an implicit guarantee on the
user's behalf that they'll take care of the active uprobes and that any future
SIGTRAPs because of this are not on us! :)

Handling uprobes vma on dump
----------------------------

We don't need to store any information about the uprobes vma because it's
completely handled by the kernel, transparent to userspace. So, when a uprobes
vma is detected, we check if the --allow-uprobes option was specified or not.
If so, then the allow_uprobes boolean in the inventory image is set (this is
used on restore). The uprobes vma is skipped from being added to the vma list.

Handling uprobes vma on restore
-------------------------------

If allow_uprobes is set in the inventory image, then check if --allow-uprobes
is specified or not. Restoring the vma is not required.

Fixes: checkpoint-restore#1961
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-05 15:41:34 -08:00
Shashank Balaji
74bf40feeb crit: add VMA_AREA_UPROBES flag
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-05 15:41:34 -08:00
Shashank Balaji
0ff2e0a66e criu-coredump: add VMA_AREA_UPROBES flag
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-05 15:41:34 -08:00
Shashank Balaji
7bf402f6b3 vma: introduce VMA_AREA_UPROBES flag
This flag will be used for a "[uprobes]" vma.

Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-05 15:41:34 -08:00
Radostin Stoyanov
520266d895 zdtm: add sk-unix-restore-fs-share test
Add a ZDTM test case where CRIU uses a helper process to restore
a non-empty process group with a terminated leader and a Unix
domain socket. This reproduces a corner case in which mount
namespace switching can fail during restore:

https://github.com/checkpoint-restore/criu/issues/2687

Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Kir Kolyshkin
790b3cf425 ci: run alpine tests on arm64
These tests reveal the following build error:

In file included from compel/include/uapi/compel/asm/sigframe.h:4,
                 from compel/plugins/std/infect.c:14:
/usr/include/asm/sigcontext.h:28:8: error: redefinition of 'struct sigcontext'
   28 | struct sigcontext {
      |        ^~~~~~~~~~

In file included from criu/arch/aarch64/include/asm/restorer.h:4,
                 from criu/arch/aarch64/crtools.c:11:
/usr/include/asm/sigcontext.h:28:8: error: redefinition of 'struct sigcontext'
   28 | struct sigcontext {
      |        ^~~~~~~~~~

Inspired by #2766 / #2767.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-05 15:41:34 -08:00
Pepper Gray
77553f07d3 make: prevent redefinition of 'struct sigcontext'
Compilation on gentoo/arm64 (llvm+musl) fails with:

In file included from compel/include/uapi/compel/asm/sigframe.h:4,
                 from compel/plugins/std/infect.c:14:
/usr/include/asm/sigcontext.h:28:8: error: redefinition of 'struct sigcontext'
   28 | struct sigcontext {
      |        ^~~~~~~~~~

In file included from criu/arch/aarch64/include/asm/restorer.h:4,
                 from criu/arch/aarch64/crtools.c:11:
/usr/include/asm/sigcontext.h:28:8: error: redefinition of 'struct sigcontext'
   28 | struct sigcontext {
      |        ^~~~~~~~~~

This is happening because <asm/sigcontext.h> and <signal.h> are
mutually incompatible on Linux.

To fix, use  <signal.h> instead of <asm/sigcontext.h> for arm64
(like all others arches do).

Fixes: #2766
Signed-off-by: Pepper Gray <hello@peppergray.xyz>
2025-11-05 15:40:55 -08:00
Radostin Stoyanov
3379c122e5 page-xfer: fix incompatible pointer type on armv7
page_pipe_read() expects an 'unsigned long *', but pi->nr_pages is u64.
On 32-bit platforms (e.g., armv7), passing &pi->nr_pages directly causes
a compiler error. To fix this we introduce a temporary variable and copy
the result back to pi->nr_pages.

Fixes: #2756

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:24 -08:00
Radostin Stoyanov
7a4b35a910 contributing: update links to mailing list
Our previous mailing list had some technical issues and we created
a new one that is hopefully more reliable.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:24 -08:00
Radostin Stoyanov
76394e93a8 ci: consolidate aarch64 tests on GitHub runners
Currently we run aarch64 tests on both Cirrus CI and GitHub runners.
However, Cirrus CI fails with "Monthly compute limit exceeded!". This
change removes the redundant tests to streamline our CI process.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:24 -08:00
Shashank Balaji
0a81dc8bbe ci/java: update base image from focal to jammy
Ubuntu Focal Fossa (20.04) reached its end-of-life on 31 May 2025. So, move
over to using Ubuntu Jammy (22.04) base images.

Also, focal repos do not have libtracefs, which the uprobes zdtm test needs.

Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
2025-11-02 07:48:24 -08:00
Radostin Stoyanov
b25ff1d336 Remove travis-ci leftovers
Travis CI stopped providing CI minutes for open-source projects
some time ago and we have migrated to GitHub actions.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Shashank Balaji
25f8be0f60 ci: use package-manager dependency install scripts
Currently, adding a package which is required either for development or testing
requires it to be added in multiple places due to many duplicated Dockerfiles
and installation scripts. This makes it difficult to ensure that all scripts
are updated appropriately and can lead to some places being missed.

This patch consolidates the list of dependencies and adds installation
scripts for each package-manager used in our CI (apk, apt, dnf, pacman).

This change also replaces the `debian/dev-packages.lst` as this subfolder
conflicts with the Ubuntu/Debian packing scripts used for CRIU:
https://github.com/rst0git/criu-deb-packages

This patch also removes the CentOS 8 build scripts as it is EOL
and the container registry is no longer available.

Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Andrei Vagin
67751bc11b docs: add developer overviews for AI assistants
This commit adds the document to provide high-level overviews of the
CRIU project for AI assistants like Claude and Gemini.

These documents are intended to be used as context for AI-powered
developer assistants to help them understand the project's goals,
architecture, and development process. This will allow them to provide
more accurate and helpful responses to developer questions.

The documents include:
- A brief introduction to CRIU
- A quick start guide for checkpointing and restoring a simple process
- An overview of the dump and restore process
- A description of the Compel subproject
- Information about the project's coding style, code layout, and tests

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
91758a68e9 zdtm: Remove junit_xml leftovers
The previous commit 4cd4a6b1ac ("zdtm: stop importing junit_xml")
removed the junit_xml library, but some variables related to it were
left in the code. This commit removes the unused `tc` variable and a
call to its `add_error_info` method.

Fixes: 4cd4a6b1ac ("zdtm: stop importing junit_xml")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
dong sunchao
2d2168fc9c vdso: relax EI_OSABI check to support linux in ELF header
On some ARM/aarch64 systems, the VDSO ELF header sets EI_OSABI to 3 (Linux),
while CRIU expects 0 (System V). This strict check causes restore to fail
with "ELF header magic mismatch"

This patch relaxes the check to accept both values, improving compatibility
with modern toolchains and kernels (e.g. Linux 6.12+)

Fixes: #2751
Signed-off-by: dong sunchao <dongsunchao@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
2e26b36d44 pagemap: print page regions in the format start - end
During investigations, it’s much easier to read logs when regions are
printed in the start - end format rather than `start/size`.

In addition, all page counters and memory sizes are now printed in
hexadecimal, as they are hard to read in decimal form.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
7e0da4d975 pagemap: use unsigned long for page counts
Variables storing page counts were previously `unsigned int`, limiting
them to a maximum of 2^32 pages. With a 4k page size, this corresponds
to a 16TB memory mapping, which is insufficient for larger mappings.

This commit changes the type for these variables to `unsigned long` to
support larger memory mappings.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
afb2e6c3f9 pagemap: change PagemapEntry.nr_pages to uint64 to support huge mappings
Update the nr_pages field in PagemapEntry to uint64 to prepare for
checkpointing and restoring huge memory mappings.

Backward compatibility with older pagemap images is preserved.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
c7395f4cbe files: fork helpers without CLONE_FILES | CLONE_FS
On restore, CRIU needs to change mount namespaces to properly restore
files and unix sockets. However, the kernel prevents this if a process
is sharing its file system information (fs) with other processes.

Fixes #2687

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:23 -08:00
Filip Hejsek
a8c5e11715 lsm: use attr/apparmor/current to get apparmor label
On some kernels, attr/current can be intercepted by BPF LSM, causing
errors (#2033). Using attr/apparmor/current is preferable, because it
is guaranteed to return the apparmor label. attr/current will still be
used as a fallback for older kernels.

Fixes: #2033

Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
2025-11-02 07:48:23 -08:00
dong sunchao
80c280610e compel/mips: Relax ELF magic check to support MIPS libraries
On MIPS platforms, shared libraries may use EI_ABIVERSION = 5 to indicate
support for .MIPS.xhash sections. The previous ELF header check in
handle_binary() strictly compared e_ident against a hardcoded value,
causing legitimate shared objects to be rejected.

This patch replaces the memcmp-based check with a structured validation
of ELF magic and class, and allows EI_ABIVERSION values beside 0.

fixes: #2745
Signed-off-by: dong sunchao <dongsunchao@gmail.com>
2025-11-02 07:48:23 -08:00
Lorenzo Fontana
053a22a23b pagemap: prevent integer overflow in pagemap_len
Fixes #2738

Original-patch-by: Andrey Vagin <avagin@google.com>
Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
a779417a3f zdtm: stop importing junit_xml
We are dropping support for generating JUnit XML reports in zdtm.py as we've
migrated testing infrastructure entirely to `GitHub Actions` and other
third-party test runners.

This package has been removed from some distribution repositories (e.g.,
Fedora), making it simpler to remove the dependency than to force installation
via pip.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
254ba3e8cc ci: avoid Docker 28 due to regression
This change modifies the CI script to avoid Docker version 28, which has
a known regression that breaks Checkpoint/Restore (C/R) functionality.
The issue is tracked in the moby/moby project as
https://github.com/moby/moby/issues/50750.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:23 -08:00
Dong Sunchao
4b73985955 criu/sockets: Restrict SO_PASSCRED and SO_PASSSEC to supported families
Linux 6.16+ restricts SO_PASSCRED and SO_PASSSEC to AF_UNIX, AF_NETLINK, and AF_BLUETOOTH
This patch updates CRIU to check the socket family before dumping these options

Fixes: #2705
Signed-off-by: Dong Sunchao <dongsunchao@gmail.com>
2025-11-02 07:48:23 -08:00
Dong Sunchao
fa1b399064 zdtm/static/sock_opts00: use unix socket to test SO_PASSCRED and SO_PASSSEC
SO_PASSCRED and SO_PASSSEC are only valid for AF_UNIX and AF_NETLINK
This patch updates the test logic to use a unix socket for these options,
while preserving the original value consistency check

Fixes: #2705
Signed-off-by: Dong Sunchao <dongsunchao@gmail.com>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
2ba3430106 test/zdtm/static/maps12: fix pointer-to-int cast
The `offset` argument to `mmap()` was computed with a direct cast from
pointer to `off_t`:

`(off_t)addr_hint - (off_t)map_base`

This causes a build failure when compiling since pointers and `off_t`
may differ in size on some platforms.

maps12.c: In function 'mmap_pages':
maps12.c:114:50: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
  114 |                    filemap ? fd : -1, filemap ? ((off_t)addr_hint - (off_t)map_base) : 0);
      |                                                  ^
maps12.c:114:69: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
  114 |                    filemap ? fd : -1, filemap ? ((off_t)addr_hint - (off_t)map_base) : 0);

The fix in this patch is to cast both pointers to `intptr_t`,
perform the subtraction in that type, and then cast the result
back to `off_t`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Andrei Vagin
dcee5bd6ff make: Disable branch-protection for PIE code on ARM64
Branch protection uses PAC. It cryptographically "signs" a function's
return address before it is stored on the stack. Upon return, the address
is authenticated using a secret key. If the signature is invalid, the
program will fault.

The PIE code is used for the parasite and the restorer. In both cases, it
runs in a foreign process. The case of the restorer is even trickier
because it needs to restore the original PAC keys, which invalidates
all previously "signed" pointers within the restorer itself.

Fixes #2709

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
98f2bd525a ci/vagrant: install vanilla kernel for Fedora Rawhide test
We need at least 6.16 to test MADV_GUARD_INSTALL support, but
our current Fedora Rawhide test uses only Rawhide's user space,
while using Fedora 42 kernel. Let's start using a vanilla kernel.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
01265cfc69 test/zdtm/static/maps12: add madv guards test
Test for madvise(MADV_GUARD_INSTALL).

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
9c0f725a62 criu/mem: dump: note MADV_GUARD pages as VMA_AREA_GUARD VMAs
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
59b4d662ae criu/pie/restorer: add madvise(MADV_GUARD_INSTALL) restore logic
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
63c7029686 criu/{mem, vdso, cr-restore}: introduce VMA_AREA_GUARD fake VMAs
Introduce a new kind of VMA - VMA_AREA_GUARD. In fact, it is not
a real VMA as it is not represented as struct vm_area_struct in
the kernel.

We want to reuse an existing vma infrastructure in CRIU to dump
an information about MADV_GUARD_INSTALL-covered address space
ranges as VMAs. Then, on restore, we need to carefully skip
those fake VMAs everywhere we expect a normal VMAs to be processed.
And only in restorer we use these VMAs to get an information about
where to call MADV_GUARD_INSTALL.

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
cc047d595f criu/mem: dump: skip MADV_GUARD pages content dump
1. get info about MADV_GUARD_INSTALL-protected pages with
help of pagemap by looking for PME_GUARD_REGION flag if /proc/<pid>/pagemap
is used or by looking for PAGE_IS_GUARD flag if ioctl(PAGEMAP_SCAN) is used
2. skip those pages

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
5843cbf975 criu/mem: refactor should_dump_page helper
Make should_dump_page to return int to indicate failure, also
return useful data back through the struct page_info structure
passed as a pointer.

Also, correspondingly convert all call sites.

No functional changes intended, except fixing a bug in
should_dump_page() as it could return (-1) when pmc_fill()
fails, while caller didn't expect that before.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
42580fcb16 criu/pagemap-cache: pagescan: look for PAGE_IS_GUARD pages
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
1873e8f502 cr-dump: warn if MADV_GUARD is supported but isn't shown in pagemap
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
4fc07a8a41 kerndat: add pagemap_scan_guard_pages feature check logic
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
2bb77daa92 kerndat: add madvise(MADV_GUARD_INSTALL) feature-detection
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Alexander Mikhalitsyn
fce491113b criu/include/mman: define MADV_GUARD_INSTALL
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
5f94dd71e7 CI: Consolidate arm64 tests on GitHub runners
The arm64 tests are currently being executed on both actuated and GitHub
runners. This change removes the actuated runner to avoid redundancy and
streamline our CI process.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Andrei Vagin
c6c6f6f231 zdtm/socket-tcp-closing: fill socket buffers effectivly
Send large chunks to fill socket buffers.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
d586b30c6b vagrant: fix tar including archive in itself
The tar command was failing with the following message:

  $ tar cf criu.tar ../../../criu
  tar: Removing leading `../../../' from member names
  tar: ../../../criu/scripts/ci/criu.tar: archive cannot contain itself; not dumped

In addition, the /vagrant no-longer exist in the new Fedora images.

  bash: line 1: cd: /vagrant: No such file or directory

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
2762b21e4a vagrant: update image to fedora 42
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Radostin Stoyanov
0d1e280d09 vagrant: fix 'qemu' install
Installing this package currently fails with the following message:

  Package qemu is not available, but is referred to by another package.
  This may mean that the package is missing, has been obsoleted, or
  is only available from another source

  E: Package 'qemu' has no installation candidate

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:23 -08:00
Ignacio Moreno Gonzalez
64276874d8 restore: flush caches during restore
See the previous commit for rationale and architecture-specific details.

[ avagin: tweak code comment ]

Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Ignacio Moreno Gonzalez
95d5e2e59b compel: flush caches after parasite injection
After the CRIU process saves the parasite code for the target thread in
the shared mmap, it is necessary to call __clear_cache before the target
thread executes the code.

Without this step, the target thread may not see the correct code to
execute, which can result in a SIGILL signal.

For the specific arm64 case. this is important so that the newly copied
code is flushed from d-cache to RAM, so that the target thread sees the
new code.

The change is based on commit 6be10a2 by @fu.lin and on input received
from @adrianreber.

[ avagin: tweak code comment ]

Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:23 -08:00
Kir Kolyshkin
22c83e3eba images/Makefile: use msg-gen
In general, we use "$(E)" instead of "$(Q) echo", but we also have
a msg-gen macro which can be used here.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-11-02 07:48:22 -08:00
Kir Kolyshkin
066bf7bf3c Keep images/google/protobuf directory
Commit 68f92b551 removed images/google/protobuf directory, so it is
re-created each time during the build process.

This resulted in a weird behavior change. Previously, one could do
something like this:

	git clone $CRURL criu
	(cd criu && sudo make install-criu)
	rm -rf criu

This worked fine, including running rm -rf as a non-root user, since no
new directories were created under criu -- all directories were still
owned by the original user.

Since commit 68f92b551 the same sequence fails:

	rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.c': Permission denied
	rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.d': Permission denied
	rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.h': Permission denied

A workaround is to keep empty images/google/protobuf directory,
which is what this commit does.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-11-02 07:48:22 -08:00
Kir Kolyshkin
21c3b9c005 images/Makefile: fix using $(Q)
Commit 68f92b551 used `$$(Q)` instead of `$(Q)` in the Makefile target,
which resulted in the following error:

$(Q) echo "Generating descriptor.pb-c.c"
/bin/sh: 1: Q: not found
Generating descriptor.pb-c.c
$(Q) protoc --proto_path=/usr/include --proto_path=images/ --c_out=images/ /usr/include/google/protobuf/descriptor.proto
/bin/sh: 1: Q: not found

as well as:

$(Q) rm -rf images/google
/bin/sh: line 1: Q: command not found

Fix it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-11-02 07:48:22 -08:00
Radostin Stoyanov
7fbf7b2be4 images: remove symlink for descriptor.proto
Currently the build scripts create the following symlink:

  criu-4.1/images/google/protobuf/descriptor.proto -> /usr/include/google/protobuf/descriptor.proto

This symlink points to a system-wide absolute-path target. Also,
this symlink ends up in the release tarball. The tarball may later be
downloaded and unpacked by e.g. OS distributions. If unpacking is
done using Python 3.14+, it will fail.

This happens because Python 3.14 will switch the default behavior of
extractall() from "fully trusting the content of archive" to
"disallow common attack vectors while extracting the archive".
With this new behavior, extractall() raises an exception when at
least one file in the archive extracts or points to outside of the
extraction directory (these are called path traversal attacks and
zip slip attacks).

Reported-by: Dmitrii Kuvaiskii <dimakuv@amazon.de>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
455c677399 zdtm: Add ztatic/mnt_ext_file_bind_auto test
The test creates a file bindmount in criu mntns and binds it into test
mntns, this external file bindmount is autodetected and restored via
"--external mnt[]" criu option.

Note: In previous patch we fix the problem on this code path where file
bindmount restore fails as there is excess "/" in source path.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Chuan Qiu
e31828ed8c mount: Fix trailing / when a file is bind-mounted
E.g. I have a /etc/hosts in workspace mounted from the host, and get the following message.

(00.141008)      1: mnt-v2: Create plain mountpoint /tmp/.criu.mntns.K1biY1/mnt-0000000938 for 938
(00.141546)      1: mnt-v2:     Mounting unsupported @938 (0)
(00.141887)      1: mnt-v2:     Bind /tmp/agent/1-d8c746c6fda3a8b2/workspace/etc/hosts/ to /tmp/.criu.mntns.K1biY1/mnt-0000000938
(00.142179)      1: Error (criu/mount-v2.c:319): mnt-v2: Failed to open_tree /tmp/agent/1-d8c746c6fda3a8b2/workspace/etc/hosts/: Not a directory
(00.143774) Error (criu/cr-restore.c:2320): Restoring FAILED.

Signed-off-by: Chuan Qiu <qiuc12@gmail.com>
2025-11-02 07:48:22 -08:00
समीर सिंह Sameer Singh
3dc865bc80 test: add static tests for ICMP socket
Add ZDTM static tests for IP4/ICMP and IP6/ICMP
socket feature.

Signed-off-by: समीर सिंह Sameer Singh <lumarzeli30@gmail.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:22 -08:00
समीर सिंह Sameer Singh
a80c544845 sk-inet: Add support for checkpoint/restore of ICMP sockets
Currently there is no option to checkpoint/restore programs that use
ICMP sockets, such as `ping`. This patch adds support for the same.

Fixes #2557

Signed-off-by: समीर सिंह Sameer Singh <lumarzeli30@gmail.com>
2025-11-02 07:48:22 -08:00
Andrei Vagin
677a568919 zdtm/netns_sub_sysctl: skip unsupported sysctls
net/unix/max_dgram_qlen can't be tuned from non-root userns before:
v5.17-rc1~170^2~215 ("net: Enable max_dgram_qlen unix sysctl to be
configurable by non-init user namespaces")

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
87bd09a0d1 net/sysctl: make ipv4/ping_group_range work in user namespaces
We dump sysctls from criu user namespace, but restore from restored user
namespace. So group id values should be mapped to the restored user
namespace gid space to restore correctly.

Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
45d09ae17e net/sysctl: fix broken ipv4_sysctls_op
We have ability to skip sysctl if there is no value, but we still give
n requests to sysctl_op, that is not correct and probably can segfault
on nullptr access. Fix it by adding ri to count non skipped requests.

To be on the safe side, let's add a check that ri == n on read, as we
should not do any skips there.

While on it lets fix bad error message prefix: s/unix/ipv4/.

Remove excess has_iarg set, and add sarg reset to NULL for the case
sysctl_op skipped it.

Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
4f057a6aeb net/sysctl: fix missprint in an error message
Fixes: f38e58836 ("net/sysctl: c/r ipv4/ping_group_range value")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
4c7d42f67a ipc/sysctl: fix CTL_FLAGS_IPC_EACCES_SKIP by making it a flag
Having CTL_FLAGS_IPC_EACCES_SKIP == (CTL_FLAGS_OPTIONAL |
CTL_FLAGS_READ_EIO_SKIP) is probably not what we want. So let's make it
a real distinct flag.

Fixes: 840735aa0 ("ipc_sysctl: Prioritize restoring IPC variables using non usernsd approach")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Ivan Pravdin
922754dffd rpc/log: return first error always
Use shared first error buffer to return correct
first error in rpc.

Fixes: #338

Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
2025-11-02 07:48:22 -08:00
Radostin Stoyanov
a79b33d0c5 cpuinfo: show error when image is missing
The `criu cpuinfo check` command calls cpu_validate_cpuinfo(), which
attempts to open the cpuinfo.img file using `open_image()`. If the
image file is not found, `open_image()` returns an "empty image"
object. As a result, `cpu_validate_cpuinfo()` tries to read from it
and fails with the following error:

(00.002473) Error (criu/protobuf.c:72): Unexpected EOF on (empty-image)

This patch adds a check for an empty image and appropriate error message.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:22 -08:00
Andrei Vagin
99ba6db89b crtools: do a few minor cleanups
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:22 -08:00
Liana Koleva
fcbaac0598 crtools: simplify check for cpuinfo subcommands
The cpuinfo command requires a "dump" or "check" subcommand. Thus, we
replace `CR_CPUINFO` with `CR_CPUINFO_DUMP` and `CR_CPUINFO_CHECK`.
This allows us to remove unnecessary subcommand check in
`image_dir_mode()` and perform all parsing in `parse_criu_mode()`.

With this change the check for validating the cpuinfo subcommand is
now done only once with `CR_CPUINFO_DUMP` or `CR_CPUINFO_CHECK` enum.

Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:22 -08:00
Prajwal S N
fbfed312e0 feat: introduce Nix flake
CRIU currently requires a number of dependencies in order to build from
source. The package names vary across distributions and package
managers. A Nix flake allows developers to spin up a dev environment
with `nix develop`, eliminating the hassle of manual dependency
management. It also prevents polluting the global package set on the
machine.

Signed-off-by: Prajwal S N <prajwalnadig21@gmail.com>
2025-11-02 07:48:22 -08:00
Alexander Mikhalitsyn
5f18ca1bbe test/zdtm/static: add maps11 test for MAP_DROPPABLE/MADV_WIPEONFORK
In this test we want to ensure that contents of droppable mappings
and mappings with MADV_WIPEONFORK is properly restored in
parent/child processes.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:22 -08:00
Alexander Mikhalitsyn
dfa0ce1808 test/zdtm/static/maps02: add MAP_DROPPABLE testcase
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:22 -08:00
Alexander Mikhalitsyn
4f9dcfb9c8 pycriu/images/pb2dict: add MAP_DROPPABLE flag
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:22 -08:00
Alexander Mikhalitsyn
b90cfc1a80 criu/proc_parse: support MAP_DROPPABLE mappings
Support MAP_DROPPABLE [1] by detecting it from /proc/<pid>/smaps
and restoring it as a normal private mapping flag on vma with only
difference that instead of MAP_PRIVATE we should use MAP_DROPPABLE.

[1] 9651fcedf7

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:22 -08:00
Alexander Mikhalitsyn
6476488a51 test/zdtm/static/maps02: add MADV_WIPEONFORK testcase
In addition to that I did small non-functional corrections.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:22 -08:00
Alexander Mikhalitsyn
af5412a433 criu/proc_parse: support MADV_WIPEONFORK/VM_WIPEONFORK
Support VM_WIPEONFORK [1] by detecting it from /proc/<pid>/smaps
and setting a corresponding MADV_WIPEONFORK flag on vma.

[1] d2cd9ede6e

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-11-02 07:48:22 -08:00
Andrei Vagin
2b8951a9cf image: use protoc instead of protoc-c
The new protoc 1.5.2 reports warnings:
`protoc-c` is deprecated. Please use `protoc` instead!

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
1fdff7c7a6 zdtm: fix check for criu binary
The opts['action'] contains actor function and not the action name, so
we should compare it with a function.

While on it let's also add a comment about --criu-bin option if CRIU
binary is missing.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Pavel Tikhomirov
ae1395de18 zdtm.py: add an option to change pycriu import path
By default zdtm expects that criu is built from source first and only
then you can run zdtm tests against it. But what if you really want to
run tests against a criu version installed on the system? Yes there is
already a nice option for zdtm to change the criu binary it uses
"--criu-bin", but it would still end up using the pycriu module from
source and you would still have to build everything beforehand.

Let's add an option to change the path where zdtm searches for pycriu
module "--pycriu-search-path". This way we can run zdtm tests on the
criu installed on the system directly without building criu from source,
e.g. on Fedora it works like:

test/zdtm.py run --criu-bin /usr/sbin/criu \
  --pycriu-search-path /usr/lib/python3.13/site-packages \
  -t zdtm/static/env00

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:22 -08:00
Yanning Yang
7a5b3d1f41 plugins/amdgpu: Update README.md and criu-amdgpu-plugin.txt
Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-02 07:48:22 -08:00
Yanning Yang
a61116fd93 plugins/amdgpu: Implement parallel restore
This patch implements the entire logic to enable the offloading of
buffer object content restoration.

The goal of this patch is to offload the buffer object content
restoration to the main CRIU process so that this restoration can occur
in parallel with other restoration logic (mainly the restoration of
memory state in the restore blob, which is time-consuming) to speed up
the restore phase. The restoration of buffer object content usually
takes a significant amount of time for GPU applications, so
parallelizing it with other operations can reduce the overall restore
time.

It has three parts: the first replaces the restoration of buffer objects
in the target process by sending a parallel restore command to the main
CRIU process; the second implements the POST_FORKING hook in the amdgpu
plugin to enable buffer object content restoration in the main CRIU
process; the third stops the parallel thread in the RESUME_DEVICES_LATE
hook.

This optimization only focuses on the single-process situation (common
case). In other scenarios, it will turn to the original method. This is
achieved with the new `parallel_disabled` flag.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-02 07:48:22 -08:00
Yanning Yang
e8ba7c103a plugins/amdgpu: Add parallel restore command
Currently the restore of buffer object comsumes a significant amount of
time. However, this part has no logical dependencies with other restore
operations. This patch introduce some structures and some helper
functions for the target process to offload this task to the main CRIU
process.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-02 07:48:22 -08:00
Yanning Yang
1fd1b670c4 plugins/amdgpu: Add socket operations
When enabling parallel restore, the target process and the main CRIU
process need an IPC interface to communicate and transfer restore
commands. This patch adds a Unix domain TCP socket and stores this
socket in `fdstore`.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-02 07:48:22 -08:00
Yanning Yang
e257d04974 pstree: Add has_children function
Currently, parallel restore only focuses on the single-process
situation. Therefore, it needs an interface to know if there is only one
process to restore. This patch adds a `has_children` function in
`pstree.h` and replaces some existing implementations with this
function.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-02 07:48:22 -08:00
Yanning Yang
497109eb4e cr-restore: Move cr_plugin_init after fdstore_init
Currently, when CRIU calls `cr_plugin_init`, `fdstore` is not
initialized. However, during the plugin restore procedure, there may be
some common file operations used in multiple hooks. This patch moves
`cr_plugin_init` after `fdstore_init`, allowing `cr_plugin_init` to use
`fdstore` to place these file operations.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-02 07:48:22 -08:00
Yanning Yang
427c0dc27b criu: Introduce a new device plugin hook for restore
Currently, in the target process, device-related restore operations and
other restore operations almost run sequentially. When the target
process executes the corresponding CRIU hook functions, it can't perform
other restore operations.  However, for GPU applications, some device
restore operations have no logical dependencies on other common restore
operations and can be parallelized with other operations to speed up the
process.

Instead of launching a thread in child processes for parallelization,
this patch chooses to add a new hook, `POST_FORKING`, in the main CRIU
process to handle these restore operations. This is because the
restoration of memory state in the restore blob is one of the most
time-consuming parts of all restore logic. The main CRIU process can
easily parallelize these operations, whereas parallelizing in threads
within child processes is challenging.

- POST_FORKING

*POST_FORKING: Hook to enable the main CRIU process to perform some
restore operations of plugins.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-02 07:48:22 -08:00
Radostin Stoyanov
d57d40a5ad sk-inet: add MPTCP definition
Building CRIU on Ubuntu 20.04 fails with the following error:

criu/sk-inet.c: In function 'can_dump_ipproto':
criu/sk-inet.c:131:16: error: 'IPPROTO_MPTCP' undeclared (first use in this function); did you mean 'IPPROTO_MTP'?
  131 |   if (proto == IPPROTO_MPTCP)
      |                ^~~~~~~~~~~~~
      |                IPPROTO_MTP

Add definition for MPTCP to fix this error.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:22 -08:00
Radostin Stoyanov
fddca67cc6 seize: fix pause devices for frozen containers
The container checkpointing procedure in Kubernetes freezes running
containers to create a consistent snapshot of both the runtime state
and the rootfs of the container. However, when checkpointing a GPU
container, the container must be unfrozen before invoking the
cuda-checkpoint tool.

This is achieved in prepare_freezer_for_interrupt_only_mode(), which
needs to be called before the PAUSE_DEVICES hook. The patch introducing
this functionality fixes this problem for containers with multiple
processes. However, if the container has a single process,
prepare_freezer_for_interrupt_only_mode() must be invoked immediately
before the PAUSE_DEVICES hook.

Fixes: #2514

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:21 -08:00
Lorenzo Fontana
366d73a4c2 make: remove checks and warnings for bsd strlcat and strlcpy
In 0a7c5fd1bd we swapped the BSD
implementation of strlcat and strlcpy in favor of our own replacement.

The checks and the predefined macros are not needed anymore.

Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
2025-11-02 07:48:21 -08:00
Andrei Vagin
1eaa870cce kerndat: check that hardware breakpoints work
In some cases, they might not work in virtual machines if the hypervisor
doesn't virtualize them. For example, they don't work in AMD SEV virtual
machines if the Debug Virtualization extension isn't supported or isn't
enabled in SEV_FEATURES.

Fixes #2658

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:48:21 -08:00
Radostin Stoyanov
b458a5c1ad sk-inet: add message how to disable MPTCP in Go
With Go version 1.24, ListenConfig now uses MPTCP by default [1].
Checkpoint/restore for this protocol is not currently supported
and adding support requires kernel changes that are not trivial
to implement. As a result, checkpointing of many containers that
run Go programs is likely to fail with the following error [2]:

(00.026522) Error (criu/sk-inet.c:130): inet: Unsupported proto 262 for socket 2f9bc5

This patch adds a message with suggested workaround for this problem.

[1] https://go.dev/doc/go1.24#netpkgnet
[2] https://github.com/checkpoint-restore/criu/issues/2655

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:48:21 -08:00
Pavel Tikhomirov
5a725266ac zdtm: add mnt_ro_root test
It makes root mount readonly and checks that it is still readonly after
migration.

Make zdtm/static writable for logs via "bind" desc option.

v2: explain why we don't have explicit rw/ro flag check
v3: use new zdtm "bind" desc option

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:21 -08:00
Pavel Tikhomirov
6b3826a6fb zdtm/lib: add "bind" desc option
Add {'bind': 'path/to/bindmount'} zdtm descriptor option, so that in
test mount namespace a directory bindmount can be created before running
the test.

This is useful to leave test directory writable (e.g. for logs) while
the test makes root mount readonly. note: We create this bindmount early
so that all test files are opened on it initially and not on the below
mount. Will be used in mnt_ro_root test.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:21 -08:00
Pavel Tikhomirov
88cb552f69 mount: restore root mount flags
Mount flags belong to mount and mount namespace of the Container, so we
should preserve them, as Container user will not expect mounts switching
between ro and rw over c/r.

Fixes: #2632

v5: fix both mount-v1 and mount-v2

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-11-02 07:48:17 -08:00
Radostin Stoyanov
b6dca31162 aarch64/crtools: fix define for missing constants
Building CRIU package on Debian 11 aarch64 fails with

criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:32:31: error: storage size of 'paca' isn't known
  struct user_pac_address_keys paca;
                               ^~~~
criu/arch/aarch64/crtools.c:33:31: error: storage size of 'pacg' isn't known
  struct user_pac_generic_keys pacg;
                               ^~~~
criu/arch/aarch64/crtools.c:47:15: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
  if (hwcaps & HWCAP_PACA) {
               ^~~~~~~~~~
               HWCAP_FCMA
criu/arch/aarch64/crtools.c:47:15: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c:53:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
   ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
                                       ^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:82:15: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
  if (hwcaps & HWCAP_PACG) {
               ^~~~~~~~~~
               HWCAP_AES
criu/arch/aarch64/crtools.c:88:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:33:31: error: unused variable 'pacg' [-Werror=unused-variable]
  struct user_pac_generic_keys pacg;
                               ^~~~
criu/arch/aarch64/crtools.c:32:31: error: unused variable 'paca' [-Werror=unused-variable]
  struct user_pac_address_keys paca;
                               ^~~~
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:227:31: error: storage size of 'upaca' isn't known
  struct user_pac_address_keys upaca;
                               ^~~~~
criu/arch/aarch64/crtools.c:228:31: error: storage size of 'upacg' isn't known
  struct user_pac_generic_keys upacg;
                               ^~~~~
criu/arch/aarch64/crtools.c:241:18: error: 'HWCAP_PACA' undeclared (first use in this function); did you mean 'HWCAP_FCMA'?
   if (!(hwcaps & HWCAP_PACA)) {
                  ^~~~~~~~~~
                  HWCAP_FCMA
criu/arch/aarch64/crtools.c:255:44: error: 'NT_ARM_PACA_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACA_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function)
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~~~~~~~~
criu/arch/aarch64/crtools.c:268:18: error: 'HWCAP_PACG' undeclared (first use in this function); did you mean 'HWCAP_AES'?
   if (!(hwcaps & HWCAP_PACG)) {
                  ^~~~~~~~~~
                  HWCAP_AES
criu/arch/aarch64/crtools.c:275:44: error: 'NT_ARM_PACG_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_SVE'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PACG_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~
                                            NT_ARM_SVE
criu/arch/aarch64/crtools.c:233:6: error: variable 'ret' set but not used [-Werror=unused-but-set-variable]
  int ret;
      ^~~
criu/arch/aarch64/crtools.c:228:31: error: unused variable 'upacg' [-Werror=unused-variable]
  struct user_pac_generic_keys upacg;
                               ^~~~~
criu/arch/aarch64/crtools.c:227:31: error: unused variable 'upaca' [-Werror=unused-variable]
  struct user_pac_address_keys upaca;
                               ^~~~~
This patch adds the missing constants and structs if undefined.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:42:55 -08:00
Andrei Vagin
5de61a721f net: nftables: avoid restore failure if the CRIU nft table already exist
CRIU locks the network during restore in an "empty" network namespace.
However, "empty" in this context means CRIU isn't restoring the
namespace. This network namespace can be the same namespace where
processes have been dumped and so the network is already locked in it.

Fixes #2650

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:42:55 -08:00
Younes Manton
b9da95b0b2 s390: Fix FP reg restore after parasite code runs
Currently we save FP regs before parasite code runs, and restore after
for --leave-running, --check-only, and in case of errors. In case of
errors the error may have happened before FP regs were saved, so we
should only restore them if they were actually saved.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2025-11-02 07:42:55 -08:00
Adrian Reber
74799ae023 aarch64: fix build with missing NT_ARM_PAC_ENABLED_KEYS
On a RHEL 8 based system building CRIU fails with:

criu/arch/aarch64/crtools.c: In function 'save_pac_keys':
criu/arch/aarch64/crtools.c:73:39: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
   ret = ptrace(PTRACE_GETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov);
                                       ^~~~~~~~~~~~~~~~~~~~~~~
                                       NT_ARM_PACA_KEYS
criu/arch/aarch64/crtools.c:73:39: note: each undeclared identifier is reported only once for each function it appears in
criu/arch/aarch64/crtools.c: In function 'arch_ptrace_restore':
criu/arch/aarch64/crtools.c:261:44: error: 'NT_ARM_PAC_ENABLED_KEYS' undeclared (first use in this function); did you mean 'NT_ARM_PACA_KEYS'?
   if ((ret = ptrace(PTRACE_SETREGSET, pid, NT_ARM_PAC_ENABLED_KEYS, &iov))) {
                                            ^~~~~~~~~~~~~~~~~~~~~~~
                                            NT_ARM_PACA_KEYS

This adds the missing define if it is undefined.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-11-02 07:42:55 -08:00
Radostin Stoyanov
6805841660 cuda: remove redundant goto label
The `goto interrupt` label is unnecessary as the code directly
returns after `cuda_process_checkpoint_action()`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:42:55 -08:00
Radostin Stoyanov
e7aee3c5c7 cuda: use pr_perror for libc function errors
When handing errors for functions such as `ptrace()`, `pipe()`, and
`fork()` it would be better to use `pr_perror` instead of `pr_err`
as it would include a message describing the encountered error.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-11-02 07:42:55 -08:00
Andrei Vagin
5ff52326e1 restore: use the new kernel interface to restore timers
Thomas Gleixner introduced the new interface to create posix timers
with specifed timer IDs:
ec2d0c0462

Previously, CRIU recreated timers by repeatedly creating and deleting
them until the desired ID was reached. This approach isn't fast,
especially for timers with large IDs. For example, restoring two timers
with IDs 1000000 and 2000000 took approximately 1.5 seconds.

The new `prctl()` based interface allows direct creation of timers with
specified IDs, reducing the restoration time to around 3 microseconds
for the same example.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:42:55 -08:00
Andrei Vagin
9a1e979666 compel: fix the stack test
The stack test incorrectly assumed the page immediately
following the stack pointer could never be changed. This doesn't work,
because this page can be a part of another mapping.

This commit introduces a dedicated "stack redzone," a small guard region
directly after the stack. The stack test is modified to specifically
check for corruption within this redzone.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-11-02 07:42:55 -08:00
Yuanhong Peng
daa548bbfb criu: Do not print failed message when there is no late stage hook
This is highly confusing, and it seems that the ret variable
is not handled in the subsequent process.

Signed-off-by: Yuanhong Peng <yummypeng@linux.alibaba.com>
2025-11-02 07:42:55 -08:00
Adrian Reber
34226fd243 ci: try GitHub arm runners
Signed-off-by: Adrian Reber <areber@redhat.com>
2025-11-02 07:42:55 -08:00
Andrei Vagin
a44aa6d985 criu: Version 4.1.1
This release of CRIU (4.1.1) addresses a critical compatibility issue
introduced in the Linux kernel and back-ported to all stable releases.

The kernel commit (12f147ddd6de "do_change_type(): refuse to operate on
unmounted/not ours mounts") addressed the security issue introduced
almost 20 years ago. Unfortunately, this change inadvertently broke the
restore functionality of mount namespaces within CRIU. Users attempting
to restore a container on updated kernels would encounter the error:
"mnt-v2: Failed to make mount 476 slave: Invalid argument."

This release contains the necessary adjustments to CRIU, allowing it to
work seamlessly with kernels incorporating this security change.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-07-29 09:10:08 -07:00
Andrei Vagin
ced15c302b test/zdtm: remove unused compiler argument
Fixes a clang compile-time error:
"argument unused during compilation: '-c'".

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-07-29 09:10:08 -07:00
Andrei Vagin
570621a48a mount-v2: enter the mount namesapce to propagation properties
A kernel change (commit 12f147ddd6de, "do_change_type(): refuse to
operate on unmounted/not ours mounts") modified how mount propagation
properties can be changed. Previously, these properties could be changed
from any mount namespace. Now, they can only be modified from the
specific mount namespace where the target mount is actually mounted

This commit addresses this new restriction by ensuring that CRIU enters the
correct mount namespace before attempting to restore mount propagation
properties (MS_SLAVE or MS_SHARED) for a mount.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2025-07-29 09:10:08 -07:00
Andrei Vagin
b6059ff193 criu: Version 4.1 (CRISC-V)
Major changes:
* RISC-V Support
* PIDFD Support
* CUDA Enhancements
* Fixes here and there

The full changelog can be found here: https://criu.org/Download/criu/4.1.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-25 14:31:33 -07:00
Ivan Pravdin
bc14153173 criu: fix log_keep_err signal deadlock
When using pr_err in signal handler, locking is used
in an unsafe manner. If another signal happens while holding the
lock, deadlock can happen.

To fix this, we can introduce mutex_trylock similar to
pthread_mutex_trylock that returns immediately. Due to the fact
that lock is used only for writing first_err, this change garantees
that deadlock cannot happen.

Fixes: #358

Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
2025-03-25 14:31:33 -07:00
Bui Quang Minh
0f64709442 namespace: skip cleaning up the uid/gid map in error cases
free_userns_maps is called to clean up uid/gid map when the dump
finishes. If we try to clean up these maps in error cases, it can lead
to double free panic. So just skip cleaning up these maps and let
free_userns_maps do its job.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2025-03-25 14:31:33 -07:00
Adrian Reber
6826ac58ce ci: run tests on a nftables only system
Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
700a8c4b5e ci: do not run tests requiring iptables if it is missing
There are a couple of tests that require the iptables binary.

Instead of adding a checkskip script, which could also handle this,
this change now uses CRIU's feature detection to see if the CRIU
feature 'has_ipt_legacy' exists.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
f22330ff07 test: print out logs if tests fail
If the tests in others/rpc are failing no information about that error
can be seen in a CI run. This change displays the log files if the test
fails.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
29ccb5b625 test: others/rpc do not use nftables locking backend
The tests in others/rpc are running as non-root and
fail silently if the nftables network locking backend is used.

This switches those tests to skip the network locking.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
95729ec328 docs: mark make commands with same format as elsewhere
This uses the same formatting for the make command examples as seen in
README.md.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
2cd9d5ded8 docs: update INSTALL.md with a section about building CRIU
The building section also contains the information how to change the
network locking backend without source code changes.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
867c773031 make: allow setting the default network locking backend
As different Linux distributions are switching away from iptables
to nftables, this makes it easier to compile CRIU with a different
default network locking backend. Instead of changing the source
code it is now possible to select the nft backend like this:

    make NETWORK_LOCK_DEFAULT=NETWORK_LOCK_NFTABLES

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Andrei Vagin
720bf67e06 zdtm/vdso02: unmap vvar_vclock mappings
It is a part of vvar and this test intends to unmap vdso and all vvar
mappings.

Fixes #2622

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Andrei Vagin
62a4a5874b vdso: correct data types for ELF hash table sizes
Let's change the data types of `nbucket` and `nchain` to uint32.

This should fix the following compile-time error on arm32:
/criu/criu/pie/util-vdso.c:336: undefined reference to `__aeabi_uldivmod'

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
AV
b8553d19ed test/zdtm: check that PAC keys are C/R-ed
Add another variation of ptrhead00 compiled with enabled branch-protection.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
AV
8ae5db37bb arm64: C/R PAC keys
PAC stands for Pointer Authentication Code. Each process has 5 PAC keys
and a mask of enabled keys. All this properties have to be C/R-ed.

As they are per-process protperties, we can save/restore them just for
one thread.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Han-Wen Nienhuys
c5d46d86a8 restorer: Add a lock around cgroupd communication.
Threads are put into cgroups through the cgroupd thread, which
communicates with other threads using a socketpair.

Previously, each thread received a dup'd copy of the socket, and did
the following

    sendmsg(socket_dup_fd, my_cgroup_set);

    // wait for ack.
    while (1) {
        recvmsg(socket_dup_fd, &h, MSG_PEEK);
        if (h.pid != my_pid) continue;
        recvmsg(socket_dup_fd, &h, 0);
    }
    close(socket_dup_fd);

When restoring many threads, many threads would be spinning in the
above loop waiting for their PID to appear.

In my test-case, restoring a process with a 11.5G heap and 491 threads
could take anywhere between 10 seconds and 60 seconds to complete.

To avoid the spinning, we drop the loop and MSG_PEEK, and add a lock
around the above code. This does not decrease parallelism, as the
cgroupd daemon uses a single thread anyway.

With the lock in place, the same restore consistently takes around 10
seconds on my machine (Thinkpad P14s, AMD Ryzen 8840HS).

There is a similar "daemon" thread for user namespaces. That already
is protected with a similar userns_sync_lock in __userns_call().

Fixes #2614

Signed-off-by: Han-Wen Nienhuys <hanwen@engflow.com>
2025-03-21 12:40:31 -07:00
Han-Wen Nienhuys
7748b3fe73 pstree: print clone flags in error message
Signed-off-by: Han-Wen Nienhuys <hanwen@engflow.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
d855501575 vdso: Fixes in DT_GNU_HASH handling
* Hash buckets is an array of 32-bit words. While DT_HASH is 32-bit on
  most platforms except s390 (where it's 64-bit).
* The bloom filter word size differs between 32-bit and 64-bit ELF
  files. This commit adjusts the code to handle both cases.

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
ed6374b48c lsm: use the user provided lsm label
Currently CRIU has the possibility to specify a LSM label during
restore. Unfortunately the information is completely ignored in the case
of SELinux.

This change selects the lsm label from the user if it is provided and
else the label from the checkpoint image is used.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
d35808f5ee ci: update to latest actions for codeql CI job
Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
c298b51a69 scripts/uninstall_module: import signal module
With Python 3.13, the `subprocess` module now uses the
`posix_spawn()` function [1], which requires the `signal`
module to be imported.

Fixes: #2607

[1] https://docs.python.org/3/whatsnew/3.13.html#subprocess

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
समीर सिंह Sameer Singh
38b9807cd5 coredump: enable coredump generation on arm
Add relevant elf header constants and notes for the arm platform
to enable coredump generation.

Signed-off-by: समीर सिंह Sameer Singh <lumarzeli30@gmail.com>
2025-03-21 12:40:31 -07:00
समीर सिंह Sameer Singh
da90b33a42 coredump: enable coredump generation on aarch64
Add relevant elf header constants and notes for the aarch64 platform
to enable coredump generation.

Signed-off-by: समीर सिंह Sameer Singh <lumarzeli30@gmail.com>
2025-03-21 12:40:31 -07:00
dschervov
030fa4affd criu: fix internal representation of cgroups hierarchical structure
strstartswith() function is incorrect choice for finding parent
directory so i change it to issubpath() function

Signed-off-by: Dmitrii Chervov <dschervov1@yandex.ru>
2025-03-21 12:40:31 -07:00
Andrei Vagin
b7fa7d304c kerndat: run iptables with -n to not resolve service names
Resolving service names can be slow and it isn't needed here.

Fixes #2032

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
528c94c48b ci: install gawk for Fedora based tests
Currently Fedora rawhide based CI runs fail with:

 /bin/sh: line 1: awk: command not found

Let's install it.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Kir Kolyshkin
d66bc34995 Makefile: move codespell options to .codespellrc
This way,
 - Makefile is less cluttered;
 - one can run codespell from the command line.

Fixes: fd7e97fcf ("lint: exclude tags file from codespell")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
8a06ca27cc vdso: switch from DT_HASH to DT_GNU_HASH (aarch64)
Trying to run latest CRIU on CentOS Stream 10 or Ubuntu 24.04 (aarch64)
fails like this:

    # criu/criu check -v4
    [...]
    (00.096460) vdso: Parsing at ffffb2e2a000 ffffb2e2c000
    (00.096539) vdso: PT_LOAD p_vaddr: 0
    (00.096567) vdso: DT_STRTAB: 1d0
    (00.096592) vdso: DT_SYMTAB: 128
    (00.096616) vdso: DT_STRSZ: 8a
    (00.096640) vdso: DT_SYMENT: 18
    (00.096663) Error (criu/pie-util-vdso.c:193): vdso: Not all dynamic entries are present
    (00.096688) Error (criu/vdso.c:627): vdso: Failed to fill self vdso symtable
    (00.096713) Error (criu/kerndat.c:1906): kerndat_vdso_fill_symtable failed when initializing kerndat.
    (00.096812) Found mmap_min_addr 0x10000
    (00.096881) files stat: fs/nr_open 1073741816
    (00.096908) Error (criu/crtools.c:267): Could not initialize kernel features detection.

This seems to be related to the kernel (6.12.0-41.el10.aarch64). The
Ubuntu user-space is running in a container on the same kernel.

Looking at the kernel this seems to be related to:

    commit 48f6430505c0b0498ee9020ce3cf9558b1caaaeb
    Author: Fangrui Song <i@maskray.me>
    Date:   Thu Jul 18 10:34:23 2024 -0700

        arm64/vdso: Remove --hash-style=sysv

        glibc added support for .gnu.hash in 2006 and .hash has been obsoleted
        for more than one decade in many Linux distributions.  Using
        --hash-style=sysv might imply unaddressed issues and confuse readers.

        Just drop the option and rely on the linker default, which is likely
        "both", or "gnu" when the distribution really wants to eliminate sysv
        hash overhead.

        Similar to commit 6b7e26547fad ("x86/vdso: Emit a GNU hash").

The commit basically does:

    -ldflags-y := -shared -soname=linux-vdso.so.1 --hash-style=sysv \
    +ldflags-y := -shared -soname=linux-vdso.so.1 \

Which results in only a GNU hash being added to the ELF header. This
change has been merged with 6.11.

Looking at the referenced x86 commit:

    commit 6b7e26547fad7ace3dcb27a5babd2317fb9d1e12
    Author: Andy Lutomirski <luto@amacapital.net>
    Date:   Thu Aug 6 14:45:45 2015 -0700

        x86/vdso: Emit a GNU hash

        Some dynamic loaders may be slightly faster if a GNU hash is
        available.  Strangely, this seems to have no effect at all on
        the vdso size.

        This is unlikely to have any measurable effect on the time it
        takes to resolve vdso symbols (since there are so few of them).
        In some contexts, it can be a win for a different reason: if
        every DSO has a GNU hash section, then libc can avoid
        calculating SysV hashes at all.  Both musl and glibc appear to
        have this optimization.

        It's plausible that this breaks some ancient glibc version.  If
        so, then, depending on what glibc versions break, we could
        either require COMPAT_VDSO for them or consider reverting.

Which is also a really simple change:

    -VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) \
    +VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) \

The big difference here is that for x86 both hash sections are
generated. For aarch64 only the newer GNU hash is generated. That is why
we only see this error on kernel >= 6.11 and aarch64.

Changing from DT_HASH to DT_GNU_HASH seems to work on aarch64.  The test
suite runs without any errors.

Unfortunately I am not aware of all implication of this change and if a
successful test suite run means that it still works.

Looking at the kernel I see following hash styles for the VDSO:

aarch64: not specified (only GNU hash style)
arm: --hash-style=sysv
loongarch: --hash-style=sysv
mips: --hash-style=sysv
powerpc: --hash-style=both
riscv: --hash-style=both
s390: --hash-style=both
x86: --hash-style=both

Only aarch64 on kernels >= 6.11 is a problem right now, because all
other platforms provide the old style hashing.

Signed-off-by: Adrian Reber <areber@redhat.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Co-authored-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2025-03-21 12:40:31 -07:00
Pavel Tikhomirov
6710cfce10 zdtm/netns_sub_sysctl: add ipv4/ping_group_range sysctl check
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-03-21 12:40:31 -07:00
Pavel Tikhomirov
4ca74b9aff net/sysctl: c/r ipv4/ping_group_range value
It is per net namespace, we need it to allow creation of unprivileged
ICMP sockets.

Note: in case this sysctl was disabled after unprivileged ICMP
socket was created we still need to somehow handle it on restore.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-03-21 12:40:31 -07:00
Pavel Tikhomirov
9c40781c26 net/sysctl: put common multiplier outside the brackets
Also add an explanation of the logic behind this calculation.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
d226bd4f67 ci: handle results from latest codespell
CI pulls in a newer version of codespell. This fixes complaints from
that codespell version.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
e2dffcbc8e lib: do not set protobuf has_* field too early
For two cases libcriu was setting the RPC protobuf field `has_*` before
checking if the given parameter is valid. This can lead to situations,
if the caller doesn't check the return value, that we pass as RPC struct
to CRIU which has the `has_*` protobuf field set to true, but does not
have a verified value (or non at all) set for the actual RPC entry.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
82b03429b7 cuda: disable CUDA plugin for pre-dump
Temporarily disable CUDA plugin for `criu pre-dump`.

pre-dump currently fails with the following error:

Handling VMA with the following smaps entry: 1822c000-18da5000 rw-p 00000000 00:00 0                                  [heap]
Handling VMA with the following smaps entry: 200000000-200200000 ---p 00000000 00:00 0
Handling VMA with the following smaps entry: 200200000-200400000 rw-s 00000000 00:06 895                              /dev/nvidia0
Error (criu/proc_parse.c:116): handle_device_vma plugin failed: No such file or directory
Error (criu/proc_parse.c:632): Can't handle non-regular mapping on 705693's map 200200000
Error (criu/cr-dump.c:1486): Collect mappings (pid: 705693) failed with -1

We plan to enable support for pre-dump by skipping nvidia mappings
in a separate patch.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
7f0d107fe5 seize: use separate checkpoint_devices function
Move `run_plugins(CHECKPOINT_DEVICES)` out of `collect_pstree()` to
ensure that the function's sole responsibility is to use the cgroup
freezer for the process tree. This allows us to avoid a time-out
error when checkpointing applications with large GPU state.

v2: This patch calls `checkpoint_devices()` only for `criu dump`.
Support for GPU checkpointing with `pre-dump` will be introduced in
a separate patch.

Suggested-by: Andrei Vagin <avagin@google.com>
Suggested-by: Jesus Ramos <jeramos@nvidia.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
02056bf41a cuda: prevent task lockup on timeout error
When creating a checkpoint of large models, the `checkpoint` action of
`cuda-checkpoint` can exceed the CRIU timeout. This causes CRIU to fail
with the following error, leaving the CUDA task in a locked state:

	cuda_plugin: Checkpointing CUDA devices on pid 84145 restore_tid 84202
	Error (criu/cr-dump.c:1791): Timeout reached. Try to interrupt: 0
	Error (cuda_plugin.c:139): cuda_plugin: Unable to read output of cuda-checkpoint: Interrupted system call
	Error (cuda_plugin.c:396): cuda_plugin: CHECKPOINT_DEVICES failed with
	net: Unlock network
	cuda_plugin: finished cuda_plugin stage 0 err -1
	cuda_plugin: resuming devices on pid 84145
	cuda_plugin: Restore thread pid 84202 found for real pid 84145
	Unfreezing tasks into 1
		Unseizing 84145 into 1
	Error (criu/cr-dump.c:2111): Dumping FAILED.

To fix this, we set `task_info->checkpointed` before invoking
the `checkpoint` action to ensure that the CUDA task is resumed
even if CRIU times out.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Adrian Reber
f83931542a net: remember the name of the lock chain (nftables)
Using libnftables the chain to lock the network is composed of
("CRIU-%d", real_pid). This leads to around 40 zdtm tests failing
with errors like this:

Error: No such file or directory; did you mean table 'CRIU-62' in family inet?
delete table inet CRIU-86

The reason is that as soon as a process is running in a namespace the
real PID can be anything and only the PID in the namespace is restored
correctly. Relying on the real PID does not work for the chain name.

Using the PID of the innermost namespace would lead to the chain be
called 'CRIU-1' most of the time which is also not really unique.

With this commit the change is now named using the already existing CRIU
run ID. To be able to correctly restore the process and delete the
locking table, the CRIU run id during checkpointing is now stored in the
inventory as dump_criu_run_id.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
54795f174b criu: use libuuid for criu_run_id generation
criu_run_id will be used in upcoming changes to create and remove
network rules for network locking. Instead of trying to come up with
a way to create unique IDs, just use an existing library.

libuuid should be installed on most systems as it is indirectly required
by systemd (via libmount).

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
815ef68848 ci: two check-commits.yml changes
* Switch to v4 actions/checkout (from v3)
 * Use our apt wrapper to gracefully handle temporary repository errors

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Austin Kuo
061f4266e8 test/zdtm: add a new test to check non-periodic timers
It creates a few timers with log expiration intervals, waites for C/R
and check that timers are armed and their intervals have been restored.

Signed-off-by: Austin Kuo <hsuanchikuo@gmail.com>
2025-03-21 12:40:31 -07:00
Austin Kuo
09dc2e9584 timer: Refine itimer_armed logic and improve timer value handling
Right now, CRIU skips timers non-periodic timers. This change addresses
this issue.

Signed-off-by: Austin Kuo <hsuanchikuo@gmail.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
aad66a4f7c test: fix cmdlinenv00 on aarch64
On aarch64 the test cmdlinenv00 was failing with:

  FAIL: cmdlinenv00.c:120: auxv corrupted on restore (errno = 11 (Resource temporarily unavailable))

Starting with Linux kernel version 6.3 the size of AUXV was changed:

    commit 28c8e088427ad30b4260953f3b6f908972b77c2d
    Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Date:   Wed Jan 4 14:20:54 2023 -0500

        rseq: Increase AT_VECTOR_SIZE_BASE to match rseq auxvec entries

        Two new auxiliary vector entries are introduced for rseq without
        matching increment of the AT_VECTOR_SIZE_BASE, which causes failures
        with CONFIG_HARDENED_USERCOPY=y.

        Fixes: 317c8194e6ae ("rseq: Introduce feature size and alignment ELF auxiliary vector entries")

With this change AT_VECTOR_SIZE increases from 40 to 50 on aarch64. CRIU
uses AT_VECTOR_SIZE to read the content of /proc/PID/auxv

        auxv_t mm_saved_auxv[AT_VECTOR_SIZE];
        ret = read(fd, mm_saved_auxv, sizeof(mm_saved_auxv));

Now the tests works again on aarch64.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
2b74924805 files-reg: fix buffer overflow on aarch64
Running the zdtm/static/unlink_regular00 test on Ubuntu 24.04 on aarch64
results in following error:

    # ./zdtm.py run -t zdtm/static/unlink_regular00 -k always
    userns is supported
    === Run 1/1 ================ zdtm/static/unlink_regular00
    ==================== Run zdtm/static/unlink_regular00 in ns ====================
    Skipping rtc at root
    Start test
    Test is SUID
    ./unlink_regular00 --pidfile=unlink_regular00.pid --outfile=unlink_regular00.out --dirname=unlink_regular00.test
    Run criu dump
    *** buffer overflow detected ***: terminated
    ############# Test zdtm/static/unlink_regular00 FAIL at CRIU dump ##############
    Test output: ================================

     <<< ================================
    Send the 9 signal to  47
    Wait for zdtm/static/unlink_regular00(47) to die for 0.100000
    ##################################### FAIL #####################################

According to the backtrace:

    #0  __pthread_kill_implementation (threadid=281473158467616, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    #1  0x0000ffff93477690 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
    #2  0x0000ffff9342cb3c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
    #3  0x0000ffff93417e00 in __GI_abort () at ./stdlib/abort.c:79
    #4  0x0000ffff9346abf0 in __libc_message_impl (fmt=fmt@entry=0xffff93552a78 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132
    #5  0x0000ffff934e81a8 in __GI___fortify_fail (msg=msg@entry=0xffff93552a28 "buffer overflow detected") at ./debug/fortify_fail.c:24
    #6  0x0000ffff934e79e4 in __GI___chk_fail () at ./debug/chk_fail.c:28
    #7  0x0000ffff934e9070 in ___snprintf_chk (s=s@entry=0xffffc6ed04a3 "testfile", maxlen=maxlen@entry=4056, flag=flag@entry=2, slen=slen@entry=4053,
        format=format@entry=0xaaaacffe3888 "link_remap.%d") at ./debug/snprintf_chk.c:29
    #8  0x0000aaaacff4b8b8 in snprintf (__fmt=0xaaaacffe3888 "link_remap.%d", __n=4056, __s=0xffffc6ed04a3 "testfile")
        at /usr/include/aarch64-linux-gnu/bits/stdio2.h:54
    #9  create_link_remap (path=path@entry=0xffffc6ed2901 "/zdtm/static/unlink_regular00.test/subdir/testfile", len=len@entry=60, lfd=lfd@entry=20,
        idp=idp@entry=0xffffc6ed14ec, nsid=nsid@entry=0xaaaada2bac00, parms=parms@entry=0xffffc6ed2808, fallback=0xaaaacff4c6c0 <dump_linked_remap+96>,
        fallback@entry=0xffffc6ed2797) at criu/files-reg.c:1164
    #10 0x0000aaaacff4c6c0 in dump_linked_remap (path=path@entry=0xffffc6ed2901 "/zdtm/static/unlink_regular00.test/subdir/testfile", len=len@entry=60,
        parms=parms@entry=0xffffc6ed2808, lfd=lfd@entry=20, id=id@entry=12, nsid=nsid@entry=0xaaaada2bac00, fallback=fallback@entry=0xffffc6ed2797)
        at criu/files-reg.c:1198
    #11 0x0000aaaacff4d8b0 in check_path_remap (nsid=0xaaaada2bac00, id=12, lfd=20, parms=0xffffc6ed2808, link=<optimized out>) at criu/files-reg.c:1426
    #12 dump_one_reg_file (lfd=20, id=12, p=0xffffc6ed2808) at criu/files-reg.c:1827
    #13 0x0000aaaacff51078 in dump_one_file (pid=<optimized out>, fd=4, lfd=20, opts=opts@entry=0xaaaada2ba2c0, ctl=ctl@entry=0xaaaada2c4d50,
        e=e@entry=0xffffc6ed39c8, dfds=dfds@entry=0xaaaada2c3d40) at criu/files.c:581
    #14 0x0000aaaacff5176c in dump_task_files_seized (ctl=ctl@entry=0xaaaada2c4d50, item=item@entry=0xaaaada2b8f80, dfds=dfds@entry=0xaaaada2c3d40)
        at criu/files.c:657
    #15 0x0000aaaacff3d3c0 in dump_one_task (parent_ie=0x0, item=0xaaaada2b8f80) at criu/cr-dump.c:1679
    #16 cr_dump_tasks (pid=<optimized out>) at criu/cr-dump.c:2224
    #17 0x0000aaaacff163a0 in main (argc=<optimized out>, argv=0xffffc6ed40e8, envp=<optimized out>) at criu/crtools.c:293

This line is the problem:

    snprintf(tmp + 1, sizeof(link_name) - (size_t)(tmp - link_name - 1), "link_remap.%d", rfe.id);

The problem was that the `-1` was on the inside of the braces and not on
the outside. This way the destination size was increase by 1 instead of
being decreased by 1 which triggered the buffer overflow detection.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Yuanhong Peng
6fdac50818 seize: Adjust the position of the log message
Based on the code, the `ret` variable at this point does not
represent the task state, so this log message should be
moved to a position after the `compel_wait_task()` function.

Signed-off-by: Yuanhong Peng <yummypeng@linux.alibaba.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
97398068b1 net: redirect nftables stdout and stderr to CRIU's log file
When using the nftables network locking backend and restoring a process
a second time the network locking has already been deleted by the first
restore. The second restore will print out to the console text like:

Error: Could not process rule: No such file or directory
delete table inet CRIU-202621

With this change CRIU's log FD is used by libnftables stdout and stderr.

Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Adrian Reber
6dce80c533 util: added cleanup_file attribute.
Signed-off-by: Adrian Reber <areber@redhat.com>
2025-03-21 12:40:31 -07:00
Liu Chao
260c08418b zdtm: Check CapAmb is restored correctly after C/R
This test sets CapAmb according to CapPrm and CapInh and check CapAmb
after C/R.

Signed-off-by: Liu Chao <liuchao173@huawei.com>
2025-03-21 12:40:31 -07:00
Liu Chao
6f8efad304 cr: Task CapAmb support
Signed-off-by: Liu Chao <liuchao173@huawei.com>
2025-03-21 12:40:31 -07:00
Kir Kolyshkin
94b9b3c5da freeze_processes: implement kludges for cgroup v1
Cgroup v1 freezer has always been problematic, failing to freeze a
cgroup.

In runc, we have implemented a few kludges to increase the chance of
succeeding, but those are used when runc freezes a cgroup for its own
purposes (for "runc pause" and to modify device properties for cgroup
v1).

When criu is used, it fails to freeze a cgroup from time to time
(see [1], [2]). Let's try adding kludges similar to ones in runc.

Alas, I have absolutely no way to test this, so please review carefully.

[1]: https://github.com/opencontainers/runc/issues/4273
[2]: https://github.com/opencontainers/runc/issues/4457

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-21 12:40:31 -07:00
Kir Kolyshkin
82f4ecda69 freeze_processes: fix logic
There are a few issues with the freeze_processes logic:

1. Commit 9fae23fbe2 grossly (by 1000x) miscalculated the number of
   attempts required, as a result, we are seeing something like this:

> (00.000340) freezing processes: 100000 attempts with 100 ms steps
> (00.000351) freezer.state=THAWED
> (00.000358) freezer.state=FREEZING
> (00.100446) freezer.state=FREEZING
> ...close to 100 lines skipped...
> (09.915110) freezer.state=FREEZING
> (10.000432) Error (criu/cr-dump.c:1467): Timeout reached. Try to interrupt: 0
> (10.000563) freezer.state=FREEZING

   For 10s with 100ms steps we only need 100 attempts, not 100000.

2. When the timeout is hit, the "failed to freeze cgroup" error is not
   printed, and the log_unfrozen_stacks is not called either.

3. The nanosleep at the last iteration is useless (this was hidden by
   issue 1 above, as the timeout was hit first).

Fix all these.

While at it,

4. Amend the error message with the number of attempts, sleep duration,
   and timeout.

5. Modify the "freezing cgroup" debug message to be in sync with the
   above error.

   Was:

   > freezing processes: 100000 attempts with 100 ms steps

   Now:

   > freezing cgroup some/name: 100 x 100ms attempts, timeout: 10s

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-21 12:40:31 -07:00
Kir Kolyshkin
99e1fbd8a2 criu/seize.c: clang-format it
Done using clang-format 19.1.5 with .clang-format obtained via
scripts/fetch-clang-format.sh.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-21 12:40:31 -07:00
Andrei Vagin
a8754905c0 test: run scm06 in the ns and uns flavors
The kernel releases a test socket asynchronously, so the restore can
fail if it is executed before the kernel actually destroys the socket.

Fixes #2537

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Andrei Vagin
15c81c1262 test/java: increate the ghost file limit
Right now, this test fails with this error:
Error (criu/files-reg.c:1031): Can't dump ghost file
  /criu/test/javaTests/omrvmem_000000626_Mlm48x of 2097152 size,
  increase limit

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Jesus Ramos
dc6cef0b4c cuda: Fix return value from CHECKPOINT_DEVICES hook so that dump's fail properly
cuda-checkpoint returns the positive CUDA error code when it runs into an issue
and passing that along as the return value would cause errors to get ignored

Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
2025-03-21 12:40:31 -07:00
Andrei Vagin
8ee2eba47c vdso: handle vvar_vclock vma-s
The vvar_vclock was introduced by [1]. Basically, the old vvar vma has
been splited on two parts. In term of C/R, these two vma-s can be still
treated as one.

[1] e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping")

Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
ed560a3491 pidfd: add missing include
Fix for the following error when building CRIU on Rocky Linux 8

criu/pidfd.c: In function ‘pidfd_open’:
criu/pidfd.c:119:17: error: ‘__NR_pidfd_open’ undeclared (first use in this function); did you mean ‘pidfd_open’?
  return syscall(__NR_pidfd_open, pid, flags);
                 ^~~~~~~~~~~~~~~
                 pidfd_open
criu/pidfd.c:119:17: note: each undeclared identifier is reported only once for each function it appears in
criu/pidfd.c:120:1: error: control reaches end of non-void function [-Werror=return-type]
 }
 ^
criu/pidfd.c: At top level:
cc1: error: unrecognized command line option ‘-Wno-unknown-warning-option’ [-Werror]
cc1: error: unrecognized command line option ‘-Wno-dangling-pointer’ [-Werror]
cc1: all warnings being treated as errors

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Alexander Mikhalitsyn
40b7f04b7c compel/arch/riscv64: properly implement compel_task_size()
We need to dynamically calculate TASK_SIZE depending
on the MMU on RISC-V system. [We are using analogical
approach on aarch64/ppc64le.]

This change was tested on physical machine:
StarFive VisionFive 2
isa		: rv64imafdc_zicntr_zicsr_zifencei_zihpm_zca_zcd_zba_zbb
mmu		: sv39
uarch		: sifive,u74-mc
mvendorid	: 0x489
marchid		: 0x8000000000000007
mimpid		: 0x4210427
hart isa	: rv64imafdc_zicntr_zicsr_zifencei_zihpm_zca_zcd_zba_zbb

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-03-21 12:40:31 -07:00
Alexander Mikhalitsyn
399d7bdcbb compel: fix gitignore and remove autogenerated code
We don't need to have compel/arch/riscv64/plugins/std/syscalls/syscalls.S
tracked in git. It is autogenerated. We also need to update our .gitignore
to ignore autogenerated files with syscall tables.

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
21e5f4cfd5 test: add get-state to mocked cuda-checkpoint tool
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
28c2cb3fd6 cuda: enable checkpoint support for paused tasks
If a CUDA process is already in a "locked" or "checkpointed" state
during criu dump, the CUDA plugin currently fails with an error because
it attempts an unnecessary "lock" action using the cuda-checkpoint tool.

This patch extends the CUDA plugin to handle such cases by first
verifying the initial state of the CUDA processes and skipping
unnecessary "lock" and "checkpoint" actions when a process has been
locked or checkpointed before CRIU is invoked.

In particular, CUDA tasks may already be in a "locked" or "checkpointed"
state to ensure consistent checkpoint/restore for distributed workloads,
such as model training, where multiple containers run across different
cluster nodes.

Another use case for this functionality is optimizing resource
utilization, where CUDA tasks with low-priority are preempted
immediately to release GPU resources needed by high-priority
tasks, and the paused workloads are later resumed or migrated
to another node.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
498bcf2806 zdtm: Check many processes with common dead pidfd
We have multiple processes open a pidfd to a common dead process.
After C/R we check that the inode numbers for these pidfds are equal or
not.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Andrei Vagin
7125bfc695 pidfd: one process creates a helper and opens all fds to it
Currently, the `waitpid()` call on the tmp process can be made by a
process which is not its parent. This causes restore to fail.

This patch instead selects one process to create the tmp process and
open all the fds that point to it. These fds are sent to the correct
process(es).

Fixes: #2496

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
b1cac7a8e5 cuda: fix check for GPU device availability
The check for `/dev/nvidiactl` to determine if the CUDA plugin can be
used is unreliable because in some cases the default path for driver
installation is different [1]. This patch changes the logic to check
if a GPU device is available in `/proc/driver/nvidia/gpus/`. This
approach is similar to `torch.cuda.is_available()` and it is a more
accurate indicator.

The subsequent check for support of the `cuda-checkpoint --action`
option would confirm if the driver supports checkpoint/restore.

[1] https://github.com/NVIDIA/gpu-operator

Fixes: #2509

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
36a53fe23c ci: test interrupt-only mode with frozen cgroup
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
4196268eef seize: enable support for frozen containers
Container runtimes like CRI-O and containerd utilize the freezer cgroup
to create a consistent snapshot of container root filesystem (rootfs)
changes. In this case, the container is frozen before invoking CRIU.
After CRIU successfully completes, a copy of the container rootfs diff
is saved, and the container is then unfrozen.

However, the `cuda-checkpoint` tool is not able to perform a 'lock'
action on frozen threads.  To support GPU checkpointing with these
container runtimes, we need to unfreeze the cgroup and return it to its
original state once the checkpointing is complete.

To reflect this new behavior, the following changes are applied:
 - `dont_use_freeze_cgroup(void)` -> `set_compel_interrupt_only_mode(void)`
 - `bool freeze_cgroup_disabled` -> `bool compel_interrupt_only_mode`
 - `check_freezer_cgroup(void)` -> `prepare_freezer_for_interrupt_only_mode(void)`

Note that when `compel_interrupt_only_mode` is set to `true`,
`compel_interrupt_task()` is used instead of `freeze_processes()`
to prevent tasks from running during `criu dump`.

Fixes: #2508

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
ff9dbef902 seize: fix error handling for check_freezer_cgroup
When `check_freezer_cgroup()` has non-zero return value, `goto err` calls
`return ret`. However, the value of `ret` has been set to `0` in the lines
above and CRIU does not handle the error properly.

This problem is related to https://github.com/checkpoint-restore/criu/issues/2508

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Lorenzo Fontana
622b43392f criu: Initialize util before service worker starts
When restoring dumps in new mount + pid namespaces where multiple dumps
share the same network namespace, CRIU may fail due to conflicting
unix socket names. This happens because the service worker creates
sockets using a pattern that includes criu_run_id, but util_init()
is called after cr_service_work() starts.

The socket naming pattern "crtools-fd-%d-%d" uses the restore PID
and criu_run_id, however criu_run_id is always 0 when not initialized,
leading to conflicts when multiple restores run simultaneously either
in the same CRIU process or because of multiple CRIU processes
doing the same operation in different PID namespaces.

Fix this by:

- Moving util_init() before cr_service_work() starts
- Adding a second util_init() call in the service worker fork
to ensure unique IDs across multiple worker runs
- Making sure that dump and restore operations have util_init() called
early to generate unique socket names

With this fix, socket names always include the namespace ID, preventing
conflicts when multiple processes with the same pid share a network
namespace.

Fixes #2499

[ avagin: minore code changes ]

Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
2025-03-21 12:40:31 -07:00
Liu Hua
9052ef93c7 uffd: Disable image deduplication after fork
After a fork, both the child and parent processes may trigger a page fault (#PF)
at the same virtual address, referencing the same position in the page image.
If deduplication is enabled, the last process to trigger the page fault will fail.

Therefore, deduplication should be disabled after a fork to prevent this issue.

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2025-03-21 12:40:31 -07:00
Cryolitia PukNgae
2be958d22e include: don't use GCC's __builtin_ffs on riscv64
Link: e300da4db4

Signed-off-by: PukNgae Cryolitia <Cryolitia@gmail.com>
---
- cherry-picked
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
da6b1807ef ci: add workflow for riscv64
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
bb29067de9 zdtm: add riscv64 support
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
6d970ed047 criu: add riscv64 support to parasite and restorer
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
1d028ef44e images: add riscv64 core image
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
95359a62aa compel: add riscv64 support
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
---
- rebased
- added a membarrier() to syscall table (fix authored by Cryolitia PukNgae)
Signed-off-by: PukNgae Cryolitia <Cryolitia@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-03-21 12:40:31 -07:00
Haorong Lu
d8f93e7bac include: add common header files for riscv64
Co-authored-by: Yixue Zhao <felicitia2010@gmail.com>
Co-authored-by: stove <stove@rivosinc.com>
Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
---
- rebased
- imported a page_size() type fix (authored by Cryolitia PukNgae)
Signed-off-by: PukNgae Cryolitia <Cryolitia@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
c49eb18f9f pidfd: block SIGCHLD during tmp process creation
This patch blocks SIGCHLD during temporary process creation to prevent a
race condition between kill() and waitpid() where sigchld_handler()
causes `criu restore` to fail with an error.

Fixes: #2490

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
5ca4400699 zdtm: add inventory test plugins
This patch adds two test plugins to verify that CRIU plugins listed
in the inventory image are enabled, while those that are not listed
can be disabled.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
5335b35f72 images/inventory: add field for enabled plugins
This patch extends the inventory image with a `plugins` field that
contains an array of plugins which were used during checkpoint,
for example, to save GPU state. In particular, the CUDA and AMDGPU
plugins are added to this field only when the checkpoint contains
GPU state. This allows to disable unnecessary plugins during restore,
show appropriate error messages if required CRIU plugin are missing,
and migrate a process that does not use GPU from a GPU-enabled system
to CPU-only environment.

We use the `optional plugins_entry` for backwards compatibility. This
entry allows us to distinguish between *unset* and *missing* field:

- When the field is missing, it indicates that the checkpoint was
  created with a previous version of CRIU, and all plugins should be
  *enabled* during restore.

- When the field is empty, it indicates that no plugins were used during
  checkpointing. Thus, all plugins can be *disabled* during restore.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
b524dab32f pycriu: fix lint errors
This patch fixes the following errors reported by ruff:

lib/pycriu/images/pb2dict.py:307:24: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
305 |     elif field.type in _basic_cast:
306 |         cast = _basic_cast[field.type]
307 |         if pretty and (cast == int):
    |                        ^^^^^^^^^^^ E721
308 |             if is_hex:
309 |                 # Fields that have (criu).hex = true option set
    |

lib/pycriu/images/pb2dict.py:379:13: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
377 |     elif field.type in _basic_cast:
378 |         cast = _basic_cast[field.type]
379 |         if (cast == int) and is_string(value):
    |             ^^^^^^^^^^^ E721
380 |             if _marked_as_dev(field):
381 |                 return encode_dev(field, value)
    |

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
88aa7e2c10 make/lint: use 'ruff check <path>'
The command `ruff <path>` has been deprecated and removed:
https://astral.sh/blog/ruff-v0.5.0#removed-deprecated-features

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
f29e655df9 zdtm: Check pidfd for thread is valid after C/R
We open a pidfd to a thread using `PIDFD_THREAD` flag and after C/R
ensure that we can send signals using it with `PIDFD_SIGNAL_THREAD`.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
7a64004dc8 zdtm: Check fd from pidfd_getfd is C/Red correctly
We get the read end of a pipe using `pidfd_getfd` and check if we can
read from it after C/R.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
2e6f348458 zdtm: Check dead pidfd is restored correctly
After, C/R of pidfds that point to dead processes their inodes might
change. But if two pidfds point to same dead process they should
continue to do so after C/R.

This test ensures that this happens by calling `statx()` on pidfds after
C/R and then comparing their inode numbers.

Support for comparing pidfds by using `statx()` and inode numbers was
introduced alongside pidfs. So if `f_type` of pidfd is not equal to
`PID_FS_MAGIC` then we skip this test.

signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
3f30ec0eda zdtm: Check pidfd can kill descendant processes
Validate that pidfds can been used to send signals to different
processes after C/R using the `pidfd_send_signal()` syscall.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
2899d46000 zdtm: Check pidfd can send signal after C/R
Ensure `pidfd_send_signal()` syscall works as expected after C/R.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
3096df9ea3 zdtm: Check pidfd fdinfo entry is consistent
Ensures that entries in /proc/<pid>/fdinfo/<pidfd> are same.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
1ce408ffa4 criu: Support C/R of pidfds
Process file descriptors (pidfds) were introduced to provide a stable
handle on a process. They solve the problem of pid recycling.

For a detailed explanation, see https://lwn.net/Articles/801319/ and
http://www.corsix.org/content/what-is-a-pidfd

Before Linux 6.9, anonymous inodes were used for the implementation of
pidfds. So, we detect them in a fashion similiar to other fd types that
use anonymous inodes by calling `readlink()`.
After 6.9, pidfs (a file system for pidfds) was introduced.
In 6.9 `S_ISREG()` returned true for pidfds, but this again changed with
6.10.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pidfs.c?h=v6.11-rc2#n285)
After this change, pidfs inodes have no file type in st_mode in
userspace.
We use `PID_FS_MAGIC` to detect pidfds for kernel >= 6.9
Hence, check for pidfds occurs before the check for regular files.

For pidfds that refer to dead processes, we lose the pid of the process
as the Pid and NSpid fields in /proc/<pid>/fdinfo/<pidfd> change to -1.
So, we create a temporary process for each unique inode and open pidfds
that refer to this process. After all pidfds have been opened we kill
this temporary process.

This commit does not include support for pidfds that point to a specific
thread, i.e pidfds opened with `PIDFD_THREAD` flag.

Fixes: #2258

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Bhavik Sachdev
3322d1e94c images: Add protobuf definition for pidfd
We only use the last pid from the list in NSpid entry (from
/proc/<pid>/fdinfo/<pidfd>) while restoring pidfds.
The last pid refers to the pid of the process in the most deeply nested
pid namespace. Since CRIU does not currently support nested pid
namespaces, this entry is the one we want.

After Linux 6.9, inode numbers can be used to compare pidfds. pidfds
referring to the same process will have the same inode numbers. We use
inode numbers to restore pidfds that point to dead processes.

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
4f8f6f2883 Makefile.config: set CR_PLUGIN_DEFAULT variable
By default, CRIU uses the path "/usr/lib/criu" to install and load
plugins at runtime. This path is defined by the `PLUGINDIR` variable
in Makefile.install and `CR_PLUGIN_DEFAULT` in `criu/include/plugin.h`.
However, some distribution packages might install the CRIU plugins at
"/usr/lib64/criu" instead. This patch updates the makefile to align
the path defined by `CR_PLUGIN_DEFAULT` with the value of `PLUGINDIR`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Radostin Stoyanov
f1d465448f amdgpu: remove exec permissions on source files
This patch fixes the following warnings that appear
when building an RPM package:

+ /usr/lib/rpm/redhat/brp-mangle-shebangs
*** WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.c is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/debug/criu-4.0-1.fc42.x86_64/plugins/amdgpu/amdgpu_plugin_util.h is executable but has no shebang, removing executable bit

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2025-03-21 12:40:31 -07:00
Andrei Vagin
c2b48ff423 criu: Version 4.0 (CRIUDA)
Major changes:
* CUDA plugin to support checkpointing and restoring NVIDIA CUDA applications.
* Shadow stack support
* Pagemap cache: Added support for PAGEMAP_SCAN ioctl

The full changelog can be found here: https://criu.org/Download/criu/4.0.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-21 16:34:14 -07:00
Andrei Vagin
a8cbe76d4f util: dump fsfd log messages
It should help to investigate errors of fsconfig, fsmount and etc.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
David Francis
096c1f7a4d plugins/amdgpu - Increase maximum parameter length
The topology parsing assumed that all parameter names were
30 characters or fewer, but

recommended_sdma_engine_id_mask

is 31 characters.

Make the maximum length a macro, and set it to 64.

Signed-off-by: David Francis <David.Francis@amd.com>
2024-09-19 15:23:42 -07:00
David Francis
60ee5ebd9d plugins/amdgpu: Zero ib_info on initialization
This struct was being used un-initialized, meaning it
was filled with random garbage.

Mea culpa.

Signed-off-by: David Francis <David.Francis@amd.com>
2024-09-19 15:23:42 -07:00
Andrei Vagin
6918998897 plugin/cuda: disable CUDA plugin if /dev/nvidiactl isn't present
The presence of /dev/nvidiactl indicates that the system has a
compatible NVIDIA GPU driver installed and that the GPU is accessible to
the operating system.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
Andrei Vagin
e1331a4b60 fault: allow to check dont_use_freeze_cgroup
Adds a new "fault" to call dont_use_freeze_cgroup.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
Andrei Vagin
651df375bd criu: Allow disabling freeze cgroups
Some plugins (e.g., CUDA) may not function correctly when processes are
frozen using cgroups. This change introduces a mechanism to disable the
use of freeze cgroups during process seizing, even if explicitly
requested via the --freeze-cgroup option.

The CUDA plugin is updated to utilize this new mechanism to ensure
compatibility.

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
59f49c6276 codespell: fix typos
This patch fixes the following typos reported by codespell:

./test/others/bers/bers.c:394: dependin ==> depending, depend in
./criu/kerndat.c:837: hitted ==> hit

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
edb6fbb820 scripts/uninstall_module: fix package discovery
The `uninstall_module.py` script is a wrapper for the `pip uninstall`
command that enables support for specifying installation prefix
(i.e., `--prefix`). When this functionality is used, we intentionally
set `sys.path` to include only search paths for the specified prefix
to avoid unintentional uninstallation of packages in system paths.

Since `importlib_metadata` version 8.1.0, the `Distribution.from_name()`
method has been modified [1] to perform additional pre-processing of
Distribution objects [2] that requires loading distribution metadata
and results in the following error:

  File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 422, in <lambda>
    buckets = bucket(dists, lambda dist: bool(dist.metadata))
                                              ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/importlib_metadata/__init__.py", line 454, in metadata
    from . import _adapters
  File "/usr/local/lib/python3.12/site-packages/importlib_metadata/_adapters.py", line 3, in <module>
    import email.message
  File "/usr/lib64/python3.12/email/message.py", line 11, in <module>
    import quopri
  ModuleNotFoundError: No module named 'quopri'

This error occurs because we have excluded system paths from the list
of search paths (`sys.path`).

However, this pre-processing is not required for our use case, as we
only use the discovery mechanism of importlib_metadata to resolve the
metadata directory path of the module being uninstalled.

To fix this problem, this patch updates `uninstall_module` to avoid the
`from_name()` method and use `discover(name=package_name)` directly.

[1] a65c29adc0
[2] a65c29ad/importlib_metadata/__init__.py (L391)

Fixes: #2468

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
b1b3c14b17 cuda: unlock on timeout error
When attempting to checkpoint a container with CUDA processes,
CRIU could fail with the following error:

	Error (criu/cr-dump.c:1791): Timeout reached. Try to interrupt: 1
	Error (cuda_plugin.c:143): cuda_plugin: Unable to read output of cuda-checkpoint: Interrupted system call
	Error (cuda_plugin.c:384): cuda_plugin: PAUSE_DEVICES failed with

In this situation, the target process is locked, but CRIU fails due to
a timeout and exits with an error. We need to make sure that the target
PID is unlocked in such case.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Adrian Reber
dbfa450246 ci: run aarch64 tests native via actuated
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
8beac656fc coredump: fail on unsupported architectures early
Currently coredump only works on x86_64. Fail early on any other
architecture.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
d44fc0de5a test: only run macvlan tests if macvlan devices can be created
Some test environments (Actuated runners for example) do not support
maclvan devices. Skip tests depending on it automatically.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
01c65732b6 test: better test for SELinux tools
Previously the check was just if /sys/fs/selinux is mounted. This
extends the check to see if all necessary tools are installed.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Adrian Reber
615ccf98cf crit: do not crash on aarch64 doing 'crit x ./ rss'
Running 'crit x ./ rss' on aarch64 crashes with:

    File "/home/criu/crit/crit/__main__.py", line 331, in explore_rss
      while vmas[vmi]['start'] < pme:
            ~~~~^^^^^
  IndexError: list index out of range

This adds an additional check to the while loop to do access indexes out
of range.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
21ea718f9f plugins/amdgpu: fix printf format specifiers
Errors on aarch64:

	In file included from amdgpu_plugin_drm.h:10,
			 from amdgpu_plugin.c:33:
	amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file':
	amdgpu_plugin_util.h:24:20: error: format '%lld' expects argument of type 'long long int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info'
	 1236 |         pr_info("devices:%d bos:%d objects:%d priv_data:%lld\n", args.num_devices, args.num_bos, args.num_objects,
	      |         ^~~~~~~
	cc1: all warnings being treated as errors

Errors on ppc64:

	In file included from amdgpu_plugin_drm.h:10,
			 from amdgpu_plugin.c:33:
	amdgpu_plugin.c: In function 'amdgpu_plugin_dump_file':
	amdgpu_plugin_util.h:24:20: error: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin.c:1236:9: note: in expansion of macro 'pr_info'
	 1236 |         pr_info("devices:%u bos:%u objects:%u priv_data:%llu\n",
	      |         ^~~~~~~
	cc1: all warnings being treated as errors
	In file included from amdgpu_plugin_util.c:38:
	amdgpu_plugin_util.c: In function 'print_kfd_bo_stat':
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:196:17: note: in expansion of macro 'pr_info'
	  196 |                 pr_info("%s(), %d. KFD BO Addr: %llx \n", __func__, idx, bo->addr);
	      |                 ^~~~~~~
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:197:17: note: in expansion of macro 'pr_info'
	  197 |                 pr_info("%s(), %d. KFD BO Size: %llx \n", __func__, idx, bo->size);
	      |                 ^~~~~~~
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:198:17: note: in expansion of macro 'pr_info'
	  198 |                 pr_info("%s(), %d. KFD BO Offset: %llx \n", __func__, idx, bo->offset);
	      |                 ^~~~~~~
	amdgpu_plugin_util.h:24:20: error: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type '__u64' {aka 'long unsigned int'} [-Werror=format=]
	   24 | #define LOG_PREFIX "amdgpu_plugin: "
	      |                    ^~~~~~~~~~~~~~~~~
	../../criu/include/log.h:47:52: note: in expansion of macro 'LOG_PREFIX'
	   47 | #define pr_info(fmt, ...) print_on_level(LOG_INFO, LOG_PREFIX fmt, ##__VA_ARGS__)
	      |                                                    ^~~~~~~~~~
	amdgpu_plugin_util.c:199:17: note: in expansion of macro 'pr_info'
	  199 |                 pr_info("%s(), %d. KFD BO Restored Offset: %llx \n", __func__, idx, bo->restored_offset);
	      |                 ^~~~~~~
	cc1: all warnings being treated as errors

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
3e2ed18790 plugins/amdgpu: use C99-standard types
Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
d68205e919 ci: enable cross compile testing for amdgpu-plugin
Skip cross-compilation on armv7 because, among many other errors,
it fails with the following:

	In file included from ../../include/common/lock.h:9,
			 from ../../criu/include/files.h:9,
			 from amdgpu_plugin.c:30:
	../../include/common/asm/atomic.h:60:2: error: #error ARM architecture version (CONFIG_ARMV*) not set or unsupported.
	   60 | #error ARM architecture version (CONFIG_ARMV*) not set or unsupported.
	      |  ^~~~~
	../../include/common/asm/atomic.h: In function 'atomic_add_return':
	../../include/common/asm/atomic.h:81:9: error: implicit declaration of function 'smp_mb' [-Werror=implicit-function-declaration]
	   81 |         smp_mb();
	      |         ^~~~~~

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Radostin Stoyanov
2ee5844411 plugins/amdgpu: fix cross-compilation
To enable cross-compile we need to use the CC definition from
criu/scripts/nmk/scripts/tools.mk:

CC := $(CROSS_COMPILE)$(HOSTCC)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-19 15:23:42 -07:00
Andrei Vagin
9a19cf34de scripts/ci: run tests with the mocked cuda-checkpoint tool
Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
de31abb970 criu/plugin: don't call plugin device hooks for non-alive tasks
Dead tasks don't hold any resources.

Fixes: 2465
Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
dea6305914 test/zdtm: allow to run tests with the mocked cuda-checkpoint tool
Here is an example how to run one test:
$ python test/zdtm.py run -t zdtm/static/env00 --ignore-taint --mocked-cuda-checkpoint

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
haozi007
67fe44e981 support user set remote mmap vma address
1. os auto assignment vma addr maybe conflict with vma in gpu living migrate scene;
2. so, we should give choice to user;

Signed-off-by: haozi007 <liuhao27@huawei.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
551cd92447 timer: fix printf specifiers for __suseconds64_t
New internal glibc types __timeval64 [1] and __suseconds64_t [2] have
been introduced as a solution for the Y2038 problem [3]. These 64-bit
types are used across all architectures. However, this change causes
the following build errors when cross-compiling on ARMv7 (armhf):

criu/timer.c:49:17: error: format '%ld' expects argument of type 'long int', but argument 5 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=]
   49 |         pr_info("Restored %s timer to %" PRId64 ".%ld -> %" PRId64 ".%ld\n", n,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
   50 |                 (int64_t)val->it_value.tv_sec, val->it_value.tv_usec,
      |                                                ~~~~~~~~~~~~~~~~~~~~~
      |                                                             |
      |                                                             __suseconds64_t {aka long long int}

criu/timer.c:49:17: error: format '%ld' expects argument of type 'long int', but argument 7 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=]
   49 |         pr_info("Restored %s timer to %" PRId64 ".%ld -> %" PRId64 ".%ld\n", n,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
   50 |                 (int64_t)val->it_value.tv_sec, val->it_value.tv_usec,
   51 |                 (int64_t)val->it_interval.tv_sec, val->it_interval.tv_usec);
      |                                                   ~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                   |
      |                                                                   __suseconds64_t {aka long long int}

ns.c:234:48: error: format '%ld' expects argument of type 'long int', but argument 5 has type 'time_t' {aka 'long long int'} [-Werror=format=]
  234 |         len = snprintf(buf, sizeof(buf), "%d %ld 0", clk_id, offset);
      |                                              ~~^             ~~~~~~
      |                                                |             |
      |                                                long int      time_t {aka long long int}
      |                                              %lld

msg.c:58:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type '__suseconds64_t' {aka 'long long int'} [-Werror=format=]
   58 |         off += sprintf(buf + off, ".%.3ld: ", tv.tv_usec / 1000);
      |                                     ~~~~^     ~~~~~~~~~~~~~~~~~
      |                                         |                |
      |                                         long int         __suseconds64_t {aka long long int}
      |                                     %.3lld

../lib/zdtmtst.h:137:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  137 |                 test_msg("ERR: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \
      |                          ^~~~~~~~~~~~~~
pthread_timers_h.c:72:17: note: in expansion of macro 'pr_perror'
   72 |                 pr_perror("wrong interval: %ld:%ld", itimerspec.it_interval.tv_sec, itimerspec.it_interval.tv_nsec);
      |                 ^~~~~~~~~

vdso00.c:22:32: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
   22 |         test_msg("%d time: %10li\n", getpid(), tv.tv_sec);
      |                            ~~~~^               ~~~~~~~~~
      |                                |                 |
      |                                long int          __time64_t {aka long long int}
      |                            %10lli

vdso00.c:29:32: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
   29 |         test_msg("%d time: %10li\n", getpid(), tv.tv_sec);
      |                            ~~~~^               ~~~~~~~~~
      |                                |                 |
      |                                long int          __time64_t {aka long long int}
      |                            %10lli

vdso01.c:357:42: error: format '%li' expects argument of type 'long int', but argument 2 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  357 |         test_msg("gettimeofday: tv_sec %li vdso_gettimeofday: tv_sec %li\n", tv1.tv_sec, tv2.tv_sec);
      |                                        ~~^                                   ~~~~~~~~~~
      |                                          |                                      |
      |                                          long int                               __time64_t {aka long long int}
      |                                        %lli

vdso01.c:357:72: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  357 |         test_msg("gettimeofday: tv_sec %li vdso_gettimeofday: tv_sec %li\n", tv1.tv_sec, tv2.tv_sec);
      |                                                                      ~~^                 ~~~~~~~~~~
      |                                                                        |                    |
      |                                                                        long int             __time64_t {aka long long int}
      |

vdso01.c:328:43: error: format '%li' expects argument of type 'long int', but argument 2 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  328 |         test_msg("clock_gettime: tv_sec %li vdso_clock_gettime: tv_sec %li\n", ts1.tv_sec, ts2.tv_sec);
      |                                         ~~^                                    ~~~~~~~~~~
      |                                           |                                       |
      |                                           long int                                __time64_t {aka long long int}
      |                                         %lli

vdso01.c:328:74: error: format '%li' expects argument of type 'long int', but argument 3 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  328 |         test_msg("clock_gettime: tv_sec %li vdso_clock_gettime: tv_sec %li\n", ts1.tv_sec, ts2.tv_sec);
      |                                                                        ~~^                 ~~~~~~~~~~
      |                                                                          |                    |
      |                                                                          long int             __time64_t {aka long long int}
      |

../lib/zdtmtst.h:144:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'time_t' {aka 'long long int'} [-Werror=format=]
  144 |                 test_msg("FAIL: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \
      |                          ^~~~~~~~~~~~~~~
mtime_mmap.c:80:17: note: in expansion of macro 'fail'
   80 |                 fail("mtime %ld wasn't updated on mmapped %s file", mtime_new, filename);
      |                 ^~~~

../lib/zdtmtst.h:144:26: error: format '%ld' expects argument of type 'long int', but argument 4 has type '__time64_t' {aka 'long long int'} [-Werror=format=]
  144 |                 test_msg("FAIL: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, \
      |                          ^~~~~~~~~~~~~~~
mtime_mmap.c:101:17: note: in expansion of macro 'fail'
  101 |                 fail("After migration, mtime changed to %ld", fst.st_mtime);
      |                 ^~~~

[1] https://sourceware.org/git/?p=glibc.git;h=504c98717062cb9bcbd4b3e59e932d04331ddca5
[2] https://sourceware.org/git/?p=glibc.git;h=3fced064f23562ec24f8312ffbc14950993969e6
[3] https://en.wikipedia.org/wiki/Year_2038_problem

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
a045c874cb ci: run tests with amdgpu and cuda plugins
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
2453ed69a2 zdtm: add option to run tests with criu plugins
By default, if the "CRIU_LIBS_DIR" environment variable is not set,
CRIU will load all plugins installed in `/usr/lib/criu`. This may
result in running the ZDTM tests with plugins for a different version
of CRIU (e.g., installed from a package).

This patch updates ZDTM to always set the "CRIU_LIBS_DIR" environment
variable and use a local "plugins" directory. This directory contains
copies of the plugin files built from source. In addition, this patch
adds the `--criu-plugin` option to the `zdtm.py run` command, allowing
tests to be run with specified CRIU plugins.

Example:

  - Run test only with AMDGPU plugin
    ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin amdgpu

  - Run test only with CUDA plugin
    ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin cuda

  - Run test with both AMDGPU and CUDA plugins
    ./zdtm.py run -t zdtm/static/busyloop00 --criu-plugin amdgpu cuda

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
ad66c27a11 cuda: fix launch cuda-checkpoint
When the cuda-checkpoint tool is not installed, execvp() is expected to
fail and return -1. In this case, we need to call exit() to terminate
the child process that was created earlier with fork().

Since CRIU can be used with applications that do not use CUDA, even
when the CUDA plugin is installed, this patch also updates the log
messages to show debug and warning (instead of error) when the
cuda-checkpoint tool is not found in $PATH.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
fde0b7ac69 cuda: don't leak fds to cuda-checkpoint
Leaking open file descriptors to third-party tools can lead
to security risks.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
4dde52a308 ci/podman: show mounts
Show information about mounts available on the host filesystem.
This is useful for debugging.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
9a85fb6382 ci/podman: show criu logs in case of error
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
liuchao173
8437663cc6 delete redundant include header files
restorer.h has been included in line 43.

Fixes: 22963d2827 ("Hide asm/restorer.h from sources")

Signed-off-by: liuchao173 <liuchao173@huawei.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
c42b58f4fb plugin: enable multiple plugins for the same hook
CRIU provides two plugins for checkpoint/restore of GPU applications:
amdgpu and cuda. Both plugins use the `RESUME_DEVICES_LATE` hook to
enable restore:

    CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__RESUME_DEVICES_LATE, amdgpu_plugin_resume_devices_late)
    CR_PLUGIN_REGISTER_HOOK(CR_PLUGIN_HOOK__RESUME_DEVICES_LATE, cuda_plugin_resume_devices_late)

However, CRIU currently does not support running more than one plugin
for the same hook. As a result, when both plugins are installed, the
resume function for CUDA applications is not executed. To fix this,
we need to make sure that both `plugin_resume_devices_late()` functions
return `-ENOTSUP` when restore is not supported.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
85050be66b seize: fix pause-devices plugin hook
The plugin hook "PAUSE_DEVICES" was recently introduced in the following
commit. This hook was intended to execute the cuda-checkpoint tool
before the process tree is frozen. However, the run_plugins() call has
been placed immediately *after* freeze_processes(). This causes the
cuda-checkpoint tool to hang indefinitely during the checkpointing
of CUDA applications running in containers, eventually leading to its
termination by the timeout alarm.

a85f488595
criu/plugin: Introduce new plugin hooks PAUSE_DEVICES and CHECKPOINT_DEVICES to be used during pstree collection

This problem can be reproduced with the following example:

sudo podman run -d --rm \
        --device nvidia.com/gpu=all --security-opt=label=disable \
        quay.io/radostin/cuda-counter

sudo podman container checkpoint -l -e /tmp/checkpoint.tar

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
21108b40de test/zdtm: mount a new tmpfs to the zdtm root /dev
The current file system can be mounted with nodev.

Fixes #2441

Signed-off-by: Andrei Vagin <avagin@google.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
fcbadfbdbf plugins: set executable bit on .so files
For historical reasons, some tools like rpm [1] or ldd [2,3]
may expect the executable bit to be present for the correct
identification of shared libraries. The executable bit on .so
files is set by default by compilers (e.g., GCC). It is not
strictly necessary but primarily a convention.

[1] https://docs.fedoraproject.org/en-US/package-maintainers/CommonRpmlintIssues/#unstripped_binary_or_object
[2] https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/ldd.bash.in;h=d6b640df;hb=HEAD#l154

[3] $ sudo ldd /usr/lib/criu/*.so
/usr/lib/criu/amdgpu_plugin.so:
ldd: warning: you do not have execution permission for `/usr/lib/criu/amdgpu_plugin.so'
	linux-vdso.so.1 (0x00007fd0a2a3e000)
	libdrm.so.2 => /lib64/libdrm.so.2 (0x00007fd0a29eb000)
	libdrm_amdgpu.so.1 => /lib64/libdrm_amdgpu.so.1 (0x00007fd0a29de000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fd0a27fc000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd0a2a40000)
/usr/lib/criu/cuda_plugin.so:
ldd: warning: you do not have execution permission for `/usr/lib/criu/cuda_plugin.so'
	linux-vdso.so.1 (0x00007f1806e13000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f1806c08000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1806e15000)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
5783706d57 docs: update amdgpu-plugin man page
This patch updates the dependencies section of the AMDGPU plugin man
page to reflect that the plugin has been merged upstream and to fix a
formatting issue.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Florian Weimer
089345f77a Adjust to glibc __rseq_size semantic change
In commit 2e456ccf0c34a056e3ccafac4a0c7effef14d918 ("Linux: Make
__rseq_size useful for feature detection (bug 31965)") glibc 2.40
changed the meaning of __rseq_size slightly: it is now the size
of the active/feature area (20 bytes initially), and not the size
of the entire initially defined struct (32 bytes including padding).
The reason for the change is that the size including padding does not
allow detection of newly added features while previously unused
padding is consumed.

The prep_libc_rseq_info change in criu/cr-restore.c is not necessary
on kernels which have full ptrace support for obtaining rseq
information because the code is not used.  On older kernels, it is
a correctness fix because with size 20 (the new value), rseq
registeration would fail.

The two other changes are required to make rseq unregistration work
in tests.

Signed-off-by: Florian Weimer <fweimer@redhat.com>
2024-09-11 16:02:11 -07:00
Bui Quang Minh
b9081ca56b zdtm: make cgroup testcases run non-parallel
cgroup testcases live in the same cgroup root zdtmtst and
zdtmtst.defaultroot controller then create child subgroup for testing. This
can cause problems when cgroup testcases run in parallel. For example,
testcase A dumps the child subgroup of testcase B since it's in the cgroup
root but in the middle of restoring of testcase A, testcase B completes and
cleans up the subgroup directory. This causes error in testcase A restore.
This commit adds excl flag to all cgroup testcases description so that
these don't run parallel.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
4f45572fde util: use close_range when it's supported
close_range is faster than reading /proc/self/fd and closing descriptors
one by one.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
42b177da62 scripts/build: drop centos 7 targets
The CI tests with CentOS 7 have been disabled and removed [1,2].
This patch removes the obsolete Makefile targets for these tests.

[1] 24bc083653
[2] f8466ca798

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
1815838191 vdso: proxify the __vdso_clock_gettime64 function
It was added in v5.3-rc1~211^2~4^2~10.

Fixes #2390

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
ac22aaf576 apparmor: get_suspend_policy must return NULL in error cases
Before this fix, it could return MAP_FAILED which is ((void *) -1).

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
71999d8883 cgroupd: unblock SIGTERM to make stop_cgroupd actually work
Sometimes due to sigblockmask inheritance cgroupd can inherit SIGTERM
blocked. That will lead cgroupd ignoring SIGTERM from stop_cgroupd() and
CRIU will get stuck due to waiting for never-stopping cgroupd.

I see this happening in lxc-checkpoint, also saw this in OpenVZ jenkins
on cgroup_inotify00 test.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
Liu Hua
daed6c3535 irmap: duplicate string in irmap_scan_path_add
Duplicate string in irmap_scan_path_add, otherwise it will free before
parsing next configuration input.

[ avagin: handle errors of xstrdup ]

Signed-off-by: Liu Hua <weldonliu@tencent.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
b169e3b63d plugins/cuda: fix crosscompilation
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Pratyush Yadav
ca971b7f8b compel: fix build on Amazon Linux 2 due to missing PTRACE_ARCH_PRCTL
Commit fc683cb01 ("compel: shstk: save CET state when CPU supports it")
started using PTRACE_ARCH_PRCTL to query shadow stack status. While
PTRACE_ARCH_PRCTL has existed in the kernel for a long time, it was only
added to glibc in version 2.27. Amazon Linux 2 (AL2) has glibc 2.26,
which does not have this definition. As a result, build on AL2 fails
with the below error:

    compel/arch/x86/src/lib/infect.c: In function ‘get_task_xsave’:
    compel/arch/x86/src/lib/infect.c:276:14: error: ‘PTRACE_ARCH_PRCTL’ undeclared (first use in this function)
    276 |   if (ptrace(PTRACE_ARCH_PRCTL, pid, (unsigned long)&features, ARCH_SHSTK_STATUS)) {
        |              ^~~~~~~~~~~~~~~~~

While the definition is present on the system via the kernel headers (in
asm/ptrace-abi.h) which can be reached by including linux/ptrace.h, the
comment in compel/include/uapi/ptrace.h says:

    We'd want to include both sys/ptrace.h and linux/ptrace.h, hoping
    that most definitions come from either one or another. Alas, on
    Alpine/musl both files declare struct ptrace_peeksiginfo_args, so
    there is no way they can be used together. Let's rely on libc one.

Since including linux/ptrace.h is not an option, define
PTRACE_ARCH_PRCTL if it doesn't already exist. An interesting point to
note is that in sys/ptrace.h, PTRACE_ARCH_PRCTL is an enum value so the
preprocessor doesn't know about it. PT_ARCH_PRCTL is the preprocessor
symbol that matches the value of PTRACE_ARCH_PRCTL. So look for
PT_ARCH_PRCTL to decide if PTRACE_ARCH_PRCTL is available or not.

Another interesting point to note is that AL2 ships with GCC 7 by
default, which does not support the -mshstk option, causing other build
failures. Luckily, it also ships GCC 10 which does have the option.
Using GCC 10 lets the build succeed.

Fixes: fc683cb01 ("compel: shstk: save CET state when CPU supports it")
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
2024-09-11 16:02:11 -07:00
Jesus Ramos
bf417dd050 criu/plugin: Add NVIDIA CUDA plugin
Adding support for the NVIDIA cuda-checkpoint utility, requires the use of an
r555 or higher driver along with the cuda-checkpoint binary.

Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
2024-09-11 16:02:11 -07:00
Jesus Ramos
5f486d5aee criu/plugin: Introduce new plugin hooks PAUSE_DEVICES and CHECKPOINT_DEVICES to be used during pstree collection
PAUSE_DEVICES is called before a process is frozen and is used by the CUDA
plugin to place the process in a state that's ready to be checkpointed and
quiesce any pending work

CHECKPOINT_DEVICES is called after all processes in the tree have been frozen
and PAUSE'd and performs the actual checkpointing operation for CUDA
applications

Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
2024-09-11 16:02:11 -07:00
Jesus Ramos
1012e542e5 criu: Restore rseq_cs state slightly earlier in the restore sequence and run the plugin finalizer later in the dump sequence
Restore rseq_cs state before calling RESUME_DEVICES_LATE as the CUDA plugin will
temporarily unfreeze a thread during the plugin hook to assist with device
restore

Run the plugin finalizer later in the dump sequence since the finalizer is used
by the CUDA plugin to handle some process cleanup

Signed-off-by: Jesus Ramos <jeramos@nvidia.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
7ac4537069 readme: update link to FAQ page
The current link opens a page with the following text:

    The MediaWiki FAQ can be found at:
    https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
4f15fe8c59 make: improve check for externally managed Python
Move PYTHON_EXTERNALLY_MANAGED and PIP_BREAK_SYSTEM_PACKAGES
into Makefile.install to avoid code duplication. In addition, add
PIPFLAGS variable to enable specifying pip options during installation.
This is particularly useful for packaging, where it is common for `pip install`
to run in an environment with pre-installed dependencies and without internet
access. In such environment, we need to specify the following options:

    --no-build-isolation --no-index --no-deps

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Adrian Reber
fdf546dbd5 ci: upgrade to Fedora 40 Vagrant images (38 is EOL)
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Bhavik Sachdev
f171649264 test/dump-crash: check code path when dump crashes
Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2024-09-11 16:02:11 -07:00
Bhavik Sachdev
a252a240c3 zdtm: Distinguish between fail and crash of dump
Adds a exit_signal static method to criu_cli, criu_config and criu_rpc
used to detect a crash.

Fixes: #350

Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
6feb57a840 ci: remove CentOS Stream 8 test (EOL)
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
1da29f27f6 zdtm: add support for LD_PRELOAD tests
This commit adds a `--preload-libfault` option to ZDTM's run command.
This option runs CRIU with LD_PRELOAD to intercept libc functions
such as pread(). This method allows to simulate special cases,
for example, when a successful call to pread() transfers fewer
bytes than requested.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
e7276cf63b pagemap-cache: handle short reads
It is possible for pread() to return fewer number of bytes than
requested. In such case, we need to repeat the read operation
with appropriate offset.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
cc88b1e1ff net: Fix TOCTOU race condition in unix_conf_op
The unix_conf_op function reads the size of the sysctl entry array
twice. gcc thinks that it can lead to a time-of-check to time-of-use
(TOCTOU) race condition if the array size changes between the two reads.

Fixes #2398

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Alexander Mikhalitsyn
457bc6a8ff criu: use proper format-specified to accommodate time_t 64-bit change
See also:
https://wiki.debian.org/ReleaseGoals/64bit-time

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2024-09-11 16:02:11 -07:00
Arnav Bhatt
95f66d13db criu: move sigact dump/restore code into sigact.c
Seperate sigact dump/restore code from cr-restore.c and parasite-syscall.c into sigact.c

Signed-off-by: Arnav Bhatt <arnav@ghativega.in>
2024-09-11 16:02:11 -07:00
Adrian Reber
9c8a6927aa ci: update check for SELinux
The rawhide tests runs in a container. Containers always have SELinux
disabled from the inside. Somehow /sys/fs/selinux is now mounted. We
used the existence of that directory if SELinux is available. This seems
to be no longer true.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
b3c3422cd9 test/make: remove unused target
A fault-injection test was introduced in commit [1] and later removed in
commit [2]. This patch removes the obsolete Makefile target.

[1] b95407e264
    test: check, that parasite can rollback itself (v2)

[2] 2cb4532e26
    tests: remove zdtm.sh (v2)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
30aa8dbe4d mount: fix unbounded write
Replace sprintf() with snprintf() and specify maximum length of
characters to avoid potential overflow.

Reported-by: GitHub CodeQL (https://codeql.github.com/)
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Juntong Deng
708f872a6d sk-tcp: Add test cases for TCP_CORK and TCP_NODELAY socket options
Currently there are no socket option test cases for TCP_CORK and
TCP_NODELAY, this commit adds related test cases.

The socket option test cases for TCP_KEEPCNT, TCP_KEEPIDLE, and
TCP_KEEPINTVL already exist in socket-tcp_keepalive.c, so they are
not included in this test case.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
2024-09-11 16:02:11 -07:00
Juntong Deng
9ba9aff77f sk-tcp: Move TCP socket options from SkOptsEntry to TcpOptsEntry
Currently some TCP socket option information is stored in SkOptsEntry,
which is a little confusing.

SkOptsEntry should only contain socket options that are common to
all sockets.

In this commit move the TCP-specific socket options from SkOptsEntry
to TcpOptsEntry.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
2024-09-11 16:02:11 -07:00
Juntong Deng
1cb75c0b1e sk-tcp: Move TCP socket options from TcpStreamEntry to TcpOptsEntry
Currently some of the TCP socket option information is stored in the
TcpStreamEntry, but the information in the TcpStreamEntry is only
restored after the TCP socket has established connection, which
results in these TCP socket options not being restored for
unconnected TCP sockets.

In this commit move the TCP socket options from TcpStreamEntry to
TcpOptsEntry and add dump_tcp_opts() and restore_tcp_opts() for TCP
socket options dump and restore.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
2024-09-11 16:02:11 -07:00
Kir Kolyshkin
13854a988c criu: fix a fatal failure if nft doesn't work
On some systems, nft binary might not be installed, or some kernel
options might be unconfigured, resulting in something like this:

	sudo unshare -n nft create table inet CRIU
	Error: Could not process rule: Operation not supported
	create table inet CRIU
	^^^^^^^^^^^^^^^^^^^^^^^

This is similar to what kerndat_has_nftables_concat() does, and if the
outcome is the same, it returns an error to kerndat_init(), and an error
from kerndat_init() is considered fatal.

Let's relax the check, returning mere "feature not working" instead of
a fatal error.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
df178c7e53 sk-tcp: cleanup dump_tcp_conn_state error handling
1) In dump_tcp_conn_state, if return from libsoccr_save is >=0, we check
that sizeof(struct libsoccr_sk_data) returned from libsoccr_save is
equal to sizeof(struct libsoccr_sk_data) we see in dump_tcp_conn_state
(probably to check if we use the right library version). And if sizes
are different we go to err_r, which just returns ret, which can
teoretically be 0 (if size in library is zero) and that would lead
dump_one_tcp treat this as success though it is obvious error.

2) In case of dump_opt or open_image fails we don't explicitly set ret
and rely that sizeof(struct libsoccr_sk_data) previously set to ret is
not 0, I don't really like it, it makes reading code too complex.

3) We have a lot of err_* labels which do exactly the same thing, there
is no point in having all of them, also it is better to choose the name
of the label based on what it really does.

So let's refactor error handling to avoid these inconsistencies.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
4607b53566 mem: optimize debug logging of enqueued pages
During restore, CRIU prints "Enqueue page-read" messages for
each page-read request [1]. However, this message does not
provide useful information, increases performance overhead
during restore and the size of log file.

$ ./zdtm.py run -t zdtm/static/maps06 -f h -k always
$ grep 'Enqueue page-read' dump/zdtm/static/maps06/56/1/restore.log | wc -l
20493

This commit replaces these log messages with a single message
that shows the number of enqueued page-read requests.

$ grep 'enqueued' dump/zdtm/static/maps06/56/1/restore.log
(00.061449)     56: nr_enqueued:   20493

[1] 91388fc

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
f4290868bb ci/vdso01: fix typo
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
e68a06cfd1 ci: update actions/checkout to v4
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
5aaf450213 ci: update base OS to ubuntu 22.04
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Andrei Vagin
1c2a3d7faa check: verify ino and dev of overlayfs files in /proc/pid/maps
Check that the file device and inode shown in /proc/pid/maps match
values returned by stat(2).

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Kir Kolyshkin
e07ffa04b0 Makefile.config: fix/improve feature warnings.
1. Tell which RPMs or DEBs are required in all cases.

2. Use $(info ...) everywhere.

3. Drop extra nested $(info), instead use (a document) a simpler kludge.

4. Simplify and unify the language, add missing periods.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
af4058871e timer: fix wrapping allignment in function declaration
Currently we have tabs + spaces on the wrapped line but the wrapped part
is not alligned to the opening bracket.

Fixes: bbe26d1b7 ("timer: fix allignment in function definition")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
0fc83a79b1 ci: silence CircleCI warning about deprecated image
CircleCI currently prints out the following warning:

   This job is using a deprecated image 'ubuntu-2004:202010-01', please update to a newer image

According to https://discuss.circleci.com/t/linux-image-deprecations-and-eol-for-2024/
the recommended image name is: "image: default"

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
ccccrrrr
52623cca16 criu: move timers dump/restore code into separate file
Fixes: #335

Signed-off-by: ccccrrrr <zcr1006@gmail.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
231ba0cd29 zdtm/sched_policy00: use reset-on-fork flag
This patch extends the sched_policy00 test case to verify that
the SCHED_RESET_ON_FORK flag is restored correctly.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
75fed59ef6 Add support for reset-on-fork scheduling flag
This patch extends CRIU with support for SCHED_RESET_ON_FORK.
When the SCHED_RESET_ON_FORK flag is set, the following rules
apply for subsequently created children:

- If the calling thread has a scheduling policy of SCHED_FIFO or
SCHED_RR, the policy is reset to SCHED_OTHER in child processes.

- If the calling process has a negative nice value, the nice value
is reset to zero in child processes.

(See 'man 7 sched')

Fixes: #2359

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Artem Trushkin
8f0e200e66 mem: fix some VMAs being incorrectly mapped wtih PROT_WRITE
A memory interval is a half-open interval, so the condition
when pr->pe->vaddr == vma->e->end should not be interpreted
as an intersection and should cause vma to be marked with VMA_NO_PROT_WRITE.

Fixes: #2364

Signed-off-by: Artem Trushkin <at.120@ya.ru>
2024-09-11 16:02:11 -07:00
Adrian Reber
a2b018a188 ci: try to fix broken docker test
Upgrade to 22.04 base image and use the existing version of docker.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
a48aa33eaa restorer: shstk: implement shadow stack restore
The restore of a task with shadow stack enabled adds these steps:

* switch from the default shadow stack to a temporary shadow stack
  allocated in the premmaped area
* unmap CRIU mappings; nothing changed here, but it's important that
  CRIU mappings can be removed only after switching to a temporary
  shadow stack
* create shadow stack VMA with map_shadow_stack()
* restore shadow stack contents with wrss
* switch to "real" shadow stack
* lock shadow stack features

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
7dd5830023 restore: add infrastructure to enable shadow stack
There are several gotachs when restoring a task with shadow stack:
* depending on the compiler options, glibc version and glibc tunables
  CRIU can run with or without shadow stack.
* shadow stack VMAs are special, they must be created using a dedicated
  map_shadow_stack() system call and can be modified only by a special
  instruction (wrss) that is only available when shadow stack is
  enabled.
* once shadow stack is enabled, it is not writable even with wrss;
  writes to shadow stack can be only enabled with ptrace() and only when
  shadow stack is enabled in the tracee.
* if the shadow stack is enabled during restore rather than by glibc,
  calling retq after arch_prctl() that enables the shadow stack causes
  #CP, so the function that enables shadow stack can never return.

Add the infrastructure required to cope with all of those:

* modify the restore code to allow trampoline (arch_shstk_trampoline)
  that will enable shadow stack and call restore_task_with_children().
* add call to arch_shstk_unlock() right after the tasks are clone()ed;
  this will allow unlocking shadow stack features and making shadow
  stack writable.
* add stubs for architectures that do not support shadow stacks
* add implementation of arch_shstk_trampoline() and arch_shstk_unlock()
  for x86, but keep it disabled; it will be enabled along with addtion
  of the code that will restore shadow stack in the restorer blob

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
f47899c9ef criu: kerndat: add kdat_has_shstk()
Detect if CRIU runs with shadow stack enabled and store the result in
kerndat.

Unlike most kerndat knobs, kdat_has_shstk() does not check for
availability of the shadow stack in the kernel, but rather checks if
criu runs with shadow stack enabled.

This depends on hardware availabilty, kernel and glibc support, compiler
options and glibc tunables, so kdat_has_shstk() must be called every
time CRIU starts and its result cannot be cached.

The result will be used by the code that controls shadow stack
enablement in the next commit.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
2ebd1a4f0b criu: shstk: prepare shadow stack parameters for restorer blob
Shadow stacks must be populated using special WRSS instruction. This
instruction is only available when shadow stack is enabled, calling it
with disabled shadow stack causes #UD.

Moreover, shadow stack VMAs cannot be mremap()ed and they must be
created using map_shadow_stack() system call. This requires delaying the
restore of shadow stacks to restorer blob after the CRIU mappings are
cleared.

Introduce rst_shstk_info structure to hold shadow stack parameters
required in the restorer blob and populate this structure in
arch_prepare_shstk() method.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
4b6dda7ec0 criu: shstk: premap and prepopulate shadow stack VMAs
Shadow stack VMAs cannot be mmap()ed, they must be created using
map_shadow_stack() system call and populated using special wrss
instruction available only when shadow stack is enabled.

Premap them to reserve virtual address space and populate it to have
there contents available for later copying after enabling shadow stack.

Along with the space required by shadow stack VMAs also reserve an extra
page that will be later used as a temporary shadow stack.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
17eda3ce57 criu: shstk: add VMA_AREA_SHSTK flag
The shadow stack VMAs require special care because they can only be
created and populated using special system calls.

Add VMA_AREA_SHSTK flag and set it for VMAs that are marked as "ss" in
/proc/pid/smaps

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
0aba3dcfa1 compel: shstk: prepare shadow stack signal frame
When calling sigreturn with CET enabled, the kernel verifies that the
shadow stack has proper address of sa_restorer and a "restore token".
Normally, they pushed to the shadow stack when signal processing is
started.

Since compel calls sigreturn directly, the shadow stack should be
updated to match the kernel expectations for sigreturn invocation.

Add parasite_setup_shstk() that sets up the shadow stack with the
address of __export_parasite_head_start as sa_restorer and with the
required restore token.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
63a45e1c8a compel: infect: prepare parasite_service() for addition of CET support
To support sigreturn with CET enabled parasite must rewind its stack
before calling sigreturn so that shadow stack will be compatible with
actual calling sequence.

In addition, calling sigreturn from top level routine
(__export_parasite_head_start) will significantly simplify the shadow
stack manipulations required to execute sigreturn.

For x86 make fini_sigreturn() return the stack pointer for the signal
frame that will be used by sigreturn and propagate that return value up
to __export_parasite_head_start.

In non-daemon mode parasite_trap_cmd() returns non-positive value
which allows to distinguish daemon and non-daemon mode and properly stop
at int3 in non-daemon mode.

Architectures other than x86 remain unchanged and will still call
sigreturn from fini_sigreturn().

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
6e491a19a3 compel: shstk: save CET state when CPU supports it
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
17f4dd0959 compel: always pass user_fpregs_struct_t to compel_get_task_regs()
All architectures create on-stack structure for floating point save area
in compel_get_task_regs() if the caller passes NULL rather than a valid
pointer.

The only place that calls compel_get_task_regs() with NULL for floating
point save area is parasite_start_daemon() and it is simpler to define
this strucuture on stack of parasite_start_daemon().

The availability of floating point save data is required in
parasite_start_daemon() to detect shadow stack presence early during
parasite infection and will be used in later patches.

Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Mike Rapoport (IBM)
0b8c51eaad compiler: add ALIGN_DOWN macro
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-09-11 16:02:11 -07:00
Stepan Pieshkin
f590c2b638 zdtm/static: check that cgroup layout of threads is preserved
Co-developed-by: Stepan Pieshkin <stepanpieshkin@google.com>
Signed-off-by: Stepan Pieshkin <stepanpieshkin@google.com>
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Stepan Pieshkin
a0a6ec3dc0 cgroup: Add support for restoring a thread in a correct v1 cgroup
Currently we have checkpoint/restore support only of cgroup v2 threaded
controllers. Threads originating in cgroup v1 environments will be
restored to the main thread's cgroup. This change extends the support
for a cgroups v1.

Signed-off-by: Stepan Pieshkin <stepanpieshkin@google.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
835afb1b88 criu-ns: fix lint error
This patch fixes the following lint error:
scripts/criu-ns:219:16: E713 [*] Test for membership should be `not in`

The change in this patch is auto-generated with `ruff --fix`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
e0b74f558b make: replace flake8 with ruff
Ruff (https://github.com/astral-sh/ruff) is a Python linter
written in Rust, designed to replace Flake8. It is significantly
faster and actively maintained.

In addition to replacing flake8 with ruff, this patch also
creates separate makefile targets for ruff, shellcheck and
codespell, so that they can be tested independently.

RUFF_FLAGS can be used to specify options such as '--fix'.
Example:
	make lint
	make ruff RUFF_FLAGS=--fix

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
7fd4a15e68 pb2dict: fix flake8 error
This patch fixes the following flake8 error:
python3 -m flake8 --config=scripts/flake8.cfg lib/pycriu/images/pb2dict.py
lib/pycriu/images/pb2dict.py:361:43: E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
e0f91e66ee kerndat: check support for PAGE_IS_SOFT_DIRTY
The commit introducing PAGE_IS_SOFT_DIRTY has not been merged
in kernel v6.7.x.

fs/proc/task_mmu: report SOFT_DIRTY bits through the PAGEMAP_SCAN ioctl
e6a9a2cbc1

As a result, CRIU fails with the following error:

Error (criu/pagemap-cache.c:199): pagemap-cache: PAGEMAP_SCAN: Invalid argument'
Error (criu/pagemap-cache.c:225): pagemap-cache: Failed to fill cache for 63 (400000-402000)'

This patch updates check_pagemap() in kerndat to check if PAGE_IS_SOFT_DIRTY is supported.
Fixes: #2334

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
a808f09bea amdgpu_plugin: fix lint errors
$ make lint
 ...
 # Do not append \n to pr_perror, pr_pwarn or fail
 ! git --no-pager grep -E '^\s*\<(pr_perror|pr_pwarn|fail)\>.*\\n"'
 plugins/amdgpu/amdgpu_plugin.c:		pr_perror("%s(), Can't handle VMAs of input device\n", __func__);

 ! git --no-pager grep -En '^\s*\<pr_(err|warn|msg|info|debug)\>.*);$' | grep -v '\\n'
 plugins/amdgpu/amdgpu_plugin_drm.c:45:		pr_err("Error in getting stat for: %s", path);
 plugins/amdgpu/amdgpu_plugin_util.c:77:		pr_err("Unable to read file (read:%ld buf_len:%ld)", len_read, buf_len);
 plugins/amdgpu/amdgpu_plugin_util.c:89:		pr_err("Unable to write file (wrote:%ld buf_len:%ld)", len_write, buf_len);
 plugins/amdgpu/amdgpu_plugin_util.c:120:		pr_err("%s: Failed to open for %s", path, write ? "write" : "read");
 plugins/amdgpu/amdgpu_plugin_util.c:126:		pr_err("%s: Failed get pointer for %s", path, write ? "write" : "read");
 plugins/amdgpu/amdgpu_plugin_util.c:136:		pr_err("%s:Failed to access file size", path);
 plugins/amdgpu/amdgpu_plugin_util.c:152:		pr_err("Cannot fopen %s", file_path);

 make: *** [Makefile:470: lint] Error 1

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
bd17bd43e3 sk-inet: fix codding style in restore_ip_opts
Commit [1] introduced codding-style breackage, let's fix it.

Fixes: 66cab1f49 ("sk-inet: Added IP_TTL socket option") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
rahulk789
895a16c13c zdtm: Added tests for IP_TTL restore
Signed-off-by: rahulk789 <rahul.u.india@gmail.com>
2024-09-11 16:02:11 -07:00
rahulk789
71102e7f73 sk-inet: Added IP_TTL socket option
Signed-off-by: rahulk789 <rahul.u.india@gmail.com>
2024-09-11 16:02:11 -07:00
Ramesh Errabolu
0d5923c95e amdgpu_plugin: Refactor code used to implement Checkpoint
Refactor code used to Checkpoint DRM devices. Code is moved
into amdgpu_plugin_drm.c file which hosts various methods to
checkpoint and restore a workload.

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
2024-09-11 16:02:11 -07:00
Ramesh Errabolu
733ef96315 amdgpu_plugin: Refactor code in preparation to support C&R for DRM devices
Add a new compilation unit to host symbols and methods that will be
needed to C&R DRM devices. Refactor code that indicates support for
C&R and checkpoints KFD and DRM devices

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
b689a6710c plugin/amdgpu: Also don't print 'plugin failed' in criu
We already don't treat it as error in the plugin itself, but after
returning -1 from RESUME_DEVICES_LATE hook we print debug message in
criu about failed plugin, let's return 0 instead.

While on it let's replace ret to exit_code.

Fixes: a9cbdad76 ("plugin/amdgpu: Don't print error for "No such process" during resume")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
David Francis
59599dacdd plugin/amdgpu: Don't print error for "No such process" during resume
During the late stages of restore, each process being resumed gets
an ioctl call to KFD_CRIU_OP_RESUME. If the process has no kfd
process info, this call with fail with -ESRCH. This is normal
behaviour, so we shouldn't print an error message for it.

Signed-off-by: David Francis <David.Francis@amd.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
92e8f9293b net: return bool with iptable_has_criu_jump_target
To improve readability, this patch changes the return type of
iptables_has_criu_jump_target() to a boolean, where 'true' indicates
that iptables has CRIU jump target and 'false' indicates otherwise.

Suggested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
a62f827302 criu-log: remove unused declaration
This patch removes a leftover declaration for log_closedir()
which has been removed in the following commit:

dc80d6f125
log: get rid of LOG_DIR_FD_OFF and opening cwd in log_init()

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Pavel Tikhomirov
d2511707fa zdtm: socket-tcp-nft-nfconntrack: add a hook to the chain in nft case
Let's use hooked nft chain which actually affects packets.

Fixes: e5f4d8c6f ("test/nfconntrack: use nft or iptables-legacy")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
afc0efcf78 pagemap-cache: add an ability to run tests without PAGEMAP_SCAN
This change adds a new injectable fault (135) to disable PAGEMAP_SCAN and fault
back to read pagemap files.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
cb64d73ada page-cache: use the PAGEMAP_SCAN ioctl when it is available
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
20628bc8a1 kerndat: check the PAGEMAP_SCAN ioctl
PAGEMAP_SCAN is a new ioctl that allows to get page attributes in a more
effeciant way than reading pagemap files.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
842289c7eb net: add error messages for restore of nftables
Show appropriate error messages when restore of nftables fails.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
d94251df75 test/nfconntrack: use nft or iptables-legacy
nft does not support xtables compat expressions
https://git.netfilter.org/nftables/commit/?id=79195a8cc9e9d9cf2d17165bf07ac4cc9d55539f

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
0ab2f9e976 net: fix network unlock with iptables-nft
When iptables-nft is used as backend for iptables, the rules for
network locking are translated into the following nft rules:

```
$ iptables-restore-translate -f lock.txt
add table ip filter
add chain ip filter CRIU
insert rule ip filter INPUT counter jump CRIU
insert rule ip filter OUTPUT counter jump CRIU
add rule ip filter CRIU mark 0xc114 counter accept
add rule ip filter CRIU counter drop
```

These rules create the following chains:

```
table ip filter { # handle 1
	chain CRIU { # handle 1
		meta mark 0x0000c114 counter packets 16 bytes 890 accept # handle 6
		counter packets 1 bytes 60 drop # handle 7
		meta mark 0x0000c114 counter packets 0 bytes 0 accept # handle 8
		counter packets 0 bytes 0 drop # handle 9
	}

	chain INPUT { # handle 2
		type filter hook input priority filter; policy accept;
		counter packets 8 bytes 445 jump CRIU # handle 3
		counter packets 0 bytes 0 jump CRIU # handle 10
	}

	chain OUTPUT { # handle 4
		type filter hook output priority filter; policy accept;
		counter packets 9 bytes 505 jump CRIU # handle 5
		counter packets 0 bytes 0 jump CRIU # handle 11
	}
}
```

In order to delete the CRIU chain, we need to first delete all four
jump targets. Otherwise, `-X CRIU` would fail with the following error:

iptables-restore v1.8.10 (nf_tables):
line 5: CHAIN_DEL failed (Resource busy): chain CRIU

Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
robert
d9c427d70c irmap: hardcode some more interesting paths
Signed-off-by: robert <robert.ayrapetyan@gmail.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
b419f3dfdc make: fix compilation on alpine
Starting with the musl v1.2.4~69, _GNU_SOURCE doesn't set _LARGEFILE64_SOURCE.

Fixes #2313
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
7b689b7a42 gitignore: remove historical left-over files
In commit [1] was introduced a mechanism to auto-generate the files:
sys-exec-tbl*.c, syscalls*.S, syscall-codes*.h, and syscall*.h.
This commit also updated the gitignore rules to ignore auto-generated
files. However, after commit [2], the path for these files has changed
and the patterns specified in gitignore are no longer needed.

[1] bbc2f133 (x86/build: generate syscalls-{64,32}.built-in.o)
[2] 19fadee9 (compel: plugins,std -- Implement syscalls in std plugin)

Reported-by: @felicitia
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Adrian Reber
2d1f4ec719 ci: disable non-root in user namespace test in container
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
fe8f5130c2 ci: fix centos-stream 9 ci errors
The image has a too old version of nettle which does not work with gnutls.
Just upgrade to the latest to make the error go away.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
6679d60ffd ci: do not use 'tail' for skip-file-rwx-check test
Newer versions of 'tail' rely on inotify and after a restore 'tail' is
unhappy with the state of inotify and just stops.

This replaces 'tail' with a minimal shell based test (thanks Andrei).

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Alexander Mikhalitsyn
f86f1b8491 tty: skip ioctl(TIOCSLCKTRMIOS) if possible
If ioctl(TIOCSLCKTRMIOS) fails with EPERM it means that a CRIU
process lacks of CAP_SYS_ADMIN capability. But we can use
ioctl(TIOCGLCKTRMIOS) to *read* current ->termios_locked
value from the kernel and if it's the same as we already have
we can skip failing ioctl(TIOCSLCKTRMIOS) safely.

Adrian has recently posted [1] a very good patch to allow ioctl(TIOCSLCKTRMIOS)
for processes that have CAP_CHECKPOINT_RESTORE (right now it requires CAP_SYS_ADMIN).

[1] https://lore.kernel.org/all/20231206134340.7093-1-areber@redhat.com/

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2024-09-11 16:02:11 -07:00
Ivan A. Melnikov
8a51639e3d Makefile: Use common warnings settings for loongarch64
WARNINGS variable should be amended, not redefined.
We still need, e.g.,  `-Wno-dangling-pointer` to build
criu on loongarch64 with gcc13.

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
37d62fa475 docker-test: downgrade docker to v24.0.7
Checkpoint/restore with version 25.0.0-beta.1 fails
with the following error:

$ docker start --checkpoint=c1 cr
Error response from daemon: failed to create task for container: content digest fdb1054b00a8c07f08574ce52198c5501d1f552b6a5fb46105c688c70a9acb45: not found: unknown

Release notes:
https://github.com/moby/moby/discussions/46816

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Radostin Stoyanov
1004625fac docker-test: fix condition for max tries
Replace a recursive call with a loop.

Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2024-09-11 16:02:11 -07:00
Adrian Reber
088390ea89 ci: switch to permissive selinux mode during test
Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
900909d95e test: check for btrfs in the current directory
The old test was checking if '/' is btrfs but we should check if the
current directory is btrfs.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
Adrian Reber
fc94b2d158 ci: fix rawhide netlink error
The rawhide netlink errors are fixed with a newer kernel than the
default 6.2 available in Fedora 38.

Signed-off-by: Adrian Reber <areber@redhat.com>
2024-09-11 16:02:11 -07:00
sally kang
9f9737c800 comple: correct the syscall number of bind on ARM64
In the compel/arch/arm/plugins/std/syscalls/syscall.def, the syscall number of bind on ARM64 should be 200 instead of 235

Signed-off-by: Sally Kang <snapekang@gmail.com>
2024-09-11 16:02:11 -07:00
Andrei Vagin
f8b14286b0 criu: Version 3.19 (Bronze Peacock)
Two major highlights of this release:
* LoongArch64 support
* A lot of fixes and improvments form the Google backlog.

The full changelog can be found here: https://criu.org/Download/criu/3.19.

This marks the final release of the 3.x series. The upcoming version
will be 4.0! Additionally, the naming pattern will be changed. Any ideas
are welcome.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-11-27 16:47:16 -08:00
Marcus Folkesson
f104b3d6d7 Makefile: introduce ARCHCFLAGS for arch specific cflags
Do not use $(USERCFLAGS) for anything other than what the user provide.

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
2023-11-27 16:47:16 -08:00
Radostin Stoyanov
0b62f4267a lib: use separate packages for pycriu and crit
Newer versions of pip use an isolated virtual environment when building
Python projects. However, when the source code of CRIT is copied into
the isolated environment, the symlink for `../lib/py` (pycriu) becomes
invalid. As a workaround, we used the `--no-build-isolation` option for
`pip install`. However, this functionality has issues in some versions
of PIP [1, 2]. To fix this problem, this patch adds separate packages
for pycriu and crit, and each package is installed independently.

[1] https://github.com/pypa/pip/pull/8221
[2] https://github.com/pypa/pip/issues/8165#issuecomment-625401463

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-11-27 16:47:16 -08:00
Michał Mirosław
97b8b659c9 zdtm: cgroup_ifpriomap: Improve skip check's robustness.
cgroup_ifpriomap test needs net_prio cgroup, which might not be
available. Make the .checkskip script check it.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-11-27 16:47:16 -08:00
Andrei Vagin
e076c11e22 ci: fix codespell errors
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-11-27 16:47:16 -08:00
Michal Clapinski
41938f14b6 zdtm/static: test the offset migration of ELF files
Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-10-22 13:29:25 -07:00
Michal Clapinski
29026496d4 zdtm/lib: add missing signal.h header
Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-10-22 13:29:25 -07:00
Michal Clapinski
a77185daee files-reg: don't change the file pos in get_build_id
At this point the correct position is already restored, so reading from
the fd results in the position being moved forward by 5 bytes.

Fixes: 9191f8728d ("criu/files-reg.c: add build-id validation functionality")
Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-10-22 13:29:25 -07:00
Taemin Ha
5d4179ddb1 criu/proc_parse: refactor the eventpoll parser
Eventpollentry's fields are set only when ret == 3 or ret == 6. The
remaining cases can be grouped together to an error

Signed-off-by: Taemin Ha <taemin.ha@utexas.edu>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Taemin Ha
cfaacfb582 zdtm/thread_different_uid_gid: remove the redundant check
line 131 checks if (ret >= 0). line 133 could be replaced by a simple else statement

Signed-off-by: Taemin Ha <taeminha@cs.utexas.edu>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Taemin Ha
36a84751ed zdtm/cow00: fix typo
The condition meant to check fd2 instead of fd1, which is checked in
line 24.

Signed-off-by: Taemin Ha <taeminha@cs.utexas.edu>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Taemin Ha
48d6a59a22 arch/x86: remove the redundant check
The is_native field is a boolean. Therefore, else if() should can be
changed to a simple else{}.

Signed-off-by: Taemin Ha <taeminha@cs.utexas.edu>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Taemin Ha
45670b6556 apparmor: remove the redundant check
This check is redundant as line 201 checks for this condition.

Signed-off-by: Taemin Ha <taeminha@cs.utexas.edu>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
5e8d7dc94b tun: don't parse buffers that have not been filled with data
read_ns_sys_file() can return an error, but we are trying to parse a
buffer before checking a return code.

CID 417395 (#3 of 3): String not null terminated (STRING_NULL)
2. string_null: Passing unterminated string buf to strtol, which expects
   a null-terminated string.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
4c9d23d33d util: allow to run criu under strace
fork_and_ptrace_attach has to fork a child with CLONE_UNTRACED,
so that strace doesn't trace it.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Dmitry Safonov
c9fdd355f6 pie: Mark __export_*() functions as externally_visible
GCC's lto source:
> To avoid this problem the compiler must assume that it sees the
> whole program when doing link-time optimization.  Strictly
> speaking, the whole program is rarely visible even at link-time.
> Standard system libraries are usually linked dynamically or not
> provided with the link-time information.  In GCC, the whole
> program option (@option{-fwhole-program}) asserts that every
> function and variable defined in the current compilation
> unit is static, except for function @code{main} (note: at
> link time, the current unit is the union of all objects compiled
> with LTO).  Since some functions and variables need to
> be referenced externally, for example by another DSO or from an
> assembler file, GCC also provides the function and variable
> attribute @code{externally_visible} which can be used to disable
> the effect of @option{-fwhole-program} on a specific symbol.

As far as I read gcc's source, ipa_comdats() will avoid placing symbols
that are either already in a user-defined section or have
externally_visible attribute into new optimized gcc sections.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Pavel Tikhomirov
5d6c8bc584 clang-format: disable column limit constraint
The "ColumnLimit: 120" is not only allowing lines to be longer than 80
characters but it also forces line wrapping at 120 characters. If total
expression length is more than 120 characters, clang-format will try to
wrap it as close to 120 as it can, it would not even allow to wrap at 80
characters if we really want it. But as we all know 80 characters is
Linux kernel coding style default and as far as our coding style is
based on it it is really strange to prohibit wrapping lines at 80
characters...

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
81a30c3206 zdtm/memfd04: check execveat on memfd that has memory mappings
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
e86e8dac0d memfd: don't reopen file descriptors for memory mappings
One memfd can be shared by a few restored files. Only of these files is
restored with a file created with memfd_open. Others are restored by reopening
memfd files via /proc/self/fd/.

It seems unnecessary for restoring memfd memory mappings. We can always use the
origin file.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
28e854d662 amdgpu: fix clang warnings
amdgpu_plugin.c:930:6: error: variable 'buffer' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
        if (ret) {
            ^~~
amdgpu_plugin.c:988:8: note: uninitialized use occurs here
        xfree(buffer);

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
ba168ab78c ci: enable build with amdgpu plugin
This patch adds the `libdrm-dev` package to the list of CRIU
dependencies installed in CI to build CRIU with amdgpu plugin.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Andrei Vagin
aa38a59899 amdgpu: print an error if the dup syscall fails
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
940a05c0ba amdgpu: don't leak fd on an error path in open_img_file
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
a68975c06d plugins: the UPDATE_VMA_MAP callback returns fd with the full control
It means CRIU has to close it when it is not needed.

It looks more logically correct and matches the behaviour of
the RESTORE_EXT_FILE callback.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Michal Clapinski
66f39adf12 criu: change the comment about magic numbers
Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-10-22 13:29:25 -07:00
Vladislav Khmelevsky
9b4e8292cd vma: Add !VVAR condition to vma_entry_can_be_lazy
Currently most of the times we don't have problems with VVAR segment and
lazy restore because when VDSO is parked there is an munmap call that
calls UFFDIO_UNREGISTER on the destination address.
But we don't want to enable userfaultfd for VDSO and VVAR at the first
place.

Signed-off-by: Vladislav Khmelevsky <och95@yandex.ru>
2023-10-22 13:29:25 -07:00
Vladislav Khmelevsky
28adebefb7 Return page size as unsigned long
Currently page_size() returns unsigned int value that is after "bitwise
not" is promoted to unsigned long value e.g. in uffd.c
handle_page_fault. Since the value is unsigned promotion is done with 0
MSB that results in lost of MSB pagefault address bits. So make
page_size to return  unsigned long to avoid such situation.

Signed-off-by: Vladislav Khmelevsky <och95@yandex.ru>
2023-10-22 13:29:25 -07:00
Andrei Vagin
00f8a56b6e zdtm: check userns once
All test logs are flooded with the "userns is supported" messages...

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
ba48ceb575 zdtm: socket_udp_shutdown: Make the test fail instead of timing out.
When -- after restore -- sockets can't communicate, the test times out
while waiting on recvfrom(). Since the communication is local, send()
works instantaneously - so mark sockets with SOCK_NONBLOCK and report
failure if the message is not received immediately.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
f6e820bedb zdtm: Treat ESRCH from kill() as success.
This fixes a failure to clean up after a failed test, where CRIU didn't start properly.

```
===================== Run zdtm/transition/socket-tcp in h ======================
Start test
./socket-tcp --pidfile=socket-tcp.pid --outfile=socket-tcp.out
Traceback (most recent call last):
  File ".../zdtm_py.py", line 1906, in do_run_test
    cr(cr_api, t, opts)
  File ".../zdtm_py.py", line 1584, in cr
    cr_api.dump("dump")
  File ".../zdtm_py.py", line 1386, in dump
    self.__dump_process = self.__criu_act(action,
  File ".../zdtm_py.py", line 1224, in __criu_act
    raise test_fail_exc("CRIU %s" % action)
test_fail_exc: CRIU dump

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<embedded module '_launcher'>", line 182, in run_filename_from_loader_as_main
  File "<embedded module '_launcher'>", line 34, in _run_code_in_main
  File ".../zdtm_py.py", line 2790, in <module>
    fork_zdtm()
  File ".../zdtm_py.py", line 2782, in fork_zdtm
    do_run_test(tinfo[0], tinfo[1], tinfo[2], tinfo[3])
  File ".../zdtm_py.py", line 1922, in do_run_test
    t.kill()
  File ".../zdtm_py.py", line 509, in kill
    os.kill(int(self.__pid), sig)
ProcessLookupError: [Errno 3] No such process
```

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
c29c5a1df0 zdtm: cgroup04: Improve skip check's robustness.
cgroup04 test needs full control over mem and devices cgroup hierarchies.
Make the test's .checkskip script better at detecting if the cgroups are
available for use.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
131e464a05 zdtm: cgroup04: Improve error messages.
Make the errno values reported by cgroup04 always correct and showing
relevant parameters.
Constify constant strings, while at it.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
785d97a619 zdtm: If ignoring kernel taint, also ignore taint changes.
At least in Google's VM environment, the kernel taints are unrelated to CRIU
runs.  Don't fail tests if taints change, if kernel taints are ignored.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
5e544dc449 ci: stop testing ubuntu overlayfs
They break it with each kernel rebase. More details are here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857257

Last time, it was fixed a few month ago and it has been broken again in
5.15.0-1046-azure.

Let's bind-mount the CRIU directory into a test container to make it
independent of a container file system.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
7a2910f897 py/cli: add --version option
This patch implements the '--version' for the crit tool.

$ crit --version
3.17

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
61d9cf6f90 crit/setup.py: use __version__ from pycriu
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
8adefc90d2 lib/pycriu: generate version.py
The version of CRIU is specified in the Makefile.versions file.
This patch generates '__varion__' value for the pycriu module.
This value can be used by crit to implement `--version`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Younes Manton
59fcfa80d8 compel: Add support for ppc64le scv syscalls
Power ISA 3.0 added a new syscall instruction. Kernel 5.9 added
corresponding support.

Add CRIU support to recognize the new instruction and kernel ABI changes
to properly dump and restore threads executing in syscalls. Without this
change threads executing in syscalls using the scv instruction will not
be restored to re-execute the syscall, they will be restored to execute
the following instruction and will return unexpected error codes
(ERESTARTSYS, etc) to user code.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-10-22 13:29:25 -07:00
David Francis
d06c9b5cda criu/plugin: Add environment variable to cap size of buffers.
The amdgpu plugin would create a memory buffer at the size
of the largest VRAM bo (buffer object). On some systems, VRAM
size exceeds RAM size, so the largest bo might be larger than
the available memory.

Add an environment variable KFD_MAX_BUFFER_SIZE, which caps the
size of this buffer. By default, it is set to 0, and has no
effect. When active, any bo larger than its value will be
saved to/restored from file in multiple passes.

Signed-off-by: David Francis <David.Francis@amd.com>
2023-10-22 13:29:25 -07:00
Michal Clapinski
8caf460b9c zdtm: test MEMBARRIER_CMD_GLOBAL_EXPEDITED migration
Check membarrier registration both ways:
1. By issuing membarrier commands and checking if they succeed.
2. By issuing MEMBARRIER_CMD_GET_REGISTRATIONS.

The first way is needed for older kernels. The second way is needed to test
MEMBARRIER_CMD_GLOBAL_EXPEDITED.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-10-22 13:29:25 -07:00
Michal Clapinski
2ba8727822 dump: use MEMBARRIER_CMD_GET_REGISTRATIONS when available
MEMBARRIER_CMD_GET_REGISTRATIONS can tell us whether or not the process used
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED unlike the old probing method.

Falls back to the old method when MEMBARRIER_CMD_GET_REGISTRATIONS is
unavailable.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
32241b00de vagrant: run tests with fedora 38
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
862cb5c1cb vagrant: update to version 2.3.7
This patch also updated the download URL format

from
    https://releases.hashicorp.com/vagrant/2.3.7/vagrant_2.3.7_x86_64.deb

to
    https://releases.hashicorp.com/vagrant/2.3.7/vagrant_2.3.7-1_amd64.deb

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Pavel Tikhomirov
d2a0d1fa64 lint: don't fail workflow on indent fail
There are multiple cases where good human readable code block is
converted to an unreadable mess by clang-format, so we don't want to
rely on clang-format completely. Also there is no way, as far as I can
see, to make clang-format only fix what we want it to fix without
breaking something.

So let's just display hints inline where clang-format is unhappy. When
reviewer sees such a warning it's a good sign that something is broken
in coding-style around this warning.

We add special script which parses diff generated by indent and
generates warning for each hunk.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-10-22 13:29:25 -07:00
Pavel Tikhomirov
08f286ed96 CONTRIBUTING.md: improve coding-style related sections
This is highlight that code readability is the real goal of all the
coding-style rules. We should not do coding-style just for coding-style,
e.g. when clang-format suggests crazy formating we should not follow it
if we feel it is bad.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-10-22 13:29:25 -07:00
Pavel Tikhomirov
38bf7f42e5 CONTRIBUTING.md: don't mention ctags
Ctags is mentioned in the beginning of the "Edit the source code" which
is really confusing: Do you need ctags to edit CRIU code? - No. It is
just one helpful tool to browse the code, and we do not want to enforce
it. So, what is it doing in contribution guide? People who really need
it should be able to find it in Makefile or just write oneliner of their
own to collect tags...

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
b7a8bb0889 zdtm: test execveat(memfd)
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
c9b2633ca3 memfd: return original memfd fd for execveat()
If there is only a single RW opened fd for a memfd, it can be used
to pass it to execveat() with AT_EMPTY_PATH to have its contents
executed.  This currently works only for the original fd from
memfd_create().  For now we ignore processes that reopen the memfd's
rw and expect a particular executability trait of it.  (Note: for
security purposes recent kernels have SEAL_EXEC to make memfds
non-executable.)

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
4805654370 kerndat: check_pagemap: close(fd) on error path
Plug a fd leak when returning error from check_pagemap().
(Cosmetic, as the process will exit soon anyway.)

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
d5284313f5 kerndat: Make errors from clone3() check more precise.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
badf8060c6 proc_parse: Log smaps entry while dumping VMA.
Help debugging problems with restoring custom VMAs.

From: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
1800018bc1 test/other: add test for action-script
This commit is introducing a test for the action-script functionality
of CRIU to verify that pre-dump, post-dump, pre-restore, pre-resume,
post-restore, post-resume hooks are executed during dump/restore.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Michał Mirosław
ea05b06ac2 proc_parse: remove trivial goto from vma_get_mapfile_user()
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
f7d7dc9c08 compel/infect: include the relevant pid in "no-breakpoints restore" debug message
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
b56a9cef32 kerndat: Make pagemap check more robust against swapped out pages.
Fix test of whether the kernel exposes page frame numbers to cope with the
possibility that the top of the stack is swapped out, which was happening
in about one 1 out of 3 million runs.  This lead to a later failure when
trying to read the PFN of the zero page, after which criu would exit with
no error message.

Original-From: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
86ad52bc2d ci/loongarch64: compile tests before running zdtm.py
Otherwise tests fail by timeout.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
0085f992cb memfd: don't set fd attributes not needed for vma mapping
There is only one user of memfd_open() outside of memfd.c: open_filemap().
It is restoring a file-backed mapping and doesn't need nor expect to
update F_SETOWN nor the fd's position.  Check the inherited_fd() handling
in the callers to simplify the code.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
8c17535f3f loongarch64: fix syscall_64.tbl
The 288d6a61e2 change broke all the syscall numbers.

Reported-by: Michał Mirosław <emmir@google.com>
Fixes: (288d6a61e2 "loongarch64: reformat syscall_64.tbl for 8-wide tabs")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
6ea60d6ef7 github: auto-remove changes requested and awaiting reply labels
Labels are removed when new comments are posted.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
620e8c0a61 Put a cap on the size of single preadv in restore operation.
While each preadv() is followed by a fallocate() that removes the data
range from image files on tmpfs, temporarily (between preadv() and
fallocate()) the same data is in two places; this increases the memory
overhead of restore operation by the size of a single preadv.
Uncapped preadv() would read up to 2 GiB of data, thus we limit that to
a smaller block size (128 MiB).

Based-on-work-by: Paweł Stradomski <pstradomski@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
2547ac8ac1 zdtm: membarrier: test migration of membarrier() registration
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
e07155e194 dump+restore: Implement membarrier() registration c/r.
Note: Silently drops MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED as it's
not currently detectable. This is still better than silently dropping
all membarrier() registrations.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
5b790aa181 loongarch64: reformat syscall_64.tbl for 8-wide tabs
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
d1096e3b31 lib/py: add VMA_AREA_MEMFD constant
The VMA_AREA_MEMFD constant was introduced with commit

29a1a88bce
memfd: add memory mapping support

This patch extends the status map used in CRIT and coredump with the
value of this constant to recognize it.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Andrei Vagin
d3b955e578 ci/docker: install all required packages
This change fixes the issue:
```
The following packages have unmet dependencies:
 docker-ce : Depends: containerd.io (>= 1.6.4)
E: Unable to correct problems, you have held broken packages.
```

Signed-off-by: Andrei Vagin <avagin@google.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
9477354def scripts/apt: don't hide apt output
It is required to investigate issues.

Signed-off-by: Andrei Vagin <avagin@google.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
a4b49c46fe amdgpu_plugin: remove duplicated log prefix
The log prefix "amdgpu_plugin:" is defined with `LOG_PREFIX` in
`amdgpu_plugin.c`.  However, the prefix is also included in each
log message. As a result it appears duplicated in the log messages:

(00.044324) amdgpu_plugin: amdgpu_plugin: devices:1 bos:58 objects:148 priv_data:45696
(00.045376) amdgpu_plugin: amdgpu_plugin: Thread[0x5589] started
(00.167172) amdgpu_plugin: amdgpu_plugin: img_path = amdgpu-kfd-62.img
(00.083739) amdgpu_plugin: amdgpu_plugin : amdgpu_plugin_dump_file() called for fd = 235

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Michał Mirosław
69200bec76 irmap: scan user-provided paths in order
Make the scan use the order of paths that came from the user.

Fixes: 4f2e4ab3be ("irmap: add --irmap-scan-path option"; 2015-09-16)
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
2a131167bb page-xfer: Pull tcp_cork,nodelay().
Move tcp_cork() and tcp_nodelay() to the only user: page-xfer.c. While
at it, fix error messages (as they do not refer to restoring the sockopt
values) and demote them as they are not fatal to the page transfer.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Haorong Lu
6ed50ea49d apparmor: fix incorrect usage of sizeof on char ptr
In criu/apparmor.c: write_aa_policy(), the arg path is passed as a char
pointer. The original code used sizeof(path) to get the size of it,
which is incorrect as it always return the size of the char pointer
(typically 8 or 4), not the actual capacity of the char array.

Given that this function is only invoked with path declared as `char
path[PATH_MAX]`, replacing sizeof(path) with PATH_MAX should correctly
represent the maximum size of it.

Fixes: 8723e3f ("check: add a feature test for apparmor_stacking")

Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
3628589b51 zdtm/memfd00: test memfd file mode
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
dfa5410951 memfd: dump and restore permissions.
memfd is created by default with +x permissions set. This can be changed
by a process using fchmod() and expected to prevent using this fd for
exec(). Migrate the permissions.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
ac1219f4ee sk-inet: Extend 'TCP repair off' failure log.
Include the file descriptor and error code in the debug message to make
it more useful.

Fixes: e7ba90955c (2016-03-14 "cr-check: Inspect errno on syscall failures")
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
25d0330809 restore: Skip dropping BSET capability if irrelevant.
prctl(NO_NEW_PRIVS) when set prevents child processes gaining
capabilities not in permitted set. In this case, inability to
clear capability from BSET that is not in the permitted set is
harmless.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
fe4be19de4 prctl: test prctl(NO_NEW_PRIVS) setting
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
a605cc9f36 prctl: Migrate prctl(NO_NEW_PRIVS) setting.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
7df3f65957 restore: Fix capability migration requirements between different kernels.
When restoring on a kernel that has different number of supported
capabilities than checkpoint one, check that the extra caps are unset.

There are two directions to consider:

1) dump.cap_last_cap > restore.cap_last_cap
	- restoring might reduce the processes' capabilities if restored
	  kernel doesn't support checkpointed caps. Warn.

2) dump.cap_last_cap < restore.cap_last_cap
	- restoring will fill the extra caps with zeroes. No changes.

Note: `last_cap` might change without affecting `n_words`.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
7ab02639f6 restore: Skip setgroups() when already correct.
Skip calling setgroups() when the list of auxiliary groups already has
the values we want.  This allows restoring into an unprivileged user
namespace where setgroups() is disabled.

From: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
4998b724ef restore: Avoid need for CAP_SETPCAP if not changing uids.
When CRIU is run with the task's credentials on restore, don't set uids
and gids. This avoids the need to modify the SECURE_NO_SETUID_FIXUP flag
which requires CAP_SETPCAP.

From: Andy Tucker <agtucker@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
99188cfbe3 tty: Avoid EPERM for no-op chown().
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
113957270b memfd: Avoid EPERM for no-op chown().
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
96fa42b79d cgroup: Replace restore_perms() with cr_fchperm().
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
b074f92f99 files-reg: Avoid EPERM in ghost_apply_metadata() for no-op changes.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
f985d9f44b sk-unix: Avoid restore_file_perms() EPERM error for no-op changes.
Note: This removes the difference in calling convention of
restore_file_perms() returning -errno that was the only call that did
this in the caller.

From: Radosław Burny <rburny@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
21560270dd util: Implement fchown() and fchmod() wrappers.
Add generic wrappers for fchown() and fchmod() that skip the calls if
no changes are needed. This will allow to unify places where we can
avoid errors when no-op requests are not permitted.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
znley
e25a243b28 ci: add workflow for loongarch64
Signed-off-by: znley <shanjiantao@loongson.cn>
2023-10-22 13:29:25 -07:00
znley
788e1e92ef zdtm: add loongarch64 support
Signed-off-by: znley <shanjiantao@loongson.cn>
2023-10-22 13:29:25 -07:00
znley
ae08114757 criu: add loongarch64 support to parasite and restorer
Signed-off-by: znley <shanjiantao@loongson.cn>
2023-10-22 13:29:25 -07:00
znley
ec6dc2d5c0 images: add loongarch64 core image
Signed-off-by: znley <shanjiantao@loongson.cn>
2023-10-22 13:29:25 -07:00
znley
c9df09eeab compel: add loongarch64 support
Signed-off-by: znley <shanjiantao@loongson.cn>
2023-10-22 13:29:25 -07:00
znley
b304106e6b include: add common header files for loongarch64
Signed-off-by: znley <shanjiantao@loongson.cn>
2023-10-22 13:29:25 -07:00
Prajwal S N
8a24d4872e ci: add workflow to ensure self-contained commits
Signed-off-by: Prajwal S N <prajwalnadig21@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
23313080aa kerndat: don't leak a socket file descriptor
kerndat_has_ipv6_freebind creates a socket but doesn't close it.

Signed-off-by: Andrei Vagin <avagin@google.com>
2023-10-22 13:29:25 -07:00
znley
b2d74fbfd4 zdtm: replace NR_fstat with NR_statx
NR_fstat is a deprecated syscall, some
modern architectures such as riscv and
loongarch64 no longer support this syscall.
It is usually replaced by NR_statx.

NR_statx is supported since linux 4.10.

Signed-off-by: znley <shanjiantao@loongson.cn>
2023-10-22 13:29:25 -07:00
Yan Evzman
8ee35bebb5 kerndat: bind ipv6-socket only if ipv6 is enabled
Fixes: #2222
Fixes: f1c8d38 ("kerndat: check if setsockopt IPV6_FREEBIND is supported")
Signed-off-by: Yan Evzman <yevzman@gmail.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
6b8107cd1b irmap: Reduce error log severity to warning.
These errors originate from the filesystem scanning in irmap.c and are mostly
benign. Nevertheless, if they do result in a failed irmap lookup, that failed
lookup is more interesting from an application perspective.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
4eb6cc3190 mount: Demote fsnotify logs for ignored failures.
Make logs about inaccessible mounts warnings, as the failures are
normally harmless (e.g. failure to read /dev/cgroup) and don't
make the CRIU run fail. (If it happens that the fsnotify can't
find a file, then to debug, full CRIU logs will be necessary anyway.)

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
3e428a1de7 log: Remove error logs for ignored or otherwise logged subprocess exits.
Errors in early restore.log for status=1 from a subprocess are confusing,
esp. that they don't show what command failed. Since the result is
either ignored or logged anyway, mark the calls as "can fail".

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
804c0ba820 soccr: Log name of socket queue that failed to restore.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
4018b78778 soccr: Log offset when failed to restore socket's queued data.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
1cb7916524 sk-unix: Log both peer names when failing on an external stream unix socket.
Make debugging dump failures resulting in "sk unix: Can't dump half
of stream unix connection" errors easier.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
5a723937a2 compel: Log the status word with "Task is still running" errors.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
664598dc74 files-reg: Debug "open file on overmounted mount" error.
Log the mount and file that were the cause of failing a dump.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
a5fe99d2c5 cgroup: Propagate error on cgroup mount failure.
This makes the error to mount cgroup hierarchy a bit less noisy:

Error (criu/cgroup.c:623): cg: Unable to mount cgroup2 : Invalid argument'

Instead of

Error (criu/cgroup.c:623): cg: Unable to mount cgroup2 : Invalid argument'
Error (criu/cgroup.c:715): cg: failed walking /proc/self/fd/-1/zdtmtst for empty cgroups: No such file or directory'

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
056cac474e zdtm: drop python 2 compatibility
This patch removes the code for Python 2 compatibility introduced
with commit e65c7b5 (zdtm: Replace imp module with importlib).

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
002f2372a9 lib/py: drop python 2 compatibility
This patch removes code introduced for compatibility with
Python 2 in commits:

  bf80fee (lib: correctly handle stdin/stdout (Python 3))

  b82f222 (lib: fix crit-recode fix for Python 2)

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
d388b91de2 test/others: drop setup_swrk() py2 compatibility
This patch removes the code introduced for compatibility
with Python 2 in commits:

4c1ee3e227
test/other: Resolve Py3 compatibility issues

6b615ca152
test/others: Reuse setup_swrk()

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
ee9983d4a9 test/criu-ns: drop python 2 compatibility
This patch is replacing the set_blocking() function with
os.set_blocking(). This function was introduced for compatibility with
Python 2 in commit 8094df8di (criu-ns: Add tests for criu-ns script).

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
ede018176c make: remove checks for python 2 binary
This commit removes the checks for the Python 2 binary in the makefile
and makes sure that ZDTM tests always use python3. Since support for
Python 2 has been dropped, these checks are no longer needed.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
642fd99bfd remove python-future dependency
This commit removes the dependency on the __future__ module, which was
used to enable Python 3 features in Python 2 code. With support for
Python 2 being dropped, it is no longer necessary to maintain backward
compatibility.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
376f3d1800 crit: add requirements.txt for pip>=20.1
When building with pip version 20.0.2 or older, the pip install
command creates a temporary directory and copies all files from
./crit. This results in the following error message:

    ModuleNotFoundError: No module named 'pycriu'

This error appears because the symlink 'pycriu' uses a relative path
that becomes invalid '../lib/py/'.

The '--no-build-isolation' option for pip install is needed to enable
the use of pre-installed dependencies (e.g., protobuf) during build.

The '--ignore-installed' option for pip is needed to avoid an error when
crit is already installed. For example, crit is installed in the GitHub
CI environment as part of the criu OBS package as a dependency for
podman.

Distributions such as Arch Linux have adopted an externally managed
python installation in compliance with PEP 668 [1] that prevents pip
from breaking the system by either installing packages to the system or
locally in the home folder. The '--break-system-packages' [2] option
allows pip to modify an externally managed Python installation.

[1] https://peps.python.org/pep-0668/
[2] https://pip.pypa.io/en/stable/cli/pip_uninstall/

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
f5d06571c5 crit: drop python 2 support
This patch reverts changes introduced with the following commits:

4feb07020d
crit: enable python2 or python3 based crit

b78c4e071a
test: fix crit test and extend it

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
7825f4ebfa coredump: drop python 2 support
This patch reverts changes introduced for Python 2 compatibility
in commits:

  1c866db (Add new files for running criu-coredump via python 2 or 3)

  3180d35 (Add support for python3 in criu-coredump).

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
05f0535de4 ci: clean up CentOS 7 related tweaks
We have disabled CentOS 7 tests in CI. This patch reverts the
changes introduced in the following commit:

24bc083653
ci: disable some tests on CentOS 7

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
362d8fa5c2 ci: disable CentOS 7 test in Cirrus CI
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Abhishek Guleri
94c9e47872 readme: refactor asciinema link for video playback
Instead of opening the image directly, the commit refactors the
asciinema image embedded link to redirect users to the corresponding
video.

Signed-off-by: Abhishek Guleri <abhishekguleri24@gmail.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
7bda5e656c zdtm: Update netns purpose comment in zdtm_ct.
With the parasite socket clash now guaranteed not to happen,
the comment becomes obsolete. netns is steel needed though, so
update the comment to point at the requirement.

Change-Id: I3cfb253cd5c53b91b955fcb001530b4aee5129f4
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
9ad59f58ff util: Make CRIU run_id machine-level unique.
Instead of relying on chance of CLOCK_MONOTONIC reading being unique,
use pid namespace ID that combined with the process ID will make it
unique on the machine level.

If pidns is not enabled on a kernel we'll get ENOENT, but then CRIU's
pid will already be unique. If there is some other error, log it but
continue, as the socket clash (if it happens) will result in a failed
run anyway.

Fixes: 45e048d77a (2022-03-31 "criu: generate unique socket names")
Fixes: 408a7d82d6 (2022-02-12 "util: add an unique ID of the current criu run")
Change-Id: I111c006e1b5b1db8932232684c976a84f4256e49
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
eecc53d05a kerndat: Don't fail on NETLINK/nsid support missing.
If not dumping netns nor connections, nsid support is not used. Don't
fail the run as if the support is needed, the dumping process will fail
later.

Change-Id: I39a086756f6d520c73bb6b21eaf6d9fb49a18879
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
bf8446ae5f kerndat: unexport kerndat_nsid()
kerndat_nsid() is not used outside kerndat.c. Make it static.

Change-Id: I52e518ecb7c627cc1866e373411b2be3f71a2c9d
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
f2011e1c76 util: Downgrade ignored errors to warnings.
If the error is ignored it is not important enough - make it a warning
instead.

From: Mian Luo <mianl@google.com>
Change-Id: If2641c3d4e0a4d57fdf04e4570c49be55f526535
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
2aa9cb9333 rpc: Support setting images_dir by path.
Google's RPC client process is in a different pidns and has more privileges --
CRIU can't open its /proc/<pid>/fd/<fd>.  For images_dir_fd to be useful here
it would need to refer to a passed or CRIU's fd.

From: Michał Cłapiński <mclapinski@google.com>
Change-Id: Icbfb5af6844b21939a15f6fbb5b02264c12341b1
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
439b522433 rpc: Support gathering external file list after freezing process tree.
New 'query-ext-files' action for `criu dump` is sent after
freezing the process tree. This allows to defer gathering
the external file list when the process tree is in a stable
state and avoids race with the process creating and deleting
files.

Change-Id: Iae32149dc3992dea086f513ada52cf6863beaa1f
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
82a0db036e docker/podman: test c/r with action-script
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
28d0056388 action-scripts: allow shell scripts in rpc mode
Container runtimes commonly use CRIU with RPC. However, this prevents
the use of action-scripts set in a CRIU configuration file due to the
explicit scripts mode introduced with the following commit:

  ac78f13bdf
  actions: Introduce explicit scripts mode

This patch enables container checkpoint/restore with action-scripts
specified via configuration file.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Michał Mirosław
3e0a8ffd6d build: libnfnetlink: Remove nla_get_s32().
nla_get_s32() was added to libnl 3.2.7 in 2015. Remove CRIU's definition
as it breaks build when statically linking the binary.

From: Uros Prestor <urosp@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
f043cb22af build: Guard against libbsd's version of __aligned.
When trying to build CRIU with libbsd enabled the compilation fails due
to duplicate definition of __aligned macro. Other such definitions are
already wrapped with #ifndef make __aligned definition consistent and
make it easier in the future to use the libbsd features if needed.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
c5142104a2 build: Use make-provided AR for building libzdtmtst.
Make $(AR) used also for libzdtmtst build.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
88f8fdda82 build: Fix LIBS vs LDFLAGS order.
$LDFLAGS can contain `-Ldir`s that are required by '-lib's in $LIBS.
Reverse the order so that `-L` options make effect.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
cbbe6c683d build: Debug system feature tests.
`make` without `-s` option will normally show the commands executed. In
the case of detecting build environment features current makefile will
cause detected features to be seen as 'echo #define' commands, but not
detected ones will be silent. Change it so that all tried features can
be seen (outside of make's silent mode) regardless of detection result.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
1a1fa439c7 build: Remove HAS_MEMFD test
The test for HAS_MEMFD is empty and noit used. Remove it.

Fixes: 5ee1ac1f28 ("criu: remove FEATURE_TEST_MEMFD")
Change-Id: I43b8f0cfd50ce9bdf93dafb647377318df1deae8
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
2bf10c8d27 restore: remove unused secbits field.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
e5c9bc4d08 kerndat: Make socket feature probing work on IPv6-only host.
Try IPv6 if IPv4 sockets are not supported.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
eea0d6edee pipes: Plug pipe fd leak in "Unable to set a pipe size" error case.
From: Piotr Figiel <figiel@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
ffa1e47fd8 sockets: Increase the size of sockets hashmap to 16K.
During dump, CRIU stores the structs representing sockets in a statically sized
hashmap of size 32. We have some (admittedly crazy) tasks that use tens of
thousands of sockets, and seem to spend most of the dump time iterating over
the linked lists of the map.

16K is chosen arbitrarily, so that it reduces the lengths of the chains to few
elements on average, while not introducing significant memory overhead.

From: Radosław Burny <rburny@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
304a309aed test/thp_disable: fix lint
The fail() macro provides a new line character at the end of the
message. This patch fixes the following lint check that currently
fails in CI:

$ git --no-pager grep -E '^\s*\<(pr_perror|fail)\>.*\\n"'
test/zdtm/static/thp_disable.c:		fail("prctl(GET_THP_DISABLE) returned unexpected value: %d != 1\n", ret);
test/zdtm/static/thp_disable.c:		fail("Flags changed %lx -> %lx\n", orig_flags, new_flags);
test/zdtm/static/thp_disable.c:		fail("Madvs changed %lx -> %lx\n", orig_madv, new_madv);
test/zdtm/static/thp_disable.c:		fail("post-migration prctl(GET_THP_DISABLE) returned unexpected value: %d != 1\n", ret);
test/zdtm/static/thp_disable.c:		fail("Flags changed %lx -> %lx\n", orig_flags, new_flags);
test/zdtm/static/thp_disable.c:		fail("Madvs changed %lx -> %lx\n", orig_madv, new_madv);

Fixes: #2193

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Michał Mirosław
c97cc6a6ce Allow skipping iptables/nftables invocation.
Make it possible to skip network lock to enable uses that break connections
anyway to work without iptables/nftables being present.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
d0ac547b3d zdtm: sock_opts00: Improve error messages.
Make it clear that the option numbers are indexes not the option
identifiers ("names"). Also show the value change that prompted test
failure.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
6bc00fcb84 zdtm: Implement test sharding.
Allow to split test suite into predictable sets to parallelize runs on
multiple machines or VMs.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
1fb5c410c8 zdtm: Allow --keep-going for single test.
We don't want test framework to change its behaviour on whether we
run a single or multiple tests in a run. When we shard the test suite
it can result in some shards having a single test to run and
unexpectedly change the test output format.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
ed88e3241c zdtm: Add timeouts for test commands.
Extend ability to limit time taken to all CRIU invocations.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
516fade932 zdtm: Allow overriding /tmp.
Use $TMPDIR for tests_root as the host's /tmp might not have enough
features or space.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Cłapiński
4b764a9dce Allow passing --log_to_stderr via RPC.
Signed-off-by: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Cłapiński
1e5ebec39d Allow passing --display_stats via RPC.
Signed-off-by: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
0e88a91ff0 Allow passing --leave_stopped by RPC.
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Haorong Lu
4455444eeb compel/test: Return 0 in case of error in fdspy
This commit revises the error handling in the fdspy test. Previously,
a failure case could have been incorrectly reported as successful because
of a specific check `pass != 0`, leading to potential false positives
when `check_pipe_ends()` returned `-1` due to a read/write pipe error.

To improve this, we've adjusted the error handling to return `0` in case
of any error. As such, the final success condition remains unchanged. This
approach will help accurately differentiate between successful and failed
cases, ensuring the output "All OK" is printed for success, and "Something
went WRONG" for any failure.

Fixes: 5364ca3 ("compel/test: Fix warn_unused_result")

Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
4c1409b8f6 Fill FPU init state if it's not provided by kernel.
Apparently Skylake uses init-optimization when saving FPU state, and ptrace()
returns XSTATE_BV[0] = 0 meaning FPU was not used by a task (in init state).
Since CRIU restore uses sigreturn to restore registers, FPU state is always
restored. Fill the state with default values on dump to make restore happy.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
93ad8d40de zdtm: thp_disable: Verify MADV_NOHUGEPAGE before migration
Add a sanity check for THP_DISABLE. This discovered a broken commit
in Google's kernel tree.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
01238d2706 zdtm: thp_disable: Verify prctl(THP_DISABLE) migration
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
a0c78a7902 zdtm: thp_disable: Output a single failure message
While at it, don't carry over stale errno to the fail() message.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
2364c963c6 Log if prctl(SET_THP_DISABLE) doesn't work as expected.
If prctl(SET_THP_DISABLE) is not used due to bad semantics, log it
for easier debugging.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
21ce76263b Restore THP_DISABLE prctl.
The original commit added saving THP_DISABLED flag value, but missed
restoring it. There is restoring code, but used only when --lazy_pages
mode is enabled. Restore the prctl flag always. While at it, rename the
`has_thp_enabled` -> `!thp_disabled` for consistency.

Fixes: bbbd597b41 (2017-06-28 "mem: add dump state of THP_DISABLED prctl")
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
9943dcde17 Fix mount(cgroup2) for older kernels.
Linux 4.15 doesn't like empty string for cgroup2 mount options.
Pass NULL then to satisfy the kernel check. Log the options for
easier debugging.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Michał Mirosław
0218b1e8f2 Fix dumping hugetlb-based memfd on kernels < 4.16.
4.15-based kernels don't allow F_*SEAL for memfds created with MFD_HUGETLB.
Since seals are not possible in this case, fake F_GETSEALS result as if it
was queried for a non-sealing-enabled memfd.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-10-22 13:29:25 -07:00
Valeriy Vdovin
a638043a7f cgroup/restore: split prepare_task_cgroup code into two separate functions
This does cgroup namespace creation separately from joining task
cgroups. This makes the code more logical, because creating cgroup
namespace also involves joining cgroups but these cgroups can be
different to task's cgroups as they are cgroup namespace roots
(cgns_prefix), and mixing all of them together may lead to
misunderstanding.

Another positive thing is that we consolidate !item->parent checks in
one place in restore_task_with_children.

Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
2ac15e3ad0 action-scripts: Add pre-stream hook
This hook allows to start image streamer process from an action script.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Pavel Tikhomirov
c6ac396aa3 timers: improve and fix posix timer id sequence checks
This is a patch proposed by Thomas here:
https://lore.kernel.org/all/87ilczc7d9.ffs@tglx/

It removes (created id > desired id) "sanity" check and adds proper
checking that ids start at zero and increment by one each time when we
create/delete a posix timer.

First purpose of it is to fix infinite looping in create_posix_timers on
old pre 3.11 kernels.

Second purpose is to allow kernel interface of creating posix timers
with desired id change from iterating with predictable next id to just
setting next id directly. And at the same time removing predictable next
id so that criu with this patch would not get to infinite loop in
create_posix_timers if this happens.

Thanks a lot to Thomas!

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-10-22 13:29:25 -07:00
Dhanuka Warusadura
fc08fa9077 criu-ns: Update shebang line to python
CentOS 7 CI environment uses Python 2. To execute criu-ns
script in CentOS 7 changing the current shebang line to
python is required.

This reverse the changes made in a15a63fce0

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
2023-10-22 13:29:25 -07:00
Dhanuka Warusadura
9cd09f5860 criu-ns: Install Python pathlib module in CentOS 7
These changes fix the `ImportError: No module named pathlib`
error when executing criu-ns tests located at criu/test/others/criu-ns

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
2023-10-22 13:29:25 -07:00
Dhanuka Warusadura
9c9e8ea3f2 criu-ns: Add tests for criu-ns script
These changes add test implementations for criu-ns script.

Fixes: #1909

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
2023-10-22 13:29:25 -07:00
Dhanuka Warusadura
e4b6fb2d1f criu-ns: Add support for older Python version in CI
These changes remove and update the changes introduced in
7177938e60 in favor of the
Python version in CI.

os.waitstatus_to_exitcode() function appeared in Python 3.9

Related to: #1909

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
2023-10-22 13:29:25 -07:00
Dhanuka Warusadura
733f165512 criu-ns: Add --criu-binary argument to run_criu()
--criu-binary argument provides a way to supply the CRIU binary
location to run_criu().

Related to: #1909

Signed-off-by: Dhanuka Warusadura <csx@tuta.io>
2023-10-22 13:29:25 -07:00
Adrian Reber
36709536e5 lib/c: add empty_ns interfaces to libcriu
crun wants to set empty_ns and this interface is missing from the
library. This adds it to libcriu.

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-10-22 13:29:25 -07:00
Radostin Stoyanov
b665dce3c7 docs: rename amdgpu_plugin.txt to criu-amdgpu-plugin.txt
By default, the file name 'amdgpu_plugin.txt' is used also as the name
for the corresponding man page (`man amdgpu_plugin`). However, when
this man page is installed system-wide it would be more appropriate
to have a prefix 'criu-' (e.g., `man criu-amdgpu-plugin`).

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-10-22 13:29:25 -07:00
Pavel Tikhomirov
cc607f8103 criu-ns: make --pidfile option show pid in caller pidns
Using the fact that we know criu_pid and criu is a parent of restored
process we can create pidfile with pid on caller pidns level.

We need to move mount namespace creation to child so that criu-ns can
see caller pidns proc.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-10-22 13:29:25 -07:00
Adrian Reber
50e17a1cf3 scripts: make newer versions of shellcheck happy
Signed-off-by: Adrian Reber <areber@redhat.com>
2023-10-22 13:29:25 -07:00
Adrian Reber
df7b897a22 ci: fix new codespell errors
Signed-off-by: Adrian Reber <areber@redhat.com>
2023-10-22 13:29:25 -07:00
Adrian Reber
727d796505 compel: support XSAVE on newer Intel CPUs
Newer Intel CPUs (Sapphire Rapids) have a much larger xsave area than
before. Looking at older CPUs I see 2440 bytes.

    # cpuid -1 -l 0xd -s 0
    ...
        bytes required by XSAVE/XRSTOR area     = 0x00000988 (2440)

On newer CPUs (Sapphire Rapids) it grows to 11008 bytes.

    # cpuid -1 -l 0xd -s 0
    ...
        bytes required by XSAVE/XRSTOR area     = 0x00002b00 (11008)

This increase the xsave area from one page to four pages.

Without this patch the fpu03 test fails, with this patch it works again.

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-10-22 13:29:25 -07:00
hdzhoujie
fa6af25e75 dump: increase fcntl call failure judgment
The pipe_size type is unsigned int, when the fcntl call fails and
return -1, it will cause a negative rollover problem.

Signed-off-by: zhoujie <zhoujie133@huawei.com>
2023-10-22 13:29:25 -07:00
Suraj Shirvankar
1c0f8787b2 zdtm: Add tests for ip tos restore
Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
2023-10-22 13:29:25 -07:00
Suraj Shirvankar
04cdbd6106 sk-inet: Add IP TOS socket option
The TOS(type of service) field in the ip header allows you specify the
priority of the socket data.

Signed-off-by: Suraj Shirvankar <surajshirvankar@gmail.com>
2023-10-22 13:29:25 -07:00
Andrei Vagin
4c1a2ac41b criu: Version 3.18 (Silver Sandpiper)
The highlight feature of this release is the ability to use CRIU for
non-root users. Adrian Reber implemented the kernel part and created the
initial version of CRIU changes. Then Younes Manton joined the effort
and pushed it to the finish line.

The full change log is here: https://criu.org/Download/criu/3.18

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-22 20:47:09 -07:00
Pavel Tikhomirov
b689bcc354 cr-check: remove excess kerndat_has_nspid from check_ns_pid
We do kerndat_has_nspid in kerndat_init already and save result to
kerndat cache, we don't need to recheck it each time.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
9b3496043d log: fix timestamp logging when tv_sec>=100
Previously when tv_sec>=100, the line would look like this:
(269.189615 Error [...]
Now the last char is overwritten with ')'.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
94ac9ee3cc proc_parse: fix while condition in parse_pid_status
In parse_pid_status there are 13 places where we do done++, so when
"done" is 13 it means that we have matched each of those 13 places and
we are ready to stop. In next lines we are not going to find anything.

So the right condition for the while loop is (done < 13).

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
hdzhoujie
45e4a6b274 netlink: fix netlink fd flags dump/restore failed
During the restore process, netlink fd uses the flags in the
NetlinkSkEntry structure to restore the file state, but during
the dump process, the flags values is not saved to the structure.

Signed-off-by: zhoujie <zhoujie133@huawei.com>
Signed-off-by: hejingxian <hejingxian@huawei.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
6c728df1dc zdtm: modify rseq01 to include a thread
Testing only the thread group leader is not enough and can hide bugs.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
f8da250bb3 cr-dump: properly apply rseq fixup for all threads
Previously fixup was done before threads' registers were dumped so it
didn't actually work. This commit splits rseq fixup into thread leader
fixup and other threads fixup and applies them after the entities are
seized.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
78c4e2c0f7 cr-dump: move rseq functions before dump_task_thread
Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
85e46c44d6 dump: extend parasite_thread_ctl lifetime to dump_task_thread
Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
9683097f27 zdtm: don't ignore rseq_cs mismatch in rseq01 test
Kernel shouldn't clean up rseq_cs inside a critical section.
If rseq_cs has been cleaned up, it means there is a bug in migration.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
0c52399322 ci: cancel preceding workflows run
This patch adds concurrency groups to the CI workflows to automatically
cancel any in-progress workflows when a pull request has been updated.
A `concurrency` group allows to ensure that a single job or workflow
will run at a time. For example, when a pull request is updated with
a force-push, the GiHub CI workflows currently in-progress will be
automatically cancelled, and the CI would run only with the updated
commits.

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
5c8cdceec2 sk-unix: rework unix_resolve_name
- use exit_code instead of returning ret
- replace -errno return with -1
- move fallback to if (!kdat.sk_unix_file)
- fix readlinkat error checking (ret < 0 && ret >= PATH_MAX) by using
  read_fd_link helper

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
de39bd2bd1 sk-unix: simplify error handling in unix_resolve_name_old
As we now don't have any calls to free in this function we can replace
all lables with explicit returns.

While on it: Replace useless -errno and 1 returns with -1 as from the
very first implementation of unix_resolve_name (it changed name to _old
later) in [1] any non-zero return was treated as error.

6d785e6cd ("unix: resolve a socket file when a socket descriptor is
available") [1]

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
d93409cf11 sk-unix: remove bogus xfree from unix_resolve_name_old
It is strange to free a pointer which is already in unix_sk_desc, either
on error path or on skip as we leave freed pointer in desc and it can
probably be used after free later and lead to some corruption. So I
would prefer not to free it as we don't have full controll over it here.

Fixes: 6d785e6cd ("unix: resolve a socket file when a socket descriptor is available")

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Yuriy Vasiliev
ccc790d540 zdtm/lib: fix cwd path freeing
Fix cwd freeing on error path in get_cwd_check_perm and
on non-error-path in unix_fill_sock_name.

v2: use cleanup_free attribute in unix_fill_sock_name

Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Cyrill Gorcunov
8e6fa9c3b9 net: Add net log prefix
For better logging.

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
7f0f07599a crit: fix compatibility with Python 3.12
Python 3.12 includes a few breaking changes, such as the removal of the
distutils module [1] and the deprecation of `setup.py install` in
favour of pip install [2]. This patch updates the installation script
for crit to reflect these changes by replacing the use of
`setup.py install` with `pip install` and `distutils` with
`setuptools`. In addition, a minimal pyproject.toml file has
been added as it is required by the new version of pip [3].

It is worth noting that with this change we are switching from the egg
packaging format to wheel [4] and add pip as a build dependency.

[1] https://www.python.org/downloads/release/python-3120a2/
[2] https://github.com/pypa/setuptools/pull/2824
[3] https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/
[4] https://packaging.python.org/en/latest/discussions/wheel-vs-egg/

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
bd0f209c2b pstree: improve id intersection detection in prepare_pstree_for_shell_job
First, let's move lookup_create_item-s to the end so that on pgid
replacement we don't have false positive pstree_pid_by_virt check
founding item created by sid replacement. (note: we need those
lookup_create_item-s for the sake of free pid selection mechanism)

Second, let's add checks for sid/pgid in images intersecting with
current_sid/pgid, as this would also bring problems on restore.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
66fd45d51d sk-unix: add some missed error printing
In Virtuozzo tests we have seen uninformative errors:

(26.575039) 151187 fdinfo 6: pos: 0 flags: 2/0
(26.575076) sockets: Searching for socket 0x346d1 family 1
(666.230281 ----------------------------------------
(666.230586 Error (criu/cr-dump.c:1850): Dump files (pid: 151187) failed
with -1

So let's add some error messages to this stack.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
a0158e6927 zdtm: add MNTNS_ZDTM macro to fix initialization
With this macro we can easily declare struct mntns_zdtm variables with
all lists properly initiallized. Let's use it in mount_complex_sharing
as without it we can have segfault on error path when accessing
uninitialized list pointers.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
12423abdb5 mount: allow bindmounts for external fuse mounts
Currently we only allow external fuse mount itself, let's allow
bindmount for it too. Other mount code is ready for this change and will
be able to bindmount it from corresponding external mount.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
65407616e0 ci/archlinux: initialize machine ID
When installing packages within Archlinux container, pacman fails with
the following errors:

(3/7) Creating temporary files...
/usr/lib/tmpfiles.d/journal-nocow.conf:26: Failed to replace specifiers in '/var/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:23: Failed to replace specifiers in '/run/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:25: Failed to replace specifiers in '/run/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:26: Failed to replace specifiers in '/run/log/journal/%m/*.journal*': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:29: Failed to replace specifiers in '/var/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:30: Failed to replace specifiers in '/var/log/journal/%m/system.journal': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:32: Failed to replace specifiers in '/var/log/journal/%m': No such file or directory
/usr/lib/tmpfiles.d/systemd.conf:33: Failed to replace specifiers in '/var/log/journal/%m/system.journal': No such file or directory

To solve this problem we need to initialize the machine ID.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
KKrypt
34e2b02219 Optimized shell code with <'s (instead of cat + |)
This patch optimizes shell code as reading a single file as input using a 'cat' command to a program.

It is considered to be a Useless Use of Cat (UUOC).

It's more efficient to simply use redirection.

However, in some cases, even using the redirection operator '<' seems unnecessary.

Signed-off-by: KKrypt <sankalpacharya1211@gmail.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
7cae16e971 mount: do collect_mntinfo of external mount namespace with no for_dump
When we collect external mount namespace we don't want to dump mounts in
it, so lets remove this flag. This way we can e.g. use for_dump in
->parse() callbacks to separate in-container mounts from others.

This only affects rare case of `--ext-mount-map auto` but to be
absolutely correct let's fix it anyway.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
69befdde18 cgroup-v2: make new field cg_set optional
The new field cg_set is currently marked as required which causes backward
compatibility problem when using newer CRIU version to restore dumped image
from older version. This commit makes this field optional and reworks the
logic to fallback to use cg_set from task_core when it is not in
thread_core.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
529f298913 cgroup-v2: make new field is_threaded optional
The new field is_threaded is currently marked as required which causes
backward compatibility problem when using newer CRIU version to restore
dumped image from older version. This commit makes this field optional and
reworks the logic the skip fixing up threaded cgroup controllers if there
is no information in dumped image.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
6e681afb69 net: fail restore if nftables isn't supported but image is present
Fixes: e1c4871 ("net: add nftables c/r")
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
156c8da33c make: disable '-Wdangling-pointer' warning with gcc 12
The patch is similar to what has been done in linux kernel, as this
warning effectively prevents us from adding list elements to local list
head. See 49beadbd47

Else we have:

    CC       criu/mount.o
  In file included from criu/include/cr_options.h:7,
                   from criu/mount.c:13:
  In function '__list_add',
      inlined from 'list_add' at include/common/list.h:41:2,
      inlined from 'mnt_tree_for_each' at criu/mount.c:1977:2:
  include/common/list.h:35:19: error: storing the address of local variable 'postpone' in
    '((struct list_head *)((char *)start + 8))[24].prev' [-Werror=dangling-pointer=]
     35 |         new->prev = prev;
        |         ~~~~~~~~~~^~~~~~
  criu/mount.c: In function 'mnt_tree_for_each':
  criu/mount.c:1972:19: note: 'postpone' declared here
   1972 |         LIST_HEAD(postpone);
        |                   ^~~~~~~~

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Dmitry Safonov
4930c98020 x86/xsave: Set only used XFEATURE_* in xstate_bv
Setting all supported by CPU features in xstate_bv may bring it into
dirty-upper-state as documented in specs, resulting in lower
performance. Let's not do this and set only those have been used by
dumpee.

P.S.
Off course it has to be a one-liner!

Fixes: #1171

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
3f8e3220ba CONTRIBUTING.md: document make lint / indent
This patch documents how do we use `make lint` and `make indent` and
adds a note about their integration with CI.

Co-authored-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
42c4be2a92 net: use get_legacy_iptables_bin also on restore
Without this we might try to restore iptables legacy images with
iptables nft.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Cyrill Gorcunov
2982867185 pie/restorer: Fix fd leaking on error path
Nothing serious since OS will close it anyway but still to be precise.

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
1d4777e452 test: add long command-line to coredump test
Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
edaec5d762 coredump: report missing files without a backtrace
New message:

 ERROR: Required file /usr/lib64/libcrypto.so.3.0.1 not found.
 Exiting

Old message:

   File "/home/criu/coredump/criu_coredump/coredump.py", line 693, in _gen_mem_chunk
     f = open(fname, 'rb')
 FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib64/libcrypto.so.3.0.1'

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
3ca979f9a1 coredump: handle long command-lines
This fixes errors with long command-lines:

  File "/home/criu/coredump/criu_coredump/coredump.py", line 320, in _gen_prpsinfo
    prpsinfo.pr_psargs = self._gen_cmdline(pid)
    ^^^^^^^^^^^^^^^^^^
ValueError: bytes too long (88, maximum length 80)

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Kouame Behouba Manasse
a0cc95c03e lib/py: reduce code duplication
Refactor lib/py/images/images.py to reduce code duplication
by extracting repetitive code into helper functions and
private methods. This improves code readability and maintainability,
as well as reducing the risk of bugs caused by duplicated code.
Additionally, in Makefile, lib/py/images/images.py is added to the
list of files to run by flake8 during CI.

Fixes: #340

Signed-off-by: Kouame Behouba Manasse <behouba@gmail.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
85b5c1e451 ci/podman-test: drop crun installation script
In a previous commit, we set the default runtime to runc and
"manage-cgroups" to ignore. We remove the installation script
for crun as it is not used with this test.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
91ff24b47e ci: disable c/r of cgroups with podman
This patch disables the checkpoint/restore of cgroups for
the tests using Podman as a temporary workaround for
https://github.com/checkpoint-restore/criu/issues/2091

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
e7ab6fe635 restore: don't miss futex abort in restore_task_with_children
Fixes: 37b99ebe5 ("files: Do setup_newborn_fds() later")

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
676b4579f9 zdtm/transition/epoll: don't rely on errno in case of zero return
Checking errno in case read succeeded is undefined behaviour.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
7b80353448 mailmap: update my email
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
1ae9bac548 dump: improve error printing and readability of task_comm_info
This addresses Andrei comments from
https://github.com/checkpoint-restore/criu/pull/2064

- Add comment about '\n' fixing
- Replace ret with more self explainting is_read
- Print warings if we failed to print comm for some reason

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
11c71656bd ci: add test for crit info
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
9b919ab748 crit: fix empty string comparison
In Python 3 b'' == '' is False. This causes the info action to fail with

  File "/usr/lib/python3.11/site-packages/crit-3.17-py3.11.egg/pycriu/images/images.py", line 178, in count
    size, = struct.unpack('i', buf)
            ^^^^^^^^^^^^^^^^^^^^^^^
  struct.error: unpack requires a buffer of 4 bytes

Reported-by: Sankalp Acharya (@sankalp-12)
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Cyrill Gorcunov
fa4af04302 dump: Show task comm early
When error happens on file dumping stage the only information about the
task we dumping is its PID. For debug purpose show task's @comm early.

It proves useful when trying to understand which of dumped applications
is "guilty" in brokern dump when pid is not there anymore.

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
fd7e97fcfd lint: exclude tags file from codespell
If we build tags for our repo:

[criu]$ make tags
  GEN      tags

And then run codespell, we get an error:

[criu]$ codespell
./tags:3755: struc ==> struct

Let's exclude tags file from codespell search, this would add usability
to `make lint`.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
50e42c9ddc Add documentation for --ghost-fiemap
The --ghost-fiemap option was introduced with #1963.

It enables an optimized algorithm based on fiemap ioctl that can reduce
the number of syscalls used to checkpoint highly sparse ghost files. This
option is enabled by default. It can be disabled with --no-ghost-fiemap
when using SEEK_HOLE/SEEK_DATA is preferred. In addition, an automatic
fallback to SEEK_HOLE/SEEK_DATA is used for filesystems that do not
supporting fiemap.

Co-authored-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
4a8c02d636 zdtm: Add tests for IP_PKTINFO and IP_FREEBIND sock options
Just creates ipv4/ipv6 raw/dgram sockets with IP_PKTINFO and IP_FREEBIND
socket options enabled/disabled and checks that these options persist.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
bd9b66c8c0 sk-inet: support IP_PKTINFO and IPV6_RECVPKTINFO options
We see systemd-resolved relying on these options, and after migration
the options are lost and systemd-resolved stops serving dns requests.

The socket options make kernel add cmsg with destination address to
packets, see more how systemd-resolved uses them:

00a60eaf5f/src/resolve/resolved-manager.c (L826)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
7d4d4915af sk-inet: save IP_FREEBIND option for SOCK_RAW sockets also
The IP_FREEBIND option is supported for RAW sockets, why not save it
while we do this for other ip sockets anyway?

One difference is that for SOCK_RAW there is no fallback between
IP_FREEBIND and IPV6_FREEBIND, see:

ef4d3ea405/net/ipv6/ipv6_sockglue.c (L1497)

So let's have explicit IPV6_FREEBIND for ipv6.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
f1c8d386b4 kerndat: check if setsockopt IPV6_FREEBIND is supported
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Younes Manton
14e8836564 proc_parse: Handle btrfs files when map_files is not accessible
If we can't access a map_files entry directly and instead have to follow
the link and access the file via a filesystem path we need to properly
deal with files on btrfs subvolumes.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
d7da4a69af ci: Add maps00 test in unprivileged mode in user namespace
CAP_CHECKPOINT_RESTORE does not give access to /proc/$pid/map_files in
user namespaces. In order to test that CRIU in unprivileged mode can
dump and restore anonymous shared memory pages we will run the maps00
tests in a user namespace.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
d2abc9817f shmem: Close fd when VMA is copied from /proc/$pid/mem
If we don't have access to map_files and instead have to get the data
from /proc/$pid/mem we can close and reset the fd before passing it to
do_dump_one_shmem() which can then check it before trying to seek past
holes, eliminating the need for a separate seek_data_supported boolean.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
144b467a05 shmem: pr_err -> pr_perror
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
6d7c0d007e compel/mips: fix parasite with GCC 12
This patch applies the '-ffreestanding' flag that was introduced
with https://github.com/checkpoint-restore/criu/pull/1726 to MIPS.

Fixes: #1725

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
7280e96a79 clang-format: use IndentGotoLabels to get rid of goto label indentation
This is done to follow 'Linux kernel coding style', same change was
added to .clang-format in linux kernel source recently:
d7f6604341

We don't change it in current code base but let's follow it in all
future uses.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
fcdb753ed5 namespaces: cleanup switch_mnt_ns and restore_mnt_ns
Simplify code a bit: make exit codes of those functions more
transparent, rename ret to exit_code.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
63159c14c0 mount: simplify code around mount_cr_time_mount
Checking errno in outer function is really strange, also saving errno of
mount syscall after calling pr_perror is completely wrong. So let's try
to simplify things.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Michał Mirosław
43fa4e76d2 remap: refactor goto jumping to a while loop
Make the code a bit more readable by uncovering a while loop from
a if() goto sequence.

Signed-off-by: Michał Mirosław <emmir@google.com>
2023-04-15 21:17:21 -07:00
Michał Mirosław
757a2b46ce remap: Fix typo
Fixes: 237bd26982 ("remap: Rename global lock", 2017-05-18)
Signed-off-by: Michał Mirosław <emmir@google.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
3837d31b5b ci/lint: make push action have at least too commits depth
We see that when lint is called for push action git has only one last
commit which makes make indent with git-clang-format fail to operate.

Fix it by increasing fetch depth to one more commit.

Fixes: #2066
Fixes: d6db3333a ("clang-format: rework make indent to check specific commits")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Younes Manton
cec43025ac criu(8): Add info about unprivileged mode limitations
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
80528dbf72 proc_parse: Don't bail out on is_memfd() VMAs
Co-authored-by: Ivanq <imachug@yandex.ru>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
95e590b512 shmem: Fall back to /proc/$pid/mem if no map_files
If trying to open /proc/$pid/map_files/x-x for a given VMA fails with
EPERM (can happen in unprivileged mode when running in a non-init user
ns), fall back to reading the content from /proc/$pid/mem.

Co-authored-by: Ivanq <imachug@yandex.ru>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
951c56917a proc_parse: Set VMA_AREA_REGULAR where needed
This patch sets VMA_AREA_REGULAR on hugetlb and anon shmem VMAs since
they can be handled the same way as other kinds of regular memory.

Co-authored-by: Ivanq <imachug@yandex.ru>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
a92dfb61ff string: define wrapers __setproctitle and __setproctitle_init to hide bsd headers
We see that libbsd redefines __has_include to be always true, which
breaks such checks for rseq. The idea behind this patch is to put all
uses of libbsd functions to separate c files and only export wrapper
functions for them.

Using __setproctitle and __setproctitle_init everywhere in existing
code:

git grep --files-with-matches "setproctitle" | xargs sed -i 's/setproctitle/__setproctitle/g'
git grep --files-with-matches "setproctitle_init" | xargs sed -i 's/setproctitle_init/__setproctitle_init/g'

Fixes: #2036
Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
0a7c5fd1bd string: use our own __strlcpy and __strlcat to remove bsd headers
We see that libbsd redefines __has_include to be always true, which
breaks such checks for rseq. The idea behind this patch is remove the
use of libbsd functions and always export our replacement functions.

Using __strlcat and __strlcpy everywhere in existing code:
git grep --files-with-matches "strlcat" | xargs sed -i 's/strlcat/__strlcat/g'
git grep --files-with-matches "strlcpy" | xargs sed -i 's/strlcpy/__strlcpy/g'

Fixes: #2036
Suggested-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
8cfda2748c log: remove all uses of %m specifier in pr_* functions
As our pr_* functions are complex and can call different system calls
inside before actual printing (e.g. gettimeofday for timestamps) actual
errno at the time of printing may be changed.

Let's just use %s + strerror(errno) instead of %m with pr_* functions to
be explicit that errno to string transformation happens before calling
anything else.

Note: tcp_repair_off is called from pie with no pr_perror defined due to
CR_NOGLIBC set and if I use errno variable there I get "Unexpected
undefined symbol: `__errno_location'. External symbol in PIE?", so it
seems there is no way to print errno there, so let's just skip it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
7459d02043 lint: prohibit to use %m specifier in pr_* functions
As our pr_* functions are complex and can call different system calls
inside before actual printing (e.g. gettimeofday for timestamps) actual
errno at the time of printing may be changed.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
f73ba77269 ci: switch from lgtm to codeql
Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
0bddecead0 restorer: add logging on prctl PR_SET_MM_MAP failure
This kernel feature contained some bugs initially. Those logs are useful in identifing what the
underlaying issue is and which kernel patch to backport.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
fb66727a25 zdtm: add mntns_compare check to mount_complex_sharing
This way we can check that mount tree topology (including sharing
groups) is the same before and after c/r.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
ba09fad391 zdtm: add mountinfo topology compare to test lib
Now we can compare mount tree and sharing group tree topology before and
after c/r with mntns_compare() helper.

Algorithm here is:

1) build mount tree based on mnt_id and parent_mnt_id from mountinfo
2) sort mount tree children based on path comparison
3) at the same time set topology_id for mounts by DFS order and order
   mounts in list accordingly
4) build shared groups tree based on sharing_id and master_id
5) at the same time set topology_id for sharings as smallest topology_id
   of its mounts, also sharings are put in their list in order of
   their topology_id
6) walk sorted mounts lists for both namespaces simultaneously each
   pair of moutns should have matching ids and parent ids
7) walk sorted sharings lists for both namespaces simultaneously each
   pair of sharings should have matching ids and parent ids

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
2837a13ef9 zdtm: add mountinfo parsing to test lib
For mount testing it is nice to be able to parse mountinfo from zdtm
test itself, for instance to be able to compare mountinfo topology
before and after c/r, or for anything else. So let's add a helper
mntns_parse_mountinfo() which parses current mount namespace mountinfo.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
543501d5f8 zdtm/lib: copy xmalloc.h
Need to use xzalloc in zdtm lib.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
d800ef6588 zdtm/lib: copy list.h
Need it to use linux lists in zdtm.

Also copy container_of from comiler.h to zdtmtst.h like we already do
for e.g. __stack_aligned__ macro.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
be61624f45 clang-format: rework make indent to check specific commits
Previousely "make indent" checked all files in criu source directory for
codding style flaws. We have several problems with it:

- clang-format default format sometimes changes in new versions of the
package and we need to reformat all our code base each time it happens
- on different systems we may have different versions of clang-format
and on latest criu-dev "make indent" may be still unhappy on your system
- when we want to update clang-format rules ourselves we need to update
all our code base each time
- sometimes clang-format rules are not fitting all our cases, (e.g.: an
option IndentGotoLabels works nice for simple C code, but is a no go for
assembler and C macros) and putting "clang-format off" everywhere is a
mess
- sometimes we intentionally want to break clang-format rules (e.g.:
we want to put function arguments on a new line separating them
"logically" not "mechanically" following 120-char rule like clang-format
does).

This adds a BASE option for "make indent" where all commits in range
BASE..HEAD would be checked with git-clang-format for codding style
flaws. For instance when developing on top of criu-dev, one can use
"make BASE=origin/criu-dev indent" to check all their commits for
compliance with the clang-format rules. Default base is HEAD~1 to make
last commit checked when "make indent" is called. The closest thing to
the old behaviour would then be "make indent BASE=init", note that only
commited files would be checked.

Extra options to git-clang-format may be passed through OPTS variable.

Also reuse "make indent" in github lint workflow.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
a918093ceb scripts/ci: use Fedora 37 for vagrant based tests
Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
1bb84f96f5 tty: fix codding-style around for_each_bit call
Wraping "{" to next line after for-each macros is wrong.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
abfe0b5d24 clang-format: add for_each_bit macros to ForEachMacros
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
c8b4fb9ba5 autofs: fix a frankenstein auto-created by clang-format
Fixes: 93dd984ca ("Run 'make indent' on all C files")

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Cyrill Gorcunov
aab709b602 log: Write more details in write_pidfile
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
7c6eb0b85c asm: fix for_each_bit macro
find_next_bit operates on a bit instead of byte positions/sizes.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
bb3f7bef66 crtools: fix help message alignment for --network-lock
Fixes: 2e30db5c3 ("criu: add --network-lock option to allow nftables alternative")

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
a302b36940 zdtm: fix 'zdtm.py list' command
The command ./zdtm.py list currently fails with

    if opts['rootless']:
       ~~~~^^^^^^^^^^^^
    KeyError: 'rootless'

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Andrei Vagin
21f5be91a9 cgroups: ignore EOPNOTSUPP on setting memory.kmem.limit_in_byte
memory.kmem.limit_in_bytes has been deprecated. Look at e7c4184164f7
("memcg, kmem: further deprecate kmem.limit_in_bytes") for more details.

Signed-off-by: Andrei Vagin <avagin@google.com>
2023-04-15 21:17:21 -07:00
Andrei Vagin
9686693aa6 test/javaTests: update org.testng:testng (Maven)
TestNG is vulnerable to Path Traversal

Fixes https://github.com/checkpoint-restore/criu/security/dependabot/1.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Andrei Vagin
5c60d35be4 sockets: tiny style fix
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
Younes Manton
5a19c34322 non-root: Don't dump socket option SO_MARK if 0
Restoring SO_MARK requires root or CAP_NET_ADMIN. If the value
is 0 we will avoid dumping it so that we don't need to do a
privileged call on restore.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
2180e03b90 non-root: Rework socket bufs for unprivileged mode
SO_SNDBUFFORCE/SO_RCVBUFFORCE require root or CAP_NET_ADMIN.
We can use SO_SNDBUF/SO_RCVBUF in some cases and avoid
needing elevated privileges.

This patch renames sk_setbufs() to sk_setbufs_ns() and
makes sk_setbufs() a general helper that sets socket
send and receive buffer sizes. The helper tries to use
SO_SNDBUFFORCE/SO_RCVBUFFORCE first and falls back to
SO_SNDBUF/SO_RCVBUF if we're in unprivileged mode.

The existing sk_setbufs_ns() which takes a pid parameter
and is intended to be called via userns_call() is rewritten
to call sk_setbufs().

Existing code that sets buffer sizes via setsockopt() is
modified to call sk_setbufs() instead.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Shubham Verma
e5ccfbb240 Fix typo in comment
Signed-off-by: Shubham Verma <shubhamv.sv@gmail.com>
2023-04-15 21:17:21 -07:00
Liang-Chun Chen
bdbccc315a zdtm: add two tests for highly sparse ghost file
ghost_multi_hole00 and ghost_multi_hole01 are tests which create a ghost file
with a lot of holes, there are 4K data and 4K hole inside every 8K length.

The only difference between them is ghost-fiemap option, 01 is a
test for the fiemap dumping algorithm, and we want to test the
behavior of EXTENT_MAX_COUNT part, so the file size should be 8M, thus there
will be 1024 chunks in the ghost file.

In some file system, such as xfs, we somehow can not easily create highly sparse
file as in ext4 or btrfs, therefore we need `fallocate` to forcibly create holes.

Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
2023-04-15 21:17:21 -07:00
Liang-Chun Chen
f3fdce81a6 files-reg.c: fiemap algorithm for ghost file
In order to reduce the frequency of using system call, based on
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/create_inode.c#n519,
I created a new algorithm of dumping chunk via fiemap.(copy_file_to_chunks_fiemap)

Also, I added another BOOL_OPT for users to determine which algorithm they
want to use. Moreover, for those filesystem not supporting fiemap, criu
will fall back to the original algorithm(SEEK_HOLE/SEEK_DATA).

v2: don't call copy_chunk_from_file on outstanding extent; rearange
headers to workaround "redeclaration of ‘enum fsconfig_command’" problem

Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
14b9ec195f ci: fix make indent
This patch fixes applies the changes required by clang-format v15.0.5
for `make indent`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
d0c64b7b34 ci/alpine: remove symlink for /usr/bin/python
The python3 package in Alpine has recently been updated to install
symbolic link for /usr/bin/python.

https://git.alpinelinux.org/aports/commit/main/python3?id=d91da210b1614eb75517d59b7f348fee01699f35

This causes the following error in CI:

  Step 10/11 : RUN ln -s /usr/bin/python3 /usr/bin/python
   ---> Running in a5a94be9dc93
  ln: failed to create symbolic link '/usr/bin/python': File exists
  The command '/bin/sh -c ln -s /usr/bin/python3 /usr/bin/python' returned a non-zero code: 1

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
095b3e84b7 ci/lint: install ShellCheck with dnf
The way ShellCheck is installed was changed in commit c056f99
(ci/gha/lint: install a recent shellcheck) to use the latest version
v0.8.0 and remove some of the "shellcheck disable=..." annotations.
Since then, Fedora 37 has been released and the ShellCheck package
has been updated to v0.8.0.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Drew Wock
c48b5290dc Fix warnings from -Wstrict-prototypes in clang 16.0.0
While building on a machine that has a HOL clang compiler,
I ran into warnings regarding the changed line.  It appears
this warning is on by default because of anticipated changes
to the C standard.

Signed-off-by: Drew Wock <ajwock@gmail.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
fa2c585c22 amdgpu: define __nmk_dir if missing
This patch adds a missing definition for `__nmk_dir` in the Makefile
for the amdgpu plugin. This definition is required, for example, when
building the `test_topology_remap` target:

	make -C plugins/amdgpu/ test_topology_remap

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Mathias Gibbens
c7211f52db Remove execute bit from source file
Signed-off-by: Mathias Gibbens <mathias@calenhad.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
f5e0f641a8 cgroup: Remove redundant code that handles zombie tasks
Zombie tasks are dumped in dump_zombies() so it is redundant to handle them
in dump_one_task().

Deprecate cg_set in task_core_entry as this field must be per thread now.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
c1ae880eb4 kerndat: Mark memfd_create(MFD_HUGETLB) unavailable when ENOSYS is returned
Some users on Raspberry Pi report that the kerndat checking for
memfd_create(MFD_HUGETLB) support returns ENOSYS even when memfd_create
syscall is available. We currently treat this error as unexpected and
return error. This commit marks the memfd_create(MFD_HUGETLB) as
unavailable when ENOSYS is returned.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
153614cb1d ci: move cgroup unmounting to run-ci-tests.sh
A previous commit added a cgroup cpuset unmounting to
scripts/ci/Makefile. We are sometimes running in a container without the
necessary privileges to unmount certain cgroups.

This commit moves the cgroup unmounting to a place in run-ci-tests.sh
which already requires privileged access and does not break unprivileged
build-only CI runs.

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
516ebc4f58 ci: Do not fail if latest epel repository definition is already installed
Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
2ebce92333 ci: Make cpuset move to cgroup-v2 hierarchy
As cgroupv2_00, cgroupv2_01 need cpuset in cgroup-v2 hierarchy to check CRIU
handle cgroup-v2 properly, umount cpuset in cgroup-v1 to make it move to
cgroup-v2.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
07d538cefc zdtm: Check threads are restored into correct threaded controllers
This test creates a process with 2 threads in different threaded controllers and
check if CRIU restores these threads' cgroup controllers properly.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
20ea8a0647 cgroup-v2: Restore threads in a process into correct threaded controllers
As threads in a process may be in different threaded controllers, we need to
move thoses threads to the correct controllers.

Because the threads of a process are restored in later stage in restorer.c, we
need to create a cgroupd service to help to move those threads into correct
controllers when they are restored. We cannot use usernsd as the code in
restorer does not know the address of outside function to pass to userns_call.
However, this cgroupd service still reuses a lot of code from usernsd.

The main logic is that restored threads receive the cg_set number they belong to
before restorer stage in case their cg_set are different from main thread. When
these threads are restored, they send the cg_set number and their thread ids
through unix socket to cgroupd. cgroupd receives the cg_set number and thread
ids and moves those threads into correct controllers. Thread ids are sent
through SCM_CREDENTIALS of unix socket so they are translated into correct
thread ids in the receiving end.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
17d1d8810e cgroup-v2: Dump cgroup controllers of every threads in a process
Currently, we assume all threads in process are in the same cgroup controllers.
However, with threaded controllers, threads in a process may be in different
controllers. So we need to dump cgroup controllers of every threads in process
and fixup the procfs cgroup parsing to parse from self/task/<tid>/cgroup.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
ad3936e81e zdtm: Add test to check global properties of cgroup-v2 are preserved
Check that CRIU can checkpoint/restore global properties in cgroup-v2 properly.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
d7e8746598 zdtm: Add write_value/read_value helpers into zdtm library
Add write_value/read_value helpers to write/read buffer to/from files into zdmt
library.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
167cfd366e cgroup-v2: Checkpoint and restore some global properties
This commit supports checkpoint/restore some new global properties in cgroup-v2

	cgroup.subtree_control
	cgroup.max.descendants
	cgroup.max.depth
	cgroup.freeze
	cgroup.type

Only cgroup.subtree_control, cgroup.type need some more code to handle.
cgroup.subtree_control value needs to be set with "+", "-" prefix and
cgroup.type can only be written with value "threaded" if we want to make this
controller threaded. cgroup.type is a special property because this property
must be restored before any processes can move into this controller.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
8a336ab226 Switch aarch64 builds to Cirrus CI
It seems like drone.io no longer provides free aarch64/armhf CI runs.

This switches the aarch64 CI runs to Cirrus CI. armhf CI runs have been
dropped for now as they are not directly supported.

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
840735aa08 ipc_sysctl: Prioritize restoring IPC variables using non usernsd approach
Since commit 5563cabdde, user with
enough capability can open IPC sysctl files and write to them. Therefore, we
don't need to use usernsd process in the outside user namespace to help with
that anymore. Furthermore, some later commits:
1f5c135ee5,
0889f44e28 bind the IPC namespace to
the opened file descriptor of IPC sysctl at the open() time, the changed value
does not depend on the IPC namespace of write() time anymore. This breaks the
current usernsd approach.

So, we prioritize opening/writing IPC sysctl files in the context of restored
process directly without usernsd help. This approach succeeds in the newer
kernel since the restored process has enough capabilities at this restore stage.
With older kernel, the open() fails and we fallback to the usernsd approach.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
3db8d1a6c6 cgroup: add a comment to restore_cgroup_prop about path argument requirements
In Virtuozzo we've faced out-of-bound access when calling this function
on short path string, which corrupted other memory and lead to
segmentation fault. So it may be useful to have this comment in code to
avoid such a missuse of this function in future.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
1cba559da4 non-root: add non-root test case to cirrus runs
Run env00 and pthread00 test as non-root as initial proof of concept.

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
6743d608cf non-root: extend zdtm.py to be able to run tests as non-root
These are the minimal changes to make zdtm.py successfully run the
env00 and pthread test case as non-root using the '--rootless' zdtm option.

Co-authored-by: Younes Manton <ymanton@ca.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
251939992a Documentation: add details about --unprivileged
This adds the non-root section and information about the parameter
--unprivileged to the man page.

Co-authored-by: Anna Singleton <annabeths111@gmail.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Anna Singleton <annabeths111@gmail.com>
2023-04-15 21:17:21 -07:00
Younes Manton
47b07d0110 non-root: Introduce unprivileged mode to kerndat
This patch modifies how kerndat is handled in unprivileged mode.

Initialization and functionality that can only be done as root is
made separate from common code. The kerndat file's location is
defined as $XDG_RUNTIME_DIR/criu.kdat in unprivileged mode. Since
we expect that directory to be on tmpfs we maintain the same behavior
as the root-mode kerndat which lives in /run.

Co-authored-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
6a30c7d1ed non-root: enable non-root checkpoint/restore
This commit enables checkpointing and restoring of applications as
non-root.

First goal was to enable checkpoint and restore of the env00 and
pthread00 test case.

This uses the information from opts.unprivileged and opts.cap_eff to
skip certain code paths which do not work as non-root.

Co-authored-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
ce01f70d94 non-root: add functions to work with capabilities
This adds the function check_caps() which checks if CRIU is running
with at least CAP_CHECKPOINT_RESTORE. That is the minimum capability
CRIU needs to do a minimal checkpoint and restore from it.

In addition helper functions are added to easily query for other
capability for enhanced checkpoint/restore support.

Co-authored-by: Younes Manton <ymanton@ca.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
4b4bf0421b non-root: add infrastructure to run as non-root
The idea behind the rootless CRIU code is, that CRIU reads out its
effective capabilities and stores that in the global opts structure.

Different parts of CRIU can then, based on the existing capabilities,
automatically enable or disable certain code paths.

Currently at least CAP_CHECKPOINT_RESTORE is required. CRIU will not
start without this capability.

Signed-off-by: Adrian Reber <areber@redhat.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
4c7f91afff ci: enable EPEL for CentOS 7
python2-future, python2-junit_xml, python-flake8 and libbsd-devel are
now provided from EPEL.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Younes Manton
a39d416568 compel: Fix ppc64le parasite stack layout
The ppc64le ABI allows functions to store data in caller frames.
When initializing the stack pointer prior to executing parasite code
we need to pre-allocating the minimum sized stack frame before
jumping to the parasite code.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
17ec539132 compel: Add test to check parasite stack setup
Some ABIs allow functions to store data in caller frame, which
means that we have to allocate an initial stack frame before
executing code on the parasite stack.

This test saves the contents of writable memory that follows the stack
after the victim has been infected but before we start using the
parasite stack. It later checks that the saved data matches the
current contents of the two memory areas. This is done while the
victim is halted so we expect a match unless executing parasite code
caused memory corruption. The test doesn't detect cases where we
corrupted memory by writing the same value.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
556ab0deaf compel: Fix infect test to not override failures
Signed-off-by: Younes Manton <ymanton@ca.ibm.com>

return zero on chk success

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Co-authored-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Younes Manton
461fa72715 compel: Add APIs to facilitate testing
Starting the daemon is the first time we run code in the victim
using the parasite stack.

It's useful for testing to be able to infect the victim without starting
the daemon so that we can inspect the victim's state, set up stack
guards, and so on before stack-related corruption can happen.

Add compel_infect_no_daemon() to infect the victim but not start the
daemon and compel_start_daemon() to start the daemon after the victim
is infected.

Add compel_get_stack() to get the victim's main and thread parasite
stacks.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Liu Hua
debc9c16cc seize: do not overwrite exit code from failpath
Signed-off-by: Liu Hua <weldonliu@tencent.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
16f1c147c8 test/others/crit/test.sh: use bash array
In fact an array (aptly named array) is already used in run_test2,
so let's just make it an array right from the start.

While at it, remove ls invocation.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
0a872ccf16 scripts/protobuf-gen.sh: fix (not ignore) shellcheck warnings
This basically replaces

	for x in $(sed ...); do

with

	sed ... | while IFS= read -r x; do

The only caveat is, sed program was amended to remove empty lines
(there was one right above the PB_AUTOGEN_STOP).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
75b859f23f scripts/ci: rm shellcheck disable annotations
Those are no longer needed with shellcheck 0.8.0.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
aeb6961f3d scripts/ci/run-ci-tests: use bash arrays
This is a preferred way of fixing SC2086 shellcheck warning.

Note that since ZDTM_OPTS is passed as a string (via make or docker),
we are converting it to an array using read -a.

Remove all "shellcheck disable=SC2086" annotations.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
b1fb9f2f0b Fix, not ignore, shellcheck SC1091 warnings
This is easy to fix (but we have to specify -x).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
9d2948b239 scripts/ci/asan.sh: fix, not ignore, shellcheck warning
We can use globstar bash feature instead of find in this case.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
968eec0d59 scripts/ci/apt-install: fix (not ignore) shellcheck warning
It is ok to quote $@, as it expands to "$1" "$2" ...

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
86ac0f05ea ci/gha/lint: install a recent shellcheck
Instead of using shellcheck v0.7.2 from fedora repo,
let's install the latest version (v0.8.0).

This allows to remove some "shellcheck disable=..." annotations,
and (I hope) better checking quality overall.

While at it, remove findutils from dnf install as this package is
already installed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
2039d73200 files-reg: skip failed mount lookup for shell-job's tty
When we restore a shell-job we would inherit tty-s, so even if we don't
have a right mount for it in container on dump, on restore it should
just be right.

Else when dumping second time via criu-ns we get:

(00.005678) Error (criu/files-reg.c:1710): Can't lookup mount=29 for fd=0 path=/dev/pts/20

Fixes: #1893
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
9e91e62a7c criu-ns: capture controlling tty
When we are restoring in new pidns we specifically do setsid() from
criu-ns init so that sids of restored tasks are non-zero in this pidns
and on next dump CRIU would not have problems with zero sids, see [1].

But after this CRIU tries to inherit and setup a tty for the restored
process, and it fails to set it's process group via TIOCSPGRP to be a
foreground group for it's tty, because tty already is a controlling tty
for other session (which we had before setsid).

So to make it restore we need to reset tty to be a controlling tty of
criu-ns init via TIOCSCTTY before calling criu.

Else when restoring first time via criu-ns (from criu-ns dump) we get:

Error (criu/tty.c:689): tty: Failed to set group 40816 on 0: Inappropriate ioctl for device

https://github.com/checkpoint-restore/criu/issues/232 [1]

v2: add why and what comment in code, set controlling tty only for
--shell-job and fail if stdin is not a tty.

Fixes: #1893
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
40e1aaf563 mount: add definition for FSOPEN_CLOEXEC
A recent change in glibc introduced `enum fsconfig_command` [1] and as a
result the compilation of criu fails with the following errors

In file included from criu/pie/util.c:3:
/usr/include/sys/mount.h:240:6: error: redeclaration of 'enum fsconfig_command'
  240 | enum fsconfig_command
      |      ^~~~~~~~~~~~~~~~
In file included from /usr/include/sys/mount.h:32:
criu/include/linux/mount.h:11:6: note: originally defined here
   11 | enum fsconfig_command {
      |      ^~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:242:3: error: redeclaration of enumerator 'FSCONFIG_SET_FLAG'
  242 |   FSCONFIG_SET_FLAG       = 0,    /* Set parameter, supplying no value */
      |   ^~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:12:9: note: previous definition of 'FSCONFIG_SET_FLAG' with type 'enum fsconfig_command'
   12 |         FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */
      |         ^~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:244:3: error: redeclaration of enumerator 'FSCONFIG_SET_STRING'
  244 |   FSCONFIG_SET_STRING     = 1,    /* Set parameter, supplying a string value */
      |   ^~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:14:9: note: previous definition of 'FSCONFIG_SET_STRING' with type 'enum fsconfig_command'
   14 |         FSCONFIG_SET_STRING = 1, /* Set parameter, supplying a string value */
      |         ^~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:246:3: error: redeclaration of enumerator 'FSCONFIG_SET_BINARY'
  246 |   FSCONFIG_SET_BINARY     = 2,    /* Set parameter, supplying a binary blob value */
      |   ^~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:16:9: note: previous definition of 'FSCONFIG_SET_BINARY' with type 'enum fsconfig_command'
   16 |         FSCONFIG_SET_BINARY = 2, /* Set parameter, supplying a binary blob value */
      |         ^~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:248:3: error: redeclaration of enumerator 'FSCONFIG_SET_PATH'
  248 |   FSCONFIG_SET_PATH       = 3,    /* Set parameter, supplying an object by path */
      |   ^~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:18:9: note: previous definition of 'FSCONFIG_SET_PATH' with type 'enum fsconfig_command'
   18 |         FSCONFIG_SET_PATH = 3, /* Set parameter, supplying an object by path */
      |         ^~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:250:3: error: redeclaration of enumerator 'FSCONFIG_SET_PATH_EMPTY'
  250 |   FSCONFIG_SET_PATH_EMPTY = 4,    /* Set parameter, supplying an object by (empty) path */
      |   ^~~~~~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:20:9: note: previous definition of 'FSCONFIG_SET_PATH_EMPTY' with type 'enum fsconfig_command'
   20 |         FSCONFIG_SET_PATH_EMPTY = 4, /* Set parameter, supplying an object by (empty) path */
      |         ^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:252:3: error: redeclaration of enumerator 'FSCONFIG_SET_FD'
  252 |   FSCONFIG_SET_FD         = 5,    /* Set parameter, supplying an object by fd */
      |   ^~~~~~~~~~~~~~~
criu/include/linux/mount.h:22:9: note: previous definition of 'FSCONFIG_SET_FD' with type 'enum fsconfig_command'
   22 |         FSCONFIG_SET_FD = 5, /* Set parameter, supplying an object by fd */
      |         ^~~~~~~~~~~~~~~
/usr/include/sys/mount.h:254:3: error: redeclaration of enumerator 'FSCONFIG_CMD_CREATE'
  254 |   FSCONFIG_CMD_CREATE     = 6,    /* Invoke superblock creation */
      |   ^~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:24:9: note: previous definition of 'FSCONFIG_CMD_CREATE' with type 'enum fsconfig_command'
   24 |         FSCONFIG_CMD_CREATE = 6, /* Invoke superblock creation */
      |         ^~~~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:256:3: error: redeclaration of enumerator 'FSCONFIG_CMD_RECONFIGURE'
  256 |   FSCONFIG_CMD_RECONFIGURE = 7,   /* Invoke superblock reconfiguration */
      |   ^~~~~~~~~~~~~~~~~~~~~~~~
criu/include/linux/mount.h:26:9: note: previous definition of 'FSCONFIG_CMD_RECONFIGURE' with type 'enum fsconfig_command'
   26 |         FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */

This patch adds definition for FSOPEN_CLOEXEC to solve this problem. In particular,
sys/mount.h includes ifndef check for FSOPEN_CLOEXEC surrounding `enum fsconfig_command`.

[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=7eae6a91e9b1670330c9f15730082c91c0b1d570

Reported-by: Younes Manton (@ymanton)
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Younes Manton
7bc24688d6 ci: Clean up and improve Java testing
This patch changes top-level OpenJ9 filename and data references to Java
to make them generic and launches tests against both HotSpot and OpenJ9
JVMs.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
0178f2f990 ci: Add Dockerfile for openj9 on Ubuntu
Semeru builds (which use OpenJ9 instead of HotSpot) are the successors
of AdoptOpenJDK's OpenJ9 builds.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
39b3de60b6 ci: Rename openj9 Dockerfiles to hotspot
We used to pull AdoptOpenJDK's OpenJ9 builds but switched to
Eclipse Temurin, which uses the HotSpot VM instead of OpenJ9.
Rename the corresponding Dockerfiles to hotspot.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
f78d3d821c gitignore: Ignore top-evel build dir only
The entry "build/" will ignore any directory named "build" at any level
of the source tree, including our scripts/build directory. We only want
to ignore the top-level build directory created by `make install`.

As the git manpage suggests, entries with slashes at the start or in the
middle will only match at the same level as the .gitignore, hence use
build/** instead.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Andrei Vagin
aeaff64452 test/unix: check C/R of unix listen queues
Check that CRIU handles non-empty listen queues properly.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
[mclapinski@google.com: update test_doc and test_author]
Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
Andrei Vagin
83c606e023 zdtm: return 1 from pr_err, pr_perror, fail
This allows to make test code more compact:
if (ret == -1) {
	pr_perror("XXX");
	return 1;
}
vs
if (ret == -1)
	return pr_perror("XXX");

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
Michal Clapinski
dca55d281a criu: fail migration if data was sent to an in-flight socket
Before this change, CRIU would just lose that data upon migration. So
it's better to fail migration in this case.

To reproduce the bug one can:
1. Create an AF_UNIX socket and call listen on it.
2. Create a second AF_UNIX socket and call connect to the first one.
3. Send the data to the second socket.
4. Migrate.
5. Call accept on the first socket and then read. There would be no data
   available.

It should be even possible to close the second socket before migration.
This would cause accept to hang because CRIU totally misses a closed
in-flight socket.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
2023-04-15 21:17:21 -07:00
fu.lin
dfe9d006ad breakpoint: enable breakpoints by default on amd64 and arm64
Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
fu.lin
bb73e1cf5a breakpoint: implement hw breakpoint for arm64 platform
The x86 implement hardware breakpoint to accelerate the tracing syscall
procedure instead of `ptrace(PTRACE_SYSCALL)`. The arm64 has the same
capability according to <<Learn the architecture: Armv8-A self-hosted
debug>>[[1]].

<<Arm Architecture Reference Manual for A-profile architecture>[[2]]
illustrates the usage detailly:
- D2.8 Breakpoint Instruction exceptions
- D2.9 Breakpoint exceptions
- D13.3.2 DBGBCR<n>_EL1, Debug Breakpoint Control Registers, n

Note:
[1]: https://developer.arm.com/documentation/102120/0100
[2]: https://developer.arm.com/documentation/ddi0487/latest

Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
fu.lin
b7953c6c7f compel: switch breakpoint functions to non-inline at arm64 platform
Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
Andrei Vagin
719fea2fc9 compel: clear a breakpoint right after it's been triggered
Breakpoints are used to stop as close as possible to a target system call.

First, we don't need it after this point.
Second, PTRACE_CONT can't pass through a breakpoint on arm64.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
Andrei Vagin
d7477dac03 compel: set TRACESYSGOOD to distinguish breakpoints from syscalls
When delivering system call traps, set bit 7 in the  signal  number  (i.e.,
deliver SIGTRAP|0x80).  This makes it easy for the tracer  to  distinguish
normal traps from those caused by a system call.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
c089159a46 ci/cirrus: centos 8 job nits
1. Rename CentOS 8 to CentOS Stream 8 (which it is).

2. Install junit_xml from the repo rather than via pip.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Kir Kolyshkin
a202ec271d ci/cirrus: add CentOS Stream 9
Mostly a copy-paste from the CentOS 8 task, with a few differences:
 - Use dnf instead of yum
 - Enable crb instead of powertools
 - Different way of installing EPEL
 - No need to switch to python3 as this is the default
 - junit_xml is now available as an rpm

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
2642b657da docker-test: handle race condition error
There is a race condition in docker/containerd that causes docker to
occasionally fail when starting a container from a checkpoint immediately
after the checkpoint has been created.

This problem is unrelated to criu and has been reported in
https://github.com/moby/moby/issues/42900

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Andrei Vagin
49319cd579 Add Alexander Mikhalitsyn to maintainers
Alex implemented a few complex features and maintain our CI system.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
f7972a3f04 cr-restore: rseq: use glibc-specific way to unregister only as fallback
Let's use dynamic approach to detect built-in *libc rseq in all cases,
and "old" static approach as a fallback path if the user kernel
lacks support of ptrace_get_rseq_conf feature.

Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
cacddf19da cr-restore: rseq: dynamically handle *libc with rseq
Before this patch we assumed that CRIU is compiled against
the same GLibc as it runs with. But as we see from real
world examples like #1935 it's not always true.

The idea of this patch is to detect rseq configuration
for the main CRIU process and use it to unregister
rseq for all further child processes. It's correct,
because we restore pstree using clone*() syscalls,
don't use exec*() (!) syscalls, so rseq gets inherited
in the kernel and rseq configuration remains the same
for all children processes.

This will prevent issues like this:
https://github.com/checkpoint-restore/criu/issues/1935

Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
1f9bd82a55 cr-check: optimize check for apparmor stacking
The result of check_aa_ns_dumping() is stored in kdat. Instead of doing
the same check twice - once on kerndat_init(), and again in
check_apparmor_stacking(), we can check the stored value.

Suggested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
a1262f55fb cr-check: fix check for apparmor stacking
The feature check for AppArmor stacking was introduced in
commit:
	8723e3f998
	check: add a feature test for apparmor_stacking

However, on systems that don't support AppArmour, this check always
fails. As a result, `criu check --all` shows the following message:

	Looks good but some kernel features are missing
	which, depending on your process tree, may cause
	dump or restore failure.

Reported-by: André Rösti (@andrej)
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
f9bc0a750a docker-test: use containerd installed from package
In commits [1, 2] the version of containerd installed by default in the
GitHub CI virtual environment was replaced with the latest release from
GitHub as a workaround to a bug in containerd.  This bug has been fixed
sometime ago and the current default version of containerd (1.6.6) does
not require this workaround. However, with the latest release, the
containerd binaries uploaded on GitHub have been built for Ubuntu 22.04
[3]. Our tests are still running on Ubuntu 20.04 and this results in the
following error:

/usr/bin/containerd: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /usr/bin/containerd)
/usr/bin/containerd: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /usr/bin/containerd)

[1] 046cad8
[2] 81a68ad
[3] 6b2dc9a37

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
750acec25f Revert "ci: Switch to non overlaysfs tests"
This reverts commit 8bb05e3bf3.

The following bug has been fixed:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1967924

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
e8a6765d1e criu: fix conflicting headers
There are several changes in glibc 2.36 that make sys/mount.h header
incompatible with kernel headers:

https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

This patch removes conflicting includes for `<linux/mount.h>` and
updates the content of `criu/include/linux/mount.h` to match
`/usr/include/sys/mount.h`. In addition, inline definitions sys_*()
functions have been moved from "linux/mount.h" to "syscall.h" to
avoid conflicts with `uapi/compel/plugins/std/syscall.h` and
`<unistd.h>`. The include for `<linux/aio_abi.h>` has been replaced
with local include to avoid conflicts with `<sys/mount.h>`.

Fixes: #1949

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
eb4ecb3cfd ci: unset XDG_RUNTIME_DIR when invoking podman
We need to pass environment variables from the CI environment to
distinguish between CI environments. However, when `sudo -E` is
used to run Podman it results in the XDG_RUNTIME_DIR environment
variable being set incorrectly that prevents Podman from running.

This patch fixes the following error in the GitHub Action virtual
environment:

	error running container: error from /usr/bin/crun creating
	container for [/bin/sh -c /bin/prepare-for-fedora-rawhide.sh]:
	sd-bus call: Connection reset by peer

Fixes: #1942

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
da4803beae MAINTAINERS: Add Radostin (myself) to maintainers
I've been contributing to CRIU for sometime and I'm hoping that my
familiarity with the project would be sufficient to self-nominate as a
maintainer. I would like to help with code reviews, submitting patches,
implementing new features, and maintaining the project in general.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Liang-Chun Chen
e62d541bde zdtm: add two tests for large ghost sparse file
ghost_holes_large00 is a test which creates a large ghost sparse file with 1GiB
hole(pwrite can only handle 2GiB maximum on 32-bit system) and 8KiB data, criu
should be able to handle this kind of situation.

ghost_holes_large01 is a test which creates a large ghost sparse file with 1GiB
hole and 2MiB data, since 2MiB is larger than the default ghost_limit(1MiB),
criu should fail on this test.

v2: fix overflow on 32-bit arch.

Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
2023-04-15 21:17:21 -07:00
Liang-Chun Chen
2d34b56024 unlink_largefile.desc: remove crfail, since criu now can support
unlink_largefile test

In the past, the unlink_largefile test should be fail on large ghost file.
However, it used sparse file, it will pass in current criu, since the large
ghost sparse file issue was fixed.

So the crfail flag of this test should be removed.

Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
2023-04-15 21:17:21 -07:00
Liang-Chun Chen
fbded79788 files-reg.c: modify the check of ghost_limit to support large sparse files
files-reg.c checks whether the file size is larger than ghost_limit with st_size
(in dump_ghost_remap), which can not deal with large ghost sparse file, since
its actual file size is not the same as what st_size shows.

Therefore, in this commit, I replace st_size with st_blocks, which shows the
actual file size. (1 block = 512B), thus criu can deal with large ghost sparse
file.

Signed-off-by: Liang-Chun Chen <featherclc@gmail.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
01b8d40ced zdtm/mnt_root_ext: don't allow propagation from test mntns to criu mntns
This test specifically wants to create external bind-mount of "/" from
criu mntns to test mntns, and it wants "/" in criu mntns to be a shared
mount so that "external" mount in the test mntns is it's slave. This is
to triger specific dirname() resolution which happens only when sharing
restore is involved for external mounts, and only if rootfs is involved.

But initially I missed that when we create external mount in test's
temporary mntns it creates a propagation in criu mntns on top of root
mount. This mount may influence other tests restore as child mount in
root mount converts to locked child mount in criu service mntns (for uns
flavour) and when criu would restore root container mount it would fail
with EINVAL on non recursive bind with locked children.

To fix this mess we just need to prohibit propagating from tests
temporary mntns to criu mntns by making mounts slave.

Fixes: #1941

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
d12e2364c4 zdtm: make root mount private in criu mntns
If root mount in criu mntns is slave, it would be slave of host mount
where criu is stored, so if someone mounts something in subdir of
{criu-dir}/test/ on host while tests are running this mount can
influence the test as it appears on top of root mount in criu mntns.

1) With mount-compat this mount can get into restored test mntns, which
means wrong restore, as this mount was not there on dump.
2) With mount-v2 this mount would just fail container restore, as root
container mount is mounted non-recursively to protect from unexpected
mounts appear after restore.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
70a9cd6fbf vdso-compat: Increase the reserved buffer for compat vdso
On Arch Linux with 5.18.3-zen1-1-zen kernel, the vdso's size is 3 pages which
exceeds the current 2-page reserved buffer. This commit simply increases the
reserved buffer size to 4 pages.

Fixes: https://github.com/checkpoint-restore/criu/issues/1916

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
b30f3ee3d3 zdtm: Remove permission part check for skipping vsyscall vma
Normally, vsyscall vma has VM_READ, VM_EXEC permission. However, when
CONFIG_LEGACY_VSYSCALL_XONLY=y, that vma only has VM_EXEC. This commit removes
the permission part when checking to skip vsyscall vma in x32 tests.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
c502d480f9 x86/compel/fault-inject: fixup mxcsr for PTRACE_SETFPREGS
Error from:
./test/zdtm.py run -t zdtm/static/fpu00 --fault 134 -f h --norst

(00.003111) Dumping GP/FPU registers for 56
(00.003121) Error (compel/arch/x86/src/lib/infect.c:310): Corrupting fpuregs for 56, seed 1651766595
(00.003125) Error (compel/arch/x86/src/lib/infect.c:314): Can't set FPU registers for 56: Invalid argument
(00.003129) Error (compel/src/lib/infect.c:688): Can't obtain regs for thread 56
(00.003174) Error (criu/cr-dump.c:1564): Can't infect (pid: 56) with parasite

See also:
145e9e0d8c6 ("x86/fpu: Fail ptrace() requests that try to set invalid MXCSR values")
145e9e0d8c

We decided to move from mxcsr cleaning up scheme and use mxcsr mask
(0x0000ffbf) as kernel does. Thanks to Dmitry Safonov for pointing out.

Tested-on: Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz

Reported-by: Mr. Jenkins
Suggested-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
e30d18f435 rseq: fix headers conflict on Mariner GNU/Linux
1. For some reason, Marier distribution headers
not correctly define __GLIBC_HAVE_KERNEL_RSEQ
compile-time constant. It remains undefined,
but in fact header files provides corresponding
rseq types declaration which leads to conflict.

2. Another issue, is that they use uint*_t types
instead of __u* types as in original rseq.h.

This leads to compile time issues like this:
format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'uint64_t' {aka 'long unsigned int'}

and we can't even replace %llx to %PRIx64 because it will break
compilation on other distros (like Fedora) with analogical error:

error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 6 has type ‘__u64’ {aka ‘long long unsigned int’}

Let's use our-own struct rseq copy fully equal to the kernel one,
it's safe because this structure is a part of Linux Kernel ABI.

Fixes #1934

Reported-by: Nikola Bojanic
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Younes Manton
ad58553d90 Add --skip-file-rwx-check opt test
Add a simple test using tail to check that processes can't be restored
by default when the r/w/x mode of an open file changes, unless
--skip-file-rwx-check is used.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Younes Manton
18fba41255 config/files-reg: Add opt to skip file r/w/x check on restore
A file's r/w/x changing between checkpoint and restore does
not necessarily imply that something is wrong. For example,
if a process opens a file having perms rw- for reading and
we change the perms to r--, the process can be restored and
will function as expected.

Therefore, this patch adds an option

--skip-file-rwx-check

to disable this check on restore. File validation is unaffected
and should still function as expected with respect to the content
of files.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2023-04-15 21:17:21 -07:00
Yuriy Vasiliev
6cef6e726a zdtm: add tests for SIGTSTP
stopped03 check that stopped by SIGTSTP tasks are restored correctly.
stopped04 check that stopped by SIGSTOP tasks which have blocked SIGTSTP and
have SIGTSTP pending are restored correctly.

Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@openvz.org>
2023-04-15 21:17:21 -07:00
Yuriy Vasiliev
c7858ba42b infect: add SIGTSTP support
Add SIGTSTP signal dump and restore. Add a corresponding field
in the image, save it only if a task is in the stopped state.

Restore task state by sending desired stop signal if it is present
in the image. Fallback to SIGSTOP if it's absent.

Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@openvz.org>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
49caf85b20 config: fail on --track-mem option if dirty tracking is not available
Else we trigger BUG in task_reset_dirty_track():
  Error (criu/mem.c:45): BUG at criu/mem.c:45

The check in kerndat_get_dirty_track() does not work right.

https://github.com/checkpoint-restore/criu/issues/1917

Reported-by: @mrc1119
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
91e971c4d9 hugetlb: don't dump anonymous private hugetlb mapping using memfd approach
Currently, the content of anonymous private hugetlb mapping is dumped in 2
different images: memfd approach and normal private mapping dumping. In memfd
approach, we dump the content of the backing pseudo file (/anon_hugepage). This
is incorrect and redundant since the mapping is private, the content of backing
file may differ from the content of the mapping. With this commit, we remove the
redundant memfd approach dump and only do the normal private mapping dump on
anonymous hugetlb mapping.

Run zdtm.py run -f h --keep-img always -t zdtm/static/maps09, du -h in the
dumped image directory

Before this commit
	13M     test/dump/zdtm/static/maps09/55/1
After this commit
	8.5M    test/dump/zdtm/static/maps09/55/1

The reduction in size is approximately 4MB which is the size of anonymous
private hugetlb mapping in the test.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Radostin Stoyanov
dd0217976c amdgpu: Add gitignore
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
b117b211ab zdtm/scm: add scm09 test with closed sender fd
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
5cd7092fda sk-unix: make add_fake_unix_queuers earier and rework find_queuer_for
Before this patch, if we had a unixsk with incomming scm packets (with
fds) and with the sender side fd closed, we got an error:

Error (criu/sk-unix.c:1125): unix: Can't find sender for 0x1e

First part of the problem is that unix_note_scm_rights() expects to see
a "queuer" which would send scm packets to the unixsk, and there is no
as the sender side is closed.

Second part of the problem is that we already have "fake" queuers
feature so that it already creates a unix socket pair and leaves other
end open for later queuing packets. But function add_fake_unix_queuers()
is called after unix_note_scm_rights() thus there is no chance to find
queuer at the point of failure.

Third part is that when we look for a queuer in find_queuer_for() we
actually look for a socket for which we are a queuer and not for the
socket which is a queuer for us, which is opposite to the name. For
cases where both ends are alive both are queuers for each other so this
was not important, but for our closed sender case it breaks.

So let's reorder add_fake_unix_queuers() before unix_note_scm_rights()
and make find_queuer_for() actually do what it's name implies.

This situation is started to reproduce on Virtuozzo start/stop tests
with the unixsk belonging to systemd, we suppose that this state where
the sender fd side is closed happens rarely only on systemd start/stop,
so we don't see it in regular suspend resume of long-living containers.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Ashutosh Mehra
28358db13b Fix the check for mnt namespace in criu-ns
criu-ns script incorrectly compares the pidns fd with mntns fd.
Also reversed the condition in is_my_namespace function to align it
with the function name.

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
295dc85ca0 github: use git-clang-format instead of make indent
This allows us to only detect bad formating in PR changes but not all
the CRIU codebase.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Alexander Mikhalitsyn
ced4ab4b0a zdtm: skip zdtm/static/shm-hugetlb when hugetlb is not supported
Reported-by: Mr. Jenkins (ppc64le)
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
c830643d86 Revert "ci: skip new hugetlb maps09/maps10 tests for pre-dump"
This reverts commit 37ea8c5fcf.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Bui Quang Minh
b26e1fdbf7 mem: Skip pre-dumping on hugetlb mappings
As private hugetlb mappings are not pre-mapped, the content of them is restored
in the the restorer which cannot use page_read->read_pages. As a result, we
cannot recursively read the content of pre-dumped image in the parent directory
and use preadv to read the content from the last dumped image only. Therefore,
it may freeze while restoring when the content of mapping is in pre-dumped image
in parent directory.

We need to skip pre-dumping on hugetlb mappings to resolve the issue.

Suggested-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2023-04-15 21:17:21 -07:00
Pavel Tikhomirov
9066f87417 cr-dump: do not report success to logs if post-dump script failed
It can be confusing to see error from post-dump action script and non
zero return from criu though at the same time see "Dumping finished
successfully" in log. I believe it is logical to consider post-dump
action script as a part of "dump" process so fail in it means that the
whole dump failed.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2023-04-15 21:17:21 -07:00
Adrian Reber
d46f40f4ff criu: Version 3.17.1
* Fixes for pre-dump read mode
 * Fixes for mount-v2
 * amdgpu plugin build and installation fixes
 * Some minor CI related fixes

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-06-23 14:53:25 -07:00
Radostin Stoyanov
46ec6749fa ci: Fix code indent
This patch contains auto-generated changes from `make indent`

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
f29d51560e zdtm: add mnt_root_ext test
This test has one external mount [criumntns] /zdtm_root_ext.tmp ->
[testmntns] /mnt_root_ext.test, and it specifically gives '--external
mnt[MNT]:.zdtm_root_ext.tmp' option on restore without '/' to make
dirname on it return static '.' path (see glibc dirname() code) and
reproduce a segfault in resolve_mountpoint().

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
8a18faea09 util/mount-v2: fix resolve_mountpoint() to always return freeable pointer
Else we have a Segmentation fault in __move_mount_set_group() on
xfree(source_mp) if resolve_mountpoint() returned statically allocated
path.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
4cc8a18f3b zdtm: test multiple ext bindmounts with no common root and same master
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
229c5df5ce mount-v2: workaround for multiple external bindmounts with no common root
It's a problem when while restoring sharing group we need to copy
sharing between two mounts with non-intersecting roots, because kernel
does not allow it.

We have a case https://github.com/opencontainers/runc/pull/3442, where
runc adds different devtmpfs file-bindmounts to container and there is
no fsroot mount in container for this devtmpfs, thus mount-v2 faces the
above problem.

Luckily for the case of external mounts which are in one sharing group
and which have non-intersecting roots, these mounts likely only have
external master with no sharing, so we can just copy sharing from
external source and make it slave as a workaround.

https://github.com/checkpoint-restore/criu/issues/1886

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Pavel Tikhomirov
f8c9e07e4f mount-v2: split out restore_one_sharing helper
This helper restores master_id and shared_id of first mount in the
sharing group. It first copies sharing from either external source or
internal parent sharing group and makes master_id from shared_id. Next
it creates new shared_id when needed.

All other mounts except first are just copied from the first one.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
a90a1d4827 amdgpu: Set PLUGINDIR to /usr/lib/criu
Building the criu packages for Ubuntu/Debian fails with:

	mkdir: cannot create directory '/var/lib/criu': Permission denied

This patch updates PLUGINDIR with the value /usr/lib/criu

Fixes: #1877

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
e6f292cb38 amdgpu/Makefile: Fix include path
When building packages for CRIU the source directory might have a
name different than 'criu'.

Fixes: #1877

Reported-by: @siris
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Andrei Vagin
6507ae5331 ci: test the read mode of pre-dump
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
f43dae720a page-xfer: refactoring analyze_iov and fill_userbuf
* handle unexpected errors of process_vm_readv
* adjust riovs in analyze_iov
* call handle_faulty_iov only if process_vm_readv returns EFAULT.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
efeedf3912 pre-dump: call vmsplice with SPLICE_F_GIFT
In this case, vmplice attaches pages without coping them.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
b2bfb7745d page-xfer: adjust a buffer to a pipe size
Due to side effects of F_SETPIPE_SZ, the actual pipe size can be greater
than PIPE_MAX_SIZE.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
0df0a7dace page-xfer: use negative values for error codes
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Andrei Vagin
51533d98ac page-pipe: fix limiting a pipe size
But actually, 5a92f100b8 probably has to be reverted as a whole.
PIPE_MAX_SIZE is the hard limit to avoid PAGE_ALLOC_COSTLY_ORDER
allocations in the kernel. But F_SETPIPE_SZ rounds up a requested pipe
size to a power-of-2 pages. It means that when we request PIPE_MAX_SIZE
that isn't a power-of-2 number, we actually request a pipe size greater
than PIPE_MAX_SIZE.

Fixes: 5a92f100b8 ("page-pipe: Resize up to PIPE_MAX_SIZE")

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
ff92731690 crit: Use same version as criu
Name collision with an abandoned project named 'crit' in pypi causes pip
to show crit (CRiu Image Tool) as outdated.  This patch updates crit to
use the same version and license as criu.

Fixes #1878

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Radostin Stoyanov
f522adec4a ci: Fix unsafe repository error
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-06-22 10:20:33 -07:00
Adrian Reber
4f8f295e57 criu: Version 3.17
Amongst a huge number of fixes all over the place this release introduces:

* mount-v2 engine
* support for MAP_HUGETLB mappings
* support for Linux Restartable Sequences
* support for SOCK_SEQPACKET unix sockets
* CRIU AMD GPU plugin
* setsockopt(SO_BUF_LOCK) support for tcp sockets

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
991f27c841 ci: skip new hugetlb maps09/maps10 tests for pre-dump
This commit has to be reverted once we fix the issue.

Issue: https://github.com/checkpoint-restore/criu/issues/1868

Reported-by: Mr. Jenkins
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
0c1f0256ff kerndat: handle the case when hugetlb isn't supported
Currently we check memfd_hugetlb by doing memfd_create("", MFD_HUGETLB).
If we see EINVAL we report that it's not supported, but we can also
get ENOENT error in such case in hugetlb_file_setup() while trying
to find proper hugetlbfs mount.

Reference:
06fb4ecfea/fs/hugetlbfs/inode.c (L1465)

Fixes: 4245e6b02f ("check: Add a check for using memfd with hugetlb")

Reported-by: Mr. Jenkins (ppc64le)
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
17a19676cd zdtm: handle the case when hugetlb isn't supported
Fixes: e2e02bc83e ("zdtm: Add MAP_HUGETLB memory mapping test")

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
c1380c077a ci: workaround race between sit module loading and bridge test
https://github.com/checkpoint-restore/criu/issues/1866

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Alexander Mikhalitsyn
550eafc5d8 ci: print kernel modules list
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-05-05 12:42:14 -07:00
Adrian Reber
f635b61f49 test: install criu in /usr
GitHub Actions comes with pre-installed criu in /usr. configure scripts
looking for CRIU will pickup the pre-installed version in /usr if we do
not install CI criu also in /usr.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-05-05 12:42:14 -07:00
Radostin Stoyanov
2f0f128396 readme: Add badge links to workflows
This commit adds a link to the workflow runs for each badge.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Andrey Zhadchenko
d14dbb8c74 sk-unix: rework bind_on_deleted() return codes
bind_on_delete() return code is only used for setting errno for pr_perror()
This is mostly useless since a lot of syscalls already set it. All of
non-syscall errors already have prints in case of failure.
Fix bind_on_deleted() always returning 0 and simplify error juggling to
returning -1 in case of errors.

Fixes: #1771
Fixes: d0308e5ecc ("sk-unix: make criu respect existing files while restoring ghost unix socket fd")
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
5b872c7183 proc_parse: Fix parsing bpf map_extra
The map_extra field has been introduced in Linux Kernel release 5.16
and does not exist in older kernel versions. The current parsing
implementation fails when map_extra is missing.

In particular, it tries to parse the `memlock` field as `map_extra` and
fails but it does not exit with an error because map_extra is marked as
"optional". It then tries to parse the `map_id` field as `memlock` and
fails with an error because map_id is not optional:

Error (criu/proc_parse.c:2161): parse_fdinfo_pid_s: error parsing [map_type:\t2] for 19: Success'

To correctly handle this, we should try to parse again the next field
when parsing of `map_extra` fails, without reading the next line from
the bpfmap.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
d40b332cef bpf: update deprecated API
bpf_create_map_xattr() has been replaced with bpf_map_create()
6cfb97c

DECLARE_LIBBPF_OPTS has been renamed to LIBBPF_OPTS
ea6c242

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
f641e0c4ba ci: print mountinfo instead of mount cmd output
mountinfo contains more info than just "mount" output

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
5c0b4fbcda ci: criu-fault: skip inotify_irmap fault-injection on btrfs
It looks like we've got broken fhandles from fdinfo
for inotifies/fanotifies for btrfs. I will look into that.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
7ac85cab86 scripts/ci: fix ZDTM_OPTS variable passing
We have a separate target for alpine in script/ci/Makefile
which defines some extra opts for zdtm using ZDTM_OPTIONS
variable. But really it doesn't work. First of all, variable
should be named as ZDTM_OPTS and also we have to specify
it directly in the CONTAINER_RUNTIME cmdline to make it work.

I've also changed variable value just to make it consistent
with docker.env value which was really used.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
ead227994b zdtm: temporary disable rseq02 test
That's strange but rseq02 test fails with:
09:06:16.222:    51: exit 555f52082120 555f52082120
09:06:16.282:    51: exit 555f52082120 555f52082120
09:06:16.340:    51: exit 555f52082120 555f52082120
09:06:16.397:    51: exit 555f52082120 555f52082120
09:06:16.503:    51: exit 0 555f52082120
09:06:16.503:    51: FAIL: rseq02.c:235: Failed to increment per-cpu counter (errno = 2 (No such file or directory))
09:06:16.503:    51: FAIL: rseq02.c:246:  (errno = 16 (Device or resource busy))

It means that rseq_cs pointer was cleaned up by the kernel despite of
NO_RESTART* flags. That's a hardly reproducible and I will investigate that.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
db9ec13616 zdtm: add rseq02 transition test with NO_RESTART CS flag
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
1e0bed3d69 rseq: handle rseq/rseq_cs flags properly
Userspace may configure rseq cs abort policy by
setting RSEQ_CS_FLAG_NO_RESTART_ON_* flags.

In ("cr-dump: fixup thread IP when inside rseq cs") we have supported
the case when process was caught by CRIU during rseq cs execution by
fixing up IP to abort_ip. Thats a common case, but there is special flag
called RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL, in this case we have to leave
process IP as it was before CRIU seized it. Unfortunately, that's not
all that we need here. We also must preserve (struct rseq)->rseq_cs field.

You may ask like "why we need to preserve it by hands? CRIU is dumping
all process memory and restores it". That's true. But not so easy. The problem
here is that the kernel performs this field cleanup when it realized that
the process gets out of rseq cs. But during dump/restore procedures we are
executing parasite/restorer from the process context. It means that process
will get out of rseq cs in any case and (struct rseq)->rseq_cs will be cleared
by the kernel. So we need to restore this field by hands at the *last* stage
of restore just before releasing processes.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
13338dee5c Revert "test: disable rseq also on Archlinux"
This reverts commit f008f74041.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
064e9925a0 zdtm: add transition/rseq01 test for amd64
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
2d3354e7b6 cr-dump: fixup thread IP when inside rseq cs
If we caught the process when it's inside rseq
critical section we have to handle it properly.

From the kernel side of view, if the process
is executing inside the rseq cs and gets a signal,
rseq critical section execution will be interrupted
and after signal handler execution, we will proceed
to rseq cs abort handler instead of continuing normal
rseq cs execution (if RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
isn't set).

When CRIU seizes processes that's the same thing as
getting signal from the rseq point of view. So we need
to fixup instruction pointer to rseq cs abort handler
address.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
4c7ece0bb7 compel: add helpers to get/set instruction pointer
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
441310c260 zdtm/static/rseq00: fix rseq test when linking with a fresh Glibc
Fresh Glibc does rseq() register by default. We need to unregister
rseq before registering our own.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
f70ddab24e pie/restorer: unregister (g)libc rseq before memory restoration
Fresh glibc does rseq registration by default during start_thread().
[ see https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=95e114a0919d844d8fe07839cb6538b7f5ee920e ]

This cause process crashes during memory restore procedure, because
memory which corresponds to the struct rseq will be unmapped and overriden
in __export_restore_task.

Let's perform rseq unregistration just before unmap_old_vmas(). To achieve
that we need to determine (struct rseq) address at first while we are in Glibc
(we do that in prep_libc_rseq_info using Glibc exported symbols).

See also
("nptl: Add public rseq symbols and <sys/rseq.h>")
https://sourceware.org/git?p=glibc.git;a=commit;h=c901c3e764d7c7079f006b4e21e877d5036eb4f5
("nptl: Add <thread_pointer.h> for defining __thread_pointer")
https://sourceware.org/git?p=glibc.git;a=commit;h=8dbeb0561eeb876f557ac9eef5721912ec074ea5

TODO: do the same for musl-libc if it will start to register rseq by default

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
e1799e5305 include: add thread_pointer.h from Glibc
Let's take thread_pointer() implementation from Glibc.
It will be useful in the further because Glibc stores
struct rseq on the TLS. Absolute address can be calculated
as __criu_thread_pointer() + __rseq_offset.
__rseq_offset is an exported symbol from Glibc itself.

We need to have an ability to determine where struct
rseq is stored to unregister it in CRIU during the restore
stage.

For different libc like musl-libc we will have to handle
rseq separately depends on how struct rseq is stored.

Right now that's not a problem because musl-libc has no
rseq support, so we don't need to unregister it.

https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=8dbeb0561eeb876f557ac9eef5721912ec074ea5
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=cb976fba4c51ede7bf8cee5035888527c308dfbc

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
267c1fdade ci: add Fedora Rawhide based test on Cirrus
We have ability to use nested virtualization on
Cirrus, and already have "Vagrant Fedora based test (no VDSO)"
test, let's do analogical for Fedora Rawhide to get fresh kernel.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
03aff7e823 Revert "ci: disable glibc rseq support"
Let's see how rseq() C/R feature works

This reverts commit d99def7dcf.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
51e0d3e29f zdtm: add basic static/rseq00 test for rseq C/R
Here we just want to check that if rseq was registered before C/R
it remains registered after it.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
c5162cef52 rseq: fail dump if rseq is used but host doesn't support get_rseq_conf feature
A lot of kernel versions lacks support for ptrace(PTRACE_GET_RSEQ_CONFIGURATION).
But the userspace may be fresh (for instance containers with fresh Fedora runs
on CentOS 7 host). Consider two scenarious:

- kernel has no ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support

1. there is a process which use rseq => fail dump
2. there is no process which use rseq => we can dump without any problems

But how to determine if process use rseq or not without get_rseq_conf feature?
Let's just try to do rseq registration from the parasite. If rseq is already
registered then we'll got EBUSY error. If not we'll success in registration.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
f81e3062ca rseq: initial support
Support basic rseq C/R scenario. Assume that:
- there are no processes with IP inside the rseq critical section (CS)
- kernel has ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support

On dump:
1. use ptrace(PTRACE_GET_RSEQ_CONFIGURATION) to get
struct rseq pointer, rseq size and signature from the kernel.
2. save to the image

On restore:
1. get rseq ptr, size, signature from the image
2. register it back using rseq() from the restorer parasite

Fixes: #1696

Reported-by: Radostin Stoyanov <radostin@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
bd9ee32554 cr-check: Add ptrace rseq conf dump feature
Add "get_rseq_conf" feature corresponding to the
ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support.

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
ca54dfcac9 util: move fork_and_ptrace_attach helper from cr-check
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
8b3a76b640 kerndat: check for rseq syscall support
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
de03eb4350 compel: add rseq syscall into compel std plugin syscall tables
Add rseq syscall numbers for:
arm/aarch64, mips64, ppc64le, s390, x86_64/x86

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
4adec8e8ef cgroup: test for --manage-cgroups=ignore
Test to ensure that --manage-cgroups=ignore works correctly.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
2b6901707c cgroup: fix --manage-cgroups=ignore
Using '--manage-cgroups=ignore' fails during restore. This fixes the use
of '--manage-cgroups=ignore'.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
c71d4a54a3 cgroup: fix "unified" path
The code expected that the cgroup directory ends with a ',' and
unconditionally removes the last character. For the "unified" case this
resulted in the last 'd' being remove instead of the non existing comma.

This just adds a comma after "unified" so that the last removed
character is not the 'd'.

Suggested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
8ddd7f4837 ci: add codespell to lint target
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
e7b1c85791 Fix remaining codespell warnings
Those that codespell have a few variants for:

./soccr/soccr.c:219: thise ==> these, this
./soccr/soccr.c:444: sence ==> sense, since
./criu/net.c:665: ot ==> to, of, or
./criu/net.c:775: ot ==> to, of, or
./criu/files.c:1244: wan't ==> want, wasn't
./criu/kerndat.c:1141: happend ==> happened, happens, happen
./criu/mount-v2.c:781: carefull ==> careful, carefully
./test/zdtm/static/socket_aio.c:54: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen.c:73: Chiled ==> Child, chilled
./test/zdtm/static/socket_listen4v6.c:73: Chiled ==> Child, chilled
./test/zdtm/static/sk-unix-dgram-ghost.c:201: childs ==> children, child's
./test/zdtm/static/sk-unix-dgram-ghost.c:205: childs ==> children, child's
./compel/arch/x86/src/lib/infect.c:297: automatical ==> automatically, automatic, automated

While at it, do some other minor fixes in the same lines.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
0194ed392f Fix some codespell warnings
Brought to you by

	codespell -w

(using codespell v2.1.0).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
3f18004778 Add .codespellrc
List words and directories to be ignored.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
f16976c033 test/zdtm.py: rename a var
Codespell thinks that pres is a misspelled pres.

Rename to pre.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
fab46c3100 test/exhaustive/unix.py: rename a var
Codespell thinks that froms is misspelled forms. Indeed it looks ugly.

Rename to from_set.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
2a60b4974c Rename useable to usable
I am not sure if this is going to bring any compatibility issues.
If yes, we need to remove this patch and add "useable" to the list of
ignored words instead.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
c4bdde2138 criu/mount.c: separate \t
Codespell thinks that tThe is a typo. Fix it by separating "\t"
which also includes readability (a bit).

[v2: run via make indent]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
51837a65ed criu/files.c: some renames
Codespell thinks that fo is misspelled of (or for),
and flem is a misspelled phlegm. Rename both.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
bd3a21e0b1 test/javaTests: rename ser to s
Codespell thinks it is a misspelled "set", and it looks that way.

Use s.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
777ad1966d Nit: rename sie to se
Codespell thinks this is a misspelled size or sigh.
Rename to se.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
716e56f37e Typo: mmaped -> mmapped
It is mapped, not maped. Same applies for mmap I guess.

Found by codespell, except it wants to change it to mapped,
which will make it less specific.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
d9411c948d test/zdtm/static: s/NODEL/NO_DEL/
Codespell thinks that NODEL is a misspelled MODEL. Indeed it looks that
way. Add an underscore.

Do the same for the file names.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
58d76cb160 test/zdtm/static/inotify_system.c: s/inot/infd/
Codespell thinks that "inot" is a misspelled "into".

Rename to infd ("inotify fd") to make it happy.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
0cb8b9c044 test/zdtm/static: use param not parm
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
58b120b069 criu/pie/restorer.c: use param not parm
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Kir Kolyshkin
747ec75d9f criu/arch/s390/include/asm/restorer.h: fix comments
Use "Parameter" instead of "Parm" as in other places.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
8bb05e3bf3 ci: Switch to non overlaysfs tests
Switch to non overlaysfs tests for Podman and Docker.
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1967924

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Andrei Vagin
45e048d77a criu: generate unique socket names
CRIU has a few places where it creates unix sockets and their names have to be
unique for each criu run.

Fixes: #1798
Signed-off-by: Andrei Vagin <avagin@google.com>
2022-04-28 17:53:52 -07:00
Fangrui Song
75064b7424 mount: fix -Wunused-but-set-variable for Clang 15
Since https://reviews.llvm.org/D122271, Clang -Wset-but-unused-variable
gets smarter to warn about unused post-increments.

Signed-off-by: Fangrui Song <maskray@google.com>
2022-04-28 17:53:52 -07:00
jiang wei
46e4773c3b style: delete some redundant code
There is some redundant in compel/src/main.c, making it better

Signed-off-by: jiang wei <jwcesign@gmail.com>
2022-04-28 17:53:52 -07:00
Fangrui Song
5109fccf8c apparmor: Fix -Wfortify-source for Clang
```
criu/apparmor.c:679:26: error: 'fscanf' may overflow; destination buffer in argument 3 has size 48, but the corresponding specifier may require size 49 [-Werror,-Wfortify-source]
        ret = fscanf(f, "%48s", contents);
```
The buffer size should be at least one larger than the fscanf maximum
field width.

Fixes: 8d992a680e ("lsm: support checkpoint/restore of stacked apparmor profiles")
Signed-off-by: Fangrui Song <maskray@google.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
791651f1b6 criu-ns: add a helper to hold a pid namespace
The init process can exit if it doesn't have any child processes and its
pidns is destroyed in this case. CRIU dump is running in the target pid
namespace and it kills dumped processes at the end. We need to create a
holder process to be sure that the pid namespace will not be destroy
before criu exits.

Fixes: #1775

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
805559c1de scripts/ci: mount test cgroups once
zdtm.py mounts two named controllers for tests. In CI, we run zdtm.py a few
times, so we can mount (create) these controllers once to avoid any unwanted
effects.

Signed-off-by: Andrei Vagin <avagin@google.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
ab6191ccd3 zdtm: use unique holder for cgroups
The idea that each zdtm.py should have own helder, so that two zdtm.py that are
running on the same host don't effect each other.

Fixes: #1774
Signed-off-by: Andrei Vagin <avagin@google.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
73a783ac17 mount: make error messages differ in different places
We have three of "Can't mount at %s", let's distinguish simple mount
from bind-mount and re-mount to make log reading easier.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
165d5a2cd5 mount-v2: make mount engine fallback messages loglevel debug
On pre v5.15 kernel we don't have MOVE_MOUNT_SET_GROUP support and thus
all our ci logs are filled with "fallback" messages. Let's decrease log
level to debug, so that we don't see it in ci logs.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
8867840c82 zdtm/mount-v2: disable pty-console test
[root@fedora criu]# ./test/zdtm.py run -t zdtm/static/pty-console --iters 2 --keep-going --ignore-taint
  [WARNING] Option --keep-going is more useful when running multiple tests
  userns is supported
  === Run 1/1 ================ zdtm/static/pty-console
  ====================== Run zdtm/static/pty-console in uns ======================
  Start test
  Test is SUID
  ./pty-console --pidfile=pty-console.pid --outfile=pty-console.out
  Run criu dump
  Run criu restore
  Run criu dump
  =[log]=> dump/zdtm/static/pty-console/62/2/dump.log
  ------------------------ grep Error ------------------------
  b'(00.009325) 101 fdinfo 3: pos:                0 flags:           100000/0'
  b'(00.009332) Dumping path for 3 fd via self 19 [/zdtm/static]'
  b'(00.009345) 101 fdinfo 4: pos:                0 flags:           100002/0'
  b'(00.009352) tty: Dumping tty 20 with id 0xc'
  b"(00.009358) Error (criu/files-reg.c:1710): Can't lookup mount=1647 for fd=4 path=/ptmx"
  b'(00.009361) ----------------------------------------'
  b'(00.009369) Error (criu/cr-dump.c:1368): Dump files (pid: 101) failed with -1'
  b'(00.009696) Running network-unlock scripts'
  b'(00.012401) Unfreezing tasks into 1'
  b'(00.012410) \tUnseizing 86 into 1'
  b'(00.012415) \tUnseizing 101 into 1'
  b'(00.012428) Error (criu/cr-dump.c:1788): Dumping FAILED.'
  ------------------------ ERROR OVER ------------------------
  ################ Test zdtm/static/pty-console FAIL at CRIU dump ################
  Test output: ================================

   <<< ================================
  Send the 9 signal to  86
  Wait for zdtm/static/pty-console(86) to die for 0.100000
  ##################################### FAIL #####################################

Restore on second iteration with mount-v2 fails, that is because
devpts_restore which is called from do_new_mount_v2 via fstype->restore
opens ptmx file in service mntns and saves it to fdstore for later use.
So after first c/r open ptmx fd changes mnt_id in fdinfo to a detached
mount. Let's just disable mount-v2 for this test for now.

FIXME: We should create separate fstype hook to do_mount_in_right_mntns,
so that we can open files from this hook in actual restored mntns.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
c8121ed746 test/jenkins: test for old mount engine
Let's run zdtm in jenkins with --mntns-compat-mode option and same for
device-external mount test from others.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
3c0e99ccfa ci: make others/mnt_ext_dev also run for old mount engine
Now when we switched to mount-v2 by default to check old mount engine we
need to explicitly run with --mntns-compat-mode option.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
642abd133e zdtm/mount-v2: disable mnt_tracefs test
We can have tracefs separate mount from debugfs and that's why the
/sys/kernel/debug external mount now has children and this thing is not
supported to be bind in container with children, because we don't wan't
external mounts to introduce some unexpected extra external mounts so we
bind them without MS_REC in mount-v2 unlike in old mount engine.

We can either bind without MS_REC when constructing test or provide all
children mount as separate external mounts to criu, let's just disable
for now.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/87875c023

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
f736d88c99 zdtm: add propagation group with mount flags to mount_complex_sharing
Before mounts-v2 we have seen mounts loosing their mount readonly flags
when they were in a propagation group, because CRIU "forgot" to set
them, with new mount engine it should work now as all propagations are
now created on the same path there all other normal mounts are created,
and all mount flags are restored.

This test actually mounts only one mount, other three are propagations,
lets set mount ro flag for half of them.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/22584993d

FIXME: need to check options restored right as we don't have
--check-mounts to do this job for us.

Reviewed-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
ef53df471d zdtm: add mount_complex_sharing test
Mounts-v2 engine should fix multiple problems of old engine relative to
sharing options, lets add a test for such problems.

Add all four types of shared groups: 1) private, 2) shared, 3) slave
and 4) slave+shared for mounts. Propagate them into sharing and after
propagation change sharing with four ways: 1) don't change, 2) make
private, 3) make slave and 4) make private + make shared.

This brings 16 cases of different sharing options for mount propagation,
lets check that they all are restored fine.

Lets create mounts from description to make it easier to improve this
test in future.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8bcd0034d

FIXME: need to check options restored right as we don't have
--check-mounts to do this job for us.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
486e1fd854 zdtm: add new mnt_ext_sharing test for mount-v2
These test simply checks that sharing between two mounts in container:
1) external mount and 2) it's bind persists (case when bind has the same
mountpoint).

Note: on old mount engine mounts inside container become also shared
with mount in criu mount namespace (outside container) after c/r which
is not right.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/76a09e850

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
3db949d821 ci: run tests for old mount engine
Now when we switched to mount-v2 by default to check old mount engine we
need to explicitly run with --mntns-compat-mode option.

Note that if the feature move_mount_set_group is not supported then
regular run will just fallback to old mount engine and then we don't
need separate run with --mntns-compat-mode.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
8d6e2d0442 zdtm: enable mounts compat mode on restore with --mntns-compat-mode option
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e4a430e1f

Changes: prepend --mntns-compat-mode to r_opts in zdtm.py so that we
can disable this option with --no-mntns-compat-mode from test desc
files.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
b35c842d0f mount: add new mounts-v2 engine
Design of mounts-v2:

  As a preparation step we classify mounts in groups by (shared_id,
  master_id) in new resolve_shared_mounts_v2 (just after reading images).

  New function prepare_mnt_ns_v2 is our main entry point when switching
  from old mount engine to new one actually happens.

  First we pre-create each mount namespace nearly empty, only with root
  yard in place (pre_create_mount_namespaces).

  We walk the mount tree and mount each mount similar to old mount
  engine but not in mount tree but as a sub-directory of root yard
  (plain mountpoint) in service (criu) mount namespace. Also we
  bind this mount from service mntns to real mntns just after creation.
  (do_mount_in_right_mntns)

  Note: this way we initially have the final mount which would be
  visible to restored container user with right mnt_id for the sake of
  e.g. creating unix sockets on it (for unix socket bindmounts), and
  both have copy of the mount in service mntns so that old code which
  accesses files on mounts through service mntns still can acces them.

  New can_mount_now_v2 is now free from heuristics we had for restoring
  shared groups, we will restore them later via MOVE_MOUNT_SET_GROUP,
  for now everything is private.

  Now when all plain mount are created in real mount namespaces, we can
  move them to the tree for each namespace. Also we open fds on the
  mountpoint: one mp_fd_id before moving and another mnt_fd_id after,
  so that we can access each file later from final mntns via those fds.
  (assemble_mount_namespaces)

  New restore_mount_sharing_options walks each root sharing group and
  their descendants with dfs tree walk. It creates sharing for the first
  mount in the sharing group and then sets the same sharing on all other
  mounts in this group.

  Sharing creation for fist mount is two step:

  a) If mount has master_id we either copy shared_id from parent sharing
  group or from external source and then make mount slave thus
  converting it to right master_id.
  b) Next if mount has shared_id we just make us shared, creating right
  shared_id.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/596651d02

Changes:
- Split all "exporting" to separate preparational patches
- Rework cr_time
- Switch to MOVE_MOUNT_SET_GROUP
- Use resolve_mountpoint for external mounts (for MOVE_MOUNT_SET_GROUP)
- Mounting plain mounts both in service and in restored-final mntns
- Call MOVE_MOUNT_SET_GROUP from usernsd
- Rework can_mount_now_v2 to handle bind of both root and external.
- Use sys_move_mount for mount assembling.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
c29675c9a5 mount: export global variables for mount-v2
Export root_yard_mp and it's mntns_roots.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
972a598628 mount: export several functions for mount-v2
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
3229e7f582 mount: export common defines for mount-v2
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
0723d0cd96 mount: remove double ns_id declaration
Fixes: d0d117986 ("mount: move functions about mounts from proc_parse.h")

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
1f4a9a531d files-reg: export parent dirs helpers for mount-v2
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
f032741cd6 mount: add plain mountpoints
This is a preparation of mounts-v2 new algorithm for mount restore, we
add an alternative mountpoints to each mount, so that if we mount mounts
in these mountpoints they will be "plain": each mount in separate
sub-directory of root_yard, mounts will be mounted without tree. Tree
reconstruction will be done in separate step.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/5e6de171a

Changes: improve get_plain_mountpoint().

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
f2d1c7fab8 config/rpc: add new option --mntns-compat-mode for old mount engine
We plan to switch to Mounts-v2 engine for restoring mounts by default,
this options is to allow switching to old engine. This patch only adds
an option, no engine behind it yet.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/503f9ad2c

Changes: allow --mntns-compat-mode option only on restore and only if
MOVE_MOUNT_SET_GROUP is supported (this also requires change in
unittest/mock.c), change id in rpc criu_opts.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
f6b52c711e crtools: move check_options after kerndat_init and log_init
This can be useful to check options which depend on some kernel features
listed in kdat.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
6a25420d35 util: add resolve_mountpoint helper
This helper would be useful to get mountpoints of source path of
external mounts without parsing host mountinfo. When we restore
mountpoint-external mount and we need to copy sharing from source via
MOVE_MOUNT_SET_GROUP, it would require from us to give it real
mountpoint of source path to be able to copy sharing group.

This uses openat2 RESOLVE_NO_XDEV feature which detects crossing
mountpoint boundary instead of potentially slow mountinfo parsing.

v3: coverity CID 389209: close fd only when it was opened

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
cef8366f52 kerndat: check whether the openat2 syscall is supported
Will use openat2 + RESOLVE_NO_XDEV to detect mountpoints.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
387f4652b3 compel: add open_tree syscall
Will use this for cross mount namespace bindmounts.

Note: don't need separate kdat for mount-v2, as MOVE_MOUNT_SET_GROUP
were added much later than open_tree and all related fixups.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
a946b946e8 kerndat: Check for MOVE_MOUNT_SET_GROUP availability
Mounts-v2 requires new kernel feature MOVE_MOUNT_SET_GROUP to be able to
restore propagation between mounts right.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/7da7f9a17

Changes: define move_mount syscall, check mainstream kernel
MOVE_MOUNT_SET_GROUP feature, use our "linux/mount.h" to overcome
possible problems of non-existing header on older kernels.

v3: coverity CID 389201: check ret of umount2 and rmdir at cleanup stage

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
0ca89b99bb files-reg: teach clean_one_remap to work with mount-v2
While mounts-v2 mounts all mounts plain without tree in service mntns we can't
just use path relative to mntns to find remap. Make it mount related, it is
also compatible with mounts-v1.

Also we don't need openat and unlinkat here as we've opened rmntns_root
just before that, lets switch to "non-at" variants.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/dc9ac0c80

Changes: rework to skip vz-specific hunks.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
9a0918497d files-reg: teach create_ghost to work with mount-v2
While mounts-v2 would mount all mounts plain without tree in service
mntns we can't just use path relative to mntns to find remap. Make it
mount related, it is also compatible with current mount engine.

Also handle no-mntns case separately in nomntns_create_ghost.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/9cdf0b3e4

Changes: make gf->remap.rpath always relative else we get:
Error (criu/files-reg.c:779): Couldn't unlink remap
/tmp/.criu.mntns.BCurDL/13-0000000000 /zdtm/static/cwd02.test:
No such file or directory

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Stanislav Kinsburskiy
169f95c394 files-reg: split create_ghost_dentry out of create_ghost
Will use it to make create_ghost work with mount-v2.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/156fa4877
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/069bba0ad

Changes: merge fixup.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
9fb3984a7e mount: add service_mountpoint getter for ->mountpoint
This getter should be used when we wan't to access the mount on the filesystem.
In next patches we want to be able to change the location of the mount on
restore in service mount namespace, while not changing ->mountpoint string.
All places where we don't want to access the mount but instead want to
determine relations between mounts in the initial mount tree or just print path
should use ns_mountpoint.

This change effectively brings no change of behaviour everything is the same
for now.

Still leave ->mountpoint references for remap, cr_time and initialization which
need to work with exact variable.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/235c761e0

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
65967a84b9 mount: use ns_mountpoint instead of mountpoint where possible
On dump ->mountpoint and ->ns_mountpoint is the same, but on restore
->mountpoint can be changed by mount tree yard setup and remap (and who
knows what else =) ). It is not good to use ->mountpoint for path
comparison between mounts if we are not explictly need to compare
"changed" paths. Imagine the remap change will make two mounts have
different prefixes in ->mountpoint and we won't be able so understand
that those mounts originally were subpaths.

This patch handles 2 simple cases:

a) These functions called ONLY ON DUMP so for them there is no effective
change: fixup_overlayfs, fusectl_dump, check_one_mark, __lookup_overlayfs,
mount_resolve_path, try_resolve_ext_mount, validate_mounts (first and third),
resolve_external_mounts, get_clean_mnt, __umount_children_overmounts,
__umount_overmounts, ns_open_mountpoint, open_mountpoint, dump_one_fs,
dump_one_mountpoint, clean_cr_time_mounts, collect_unix_bindmounts.

b) In these functions ONLY LOGS changed, so no algorithm change:
always_fail, mnt_build_ids_tree, mnt_tree_show, unsupported_nfs_bindmounts,
unsupported_nfs_mount, unsupported_mount, validate_mounts (second),
__search_bindmounts, resolve_shared_mounts, mnt_tree_for_each, resolve_source,
propagate_siblings, propagate_mount, do_mount_one, get_mp_root,
collect_mnt_from_image, merge_mount_trees, ns_remount_writable,
__remount_readonly_mounts, parse_mountinfo.

All complex cases are handled in separate patches.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/4972888dd

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
eedbc6f478 mount: use ns_mountpoint in mnt_depth
Function mnt_depth is only used on real mounts when building mount tree for
single namespace, thats why we can compare those mounts with ns_mountpoint
safely.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/2be0ff276

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
ae0b218c30 mount: use ns_mountpoint in aufs_parse
At this point ns_mountpoint is equal to mountpoint.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/c70bd7de0

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
7b968ceeab mount: use ns_mountpoint in collect_mntinfo
At this point ns_mountpoint is equal to mountpoint.

More over let's use robust is_same_path helper in should_skip_mount so
that we don't need to rely on ->mountpoint + 1 hacks.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d4c4271a0

Changes: use is_same_path helper.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
f2bf6597ca path: simplify mnt_get_sibling_path via get_relative_path
Previous code did:

1) get rpath: mount's mountpoint relative to it's parent mountpoint
2) get cut_root: parent's root relative to parent's slave root or vice
versa (will be "-" if parents root is wider of "+" if thicker)
3) return parent's slave mountpoint +/- cut_root + rpath

It can be done more robust with get_relative_path:

1) get rpath: mount's mountpoint relative to it's parent mountpoint
2) get fsrpath: add rpath to parent's root (path relative to fs root)
3) get rpath: fsrpath relative to parent's slave root
4) return parent's slave mountpoint + rpath

In the latter approach we do not need to open code workarounds for
consequent slashes in paths (get_relative_path would do this for us),
and we also do not need to have complex logic with +/-.

While on it let's also switch ->mountpoint to ->ns_mountpoint where
possible, as mountpoint can have unexpected prefixes.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/0fd09f8571

Changes: rework mnt_get_sibling_path more.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
abbc70adc9 mount: use ns_mountpoint for children-overmount check
We need to skip root_yard_mp parent as it has no ns_mountpoint, it also
has no children overmounts so we are safe, all others can be compared by
ns_mountpoints.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e5665c976

Changes: add mi->parent pre-check, reword commit message.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
c17695cb11 mount: use ns_mountpoint in root_path_from_parent
Fail root_path_from_parent if parent is root_yard, we want to only
lookup root path in real parent mounts.

Now it is safe to use ns_mountpoint instead of mountpoint as both
children and parent have it and they are relative.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e58a91883

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
010295b8fa mount: use ns_mountpoint in validate_children_collision
Function validate_children_collision is both called on dump and on
restore. On dump mountpoint and ns_mountpoint are the same. On restore
as we never call validate_children_collision on helper mounts
(root_yard_mp and cr_time are not in mntinfo list), for all other mounts
strcmp results would be the same with mountpoint and ns_mountpoint.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8f4fda5ac

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
07eb01593d mount: skip root yard children from mnt_needs_remap check
There is no point of remaping ns root mounts they can't overmount anybody.

This also allows us to switch mnt_needs_remap from ->mountpoint to
->ns_mountpoint for mount comparison in overmount detection.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/9475bf843

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
e8de10a4fb mount: use ns_mountpoint in mnt_is_overmounted
Let's use ->ns_mountpoint in comparison as ->mountpoint can change (e.g.
see how we add ns root in get_mp_mountpoint and in do_remap_mount we can
change it again). We plan to get rid of ->mountpoint everywhere where we
can use unchanged ->ns_mountpoint.

Cherry-picked hunks from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e98e1456d

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
b954e51360 autofs: use ns_mountpoint in autofs_create_dentries
Replace ->mountpoint with ->ns_mountpoint for determining relations
between mounts.

Also let's use get_relative_path in autofs_create_dentries as it is more
robust, before that we've missed the case where mountpoint of child of
autofs mount is multilevel subdirectory of parent mountpoint, and always
created them as single level subdirectory.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/5d5462202

Changes: skip children overmount as it does not need a subdirectory.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
7a67949e55 mount: make general place for shared variables on mount-info on restore
Put remounted_rw to it. This allows us to easily add some more of such
variables without allocating each one of them separately.

Due to existance of shfree_last shmalloc'ed region can be inherited from
the previous caller so it needs to be explicitly zero initialized.

Fixes: 0a2d380e6 ("ghost/mount: allocate remounted_rw in shmem to get
info from other processes")

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/6750e5793

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
0c41c1187b mount: fix broken remounted_rw check
Expression (x && REMOUNTED_RW) is always same as just (x).

It should've been (x & REMOUNTED_RW) to check if mount is marked as
temporary remounted writable and requires to be switched back.

By fixing this check we eliminate excess readonly remounts.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/167f8ac67

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
7182470454 mount: move root yard tree merge as early as possible
Let's merge mount trees under root_yard just after reading from image.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8e8ecdfdc

Changes: split only root yard part as a separate patch, and put root
yard alloc into merge_mount_trees.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
770cdbfb9f mount: prepare is_overmounted as early as possible
Function mnt_is_overmounted is designed to detect if mount is overmounted in
current tree using comparison of mountpoints of neighbour mounts for detection.
We want to get actual overmounts in dumped tree, we don't expect that helper
mounts we add or merging will introduce new overmounts. So let's do overmount
detection earlier before adding helpers.

Set is_overmounted = false for root yard and binfmt helper mounts.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e98e1456d

Changes: rename set_is_overmounted to prepare_is_overmounted, move it
just after collecting mounts from images to mount tree, handle helper
mounts.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
83bbf1b051 mount: add helper mnt_get_external_bind_nodev
Will use it to find shared mount we can bind from and also can inherit
external slavery. Device-external can't give us external slavery.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/dcd952c4c

Changes: switch to mnt_bind_pick helper.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
0fd0e03a21 mount: do not override master_id to -1 for root binds
There is no point to lose this information, having -1 everywhere in
mount images instead of acutall master id can be confusing.

Note that now need_master is true for bindmounts of root mounts with
same master_id as root mount, so now they are handled with a common
code, we've added can_receive_master_from_root check specially to handle
this case right. Also note that in propagate_mount we no more set ->bind
for this case, this is handled by mnt_ext_slave list related code.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/b3c9dc05e

Stripped only master_id relative part of original patch, add
preparational patches before this one.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
4f156f32ba mount: put external slavery mounts to separate mnt_ext_slave list
We need to put mounts which need to inherit master_id from external
mounts or from root mount into separate list, so that we can set ->bind
on them right in propagate_siblings.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/ea592cf6e

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
ef79912c1d mount: add can_receive_master_from_root helper
If mount has external master_id it can inherit it as a bind of external
mount, but also it can inherit it as a bind of container root mount, so
let's add similar condition to allow such mounts.

Note: need_master is false for binds of root mount which can inherit
master_id from root mounts yet, this would change in next patch.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
b52fcb284a mount: replace CRTIME_MNT_ID with HELPER_MNT_ID
Root yard mount also has mnt_id == 0 so it will look better with a new
name. Let's explicitly initialize root yard mnt_id to HELPER_MNT_ID
for the sake of code readability.

Also in near future we might want to create additional mount helpers to support
mounts in CT with no fsroot mounted.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/45bf6f0ee

Changes: split umount hunk to previous patch, set HELPER_MNT_ID for root
yard.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
4736a7240e mount/restore: leave ns_mountpoint NULL for aux binfmt_misc mount
On dump, yes, mountpoint and ns_mountpoint are the same, but on restore
they don't and puting something like "<root_yard>/binfmt_misc" to
ns_mountpoint is wrong, let's leave ns_mountpoint NULL, this mount
should not be compared by ns_mountpoint with other mounts anyway.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
16085b5e67 mount/restore: create auxiliary binfmt_misc mount in the root yard
Put our auxiliary binfmt_misc mount in "<root_yard>/binfmt_misc" instead
of "<root_yard>/<mntns>/proc/sys/fs/binfmt_misc". Thus we can restore
binfmt_misc without altering actual mount tree, which looks much more
safe.

For that we need to remove "fake top mount_info" handling from
add_cr_time_mount as now we intentionally add binfmt_misc mount as a
child of ("fake") root yard. On dump this does not change anything.
Also we need to create mountpoint for binfmt_misc in root yard.

As now mount is out of restored mount tree we don't need to umount it,
so remove corresponding CRTIME_MNT_ID umount hunk in do_new_mount.

Note: to make binfmt_misc c/r work criu should be compiled with
CONFIG_BINFMT_MISC_VIRTUALIZED and binfmt_misc should be actually
virtualized and this is only done in Virtuozzo kernel per ve.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/2eb535843
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d79c7f441
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/34002bef4
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/45bf6f0ee

Changes: merge all fixups together to one consistent patch.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
a379d4d945 zdtm: add mntns_pivot_root_ro test
This checks that superblock readonly flag is applied to nested mntns
roots on restore.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
2a3d2bc284 mount: apply superblock flags to nested ns roots
Before this change we didn't apply sb-flags if we mount the root mount of
non-root mntns. There is no point in it, if we got to do_new_mount this root
mount is not external bind, so we won't change sb-flags on host if we change it
for this mount. So we just loose sb-flags on some regular container mount for
no reason. Fix it.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e7ffe4c60

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
77f67973f2 zdtm: add mntns_pivot_root test
This creates nested mntns and does pivot_root to tmpfs mount, so that
roots of original test mntns and in nested mntns are different.

Before allowing nested mntnses with different roots in previous patch
this would fail.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
2fdb4993a0 mount: allow nested mount namespaces with different roots
Only root in root-mntns is special (see rst_mnt_is_root) all other
mounts are mounted regulary there is no difference between ns root and
any other mount or bind-mount.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/f41e41dd5

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
cf6fe2d48b mount: add mnt_is_root_bind helper
Helper mnt_is_root_bind indicates that mount can be bind-mounted from
the root mount (which in it's turn from opts.root).

Use it in validate_mounts: we should skip unsupported mount from fsroot check
if we know it will be bindmounted from root mount, is_ns_root check was wrong.

Also fix root mount check in dump_one_fs, root mounts in non root mntns should
be dumped normally if they are not bind-mounts of root mount.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/25d078971

Changes: switch to mnt_bind_pick helper, export to mount.h, also add
mnt_get_root_bind helper for future use in mount-v2, remove excess root
yard hunk.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
e50abbd3b3 zdtm: add mnt_ext_collision test
This test creates two mount namespaces, one "root" with external mount
at /mnt_ext_collision.test/dst and one "nested" with different internal
mount at /mnt_ext_collision.test/dst instead.

This case is important for nested containers, if we dump a container
with some external mount in /mnt we should not also replace mounts in
/mnt for nested containers with the external one. (One example is docker
containers inside Virtuozzo containers.)

Without previous patch which restricts external mounts resolution to
only root mntns of container this test fails as internal mount is
replaced by external one after migration.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
a963ceb770 mount: restrict mp-external mount map to init container mntns only
We resolve mountpoint-external mounts on dump by mountpoint comparison,
so if we have other mount (other superblock e.g. in nested mntns) with
same mountpoint we would also resolve this mount as external and restore
it as external: replacing it completely with different mount... That's
wrong, so to make this interface more robust let's only resolve
mountpoint-external mounts in root mntns of container, not in all
mntnses as it was before.

Note: if actual external mount (bind of external) gets to nested mntns
it's ok not to resolve it as external as criu would bind it from the
resolved mount in root mntns. So external mounts in nested mntns are
still supported after this patch.

Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/034498b28

Changes: apply mntns check only to mountpoint-external mounts.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
007501f985 zdtm: add new mnt_ext_root test
This test simply creates a) root external mount and b) "deeper"
bindmount for it (deeper in terms of mnt_depth). Our mount restore code
tries to mount (b) first and fails (without previous patch ordering
external mounts before their binds).

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/d31954669

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
4f94149346 mount: mount external mount before mounting it's binds
The problem when we don't order these mounts we can get to mounting
non-external bind first via do_new_mount and fail c/r. For instance for
tmpfs we would fail on no image to get contents from. See the test
mnt_ext_root for more info.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/baf3f8db8

Changes: switch to mnt_bind_pick helper, export to mount.h, make check
in can_mount_now skip mounts with ->bind set.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
d5cb7764e4 mount: show more info about why we can't mount
Currently if we have mount deadlock it is hard to understand which
mounts lock each other, these makes it easier.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8ba8499e2
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/48d044ae11

Changes: merge newline fixup.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
685a53eeca mount: rework skipping external mounts in dump_one_mountpoint
Function dump_one_fs already has mnt_is_external_bind check inside, so
there is no point to check pm->external one more time.

Function check_bindmount is intended to check devpts bindmount's master
was opened in right mount namespace, but if bindmount is external mount
there is no point to check this. Let's also skip check for bindmounts of
external mounts.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
3b2b808128 mount: split mnt_is_external(_bind) and can_receive_master_from_external
We use mnt_is_external():

1) In validate_mounts() to skip fsroot existence check for mounts which
will be bind-mounted from external mounts.

2) In resolve_shared_mounts() to skip error on slave mounts without
master mount, if they can receive these master_id through external
mount.

3) In dump_one_fs to skip dump of mounts which will be bind-mounted from
external mounts.

Cases (1) and (3) are the same, but case (2) is quiet different. Lets
split these cases thus making things simplier.

Effectively these patch does not change criu's behaviour at all. While
I can't say that old mnt_is_external was wrong, it was too complex and
hard for understanding, so it's worth to switch to lookup across
bindmounts list via general mnt_bind_pick() helper. And now when it is
obvious that mnt_is_external looks for external bindmount, let's also
change it's name to mnt_is_external_bind.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/494b52ba8

Changes: use mnt_bind_pick helper, use is_sub_path helper to be more
robust, rename mnt_is_external to mnt_is_external_bind, fix
clang-format, export to mount.h, use mnt_is_nodev_external as we can not
inherit master from device-external mounts.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
c09bd89419 mount: add mnt_bind_pick helper to pick the desired bind
Adding different pick functions we would be able to search different
things like mounted bind with wider root, or external bind, or external
bind with same sharing group and so on and so forth.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
9d1f39f28a unittest: add some tests for get_relative_path helper
v2: let's also mock kerndat_s.sysctl_nr_open field, to make aarch64
clang ci happy

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
97bd9511ca util: add get_relative_path helper
This is a smart way of getting relative paths:

1) Always returns relative path, no unexpected starting '/';
2) Detects subpath even if path formats are different, only real directory
and file names matter;
3) No path modiffication/allocation, returns shifted pointer to the
orignal path.

We have many places where we need to cut subpath from path. Different code
blocks doing this job spread widely across the codebase for instance see:
cut_root_for_bind and root_path_from_parent. But those implementations rely on
the fact that subpath's and path's formats are the same.

When we modify or concatenate paths we can accidentally get strange
path formats, paths given by user can have strange format, and the job
to manually maintain all paths in "simple" format everywhere is too
hard. So let's just add a tool to compare "strange" paths.

E.g.:

get_relative_path("./a////.///./b//././c", "///./a/b") == "c"

Note: ".." in path is not supported, and we just can't support it right
without full filesystem tree information to resolve paths like
"../../a", so we just treat ".." as a directory name which should work
in simple cases.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/73a771348

Changes: add other useful robust path comparison helpers is_sub_path and
is_same_path based on get_relative_path, fix clang-format.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
261b7a8fd4 mount: setup mnt_bind list before using it in mnt_is_external
Before this patch mnt_is_external() used non-populated mnt_bind list
when called from resolve_shared_mounts(), thus it could work not as
intended.

Let's add separate helper search_bindmounts() for populating mnt_bind
list, and add mnt_bind_is_populated to differentiate between
non-populated list and just empty populated list. This way we can add a
BUG_ON to mnt_is_external to catch such order problems in future.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/e464c1c6d
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/8b22b30d5
Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/ca9de41e3

Changes: simplify commit message, merge fixups: search bindmounts
earlier so that we have bindmounts info as early as possible, rename
mnt_no_bind to mnt_bind_is_populated and simplify it's logic a bit.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
30261a7515 mount: skip fstype and source checks for external mounts in mounts_sb_equal
Fstype and source fields can be changed by resolve_external_mounts() or
by try_resolve_ext_mount() for external mounts, but we can have other
mounts from same superblock which are not detected as external, for
instance bind of subdirectory from device-external or bind of
mountpoint-external mount to other mountpoint. So we need to still be
able to find bindmounts between mounts with changed fstype or source and
unchanged mounts.

So let's make fstype/source checks in mounts_sb_equal ignored for
external mounts. Leave only fstype->sb_equal checks if have them.

Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/fadc38d84
Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/f9700cb12

Changes: merge two commits in one and rework, remove ":)", reword
commit-message to make patch self-sufficient.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
8d5300aa9b mount: mark mounts of external devices external
Previously only autodetected and mountpoint external mounts had
mount_info->external field set, let's fix this injustice so that we can
operate all external mounts in a similar manner.

Also:

Print info message when device external mount is detected similar to
mountpoint external mounts detection.

Add helper mnt_is_nodev_external to let do_mount_one, can_mount_now and
do_bind_mount handle device external mounts separately as it was before.

Handle device external mount right in get_mp_root to set ->external on
restore. (note: calling ext_mount_lookup is only meaningfull for
mountpoint external mounts)

Add helper mnt_is_dev_external to use in resolve_source to make it more
clear that it is a device external mount restore path.

All other "if (mi->external)" checks now also handle device external
mounts, but they all look safe to do so and could've done it initially,
here is a list: fusectl_dump, mnt_is_external, dump_one_mountpoint,
propagate_mount.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/afd899539

Changes: cleanup commit message, add some helpers.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
e17c1cc128 mount: do not detect non-fsroot mounts as device-external
Device-external mounts are restored via do_new_mount(), but function
do_new_mount only allows creating mounts with root "/", as it does
simple mount (not bind) without any later root change. Restoring
non-root mounts via do_new_mount is just imposible.

So let's detect mounts as device-external only when they have fsroot
root, all other non-fsroot binds of this device would be restored as
bindmounts of fsroot ones.

This is a cosmetic change as though non-root mounts were detected as
device-external before this patch they anyway would not be created with
do_new_mount() because of fsroot/bind check in can_mount_now orders them
to be restored as binds.

Cherry-picked one hunk from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/afd899539

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
eda1e5fdbd mount: add mntinfo_add_list_before helper for adding to mntinfo list
Use this helper everywhere instead of manually adding mounts to the head
of the list, this way it is much easier to track all places where we do
add to mntinfo list.

Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/7bca9397b

Changes: skip hunk adding root_yard_mp to the list because root yard has
not fully initialized mountinfo structure (can break code which uses
mntinfo fallback in lookup_nsid_by_mnt_id), let's only have real mounts
in mntinfo list. Also skip cr_time mount from mntinfo list for the same
reason.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
9649356e3a zdtm: fix mnt_ext_master test to correspond to it's name
Before these change the on-host-"zdtm_auto_ext_mnt" mount with
mountpoint "/tmp/zdtm_ext_auto.XXXXXX" was private/shared depending on
it's parent mount "/tmp". And e.g. on my setup the parent mount on
"/tmp" is private and our "host" mount becomes private too. So
in-container-"zdtm_auto_ext_mnt" external mount is also private but test
name hints it should be slave.

E.g. If I ran mnt_ext_master before this patch, in mnt_ext_master
process mntns we see that our "external" mount is private but not slave:

[root@fedora criu]# grep zdtm_auto_ext_mnt /proc/167077/mountinfo
1239 1238 0:138 /test /ext_mounts rw,relatime - tmpfs zdtm_auto_ext_mnt rw,seclabel,inode64

After this patch:

[root@fedora criu]# grep zdtm_auto_ext_mnt /proc/166385/mountinfo
1239 1238 0:138 /test /ext_mounts rw,relatime master:413 - tmpfs zdtm_auto_ext_mnt rw,seclabel,inode64
                                              ^^^^^^^^^^

So we just explicitly make on-host-"zdtm_auto_ext_mnt" shared, and this
makes in-container-"zdtm_auto_ext_mnt" external mount slave.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/a1a221fe9

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
5a8fd343f5 uffd: fix __u64 print format specifier
coverity CID 389197:

CID 389197 (#1 of 1): Invalid printf format string (PRINTF_ARGS)
format_error: Length modifier L not applicable to conversion specifier in %Lu. [show details]
284 pr_err("Incompatible uffd API: expected %Lu, got %Lu\n", UFFD_API, uffdio_api.api);

Looking on C11 standard it seems that "%Lu" is undefined, we better not
use this, see:

"L Specifies that a following a, A, e, E, f, F, g, or G conversion
specifier applies to a long double argument."
http://port70.net/~nsz/c/c11/n1570.html#7.21.6.1p7

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
9e74735160 sk-unix: fix e_str leak in unix_sk_id_add
coverity CID 389191:

int unix_sk_id_add(unsigned int ino)
2327{
2328        char *e_str;
2329
    1. alloc_fn: Storage is returned from allocation function malloc.
    2. var_assign: Assigning: ___p = storage returned from malloc(20UL).
    3. Condition !___p, taking false branch.
    4. leaked_storage: Variable ___p going out of scope leaks the storage it points to.
    5. var_assign: Assigning: e_str = ({...; ___p;}).
2330        e_str = xmalloc(20);
    6. Condition !e_str, taking false branch.
2331        if (!e_str)
2332                return -1;
    7. noescape: Resource e_str is not freed or pointed-to in snprintf.
2333        snprintf(e_str, 20, "unix[%u]", ino);
    8. noescape: Resource e_str is not freed or pointed-to in add_external. [show details]
    CID 389191 (#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable e_str going out of scope leaks the storage it points to.
2334        return add_external(e_str);
2335}

We should free e_str string after we finish it's use in unix_sk_id_add,
easiest way to do it is to use cleanup_free attribute.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
87d3735145 criu/plugin: Add support for criu image streamer
Modifications to support criu image streamer when using amdgpu_plugin.
When running with criu image streamer, fseek/lseek is not available so
we store the file size in the first 8-bytes of the actual file.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
55370b720e criu/plugin: Store BO contents directly to file
Store BO contents directly to file (1 per GPU) instead of using
protobuf.

Bug Fix:
Fixes an issue where we could not handle BOs bigger than 4GB because
protobuf has an internal limit of 4GB for the Bytes structure.

Performance Improvements:
This significantly reduces CR duration on multi-GPU systems as it allows
reading and writing to disk in parallel. During checkpoint, instead of
waiting for all the BO contents to be read from the one protobuf file,
we can now start writing the BO contents as soon as the first BO is read
from disk. During restore, we can start writing BO contents to disk
after the first BO from VRAM. This also reduces the peak amount of
system memory used as we only need to keep 1 BO content in memory per
GPU at a time instead of all the BO contents.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
Felix Kuehling
ecdf740fa3 criu/plugin: Add whitepaper document
Adding whitepaper document

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj
99a2380fc0 criu/plugin: Dockerfile for amdgpu_plugin
This sets up the pytorch environment for BERT Transformers and also sets
up CRIU along with all its dependencies including amdgpu plugin for
supporting CR with AMDGPUs.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
2095de9f03 criu/plugin: Fix for FDs not allowed to mmap
On newer kernel's (> 5.13), KFD & DRM drivers will only allow the
/dev/renderD* file descriptors that were used during the CRIU_RESTORE
ioctl when calling mmap for the vma's.
During restore, after opening /dev/renderD*, amdgpu_plugin keeps the
FDs opened and instead returns a copy of the FDs to CRIU. The same FDs
are then returned during the UPDATE_VMAMAP hooks so that they can be
used by CRIU to call mmap. Duplicated FDs created using dup are
references to the same struct file inside the kernel so they are also
allowed to mmap.
To prevent the opened FDs inside amdgpu_plugin from conflicting with
FDs used by the target restore application, we make sure that the
lowest-numbered FD that amdgpu_plugin will use is greater than the
highest-numbered FD that is used by the target application.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj
bd83330095 criu/plugin: Implement sDMA based buffer access
AMD Radeon GPUs have special sDMA (system dma engines) IPs that can be
used to speed up the read write operations from the VRAM and GTT memory.

Depends on:

* The kernel mode driver (kfd) creating the dmabuf objects for the kfd
  BOs in both checkpoint and restore operation.
* libdrm and libdrm_amdgpu libraries

Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
6d79266229 criu/plugin: Restore libhsakmt shared memory files
Libhsakmt(thunk) uses a shared memory file in /dev/shm/hsakmt_shared_mem
and its semaphore in /dev/shm/hsakmt_shared_mem. Adding a check during
checkpoint to see if these two files exist. If they exist then the
plugin will try to restore them during restore.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
a218fe0baa criu/plugin: Read and write BO contents in parallel
Implement multi-threaded code to read and write contents of each GPU
VRAM BOs in parallel in order to speed up dumping process when using
multiple GPUs.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
ba9c62df24 criu/plugin: Add unit tests for GPU remapping
Adding unit tests for GPU remapping code when checkpointing and
restoring on different nodes with different topologies.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
4856e0d4d0 criu/plugin: Add parameters to override mapping
Add optional parameters to override default behavior during restore.
These parameters are passed in as environment variables before executing
CRIU.

List of parameters:

KFD_FW_VER_CHECK - disable firmware version check
KFD_SDMA_FW_VER_CHECK - disable SDMA firmware version check
KFD_CACHES_COUNT_CHECK - disable caches count check
KFD_NUM_GWS_CHECK - disable num_gws check
KFD_VRAM_SIZE_CHECK - disable VRAM size check
KFD_NUMA_CHECK - preserve NUMA regions
KFD_CAPABILITY_CHECK - disable capability check

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
72905c9c9b criu/plugin: Remap GPUs on checkpoint restore
The device topology on the restore node can be different from the
topology on the checkpointed node. The GPUs on the restore node may
have different gpu_ids, minor number. or some GPUs may have different
properties as checkpointed node. During restore, the CRIU plugin
determines the target GPUs to avoid restore failures caused by trying
to restore a process on a gpu that is different.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
6e99fea2fa criu/plugin: Implement system topology parsing
Parse local system topology in /sys/class/kfd/kfd/topology/nodes/ and
store properties for each gpu in the CRIU image files. The gpu
properties can then be used later during restore to make the process is
restored on gpu's with similar properties.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
c4e3ac7fef criu/plugin: Adding check for kernel IOCTL version
Adding check for minimum kernel IOCTL version before attempting to
checkpoint.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj
55a5993bc7 criu/plugin: Support AMD ROCm Checkpoint Restore with KFD
To support Checkpoint Restore with AMDGPUs for ROCm workloads, introduce
a new plugin to assist CRIU with the help of AMD KFD kernel driver. This
initial commit just provides the basic framework to build up further
capabilities. Like CRIU, the amdgpu plugin also uses protobuf to
serialize
and save the amdkfd data which is mostly VRAM contents with some
metadata.
We generate a data file "amdgpu-kfd-<id>.img" during the dump stage. On restore
this file is read and extracted to re-create various types of buffer
objects that belonged to the previously checkpointed process. Upon
restore the mmap page offset within a device file might change so we use
the new hook to update and adjust the mmap offsets for newly created
target process. This is needed for sys_mmap call in pie restorer phase.
Support for queues and events is added in future patches of this series.

With the current implementation (amdgpu_plugin), we support:
     - Only compute workloads such (Non Gfx) are supported
     - GPU visible inside a container
     - AMD GPU Gfx 9 Family
     - Pytorch Benchmarks such as BERT Base

amdgpu plugin dependes on libdrm and libdrm_amdgpu which are typically
installed with libdrm-dev package. We build amdgpu_plugin only when the
dependencies are met on the target system and when user intends to
install the amdgpu plugin and not by default with criu build.

Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Co-authored-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj
71ff9cc045 criu/plugin: Initialize AMD KFD header
kfd_ioctl.h contains the definitions for the APIs and required arguments
to call the ioctls so simply copy the header as is for amdgpu plugin.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
91157315b2 criu/plugin: Skip plugin vmas during premap
During premap phase, skip vmas that are handled by external plugins as
their offsets may change when the plugin restores them. This change is
needed when running with criu image streamer.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
63e127fc80 criu/plugin: Add dedicated flag for plugins
Adding a dedicated flag for vma's that are handled by an external plugin
as previously used VMA_UNSUPP flag depends on vma not having
VMA_FILE_SHARED flag.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
e04db02411 criu/files: Add function to return unused FD by pid
Add a new global function to return unused FD based on the pid. This
function can be used in situations where we need a FD that will not
conflict with FDs used by target restore process, but
struct pstree_item is not available (e.g plugins)

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
David Yat Sin
653eefea0c criu/plugin: Do not reopen vma fd for plugins
Some device drivers (e.g DRM) only allow the file descriptor that was
used to create the vma to be used when calling mmap.
In this case, instead of opening a new FD, the plugin will return a
valid FD that can be used for mmap later. The plugin needs to close the
returned FD later. Copies of the returned FD that are created using dup
or fnctl(..,F_DUPFD,..) are references to the same struct file inside
kernel so they are also allowed to mmap.
The plugin does not need to update the path anymore as the plugin can
return a FD for the correct path.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
5b0a639a55 files: fix inh leak in inherit_fd_add
coverity CID 389190:

1538int inherit_fd_add(int fd, char *key)
1539{
1540        struct inherit_fd *inh;
...
    2. alloc_fn: Storage is returned from allocation function malloc.
    3. var_assign: Assigning: ___p = storage returned from malloc(32UL).
    4. Condition !___p, taking false branch.
    5. leaked_storage: Variable ___p going out of scope leaks the storage it points to.
    6. var_assign: Assigning: inh = ({...; ___p;}).
1548        inh = xmalloc(sizeof *inh);
    7. Condition inh == NULL, taking false branch.
1549        if (inh == NULL)
1550                return -1;
1551
...
    9. Condition !___p, taking true branch.
1555        inh->inh_id = xstrdup(key);
    10. Condition inh->inh_id == NULL, taking true branch.
1556        if (inh->inh_id == NULL)
    CID 389190 (#1 of 1): Resource leak (RESOURCE_LEAK)11. leaked_storage: Variable inh going out of scope leaks the storage it points to.
1557                return -1;

We should free inh on inh_id allocation error path in inherit_fd_add.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
0605670421 net: fix e_str leak in veth_pair_add
coverity CID 389187:

3193int veth_pair_add(char *in, char *out)
3194{
3195        char *e_str;
3196
    1. alloc_fn: Storage is returned from allocation function malloc.
    2. var_assign: Assigning: ___p = storage returned from malloc(200UL).
    3. Condition !___p, taking false branch.
    4. leaked_storage: Variable ___p going out of scope leaks the storage it points to.
    5. var_assign: Assigning: e_str = ({...; ___p;}).
3197        e_str = xmalloc(200); /* For 3 IFNAMSIZ + 8 service characters */
    6. Condition !e_str, taking false branch.
3198        if (!e_str)
3199                return -1;
    7. noescape: Resource e_str is not freed or pointed-to in snprintf.
3200        snprintf(e_str, 200, "veth[%s]:%s", in, out);
    8. noescape: Resource e_str is not freed or pointed-to in add_external. [show details]
    CID 389187 (#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable e_str going out of scope leaks the storage it points to.
3201        return add_external(e_str);
3202}

We should free e_str string after we finish it's use in veth_pair_add,
easiest way to do it is to use cleanup_free attribute.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
2856d06e37 config: fix ns leak in parse_join_ns
coverity CID 389192:

550static int parse_join_ns(const char *ptr)
551{
...
553        char *ns;
554
   1. alloc_fn: Storage is returned from allocation function strdup.
   2. var_assign: Assigning: ___p = storage returned from strdup(ptr).
   3. Condition !___p, taking false branch.
   4. leaked_storage: Variable ___p going out of scope leaks the storage it points to.
   5. var_assign: Assigning: ns = ({...; ___p;}).
555        ns = xstrdup(ptr);
   6. Condition ns == NULL, taking false branch.
556        if (ns == NULL)
557                return -1;
558
   7. noescape: Resource ns is not freed or pointed-to in strchr.
559        aux = strchr(ns, ':');
   8. Condition aux == NULL, taking true branch.
560        if (aux == NULL)
   CID 389192 (#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable ns going out of scope leaks the storage it points to.
561                return -1;

We should free ns string after we finish it's use in parse_join_ns,
easiest way to do it is to use cleanup_free attribute.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
a8dd7d2909 ci: run criu-config tests
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
1c54c45fc5 zdtm: drop redundant config_inotify_irmap test
The config_inotify_irmap test duplicates inotify_irmap with slight
change to add the --force-irmap and --irmap-scan-path options in
a configuration file.

The --criu-config option of ZDTM provides more general solution
for testing CRIU options provided in configuration files.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
d2073cd4d7 zdtm: add --criu-config option
The --criu-config option allows to run test with CRIU options provided
via configuration files instead of command-line arguments.

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
fc38a01e5e zdtm: use long form cli options
Using long-form command-line options would allows us to provide
them via config file to CRIU.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
0734fc807f zdtm: sort import lines
These changes have been auto-generated with:

    pip3 install isort autoflake
    isort -rc -sl zdtm.py
    autoflake --remove-all-unused-imports -i zdtm.py
    isort -rc -m 3 zdtm.py

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
0b79653973 zdtm: refactor main
This patch improves the readability of zdtm by refactoring the top-level
code into a main function.

https://docs.python.org/3/library/__main__.html

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
1b4a9df9c0 sk-unix: fix uint32_t id variable printf format specifier
coverity CID 389193:
CID 389193 (#1 of 1): Printf format string issue (PW.BAD_PRINTF_FORMAT_STRING)
1. bad_printf_format_string: invalid format string conversion
598 pr_warn("Can't stat socket %#x(%s), skipping: %m (err %d)\n", id, rpath, errno);

Specifier "%#x" is wrong for id as it is of type uint32_t, let's change
it to "%#" PRIx32 "" to fix the problem.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
09fa32a755 tun: fix tun_link leak in dump_tun_link
coverity CID 389205:

452int dump_tun_link(NetDeviceEntry *nde, struct cr_imgset *fds, struct nlattr **info)
453{
...
458        struct tun_link *tl;
...
   2. alloc_fn: Storage is returned from allocation function get_tun_link_fd. [show details]
   3. var_assign: Assigning: tl = storage returned from get_tun_link_fd(nde->name, nde->peer_nsid, tle.flags).
475        tl = get_tun_link_fd(nde->name, nde->peer_nsid, tle.flags);
   4. Condition !tl, taking false branch.
476        if (!tl)
477                return ret;
478
479        tle.vnethdr = tl->dmp.vnethdr;
480        tle.sndbuf = tl->dmp.sndbuf;
481
482        nde->tun = &tle;
   CID 389205 (#1 of 1): Resource leak (RESOURCE_LEAK)5. leaked_storage: Variable tl going out of scope leaks the storage it points to.
483        return write_netdev_img(nde, fds, info);
484}

Function get_tun_link_fd() can both return tun_link entry from tun_links
list and a newly allocated one. So we should not free entry if it is
from list and should free it when it is a new one to fix leak.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
7e9a9dc342 cr-dump: fix cr_imgset leak in dump_one_task
coverity CID 389194:

1238static int dump_one_task(struct pstree_item *item, InventoryEntry *parent_ie)
1239{
...
1245        struct cr_imgset *cr_imgset = NULL;
...
    11. alloc_fn: Storage is returned from allocation function cr_task_imgset_open. [show details]
    12. var_assign: Assigning: cr_imgset = storage returned from cr_task_imgset_open(vpid(item), 577).
1355        cr_imgset = cr_task_imgset_open(vpid(item), O_DUMP);
    13. Condition !cr_imgset, taking false branch.
1356        if (!cr_imgset)
1357                goto err_cure;
1358
...
    25. Condition opts.lazy_pages, taking false branch.
1427        if (opts.lazy_pages)
1428                ret = compel_cure_remote(parasite_ctl);
1429        else
1430                ret = compel_cure(parasite_ctl);
    26. Condition ret, taking true branch.
1431        if (ret) {
1432                pr_err("Can't cure (pid: %d) from parasite\n", pid);
    27. Jumping to label err.
1433                goto err;
1434        }
...
1448        close_cr_imgset(&cr_imgset);
1449        exit_code = 0;
1450err:
1451        close_pid_proc();
1452        free_mappings(&vmas);
1453        xfree(dfds);
    CID 389194 (#1 of 1): Resource leak (RESOURCE_LEAK)28. leaked_storage: Variable cr_imgset going out of scope leaks the storage it points to.
1454        return exit_code;
1455
1456err_cure:
1457        close_cr_imgset(&cr_imgset);
1458err_cure_imgset:
1459        ret = compel_cure(parasite_ctl);
1460        if (ret)
1461                pr_err("Can't cure (pid: %d) from parasite\n", pid);
1462        goto err;
1463}

On compel_cure() error path we do not do close_cr_imgset() thich leads
to leaked cr_imgset, let's move corresponding close_cr_imgset below err
label. Also now we can merge remove close_cr_imgset() in err_cure label
as it goes to err label later anyway. Separate err_cure_imgset label is
not needed as close_cr_imgset() is ready for cr_imgset == NULL.

v2: remove excess close_cr_imgset() in label err_cure (@adrianreber)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
2747bb2a7c mount: fix e_str leak in ext_mount_add
coverity CID 389202:
54int ext_mount_add(char *key, char *val)
 55{
 56        char *e_str;
 57
   1. alloc_fn: Storage is returned from allocation function malloc.
   2. var_assign: Assigning: ___p = storage returned from malloc(strlen(key) + strlen(val) + 8UL).
   3. Condition !___p, taking false branch.
   4. leaked_storage: Variable ___p going out of scope leaks the storage it points to.
   5. var_assign: Assigning: e_str = ({...; ___p;}).
 58        e_str = xmalloc(strlen(key) + strlen(val) + 8);
   6. Condition !e_str, taking false branch.
 59        if (!e_str)
 60                return -1;
...
   7. noescape: Resource e_str is not freed or pointed-to in sprintf.
 73        sprintf(e_str, "mnt[%s]:%s", key, val);
   8. noescape: Resource e_str is not freed or pointed-to in add_external. [show details]
   CID 389202 (#1 of 1): Resource leak (RESOURCE_LEAK)9. leaked_storage: Variable e_str going out of scope leaks the storage it points to.
 74        return add_external(e_str);
 75}

We need to free e_str after add_external used it.

v2: use cleanup_free attribute (@adrianreber)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
anatasluo
be78b853da proc_smaps: remove useless nonlinear check
nonlinear information has been removed since commit
1da4b35b001481df99 in kernel.

Signed-off-by: anatasluo <luolongjuna@gmail.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
97a9985673 compel: set mxcsr during error injection to zero
During error injection tests there are random values loaded in some of
the registers. The kernel, however, has the following check:

    if (mxcsr[0] & ~mxcsr_feature_mask)
        return -EINVAL;

So depending on the random values loaded mxcsr might have values that
the kernel rejects with EINVAL. Setting mxcsr to zero during the tests
lets the error injection test pass.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
ef98a71b16 zdtm: fix missplacement of err=True
There is no 'err' argument for print(), it should be in grep_errors() in
line below.

Fixes: bed670f62 ("zdtm: print tails of all logs if a test has failed")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
6b842635bd test: disable rseq also on Archlinux
Seems like Archlinux also uses rseq now and that breaks CRIU.
Also disable rseq on Archlinux.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
51099d2bb6 test: remove test for LOCK_MAND flock
Linux Kernel release 5.16 removed support for LOCK_MAND flock and so the
test to verify if LOCK_MAND works started to fail with 5.16.

The kernel also logs following message:

 Attempt to set a LOCK_MAND lock via flock(2). This support has been removed and the request ignored.

This fixes CRIU CI using Fedora with 5.16.

See Linux Kernel commit 90f7d7a0d0d68623b5f7df5621a8d54d9518fcc4
 "locks: remove LOCK_MAND flock lock support"

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
247cdc90db bpfmap: handle new field in fdinfo
Starting with Linux Kernel release 5.16 the fdinfo proc entry contains
a map_extra field which breaks CRIU parsing of bpfmap entries.

This commit adds the map_extra as a possible field to CRIU. The value of
map_extra is not passed to the kernel on restore as it does not seem to
be evaluated in the code paths CRIU restore is using for BPF.

This fixes CRIU CI using Fedora with 5.16.

See Linux commit 9330986c03006ab1d33d243b7cfe598a7a3c1baa
 "bpf: Add bloom filter map implementation"

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
56df8aeeb5 ci: skip MAP_HUGETLB tests in stream test
Currently, hugetlb mappings is not premapped so in the restore content phase, we
skip page read these pages, enqueue the iovec for later reading in restorer and
eventually close the page read. However, image-streamer expects the whole image
to be read and the image is not re-opened, sent twice. These MAP_HUGETLB test
cases will result in EPIPE error. Temporarily disable these test cases for now.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
2dc6d146b9 zdtm: Add MAP_HUGETLB mappings test for parent-child relationship processes
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
87a5694b4a zdtm: Add shm hugetlb test
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
d22e472cf2 zdtm: Add memfd hugetlb test
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
ffa2688969 zdtm: Add MAP_HUGETLB memory mapping test
This commit add a test for checkpoint/restore MAP_HUGETLB memory mappings.
A new zdtm helper get_mapping_dev() is added to get the device number of
the memory mapping.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
a26b692c42 uffd: Skip lazy-mode restore on hugetlb mappings
As hugetlb mappings are not premapped, they are not registered to uffd service
in restorer code. We must not mark these mappings as PPB_LAZY in generate_iovs()
otherwise when restoring content of these mappings, we will keep looking for in
uffd and get ENOENT because they are not registered.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
456e50b59b mem: Skip premapping hugetlb mapping
As we cannot use mremap() to move the hugetlb mapping around until Linux kernel
version 5.16, we need to skip premapping hugetlb mapping.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
8941b63a4b proc_parse, files: Add support for hugetlb memory mapping
When memfd can be used with hugetlb, we use memfd for checkpoint/restore
anonymous shared memory. Otherwise, map_files symlinks is used for
checkpoint/restore anonymous shared memory.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
e4fb1dd5f5 memfd, shmem: Add support for checkpoint/restore memfd and anon shared memory
Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
4d77b19eb3 ipc: Add support for checkpoint/restore hugetlb System V shared memory
Attach the System V shared memory segments to the address space via shmat() to
determine if they are backed by hugetlb and their page size. Use these
information for setting the correct flags on restore.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
f69c365916 kerndat: Collect hugetlb device numbers
These numbers are used to determine whether a memory mapping is backed by
hugetlb and its page size.

As the hugepage can be allocated more after the first time we collect kerndat,
we need to collect the missing device numbers every time we load the kerndat
cache.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
9c7bbfa698 check: Add a check for using memfd with hugetlb
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
e8087fcff6 files: generate unique transport socket names
Transport socket names have to be unique for each criu run.

Fixes #1735 #1720

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
408a7d82d6 util: add an unique ID of the current criu run
This ID will be used to generate resource ID-s that can conflicts with
other CRIU processes.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Mike Rapoport
b13b95e522 compel: fix how PTRACE_GET_THREAD_AREA errors are handled
When PTRACE_GET_THREAD_AREA errors on kernels with
!CONFIG_IA32_EMULATION beacuse of missing support (-EIO), compel should
ignore uch errors in native mode.

However the check for error type uses return value of ptrace rather than
errno, which will always result in error propagation.

Use errno to detect type of error to fix this.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
b2ba14a15d restorer: Fix sys_mmap's returned value check
As we call mmap syscall directly, the returned value in error case is the error
number not -1 like in libc wrapper. Use IS_ERR for correct checking in error
case.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
7177938e60 criu-ns: use os.waitstatus_to_exitcode()
os.WEXITSTATUS() returns the process exit status and it should be used
only if WIFEXITED() is true, i.e., the process terminated normally.

os.waitstatus_to_exitcode() does the same as os.WEXITSTATUS() but it
also handles the case when the process has been terminated by a signal.

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
bb1b1681ab criu-ns: fix exit code o for criu dump
Fixes: #1739

Reported-by: @PavloMykhailyshyn
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
fdf4fda200 pstree: when updating sid for shell job also update matching pgid
If we replace old_sid with current_sid we should also do same
replacement for matching pgid (=old_sid).

Reported in CRIU gitter by Younes Manton (@ymanton)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
89267dbcc8 ci: install libbsd dependency
The libbsd dependency is used to enable support for `setproctitle()`
and `strlcpy()`.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Ashutosh Mehra
48d53b6994 Fix formatting in criu documentation
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
73d6a2c0ee test/autofs: fix use-after-free
autofs.c:66:17: error: pointer 'str' may be used after 'realloc' [-Werror=use-after-free]

autofs.c: In function 'check_automount':
../lib/zdtmtst.h:131:9: error: pointer 'mountpoint' may be used after 'free' [-Werror=use-after-free]
  131 |         test_msg("ERR: %s:%d: " format " (errno = %d (%s))\n", __FILE__, __LINE__, ##arg, errno, strerror(errno))
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
autofs.c:277:17: note: in expansion of macro 'pr_perror'
  277 |                 pr_perror("%s: failed to close fd %d", mountpoint, p->fd);
      |                 ^~~~~~~~~
autofs.c:268:9: note: call to 'free' here
  268 |         free(mountpoint);
      |         ^~~~~~~~~~~~~~~~

Fixes: #1731

v2: (@Snorch) always update `str` after successful realloc()

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
4d31105c76 ci: set continue-on-error for cross-compile
Running cross compile tests with Debian unstable sometimes
fails due to missing or outdated packages.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Adrian Reber
0568889ee3 compel: fix parasite with GCC 12
Parasite creation started to fail with GCC 12:

On x86_64 with:
 ./compel/compel-host hgen -f criu/pie/restorer.built-in.o -o criu/pie/restorer-blob.h
 Error (compel/src/lib/handle-elf-host.c:337): Unexpected undefined symbol: `strlen'. External symbol in PIE?

On aarch64 with:
 ld: criu/pie/restorer.o: in function `lsm_set_label':
 /drone/src/criu/pie/restorer.c:174: undefined reference to `strlen'

Line 174 is: "for (len = 0; label[len]; len++)"

Adding '-ffreestanding' to parasite compilation fixes these errors
because, according to GCC developers:

"strlen is a standard C function, so I don't see any bug in that being used
unless you do a freestanding compilation (-nostdlib isn't that)."

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
db352ca488 criu: fix configuration file scanner with GCC 12
This fixes:

criu/config.c: In function ‘parse_statement’:
criu/config.c:232:43: error: the comparison will always evaluate as ‘true’ for the pointer operand in ‘*(configuration + (sizetype)((long unsigned int)i * 8)) + ((sizetype)offset + 1)’ must not be NULL [-Werror=address]
  232 |         if (configuration[i] + offset + 1 != 0 && strchr(configuration[i] + offset, ' ')) {
      |                                           ^~

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
bf6975c3e5 compel: fix GCC 12 failure (out of bounds)
This is a confusing change as it seems the original code was just wrong.
GCC 12 complains with:

In function ‘__conv_val’,
    inlined from ‘std_strtoul’ at compel/plugins/std/string.c:202:7:
compel/plugins/std/string.c:154:24: error: array subscript 97 is above array bounds of ‘const char[37]’ [-Werror=array-bounds]
  154 |                 return &conv_tab[__tolower(c)] - conv_tab;
      |                        ^~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/string.c: In function ‘std_strtoul’:
compel/plugins/std/string.c:10:19: note: while referencing ‘conv_tab’
   10 | static const char conv_tab[] = "0123456789abcdefghijklmnopqrstuvwxyz";
      |                   ^~~~~~~~
cc1: all warnings being treated as errors

Which sounds correct. The array conv_tab has just 37 elements.

If I understand the code correctly we are trying to convert anything
that is character between a-z and A-Z to a number for cases where
the base is larger than 10. For a base 11 conversion b|B should return 11.
For a base 35 conversion z|Z should return 35. This is all for a strtoul()
implementation.

The original code was:

  static const char conv_tab[] = "0123456789abcdefghijklmnopqrstuvwxyz";

  return &conv_tab[__tolower(c)] - conv_tab;

and that seems wrong. If conv_tab would have been some kind of hash it could
have worked, but '__tolower()' will always return something larger than
97 ('a') which will always overflow the array.

But maybe I just don't get that part of the code.

I replaced it with

  return __tolower(c) - 'a' + 10;

which does the right thing: 'A' = 10, 'B' = 11 ... 'Z' = 35

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
fu.lin
6be10a232d zdtm: fix zdtm/static/maps00 case in arm64
This case sometimes will cause SIGILL signal in arm64 platform.

<<ARM Coretex-A series Programmer's Guide for ARMv8-A>> notes:
  The ARM architecture does not require the hardware to ensure coherency
  between instruction caches and memory, even for locations of shared
  memory.

Therefore, we need flush dcache and icache for self-modifying code.

- https://developer.arm.com/documentation/den0024/a/Caches/Point-of-coherency-and-unification

Signed-off-by: fu.lin <fulin10@huawei.com>
2022-04-28 17:53:52 -07:00
Liu Hua
6cfad77f00 pagemap: tiny fix on truncating memory image
When requested iovs are huge, criu needs to invoke more then one
preadv()s. In this situation criu truncates memory image with
offset of first preadv() and length of last one, which leads
to leakage of memory image. This patch fixs truncating with right
offset and length.

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
908e5dd95d lib: added tests for feature check in libcriu
Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
b00b61f0e6 lib: introduce feature check in libcriu
This commit adds feature check support to libcriu. It already exists in
the CLI and RPC and this just extends it to libcriu.

This commit provides one function to do all possible feature checks in
one call. The parameter to the feature check function is a structure and
the user can enable which features should be checked.

Using a structure makes the function extensible without the need to
break the API/ABI in the future.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
4c4b2159de ci: added .lgtm.yml file
A couple of months (or years) ago I looked into lgtm.com for CRIU. Today
on a pull request I saw result from lgtm.com for the first time and it
failed. Not sure what triggered the lgtm.com message into the CRIU
repository, but with the .lgtm.yml file in this commit lgtm.com can
actually build CRIU.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
7f4265dc0b ci: update to latest Vagrant and Fedora images
Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
7400d91f83 contributing: remove old badges and logo
CI badges and logo are already present in the readme file.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
29e221bb7a readme: add docker test badge
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Nicolas Viennot
6f9d62eb38 ci: test criu-image-streamer with all tests
All the bugs that were in the way got fixed. We can enable all tests.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
8ec214d3cb mount/btrfs: make check_mountpoint_fd fallback to get_sdev_from_fd
We face that btrfs returns anonymous device in stat instead of real
superblock dev for volumes, thus all btrfs volume mounts does not pass
check_mountpoint_fd due to dev missmatch between stat and mountinfo. We
can use special helper get_sdev_from_fd instead of stat to try to get
real dev of fd for btrfs.

We move check_mountpoint_fd from open_mountpoint into get_clean_fd and
ns_open_mountpoint to the point where temporary mount we open fd to is
still in mountinfo, thus get_sdev_from_fd would be able to find tmp
mount in mountinfo.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
bbf5f642de proc_parse: add helper to resolve sdev from fd
New get_sdev_from_fd helper first gets mnt_id from fd using fdinfo and
then converts mnt_id to sdev using mountinfo.

By default mnt_id to sdev conversion only works for mounts in mntinfo.

If parse_mountinfo argument is true, will also parse current process
mountinfo when looking for mount sdev, this should be used only with
temporary mounts just created by criu in current mntns.

v3: add argument to parse self mountinfo for auxiliary mounts

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
15c42696cf mount: remove mnt_fd argument of __open_mountpoint
Only place where we used __open_mountpoint with non -1 mnt_fd is
open_mountpoint. Let's use check_mountpoint_fd for this case, so that we
now can remove mnt_id argument. Also now __open_mountpoint actually
always does open.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
1e7c62047a mount: split check_mountpoint_fd from __open_mountpoint
Now we can reuse "check" part separately in other places.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
1b912802dd zdtm/static/uffd-events: add more log messages
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
ebd03383f2 zdtm: print tails of all logs if a test has failed
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
f650987469 test: log testname.out.inprogress if a test has failed
This is required if the test failed by timeout.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
8775cf3a50 ci: reenable the lazy-thp test in the lazy-remote mode
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
c59abfa81a page-xfer: stop waiting for a new command after a close command
There is no reason to do that and in case of tls, __recv returns EAGAIN
instead of 0.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
13b726ebc3 tls: allow to terminate connections synchronously
GNUTLS_SHUT_RDWR sends an alert containing a close request and waits for
the peer to reply with the same message.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
73d1d07696 uffd: call disconnect_from_page_server to shutdown a page-server connection
We need to be sure that page-server doesn't wait for a new command when we
call gnutls_bye() that sends an alert containing a close request.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
4fdf3db31e tls: add more comments
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
5a2250b1af tls: use ssize_t for return value
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
89e8e8e697 tls: fix typo
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
0da88b6dad zdtm: Add SOCK_SEQPACKET variants to unix socket tests
This commit simply makes copies of SOCK_STREAM unix socket tests and uses
SOCK_SEQPACKET instead.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
530ad9c89d sk-unix: Add support for SOCK_SEQPACKET unix sockets
Adjust some SOCK_STREAM cases to handle SOCK_SEQPACKET too.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
3d618d0f4d crtools: check that cpuinfo command has sub-command
This fixes segfault on empty sub-command for cpuinfo.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
233f1f1d0c crtools: use new opts.mode in image_dir_mode
Also while on it there is no "cpuinfo restore", let's remove it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
3fa85bcdcc crtools/rpc: export current criu mode to opts.mode
We have multiple options which are valid only on restore or only on dump
or in any other specific criu mode, so it would be useful to have info
about current mode in opts so that we can validate other options against
current mode.

Plan is to use it for mount-v2 option as it is only valid on restore,
and this would make handling of different types mountpoints much easier.

Realization is a bit different for general code and rpc:

- When criu mode is set from main() we just parse mode from argv[optind]
just after parse_options() found optind of the command. Note that
opts.mode is available before check_options().

- For rpc service we reset opts.mode to CR_SWRK each time we restart
cr_service_work(), in the original service process we still have
CR_SERVICE to differentiate between them, and each request handling
function which does setup_opts_from_req sets opts.mode in accordance
with the processed request type. And it is also available before
check_options().

Now in check_options we can add filters on one mode only options.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
1b015df9ba crtools: remove excess always true condition
Several lines above if (optind >= argc) we go to usage label and fail,
thus we don't need to check (optind < argc) here as it is always true.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
be092e25a9 zdtm: remove mntns-deleted-dst test leftover from git
Looks like in commit [1] we've non-intentionally added this tmp file to
git, let's remove it.

Fixes: 01ee29702 ("s390:zdtm: Enable zdtm for s390") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
f92c7f1af8 zdtm: zdtm_ct fix compilation error with strict-prototypes on
zdtm_ct.c:44:12: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
   44 | static int create_timens()

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
e62e05c2d2 zdtm.py: clean up MAKEFLAGS env variable before running make instance
In most cases we run tests as:
./test/zdtm.py run -a

But it's also possible to run tests from root makefile:
make test

In this case, if criu tree have no ./test/umount2 binary
built we get the error like:
make[3]: *** No rule to make target 'umount2'. Stop.

It's worth to mention this "3". That's because we have
build process tree like this:
make -> make -> make -> zdtm.py -> make umount2
and also we have MAKEFLAGS variable set to:
build=-r -R -f ...

And that's bad because "-r" option means no builtin
rules and -R means no builtin variables. That makes
`make umount2` not working. Let's just cleanup this
variable to make things work properly.

Fixes: #1699
https://github.com/checkpoint-restore/criu/issues/1699

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
af4b265193 tests: added test for single pre-dump support
Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
51a1adbc03 libcriu: add single pre-dump support
In contrast to the CLI it is not possible to do a single pre-dump via
RPC and thus libcriu. In cr-service.c pre-dump always goes into a
pre-dump loop followed by a final dump. runc already works around this
to only do a single pre-dump by killing the CRIU process waiting for the
message for the final dump.

Trying to implement pre-dump in crun via libcriu it is not as easy to
work around CRIU's pre-dump loop expectations as with runc that directly
talks to CRIU via RPC.

We know that LXC/LXD also does single pre-dumps using the CLI and runc
also only does single pre-dumps by misusing the pre-dump loop interface.

With this commit it is possible to trigger a single pre-dump via RPC and
libcriu without misusing the interface provided via cr-service.c. So
this commit basically updates CRIU to the existing use cases.

The existing pre-dump loop still sounds like a very good idea, but so
far most tools have decided to implement the pre-dump loop themselves.

With this change we can implement pre-dump in crun to match what is
currently implemented in runc.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
119a798856 ci: disable glibc rseq support
This patch sets the glibc.pthread.rseq tunable [1] to disable rseq
support in glibc as a temporary solution for the problem described in
[2]. This would allow us to run CI tests until CRIU has rseq support.

This commit also disables the rpc tests as they fail even
when GLIBC_TUNABLES is set.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=e3e589829d16af9f7e73c7b70f74f3c5d5003e45
[2] https://github.com/checkpoint-restore/criu/issues/1696

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
9fd000c58d ci: use unstable release for cross-compile
We added cross-compile tests with testing debian release to be able to
replicate the error reported in #1653, however, installing build
dependencies in this release currently fails with the following error:

libc6-dev:armhf : Breaks: libc6-dev-armhf-cross (< 2.33~) but 2.32-1cross4 is to be installed

This is not something we can fix, therefore using the debian unstable
release (instead of testing) could be more reliable option for our CI.
This would still replicate the problem reported in #1653.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2022-04-28 17:53:52 -07:00
Adrian Reber
0e04a3c6a6 libcriu: add setting lsm-mount-context to libcriu
Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
af298353dc usernsd: UNS_FDOUT should not require an input descriptor
UNS_FDOUT means only that a userns call will return a file descriptor.

Signed-off-by: Andrei Vagin <avagin@google.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
efe5d9a12d Add documentation for --timeout option
The --timeout option was introduced in [1] to prevent criu dump from
being able to hang indefinitely and allow users to adjust the time limit
in seconds for collecting tasks during the dump operation.

[1] d0ff730

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
583e8ca055 ci: enable x86 xsave fault injection tests back
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
1ba443982a x86/compel/fault-inject: print the initial seed
Fixes: e2e8be37 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Alexander Mikhalitsyn
fc1eb01ff3 x86/compel/fault-inject: bound xsave features set
Since
e2e8be37 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")
we doing fault-injection test for C/R of threads register set by filling tasks
xsave structures with the garbage. But there are some features for which that's not
safe. It leads to failures like described in #1635

In this particular case we meet the problem with PKRU feature, the problem
that after corrupting pkru registers we may restrict access to some vma areas,
so, after that process with the parasite injected get's segfault and crashes.

Let's manually specify which features is save to fill with the garbage by
keeping proper XFEATURE_MASK_FAULTINJ mask value.

Fixes: e2e8be37 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")

https://github.com/checkpoint-restore/criu/issues/1635

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
6186bfa0c7 test: another try to correctly fix the kernel version
We try to disable time namespace based testing for kernels older than
5.11. But we fail to come up with the correct if condition.

This changes (major <= 5) to (major < 5). There are no kernels with
major > 5 so currently the time namespace based are never run. This
should finally change it to run time namespace based tests on kernel
versions newer than 5.10.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
d79d73e3a0 ci: install procps in Alpine
The version of ps in Alpine image by default is very limited.
It is based on the one from busybox and doesn't support options
such as '-p'.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
3eba68089e ci: Enable disabled unix socket related tests
As the unix socket broken tests have been fixed in the pull request

https://github.com/checkpoint-restore/criu/pull/1680

We re-enable these tests.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
94111596f8 sk-unix: Fix TCP_ESTABLISHED checks in unix sockets
Since commit 83301b5367a98 ("af_unix: Set TCP_ESTABLISHED for datagram sockets
too") in Linux kernel, SOCK_DGRAM unix sockets can have TCP_ESTABLISHED state
when connected. So we need to fix checks that assume SOCK_DRAM sockets cannot
have TCP_ESTABLISHED state.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
6930d6a32d util: make page-server IPv6 safe
The function run_tcp_server() was the last place CRIU was still using
the IPv4 only function inet_ntoa(). It was only used during a print, so
that it did not really break anything, but with this commit the output
is now no longer:

 Accepted connection from 0.0.0.0:58396

but correctly displaying the IPv6 address

 Accepted connection from ::1:58398

if connecting via IPv6.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
ianlang
d57f27bc94 files-reg: try dump_ghost_remap if link-remap failed with error ENOENT
An issue with dumping deleted reg files in overlayfs:

After deleting a file originated from lower layer in merged dir,
fstat() on the /proc/$pid/map_files symlink returns st_nlink=1, while
linkat() fails with errno ENOENT.

Signed-off-by: langyenan <ianlang@tencent.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
14075baf72 test: do not use --keep-going for single zdtm tests
Looking at CI logs there are often messages like:

 "[WARNING] Option --keep-going is more useful when running multiple tests"

This commit removes '--keep-going' from single zdtm test runs.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
a52185ffe3 ci: disable broken tests until fixed
Broken tests are being tracked at

 * https://github.com/checkpoint-restore/criu/issues/1669
 * https://github.com/checkpoint-restore/criu/issues/1635

This also enables previously disabled BPF related tests:

 * https://github.com/checkpoint-restore/criu/issues/1354

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Salvatore Bonaccorso
4ab2facb24 make: Explicitly enable FPU on ARMv7 builds
Starting with gcc-11, Debian's armhf compiler no longer builds with
a default -mfpu= option. Instead it enables the FPU via an extension
to the -march flag (--with-arch=armv7-a+fp). criu's Makefile explicitly
passes its own -march=armv7-a setting, which overrides the +fp default,
so we end up with no FPU:

    cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
d514bacb40 ci: Run cross compile with debian testing
Debian testing has newer compiler version and running
cross compilation tests would allow us to catch any compilation
errors early.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
4c1330bb0c ci: Run cross compile on debian stable
The current debian stable release is Bullseye, not Buster. However, we
can use the 'stable' release instead. This would allow the CI to
automatically pick up updates in the future.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
718eb06be3 clang-format: disable wrong struct pointer declaration format
When we declare struct and at the same time declare variable pointer of
this struct type, it looks like clang-format threats "*" as a
multiplication operator instead of indirection (pointer declaration)
operator and puts spaces on both sides, which looks wrong.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
8580024838 zdtm: add ro-mount check after c/r to mntns_ghost01
This is a test for "ghost/mount: allocate remounted_rw in shmem to get
info from other processes" patch, without the patch test fails with:

  ############# Test zdtm/static/mntns_ghost01 FAIL at result check ##############
  Test output: ================================
  16:15:19.607:     5: ERR: mntns_ghost01.c:95: open for write on rofs -> 7 (errno = 11 (Resource temporarily unavailable))
  16:15:19.607:     4: FAIL: mntns_ghost01.c:121: Test died (errno = 11 (Resource temporarily unavailable))

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
17357d67fe files-reg: temporary remount writable the mount we do unlink on
Previousely I din't mention this case because we had bad error handling
in ghost cleanup path.

Without these patch but with proper error handling for unlink we have an
error in mntns_ghost01 test:

Error (criu/files-reg.c:2269): Failed to unlink the remap file:
Read-only file system

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/151c859e1

Changes: check lookup_mnt_id return for NULL

Fixes: fd0a3cd9ef ("mount: remount ro mounts writable before
ghost-file restore")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
bd219b69a6 ghost/mount: allocate remounted_rw in shmem to get info from other processes
Previousely remounted_rw was not shared between all processes on
restore, thus cleanup didn't got this info from rfi_remap and these
mounts were wrongly left writable after restore.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/3a1a592e7

Fixes: fd0a3cd9ef ("mount: remount ro mounts writable before
ghost-file restore")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
cfed6f35e1 files-reg: fix error handling of rm_parent_dirs
If unlinkat fails it means that fs is in "corrupted" state - spoiled
with non-unlinked auxiliary directories.

While on it add fixme note as this function can be racy and BUG_ON if
path contains double slashes.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/b7b4e69fd

Changes: simplify while loop condition, remove confusing FIXME, remove
excess !count check in favour of while loop condition check

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
5a0943c900 files-reg: fix error handling in open_path
1) On error paths need to close fd and unlock mutex.
2) Make rfi_remap return special return code to identify EEXIST from
   linkat_hard, all other errors should be reported up.
3) Report unlinkat error as criu should not corrupt fs.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/fe1d0be14

Changes: use close_safe(), fix order in "Fake %s -> %s link" error
message.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Nicolas Viennot
64b58b5141 check: cleanup child processes
Always wait() for forked child processes. It avoid zombie processes in
containers that don't have an init process reaping orphans.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
156cce78c5 ci: switch to centos-stream-8
CentOS 8 goes EOL at the end of 2021. This switches our CentOS 8 based
tests to CentOS Stream 8 which should be supported until 2024.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Liu Hua
c2fd819039 crtools: ignore SIGPIPE in swrk mode
Criu ignores SIGPIPE in most cases except swrk mode. And in the
following situtation criu get killed by SIGPIPE and have no chance
to do cleanup: Connection to page server is lost when we do disk-less
migration, criu send PS_IOV_FLUSH via a broken connction in
disconnect_from_page_server.

This patch let criu ignore SIGPIPE in all paths .

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
a491706cce ci: Use latest Fedora for lint ci runs again
Now when we fixed clang-format complains in zdtm, let's switch to lates
clang-format available. This is effectively a revert of commit 07a2f0265
("ci: use Fedora 34 for lint CI runs").

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Liu Hua
eb0dee4082 seize: restore cgroup freezer to right state
The new freezer_state is a complete equivalent of old freezer_thawed
except for the initial value. If old freezer_thawed was not initialized
it was 0 and in freezer_restore_state were threated as if we need to
freeze cgroup "back", thus before this patch if criu dump failed before
freezing dumpee, criu always freeze dumpee in cr_dump_finish which is
wrong. Switching to freezer_state initialized with FREEZER_ERROR fixes
the problem.

v2: improve description, rename to origin_freezer_state

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
781676f102 clang-format/zdtm: fix clang complains about strange elseifs
Clang-format v13 on my Fedora 35 complains about these hunks, more over
reading the formating we had before is a pain:

} else /* comment */
	if (smth) {
	fail("")
	return -1;
}

Let's make explicit {} braces for else, this way it looks much better.

Fixes: 93dd984ca ("Run 'make indent' on all C files")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Nicolas Viennot
d2b6faf8fd tests: improve the deterministic behavior of the test suite
Various I/O objects are unclosed when the object falls out of scope.
This can lead to non-deterministic behavior.

Also fixed a few missing list(). It doesn't play way with python3.
e.g., `random.shuffle(filter(...))` doesn't work.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
94092ce002 zdtm.py: make tests with --link_remap exclusive
We see that tests mntns_ghost01 and unlink_fstat03 can run
simultaneousely and thus the former sees leftover link_remap.* files in
the test directory created by the latter, and the latter is still
running so it's ok to have link_remap.* at this point.

Let's implicitly make all --link-remap tests exclusive (not running in
parallel).

Fixes: #1633

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
1f9e2c4204 ci: disable socket-raw test on centos8
We see error in centos8 ci on restore of socket-raw test:

  inet: \tRestore: family AF_INET type SOCK_RAW proto 66
  port 66 state TCP_CLOSE src_addr 0.0.0.0
  Error (criu/sk-inet.c:834): inet: Can't create inet socket:
  Protocol not supported

Centos 8 kernel replaces IPPROTO_MPTCP(262) with "in-kernel" value
IPPROTO_MPTCP_KERN(66) on inet_create(), but later shows this inkernel
value to criu when listing sockets info. Same code in inet_create()
returns EPROTONOSUPPORT on the attempr to create socket with
IPPROTO_MPTCP_KERN. So this ci error is completely rh8 kernel related.
Kernel should not show "in-kernel" value to userspace. But anyway this
is already changed in Centos 9 kernel, so we can just skip socket-raw
test on Centos 8.

v2: use cirrus.yml

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
a9d9fb8aa5 clang-format: make x86_ins_capability_mask human-readable
There is no option in clang not to merge as much binary operands as it
fits in column limit, but here we need each bit on new line to make it
readable, so let's disable clang-format for x86_ins_capability_masks.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Liu Hua
4ff252656d cr-dump: fail dumping when zombie process with sid 0
A zombie process with 0 sid has a session leader in
outer pidns and has ignored SIGHUP. Criu has no idea
to restore this type of process, so fail the dumpping.

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
26db7adbbb clang-format: do automatic comment fixups
Result of `make indent` after enabling AlignTrailingComments.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
2064793225 clang-format: do several manual comment fixups
Automatic AlignTrailingComments fails to make those comments look right,
so let's do it manually, so that they both satisfy AlignTrailingComments
and also are human-readable.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
bbfd9031a3 clang-format: enable AlignTrailingComments
Code becomes much more human-readable after enabling it.

Example 1:

Before:
```
struct file_desc {
	u32 id; /* File id, unique */
	struct hlist_node hash; /* Descriptor hashing and lookup */
	struct list_head fd_info_head; /* Chain of fdinfo_list_entry-s with same ID and type but different pids */
	struct file_desc_ops *ops; /* Associated operations */
	struct list_head fake_master_list; /* To chain in the list of file_desc, which don't
						    have a fle in a task, that having permissions */
};
```
After:
```
struct file_desc {
	u32 id;                            /* File id, unique */
	struct hlist_node hash;            /* Descriptor hashing and lookup */
	struct list_head fd_info_head;     /* Chain of fdinfo_list_entry-s with same ID and type but different pids */
	struct file_desc_ops *ops;         /* Associated operations */
	struct list_head fake_master_list; /* To chain in the list of file_desc, which don't
					    * have a fle in a task, that having permissions */
};
```

Example 2:

Before:
```
enum fsconfig_command {
	FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */
	FSCONFIG_SET_STRING = 1, /* Set parameter, supplying a string value */
	FSCONFIG_SET_BINARY = 2, /* Set parameter, supplying a binary blob value */
	FSCONFIG_SET_PATH = 3, /* Set parameter, supplying an object by path */
	FSCONFIG_SET_PATH_EMPTY = 4, /* Set parameter, supplying an object by (empty) path */
	FSCONFIG_SET_FD = 5, /* Set parameter, supplying an object by fd */
	FSCONFIG_CMD_CREATE = 6, /* Invoke superblock creation */
	FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */
};
```
After:
```
enum fsconfig_command {
	FSCONFIG_SET_FLAG = 0,        /* Set parameter, supplying no value */
	FSCONFIG_SET_STRING = 1,      /* Set parameter, supplying a string value */
	FSCONFIG_SET_BINARY = 2,      /* Set parameter, supplying a binary blob value */
	FSCONFIG_SET_PATH = 3,        /* Set parameter, supplying an object by path */
	FSCONFIG_SET_PATH_EMPTY = 4,  /* Set parameter, supplying an object by (empty) path */
	FSCONFIG_SET_FD = 5,          /* Set parameter, supplying an object by fd */
	FSCONFIG_CMD_CREATE = 6,      /* Invoke superblock creation */
	FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */
};
```

Example 3:

Before:
```
	ret = libnet_build_tcp(ntohs(sk->dst_addr->v4.sin_port), /* source port */
			       ntohs(sk->src_addr->v4.sin_port), /* destination port */
			       data->inq_seq, /* sequence number */
			       data->outq_seq - data->outq_len, /* acknowledgement num */
			       flags, /* control flags */
			       data->rcv_wnd, /* window size */
			       0, /* checksum */
			       10, /* urgent pointer */
			       LIBNET_TCP_H + 20, /* TCP packet size */
			       NULL, /* payload */
			       0, /* payload size */
			       l, /* libnet handle */
			       0); /* libnet id */
```
After:
```
	ret = libnet_build_tcp(ntohs(sk->dst_addr->v4.sin_port), /* source port */
			       ntohs(sk->src_addr->v4.sin_port), /* destination port */
			       data->inq_seq,                    /* sequence number */
			       data->outq_seq - data->outq_len,  /* acknowledgement num */
			       flags,                            /* control flags */
			       data->rcv_wnd,                    /* window size */
			       0,                                /* checksum */
			       10,                               /* urgent pointer */
			       LIBNET_TCP_H + 20,                /* TCP packet size */
			       NULL,                             /* payload */
			       0,                                /* payload size */
			       l,                                /* libnet handle */
			       0);                               /* libnet id */
```

Example 4:

Before:
```
static struct testcase __testcases[] = {
	{ 2, 1, 2, 1, 2, 1 }, /* session00                      */
	{ 4, 2, 4, 2, 4, 1 }, /*  |\_session00          */
	{ 15, 4, 4, 4, 15, 1 }, /*  |  |\_session00             */
	{ 16, 4, 4, 4, 15, 1 }, /*  |   \_session00             */
	{ 17, 4, 4, 4, 17, 0 }, /*  |  |\_session00             */
	{ 18, 4, 4, 4, 17, 1 }, /*  |   \_session00             */
	{ 5, 2, 2, 2, 2, 1 }, /*  |\_session00          */
	{ 8, 2, 8, 2, 8, 1 }, /*  |\_session00          */
	{ 9, 8, 2, 2, 2, 1 }, /*  |   \_session00               */
	{ 10, 2, 10, 2, 10, 1 }, /*  |\_session00               */
	{ 11, 10, 11, 2, 11, 1 }, /*  |    \_session00          */
	{ 12, 11, 2, 2, 2, 1 }, /*  |        \_session00        */
	{ 13, 2, 2, 2, 2, 0 }, /*   \_session00         */
	{ 3, 13, 2, 2, 2, 1 }, /* session00                     */
	{ 6, 2, 6, 2, 6, 0 }, /*   \_session00          */
	{ 14, 6, 6, 6, 6, 1 }, /* session00                     */
};
```
After:
```
static struct testcase __testcases[] = {
	{ 2, 1, 2, 1, 2, 1 },     /* session00                  */
	{ 4, 2, 4, 2, 4, 1 },     /*  |\_session00              */
	{ 15, 4, 4, 4, 15, 1 },   /*  |  |\_session00           */
	{ 16, 4, 4, 4, 15, 1 },   /*  |   \_session00           */
	{ 17, 4, 4, 4, 17, 0 },   /*  |  |\_session00           */
	{ 18, 4, 4, 4, 17, 1 },   /*  |   \_session00           */
	{ 5, 2, 2, 2, 2, 1 },     /*  |\_session00              */
	{ 8, 2, 8, 2, 8, 1 },     /*  |\_session00              */
	{ 9, 8, 2, 2, 2, 1 },     /*  |   \_session00           */
	{ 10, 2, 10, 2, 10, 1 },  /*  |\_session00              */
	{ 11, 10, 11, 2, 11, 1 }, /*  |    \_session00          */
	{ 12, 11, 2, 2, 2, 1 },   /*  |        \_session00      */
	{ 13, 2, 2, 2, 2, 0 },    /*   \_session00              */
	{ 3, 13, 2, 2, 2, 1 },    /* session00                  */
	{ 6, 2, 6, 2, 6, 0 },     /*   \_session00              */
	{ 14, 6, 6, 6, 6, 1 },    /* session00                  */
};
```

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
718f4cae2c zdtm: make sock_opts02 also check lock change by SO_*BUF*
Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
3a875cc4c7 zdtm: add test for socket buffer size locks
Just set all possible values 0-3 and chack if it persists.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
e69be16db7 sockets: c/r bufer size locks
When one sets socket buffer sizes with setsockopt(SO_{SND,RCV}BUF*),
kernel sets coresponding SOCK_SNDBUF_LOCK or SOCK_RCVBUF_LOCK flags on
struct sock. It means that such a socket with explicitly changed buffer
size can not be auto-adjusted by kernel (e.g. if there is free memory
kernel can auto-increase default socket buffers to improve perfomance).
(see tcp_fixup_rcvbuf() and tcp_sndbuf_expand())

CRIU is always changing buf sizes on restore, that means that all
sockets receive lock flags on struct sock and become non-auto-adjusted
after migration. In some cases it can decrease perfomance of network
connections quite a lot.

So let's c/r socket buf locks (SO_BUF_LOCKS), so that sockets for which
auto-adjustment is available does not lose it.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
ce5ce285a8 kerndat: check for set/getsockopt SO_BUF_LOCK availability
This is a new kernel feature to let criu restore sockets with kernel
auto-adjusted buffer sizes.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
2bd709664f sockets: don't call sk_setbufs asyncronously
We want to also c/r socket buf locks (SO_BUF_LOCKS) which are also
implicitly set by setsockopt(SO_{SND,RCV}BUF*), so we need to order
these two properly. That's why we need to wait for sk_setbufs to finish.
And there is no much point in seting buffer sizes asyncronously anyway.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Nicolas Viennot
37a8090d8c tests: improve the image streamer process control
When exceptions are raised during testing, the image streamer process
should be terminated as opposed to being left hanging.
This could lead to the whole test suite to be left hanging as it waits
for all child processes to exit.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
dae0704b61 ci: use Fedora 34 for lint CI runs
Fedora 35 comes with clang 13 which provides different results for
clang-format than clang 12 in Fedora 34.

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
f7bc3bdc91 ci: fix userfaultfd test failures
Newer kernels (5.11) require echo 1 > /proc/sys/vm/unprivileged_userfaultfd

Without the 'echo 1' the kernel prints a message like this:

 uffd: Set unprivileged_userfaultfd sysctl knob to 1 if kernel faults must be handled without obtaining CAP_SYS_PTRACE capability

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Adrian Reber
d17eb325c6 ci: replace deprecated codecov bash uploader
Replace deprecated codecov bash uploader with new version:

https://about.codecov.io/blog/introducing-codecovs-new-uploader/

Signed-off-by: Adrian Reber <areber@redhat.com>
2022-04-28 17:53:52 -07:00
Nicolas Viennot
c1659c386d net: optimize restore_rule() to not open the CR_FD_RULE image file twice
Previously, `open_image(CR_FD_RULE, O_RSTR, pid)` was called twice.
Opening an image file twice is not allowed when streaming the image.
This commit optimizes the code to only open the image file once.

Also improved the error path in restore_ip_dump().

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
e3a853ab99 criu-ns: make pidns init first do setsid
We see that on criu-ns dump/restore/dump of the process which initially
was not a session leader (with --shell-job option) we see sid == 0 for
it and fail with something like:

Error (criu/cr-dump.c:1333): A session leader of 41585(41585) is outside of its pid namespace

Note: We should not dump processes with sid 0 (even with --shell-job) as
on restore we can can put such processes from multiple sessions into
one, which is wrong.

Fixes: #232

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
c750e62cac util: use nftw in rmrf helper
This simplifies the code by removing excess recursion and reusing
standard function to walk over file-tree instead of opencoding it.

This addresses problem mentioned in my review comment:
https://github.com/checkpoint-restore/criu/pull/1495#discussion_r677554523

Fixes: 0db135ac4 ("util: add rm -rf function")

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
fu.lin
485a83c110 tty: fix the null pointer of get_tty_driver
v2: split error checking from index variable initialization
v3: use PRIx64 for printing dev_t

Signed-off-by: fu.lin <fulin10@huawei.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Pavel Tikhomirov
7ba4d3bf1d pie/restorer: remove excess hash printf specifier
We use here "%#x" printf specifier in pie code, but sbuf_printf core pie
printing function knows nothing about '#' specifier. More over simple
"%x" in pie does same as "%#x" in stdio printf, see print_hex* functions
add "0x" before hex numbers.

We've got this error on vzt-cpt runs in Virtuozzo:

(04.750271) pie: 158: Adjust id
Error: Unknown printf format %#

So to fix it we can just remove '#'.

Fixes: ecd432fe2 ("timerfd: Implement c/r procedure")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
bffaa7d072 ci: enable coredump tests
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
bf8382a800 make: enable lint for coredump
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
8aa7694558 test/coredump: fix shellcheck errors
ShellCheck reports the following problems:

SC2086: Double quote to prevent globbing and word splitting.
SC2035: Use ./*glob* or -- *glob* so names with dashes won't become options.
SC1091: Not following: ../env.sh was not specified as input (see shellcheck -x).

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
0b3cf5c9e5 coredump: lint fix visually indented line
Continuation line over-indented for visual indent
https://www.flake8rules.com/rules/E127.html

Visually indented line with same indent as next logical line
https://www.flake8rules.com/rules/E129.html

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
3a689ed9a6 coredump: fix comparison to true
Comparison to true should be 'if cond is true:' or 'if cond:'
https://www.flake8rules.com/rules/E712.html

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
c1eab7d06a coredump: fix too many blank lines
https://www.flake8rules.com/rules/E303.html

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
baad88d5ec coredump: fix missing whitespace around operator
Missing whitespace around arithmetic operator
https://www.flake8rules.com/rules/E226.html

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
579066633b coredump: lint fix for block comments
Block comment should start with '# '
https://www.flake8rules.com/rules/E265.html

Inline comment should start with '# '
https://www.flake8rules.com/rules/E262.html

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
06306c8b1e coredump: drop exec permission
The shebang line in this file was removed in a previous commit and the
file should be non-executable.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
1b368238b5 coredump: drop unused variable
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
a92a7887a6 python: replace equality with identity test
PEP8 recommends for comparisons to singletons like None to always be
done with 'is' or 'is not', never the equality operators.

https://python.org/dev/peps/pep-0008/#programming-recommendations

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
c71a81a6bd coredump: convert indentation to spaces
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
bf8a3c9f62 coredump: sort imports
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
a0b738cb8f coredump: remove unused import
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
AndreyVV-100
1c866dbb51 Add new files for running criu-coredump via python 2 or 3
Previous commit added support for python3 in criu-coredump. For convenience,
add two files (coredump-python2 and coredump-python3) that start
criu-coredump with respective python version. Edit env.sh accordingly.

Signed-off-by: Andrey Vyazovtsev <viazovtsev.av@phystech.edu>
2022-04-28 17:53:52 -07:00
Andrey Vyazovtsev
3180d35fa4 Add support for python3 in criu-coredump
Resolve the following python3 portability issues:

1) Python 3 needs explicit relative import path.

2) Coredumps are binary data, not unicode strings. Use byte strings
(b"" instead of "") and open files in binary format.

3) Some functions (for example: filter) return a list in python 2,
but an iterator in python 3. Port code to a common subset of python 2
and python 3 using itertool.

4) Division operator / changed meaning in Python 3. Use explicit
integer division (//) where appropriate.

Signed-off-by: Andrey Vyazovtsev <viazovtsev.av@phystech.edu>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
f24360658f criu(8): Add more detailed description about --tcp-close dump option
The expected behavior of --tcp-close option when dumpping is to close
all established tcp connections including connection that is once
established but now closed. This adds an explicit description about
that behavior.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
abf6b15c14 zdtm: Dumping/restoring with --tcp-close on TCP_CLOSE socket
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Bui Quang Minh
7959730551 tcp: Skip restoring TCP state when dumping with --tcp-close
Since commit e42f5e0 ("tcp: allow to specify --tcp-close on dump"),
--tcp-close option can be used when checkpointing. This option skips
checkpointing established socket's state (including once established
but now closed socket). However, when restoring, we still try to
restore closed socket's state. As a result, a non-existent protobuf
image is opened.

This commit skips TCP_CLOSE socket when restoring established TCP
connection and removes the redundant check for TCP_LISTEN socket as
TCP_LISTEN socket cannot reach this function.

Suggested-by: Andrei Vagin <avagin@gmail.com>
Suggested-by: Radostin Stoyanov <radostin@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj
74d1233b59 criu/files: Don't cache fd ids for device files
Restore operation fails when we perform CR operation of multiple
independent proceses that have device files  because criu caches
the ids for the device files with same mnt_ids, inode pair. This
change ensures that even in case of a cached id found for a device, a
unique subid is generated and returned which is used for dumping.

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj
7b6239b6dd criu/plugin: Implement dummy amdgpu plugin hooks
This is just a placeholder dummy plugin and will be replaced by a proper
plugin that implements support for AMD GPU devices. This just
facilitates the initial pull request and CI build test trigger for early
code review of CRIU specific changes. Future PRs will bring in more
support for amdgpu_plugin to enable CRIU with AMD ROCm.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
2022-04-28 17:53:52 -07:00
Rajneesh Bhardwaj
17e2a8c709 criu: Introduce new device file plugin hooks
Currently CRIU cannot handle Checkpoint Restore operations when a device
file is involved in a process, however, CRIU allows flexible extensions
via special plugins but still, for certain complex devices such as a GPU,
the existing hooks are not sufficient. This introduces few new hooks
that will be used to support Checkpoint Restore operation with AMD GPU
devices and potentially to other similar devices too.

 - HANDLE_DEVICE_VMA
 - UPDATE_VMA_MAP
 - RESUME_DEVICES_LATE

 *HANDLE_DEVICE_VMA:
	Hook to detect a suitable plugin to handle device file VMA with
	PF | IO mappings.

 *UPDATE_VMA_MAP:
	Hook to handle VMAs during a device file restore.

	When restoring VMAs for the device files, criu runs sys_mmap in
	the pie restore context but the offsets and file path within a
	device file may change during restore operation so it needs to be
	adjusted properly.

 *RESUME_DEVICES_LATE:
	Hook to do some special handling in late restore phase.

	During criu restore phase when a device is getting restored with
	the help of a plugin, some device specific operations might need
	to be delayed until criu finalizes the VMA placements in address
	space of the target process. But by the time criu finalizes this,
	its too late since pie phase is over and control is back to criu
	master process. This hook allows an external trigger to each
	resuming task to check whether it has a device specific operation
	pending such as issuing an ioctl call? Since this is called from
	criu master process context, supply the pid of the target process
	and give a chance to each plugin registered to run device
	specific operation if the target pid is valid.

A future patch will add consumers for these plugin hooks to support AMD
GPUs.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
2022-04-28 17:53:52 -07:00
Radostin Stoyanov
dd46e79196 criu(8): add --external net option
Support for external net namespaces has been introduced with
commit c2b21fbf (criu: add support for external net namespaces).

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2022-04-28 17:53:52 -07:00
Andrei Vagin
be239109a8 github: update the stale version
https://github.com/actions/stale/issues/708
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2022-04-06 10:57:20 -07:00
Adrian Reber
4a1731891e criu: Version 3.16.1
* Switch criu-ns from unversioned 'python' to 'python3'
   for easier distribution packaging
 * Add '--join-ns' interface to libcriu to allow joining
   namespaces via libcriu like CLI and RPC already allow

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-10-13 22:44:30 -07:00
Radostin Stoyanov
62b3779574 Makefile: add shellcheck test/others/libcriu/*.sh
Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-10-12 12:58:43 -07:00
Radostin Stoyanov
59d0dfba96 test/libcriu: print logs on fail
run_test was trying to read criu logs on build failure
instead of runtime error.

This patch also removes the unnecessary subfolder with name "i"
and resolves some of issues reported by shellcheck.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-10-12 12:58:43 -07:00
Radostin Stoyanov
53bf82bcfc test/libcriu: add test case for join-ns
This test case aims to verify that CRIU correctly
restores a process in IPC, UTS and Time namespaces
with criu_join_ns_add() libcriu API.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-10-12 12:58:43 -07:00
Radostin Stoyanov
a8c5efe4c1 libcriu: define log level constants
Replace magic numbers used to set log level in libcriu
with constants.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-10-12 12:58:43 -07:00
Radostin Stoyanov
5ec2a6aaad libcriu: add join_ns API
In runc we use the join-ns RPC API to enable checkpoint/restore of
containers with shared namespaces. Shared namespaces are often used
when containers run inside Kubernetes Pod.

In crun we use libcriu to interface with CRIU, however it currently
doesn't provide an API for join-ns. This patch adds the necessary
libcriu API to enable checkpoint/restore of containers with shared
namespaces with crun.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-10-12 12:58:43 -07:00
Radostin Stoyanov
f2cdb062aa Makefile: install criu-ns only with python3
Python 2 has been deprecated since January 1, 2020 and linux distributions
already support Python 3. Thus, to simplify maintenance and packaging
we could support criu-ns as Python 3 only.

v2: Add a message for criu-ns installation

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-10-12 12:58:43 -07:00
Radostin Stoyanov
a15a63fce0 criu-ns: change python shebang to python3
PEP 394 recommends changing python shebangs to python3 when Python 3.x
is supported. This is similar to `crit-python3`.

https://www.python.org/dev/peps/pep-0394/

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-10-12 12:58:43 -07:00
Adrian Reber
000ea82669 criu: Version 3.16
Amongst a huge number of fixes all over the place and the move away from
Travis this release introduces:

* better support for restoring containers into existing pods
* pidfd based pid reuse detection for RPC clients
* allow restoring of precreated veth devices
* license change for all files in the images/ directory to MIT
* criu-ns helper script
* use clang-format for automatic code indentation
* support checkpoint/restore of stacked apparmor profiles
* [GSoC] Add nftables based network locking/unlocking (Zeyad Yasser)

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-22 08:22:46 -07:00
Radostin Stoyanov
8567a09524 ci: Update openj9 container images
The AdoptOpenJDK docker images have been deprecated in favor of
eclipse-temurin.

https://github.com/AdoptOpenJDK/openjdk-docker/pull/604

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-09-22 08:21:06 -07:00
Nicolas Viennot
0b2a7223bb mount: fix double-dump file system bug
In the function dump_one_fs(), there's a comment that says "mnt_bind is
a cycled list, so list_for_each can't be used here." That means that the
list head of the list is also a node of the list.

The subsequent list_for_each_entry() marks all the mount info nodes as
dumped, except it skips the list head, which is also a mount info.
This is the bug we fix.

This bug made CRIU dump a file system twice.
See https://github.com/checkpoint-restore/criu-image-streamer/issues/8

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2021-09-17 10:42:21 -07:00
Radostin Stoyanov
bea9580e3c gitignore: add build directory
The build directory is generated when running make install.

build/ was initially added to gitignore with commit 967797a (Add build
directory to gitignore) and it was accidentally removed.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-09-17 10:42:21 -07:00
Radostin Stoyanov
4db8ef15ce podman-test: use crun from git repository
This patch allows to test the integration of libcriu
with the upstream crun.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-09-17 10:42:21 -07:00
Radostin Stoyanov
6a15dbdefa lib: install images/rpc.pb-c.h
Since commit 1c25914 compiling crun with libcriu also requires
/usr/include/criu/rpc.pb-c.h

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
2021-09-17 10:42:21 -07:00
Pavel Tikhomirov
c6b5e7d923 sk-unix: fix prep_unix_sk_cwd root and cwd restoring
If we save root and cwd after we've already switched to mntns we save
different root and cwd from what we had before prep_unix_sk_cwd, we just
save default root/cwd for new mntns... Let's fix it by proper order.

Also while on it lets fix ns_fd leak on switch_ns_by_fd error path.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-17 10:42:21 -07:00
Pavel Tikhomirov
f0e968ffed binfmt_misc: restore current work directory after restoring mnt ns
Else criu would change cwd while binfmt_misc leftover mount cleanup.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-17 10:42:21 -07:00
Pavel Tikhomirov
776f3cff7c autofs: restore current work directory after restoring mnt ns
Else criu would change cwd while dumping autofs mounts.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-17 10:42:21 -07:00
Liu Hua
45409c35d6 mount: use swich_mnt_ns/restore_mnt_ns helpers to simplify code
Signed-off-by: Liu Hua <weldonliu@tencent.com>
2021-09-17 10:42:21 -07:00
Liu Hua
f79d15c44d binfmt_misc: restore current work directory after restoring mnt ns
Restore current work directory after restoring mnt ns, otherwise
"stats-dump" will been written to /

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2021-09-17 10:42:21 -07:00
Liu Hua
eea63587e6 namespaces: add helpers to switch/restore mnt ns
When restoring mnt, we should restore current work directory.

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2021-09-17 10:42:21 -07:00
liuchao173
41f4489682 remove tls parameter description if without GnuTLS support
Signed-off-by: SuperSix173 <liuchao173@huawei.com>
2021-09-17 10:42:21 -07:00
Zeyad Yasser
d879220996 kerndat: create separate netns for has_nftables_concat check
There was a race window in kerndat_has_nftables_concat(void) when
two or more instances of CRIU are running in parallel.
One instance would create the table normally, and another would
fail when trying to create a table with the same name and wrongly
set kdat.has_nftables_concat = false, and will save it to kerndat
cache.

A separate netns is created to avoid table collisions.

v2: use call_in_child_process helper instead of fork

Co-developed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
aa772bf286 zdtm: fix network lock tests when run with --norst
In test/jenkins/{crit.sh,criu-dump}, ZDTM is run with --norst,
Causing tests to only go through dump wihtout restoring.

The network locking tests are highly dependant on dump/restore hooks
causing them to hang when run with --norst.

We just add a reqrst flag to all network lock tests.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
9838d34ded criu: use unique table names for nftables based locking
During network locking CRIU creates an nftables table to
add needed rules. If more than one instance of CRIU run
in parallel, those tables' names would conflict.

Solution is to append root task pid to the nftables table
name as a postfix (e.g. inet CRIU-3231).

We also need to use `create table` instead of `add table`
because using `create` returns an error in case table name
already exists so we could detect conflicts if they happen.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
ca3e3c50be inventory: save network lock method to reuse in restore
When the network is locked using a specific method like iptables
or nftables there is no need to require passing the same method
during restore.

We save the lock method during dump in the inventory image and
use that in restore.

This always overwrites the restore --network-lock option.

v2: store opts.network_lock_method directly to avoid dependency
    on rpc.proto's 'enum criu_network_lock_method'.
v3: fall back to iptables if image is generated with an older
    version of CRIU.
v4: remove --network-lock from netns_lock_* from restore

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
cd1570b15e zdtm: add ipv6 variants of net_lock_socket_* tests
v2: remove unnecessary elif and else after return in
    wait_server_addr()
v3: use IOError instead of FileNotFoundError for python2
    compatibility

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
212db1d9a6 zdtm: add nftables per-socket locking test
This is just a symlink to the original static/net_lock_socket_iptables
test with the right options passed to use nftables instead.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
826d3d7407 criu: add nftables connection locking/unlocking
This adds nftables based connection locking as an alernative
for iptables. This avoid the external dependency of
iptables-restore.

It works by creating a 'connection set', which is a set of
connection identifying tuples. Rules are added to drop packets that
match the connection tuples in the set. Locking is now reduced to
just adding the connection identifying tuple to the set.

Unlocking is just as simple as deleteing the CRIU table.

v2: split ip string conversion into two if conditions
v3: add better message when CRIU is build without libnftables support
v4: fix indentation in nftables_lock_connection_raw()
v5: move 'ret = -1' below err: lable to avoid redundancy
v6: add better error message on lock failure
v7: run make indent

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
6e59b2bd77 zdtm: add iptables per-socket locking test
When criu dumps a process with --tcp-established opt it locks
the open tcp connections so that no packets from peer enters
the stack, otherwise RST will be sent by a kernel causing the
connection to fail.

Post-start hook creates a connection with the test server and
creates a background thread that stays alive for the duration
of the test. This background thread sends data to the test
server at three stages:
- Pre-dump: Should send normally
- Pre-restore:
	If connection is locked properly, packets will be dropped
	and TCP will just retry, which will eventually be sent when
	the process is restored and the network is unlocked.
- Post-restore: Should send normally

Data sent at the three stages is then checked at the server's side.

v2:
	- remove unused imports and constants
	- delete sync file in wait_sync_file() instead of --clean
v3:
	- add comments

Co-developed-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
c15327656a zdtm: add nftables network namespace locking test
This is just a symlink to the original static/netns_lock test with
the right options passed to use nftables instead.

v2:
	- make static/netns_lock test iptables explicitly
	- prevent netns_lock tests from running in parallel because
	  netns & sync files creation were conflicting in both tests.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
19cc0bfa65 criu: add nftables netns-wide locking/unlocking
This adds nftables based internal network locking as an
alernative for iptables. This avoid the external dependency
of iptables-restore.

v2: fix indentation & rename 'free' lable to 'out'
v3: add better message when CRIU is build without libnftables support
v4:
	- move 'ret = -1' below err: lable to avoid redundancy
	- fix nft ctx memory leak in case of success in
	  nftables_network_unlock()
v5: add better error message on lock failure

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
f246ca56c0 criu: rename iptables network locking/unlocking functions
Related to the new --network-lock option, other methods for network
locking/unlocking will be added as an alternative to iptables like
nftables.

This option is used in the core network locking/unlocking hooks to
decide which method should be used, making it easier to add new
methods later smoothly.
i.e.
	- network_lock_internal
	- network_unlock_internal
	- lock_connection (renamed from nf_lock_connection)
	- unlock_connection (renamed from nf_unlock_connection)
	- unlock_connection_info (renamed from unlock_connection_info)

nf_* functions are renamed to iptables_* to avoid confusion with
other netfilter methods in the future like nftables.

v2: run make indent
v3: make error messages more descriptive

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
e9d24a2ba3 cr-check: add check for nftables based network locking
Nftables based network locking/unlocking will be added later.

Nftables sets will be used to load the connection tuples that
will be locked, to be able to store those tuples we need to
check "Set Concatenations" support.

https://wiki.nftables.org/wiki-nftables/index.php/Concatenations

v2: fix 'has_nftables_concat=true' when adding CRIU table fails
v3: add better message when CRIU is build without libnftables support
v4: run make indent

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
b85fad797c cr-service: add network_lock option to RPC and libcriu
v2: run make indent

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
2e30db5c3d criu: add --network-lock option to allow nftables alternative
This adds the option to choose the networking locking method.

CRIU currently uses iptables-restore cli for network locking/unlocking
but nftables support will be added later.

There have been reports from users that iptables-restore fails in some
way and an nftables based approach using libnftables could avoid this
external dependency.

v2: remove dependency details in man page for --network-lock.
v3: remove --network-lock from restore section in docs because it is
    automatically detected from the inventory image now.
v4: add message that --network-lock will be ignored during restore
    and value from dump will be used.
v5: run make indent

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
ef7af1dd15 Run 'make indent' on criu/include/plugin.h
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
cf2b67375a workflows/lint: show changes
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
03cdbc4c02 criu/config: fix use-after-free in parse_join_ns
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
546a6dfd0e configs: fix used after free cases
We have some code written with the assumption that argv is never
destroyed, but when we handle configs, we construct an argv-like array
for each config, parse it and release it.

I am still not sure that we need to release memory of per-config argv
arrays... The current scheme is going to be a source of used-after-free
bugs. When we will add the non-privileged mode, all these bugs will be
serious security issues.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
399a53a43f lsm: do not print a warning if no LSM has been detected
Without any LSM detected CRIU would print a warning for every run:

 Warn  (criu/lsm.c:328): don't know how to suspend LSM 0

Which clutters up the CI logs.

Change the message to a debug message.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
960f26f907 files-reg: do not print a warning if a file has no build_id
On systems where there is no build_id we get a warning for each run

 Warn  (criu/files-reg.c:1458): Couldn't find the build-id note for file with fd 15

Change the log level to debug for this message as the file just does not have
a build_id and printing a warning clutters the CI logs.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
90e175d52f zdtm/pthread_timers: make sure glibc allocated SIGEV_THREAD's stack
On Virtuozzo7 jenkins we see a fail of criu-dev zdtm:

  ===================== Run zdtm/static/pthread_timers in ns =====================
  Start test
  ./pthread_timers --pidfile=pthread_timers.pid --outfile=pthread_timers.out
  Run criu dump
  =[log]=> dump/zdtm/static/pthread_timers/112/1/dump.log
  ------------------------ grep Error ------------------------
  (00.004817) netlink: Collect netlink sock 0x1cad6e21
  (00.004821) netlink: Collect netlink sock 0x1cad6e22
  (00.004831) Collecting pidns 9/112
  (00.004886) No parent images directory provided
  (00.004903) Warn  (criu/lsm.c:328): don't know how to suspend LSM 0
  ------------------------ ERROR OVER ------------------------
  Run criu restore
  4: Old maps lost: set([])
  4: New maps appeared: set([u'7fe4c54ca000-7fe4c54cb000 ---p', u'7fe4c0000000-7fe4c0021000 rw-p', u'7fe4c0021000-7fe4c4000000 ---p', u'7fe4c54cb000-7fe4c5ccb000 rw-p'])
  ############# Test zdtm/static/pthread_timers FAIL at maps compare #############

https://ci.openvz.org/job/CRIU/job/CRIU-virtuozzo/job/criu-dev/8032/consoleFull

First thing to mention is that this is not related to criu. I can manage
to reproduce it with "--nocr", problem is that some mapping appears a
bit later when we do pre-cr get_visible_state().

By debugging SIGEV_THREAD thread with gdb I can see that addresses from
this unexpectedly appearing mapping are used by glibc here as "struct
pthread *pd":

 clone()
  start_thread()
   timer_helper_thread()
    __pthread_create_2_1()

So the mapping looks allocated by allocate_stack(), and it is only
gets done after first timer trigger (we have glibc-2.17 on vz7):

https://github.com/bminor/glibc/blob/release/2.17/master/nptl/sysdeps/unix/sysv/linux/timer_routines.c#L92

So let's wait at least 1 timer trigger so that memory outfit of the test
become permanent and our check_visible_state zdtm check would not be
false negative.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
dd0e66149f ci: fix 'crit.sh: 3: source: not found'
Jenkins test runs are failing with:

 ./test/jenkins/run_ct ./test/jenkins/crit.sh
 ./test/jenkins/crit.sh: 3: source: not found

Switch to bash which has 'source'.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
e936a0f8ad docker-test: refactor test scenario
The following error occurs when creating a checkpoint of
a container immediately after the container has been restored
from another checkpoint.

Error response from daemon: Cannot checkpoint container cr: content
sha256:12c69b7a9d25695dd5f9d37d4e858e2f7c3f9da738ccf86f8d3042f6973af1df:
already exists

In this patch we add a healthcheck to the test container and update the
test to perform a checkpoint only when the container is in a 'healthy'
state. In addition, this patch adds a scenario to test the
checkpoint/restore of multiple containers.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Andrei Vagin
78eb0dabf9 dump: suspend/resume lsm on pre-dump
Otherwise, criu pre-dump can fail with errors like this:
lib/infect.c:650: Unable to connect a transport socket: Permission denied'

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Christian Brauner
5dc373385d util: add run_command()
Make it easy to run commands and capture the output in a user provided
buffer.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Christian Brauner
9422383b6b zdtm/apparmor_stacking: don't include optional AppArmor namespace separator
AppArmor namespaces are officially colon-separated. The double-slash
syntax is just convenience:

"The trailing : separates the namespace name from the profile name and
the optional / and // separators are provided as a convenience for those
familiar with ssh and protocol urls." (see [1])

[1]: https://gitlab.com/apparmor/apparmor/-/wikis/AppArmorNamespaces
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Christian Brauner
dc4c3cd48b apparmor: actually enable suspend for AppArmor
The original patches didn't pass down the "suspend" boolean into
write_aa_policy() and so suspend never really happened. Pass it down.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Christian Brauner
ea1c891476 lsm: handle SELinux LSM correctly
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Tycho Andersen
06b5d2fa8d tests: add a test for apparmor_stacking
v2: use a profile that doesn't have "unix" to test the suspend feature too
v3: use "/" in the profile names to make sure this works

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Tycho Andersen
8723e3f998 check: add a feature test for apparmor_stacking
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Tycho Andersen
8d992a680e lsm: support checkpoint/restore of stacked apparmor profiles
Support for apparmor namespaces and stacking is coming to Ubuntu kernels in
16.10, and should hopefully be upstreamed Soon (TM) :).

The basic idea is similar to how cgroups are done: we can restore the
apparmor namespace and profile blobs independently of the tasks, and then
at the end we can just set the task's label appropriately. This means the
code that moves tasks under a label stays the same, and the only new code
is the stuff that dumps and restores the policy blobs that are in the
namespace that were loaded by the container.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Tycho Andersen
0db135ac4f util: add rm -rf function
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Tycho Andersen
6085c37ba2 lsm: change when LSM profiles are collected
Instead of collecting profiles right at thread dump time, let's collect
them all at a single point. This way, when we add support for suspending
LSMs, we know what profiles if any we need to suspend.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
e2a45d7867 ci: extend lint run to run 'make indent'
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
70833bcf29 Run 'make indent' on header files
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
93dd984ca0 Run 'make indent' on all C files
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
1e26f170ca criu: introduce clang-format to format source code
This is another attempt to introduce a tool to format CRIU's source
code. This time it is based on clang-format.

The .clang-format file is taken from the linux kernel git tree (5.13).

I removed all comments from lines which state that it requires at least
clang-format 4 or 5. For this resulting file at least clang-format 11
is required. See scripts/fetch-clang-format.sh for all the changes
done to the Linux kernel .clang-format file.

Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
cc2317ea48 zdtm: fix indentation in Makefile wait_stop target
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
d62e747e91 ci: fix Fedora Rawhide
Fedora Rawhide updated to a glibc using clone3(). clone3() is, however,
not yet part of the seccomp filter. Unfortunately 'docker build' does
not allow dropping seccomp but luckily 'podman build' does.

This switches the Fedora Rawhide test to use Podman. Podman is part of
GitHub Actions and no additional packages need to be installed.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
b32c8c6fe5 posix-timers: fix getoverrun error handling
If retcode of dump_posix_timers is not zero it is treated as an error in
compel_rpc_sync. And currently we can return positive overrun of last
timer (if we are lucky and it is not zero) as retcode of function
dump_posix_timers, let's fix it. Also I don't see any point in putting
negative value into .overrun on error path, fix it too.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
01fa34f1eb ci: use pre-installed Podman
GitHub Actions already has Podman installed. No need to install it a
second time.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
918901439f zdtm/pthread_timers: require ns_pid feature and add non-ns test
Resolving real pid to vpid for notify thread ids requires NSpid feature
supported by kernel, though in simple non-pid-ns case we can deal
without it, so add a requirement and split out the host test without the
requirement.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
e1b1547c80 posix-timers: fallback notify thread id encoding for non-pidns and non-nspid
1) If all dumped processes are in host pidns we can skip pid conversion
   logic and just use real pid.

2) If we have pidns to dump we should also have kernel NSpid feature,
   else we should fail to dump notify thread id, as it's not possible to
   properly convert rpid to vpid.

While on it let's put the code to encode_notify_thread_id helper to
improve code readability.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
91d7203b80 proc_parse: make nspid field optional
On Centos7 we don't have NSpid field in /proc/[pid]/status so for
compatibility let's skip it.

(cherry-pick one hunk from Virtuozzo commit
https://src.openvz.org/projects/OVZ/repos/criu/commits/c6d0ee567c)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Kirill Tkhai
a692a0d0af kerndat: Check that "/proc/[pid]/status" file has NS{pid, ..} lines
If there is nested pid_ns, we need to be able to get pid in
the whole pid hierarhy. This may be taken from "/proc/[pid]/status"
file only. Check, that kernel has support for it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

NSpid is not (yet) supported on Centos7 thus we need this check for
compatibility.

(cherry-picked from Virtuozzo criu commit
https://src.openvz.org/projects/OVZ/repos/criu/commits/94f4653f20)

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
64f0012e44 zdtm: add a test for SIGEV_THREAD timers
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
7eab5a7dc7 timers: save tid from a task pid namespace
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
61e1334ab0 proc_parse: get a thread ID in a thread pidns from /proc/pid/status
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Liu Chao
80079fbb0d criu: dump and restore notify_thread_id of posix timer
When sigev_notify_thread_id is not set, get_pid will return a NULL
pointer and do_timer_create will return -EINVAL in kernel. So criu
will failed to create posix timer:

(09.806760) pie: 41301: Error (criu/pie/restorer.c:1998): Can't restore posix timers -22
(09.806824) pie: 41301: Error (criu/pie/restorer.c:2133): Restorer fail 41301
(09.891880) Error (criu/cr-restore.c:2596): Restoring FAILED.

Signed-off-by: Liu Chao <liuchao173@huawei.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
6be9345fb1 criu-ns: add support for 'check' action
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
868bffba4d criu-ns: add top-level conditional execution
Execute actions only if run as a script.
https://docs.python.org/3/library/__main__.html

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
f70605ef1e criu-ns: update script name in help message
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
f472e2590e Documentation: Add man page for criu-ns
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
8891e51cd4 make: install criu-ns
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
4a9bcd8844 zdtm: prioritize /lib/* dependencies in some tests
Prioritize /lib/* because iptables fails to search /usr/lib64/*
first on archlinux.

This change of 'deps' order prioritizes the default library location.

This affects:
	- zdtm/static/netns-nf
	- zdtm/static/netns-nft-ipt
	- zdtm/static/socket-tcp-closed-last-ack
	- zdtm/static/socket-tcp-reseted
	- zdtm/static/socket-tcp-syn-sent

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
00ca2b519e scripts/build: add a docker file for archlinux
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
694eafa1f6 protobuf: remove leading underscores from protobuf structs
Fixes: #1560

The latest protobuf-c compiler breaks CRIU because they removed
leading underscores from structs in 1.4.0.

This replaces those definitions with the standard generated structs.

v2: remove struct _VmaEntry, struct _CredsEntry and struct _CoreEntry

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
efb9fccd41 cgroup: cgroup_contains has to update the mask for cgroupv2
Fixes: #1524
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
ac27562f09 ci: add msgque test case to crit-recode
crit-recode on msgque tests was broken in Jenkins. Run it also in GitHub
CI.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
7e86519fe3 lib: fix crit-recode msgque errors in Jenkins
b'test/dump/zdtm/static/msgque/63/1/ipcns-msg-11.img'  encode fails: expected bytes-like object, not str
b'test/dump/zdtm/static/msgque/63/1/ipcns-msg-11.img' pretty  encode fails: expected bytes-like object, not str
FAIL

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
5034885974 ci/openj9: run mrproper before make
Make sure to remove all files created from previous local build
before compiling in the container.

Reported-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
7ff785e1d4 zdtm: make --sbs also stop on each pre-dump/snap iteration
This is useful to investigate problems on pre-dump iterations. After
this patch test output with "--pre=2 --sbs" would have new usefull stop
points.

While on it let's remove confusion in sbs stop point naming. "Pause at
pre-dump" actually has nothing to do with pre-dump, let's better use
"before " instead of "at pre-", similar let's use "after " instead of
"at post-".

Result would look like:

========================== Run zdtm/static/env00 in h ==========================
Start test
./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
Pause before pre-dump 0. Press Enter to continue.
Run criu pre-dump
Pause before pre-dump 1. Press Enter to continue.
Run criu pre-dump
Pause before dump. Press Enter to continue.
Run criu dump
Pause before restore. Press Enter to continue.
Run criu restore
Pause after restore. Press Enter to continue.

v2: improve sbs step naming; rename "iter" to more meaningfull
"pre-dump"/"snap".

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Liu Hua
07316d15a6 restore: cleanup cgroup properly in error path
Cgroup yard is setup in crtools_prepare_shared. But it is not cleaned up
properly in some code path, which leads cgroup mountpoint leaking.

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
8f2b8c7be0 scripts: run lint also on criu-ns
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
bd648cc8d9 ci: also test tcp stream crit recoding
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
fa9acb9dc7 lib: fix broken crit-recode test
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
0ca36c95ee ci: combine cross compile container definitions
The cross compile container definitions for each architecture were
almost the same files except for the architecture.

This moves the architecture to variables so that all cross compile
setups can use the same container definition.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
2ebb1c7419 crit: fix error on memfd files parsing
Else we get error:

[root@fedora criu]# crit/crit x test/dump/zdtm/static/memfd00/56/1/ mems
...
Traceback (most recent call last):
  File "/home/snorch/devel/ms/criu/crit/crit", line 6, in <module>
    cli.main()
  File "/home/snorch/devel/ms/criu/crit/pycriu/cli.py", line 430, in main
    opts["func"](opts)
  File "/home/snorch/devel/ms/criu/crit/pycriu/cli.py", line 361, in explore
    explorers[opts['what']](opts)
  File "/home/snorch/devel/ms/criu/crit/pycriu/cli.py", line 283, in explore_mems
    fn = ' ' + get_file_str(opts, {
  File "/home/snorch/devel/ms/criu/crit/pycriu/cli.py", line 214, in get_file_str
    f = ft['get'](opts, ft, fd['id'])
  File "/home/snorch/devel/ms/criu/crit/pycriu/cli.py", line 165, in ftype_reg
    rf = ftype_find_in_image(opts, ft, fid, 'reg-files.img')
  File "/home/snorch/devel/ms/criu/crit/pycriu/cli.py", line 154, in ftype_find_in_image
    return f[ft['field']]
KeyError: 'reg'

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
f57e45df56 cr-service: move pidfd_store initialization to cr-service
Moved pidfd store hashtable initialization to cr-service.c since
pidfd_store is only relevant in RPC mode.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
f7cd254004 pidfd_store: tidy up interface and hide unneeded details
There is no need to expose the internals of pidfd_store like
hash table entries and how to use them for pid reuse detection.

v2: fixup some spacing issues
v3: fix hash memory leak after xmalloc in init_pidfd_store_hash()

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
083f0822ed pidfd_store: move pidfd_store to a separate file
Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
d55f34ed78 test/ci: sync netns_lock test and its --post-start hook
The --post-start hook creates a netns which the test should enter
at the beginning of the test.

The test randomly failed in CI tests, it is most likely caused by
a race condition.

I suspect this flow is root cause:
	1. --post-start hook starts just after the test (in parallel)
	2. --post-start hook calls ip netns add to create the test netns
	3. ip creates the netns file
	4. netns_lock test opens that file and uses it in setns
	5. ip mounts the netns to the file

Of course test fails at step 4 because the netns is not yet mounted
to the file.

I made the test wait for SYNCFILE to be created by the --post-start
hook before it tries to open the netns file and call setns.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
b290df9a65 test/jenkins: fix netns_lock test multiple iterations failure
netns_lock is highly dependent on the order of the hooks, and
iterations causes the --pre-dump hook to be called multiple
times which expectedly caused the test to fail.

The server loop accommodates for multiple iterations.

https://ci.kernoops.org/job/CRIU/job/CRIU-iter/job/criu-dev/431/testReport/(root)/criu/zdtm_static_netns_lock/

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
75feb9635e ci: fix mips64el-cross test
The mips64el-cross test target started to show following error:

error: listing the stack pointer register '$29' in a clobber list is deprecated [-Werror=deprecated]

This fixes it in three different places by removing $29' from the
clobber list. This is only compile tested as we have no mips hardware
for testing.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Younes Manton
f3cb156604 Keep inherit-fd strings alive until task restore
If inherit-fd is read from a config file its buffer will be freed
after the config file is parsed but before task restore, which is
when we need to use the mapping. Therefore, when adding an
inherit-fd mapping to the opts list, copy the key string to a new
buffer.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
2021-09-03 10:31:00 -07:00
fu.lin
d3ce492cc2 pycrit: fix the broken of cli the crit show xxx.img
It will broken when the cli `crit show ipcns-shm-9.img` is executed, msg:
  {
      "magic": "IPCNS_SHM",
      "entries": [
          {
              "desc": {
                  "key": 0,
                  "uid": 0,
                  "gid": 0,
                  "cuid": 0,
                  "cgid": 0,
                  "mode": 438,
                  "id": 0
              },
              "size": 1048576,
              "in_pagemaps": true,
              "extra": Traceback (most recent call last):
    File "/usr/bin/crit", line 6, in <module>
      cli.main()
    File "/usr/lib/python3/dist-packages/pycriu/cli.py", line 412, in main
      opts["func"](opts)
    File "/usr/lib/python3/dist-packages/pycriu/cli.py", line 45, in decode
      json.dump(img, f, indent=indent)
    File "/usr/lib/python3.9/json/__init__.py", line 179, in dump
      for chunk in iterable:
    File "/usr/lib/python3.9/json/encoder.py", line 431, in _iterencode
      yield from _iterencode_dict(o, _current_indent_level)
    File "/usr/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
      yield from chunks
    File "/usr/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
      yield from chunks
    File "/usr/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
      yield from chunks
    File "/usr/lib/python3.9/json/encoder.py", line 438, in _iterencode
      o = _default(o)
    File "/usr/lib/python3.9/json/encoder.py", line 179, in default
      raise TypeError(f'Object of type {o.__class__.__name__} '
  TypeError: Object of type bytes is not JSON serializable

This is caused by `img['magic'][0]['extra']` which is bytes. I find
other load condtions, fix them at the same time.

Signed-off-by: fu.lin <fulin10@huawei.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
093fb0c878 Add test for new --lsm-mount-context option
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
64dd64e504 Enable changing of mount context on restore
This change is motivated by checkpointing and restoring container in
Pods.

When restoring a container into a new Pod the SELinux label of the
existing Pod needs to be used and not the SELinux label saved during
checkpointing.

The option --lsm-profile already enables changing of process SELinux
labels on restore. If there are, however, tmpfs checkpointed they
will be mounted during restore with the same context as during
checkpointing. This can look like the following example:

 context="system_u:object_r:container_file_t:s0:c82,c137"

On restore we want to change this context to match the mount label of
the Pod this container is restored into. Changing of the mount label
is now possible with the new option --mount-context:

 criu restore --mount-context "system_u:object_r:container_file_t:s0:c204,c495"

This will lead to mount options being changed to

 context="system_u:object_r:container_file_t:s0:c204,c495"

Now the restored container can access all the files in the container
again.

This has been tested in combination with runc and CRI-O.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
5be71273f6 Remove unnecessary whitespace
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
fc7705a13f zdtm: add network namespace locking test
When criu dumps a process in a network namespace it locks
the network so that no packets from peer enters the stack,
otherwise RST will be sent by a kernel causing the connection
to fail.

In netns_lock.c we try to enter the netns created by post-start
hook so that criu locks the network namespace between dump and
restore.

A TCP server is started in post-start hook inside the test netns
and runs in the background detached from its parent so that
it stays alive for the duration of the test.

Other hooks (pre-dump, pre-restore, post-restore) try to
connect to the server.

Pre-dump and post-restore hooks should be able to connect
successfully.

Pre-restore hook client with SOCCR_MARK should also connect
successfully.

Pre-restore hook client without SOCCR_MARK should not be able
to connect but also should not get connection refused as all
packets are dropped in the namespace so the kernel shouldn't
send an RST packet as a result. Instead we check that the
connect operation causes a timeout.

This test would be useful when testing that the network is
locked using different ways (using iptables currently and
other methods later).

v2:
	- check that packets with SOCCR_MARK are allowed to
	  pass when the netns is locked.

v3:
	- fix pre-restore hook skipping non SOCCR_MARK
	  connection test due to early exit in SOCCR_MARK
	  variant.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
0cf79a3608 test: remove exec test
criu exec is deprecated for some time now and criu just exits with an
error if running 'criu exec'. This removes the test for that non-working
subcommand.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
1a197d4d86 criu: add unit testing for config file parser
This tries to add a unit test for the configuration file parser.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
45bde968a2 test: add tests for configuration file parsing
This adds a test run to ensure known (but fixed) configuration file
parser errors are not crashing CRIU anymore.

Based on missing test code coverage this script also tests code paths of
the option handling which have not been tested until now.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
f695e6e107 config: make configuration file parser more robust
Trying to see how robust the configuration parser I was able to crash
CRIU pretty quickly. This fixes a few crashes in the existing
configuration file parser.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
381d2e88fb criu: add cleanup_free attribute
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Nicolas Viennot
031a8d7905 bfd: loop through read()/write() when the action is incomplete
The callers of bread() and bwrite() assume the operation reads/writes
the complete length of the passed buffer.
We must loop when invoking the read()/write() APIs.

Fixes #1504

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
24bc083653 ci: disable some tests on CentOS 7
Now that we are running CI on an actual CentOS 7 kernel different
tests are no longer working as they require newer kernels.

This commit disables a few tests only on CentOS 7.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
63ca464bcb ci: remove old workarounds
This commit removes a couple of workaround for old kernels and
distributions which we no longer use in CI.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
6ef01d3e6a ci: switch CentOS 7 test to Cirrus CI
On Cirrus CI we can run tests on the orignal CentOS 7 kernel.

The kernel is rather old, but on GitHub Actions we a 5.8 kernel
and a containerized CentOS 7 user space not much is working
correctly anymore. With this commit CentOS 7 based tests are
no longer running on GitHub Actions but on Cirrus CI.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
1fbe876242 ci: disable -x during print_env()
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
b4c7267b0e zdtm: allow ignore taint via environment variable
With this change tainted kernels can be ignored with setting
ZDTM_IGNORE_TAINT=1. This is just to simplify the CI script to not
require to change every call of zdtm. Setting the variable once should
be enough.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
a92833818b scripts/vagrant: Use vagrant 2.2.16
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
eda3ac2ff3 scripts/vagrant: Use Fedora 34
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Mike Frysinger
87ea13f6b7 add PKG_CONFIG default in a few more places
These files use $PKG_CONFIG before they include the common files that
setup a default, so set early defaults in them too.

Signed-off-by: Mike Frysinger <vapier@chromium.org>
2021-09-03 10:31:00 -07:00
Valery Ivanov
6db0f95dbf crtools: improve error handling on signal setting
Signed-off-by: Valery Ivanov <ivalery111@gmail.com>
2021-09-03 10:31:00 -07:00
Mike Frysinger
2967bed64e build: respect $PKG_CONFIG settings
The build needs to respect $PKG_CONFIG env var like other standard
build systems and the the upstream pkg-config project itself.  This
allows the package builder to point it to the right tool when doing
a cross-compile build.  Otherwise the host pkg-config tool is used
which won't have access to the packages in the cross sysroot.

Signed-off-by: Mike Frysinger <vapier@chromium.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
81a68ad3b2 docker-test: use latest containerd release
This patch improves the changes from 19be9ced9.

To use the newer version of containerd, we need to make sure that the
containerd service has been restarted after install. Instead of
hard-coding a version number, we can use github API to get the latest
release. In addition, the tar file contains all binary files in a
'./bin' sub-folder. Thus, it should be extracted in '/usr'.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
638e53c950 zdtm/tun_ns: add per-test dependencies
The tun_ns test was introduced with [1] and [2], however, these commits
didn't add per-test dependencies required for the test.

Per-test dependencies are listed in the .desc file as 'deps': [<list>]

These dependencies are made available inside the test namespace and without
the ip dependency, the tests fails on Fedora 34 with

   Error: ipv4: FIB table does not exist.

[1] 7e355e7
[2] 3ba0893

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Adrian Reber
9d9ec73dd7 test: skip time namespaced tests on <= 5
Although CentOS 8 comes with 4.18 kernel it has time namespace patches
backported but not all the required once. This disables time namespaced
tests on everything older than 5.11.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
e42083aa8b ci: update docker test matrix
Ubuntu 18.04 still has a problem with overlayfs and breaks CRIU
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857257

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Christian Brauner
ebc74668ff cr_options: handle the case where __dest == __src in SET_CHAR_OPTS
The SET_CHAR_OPT(__dest, __src) macro is essentially:

free(opts.__dest);
opts.__dest = xstrdup(__src);

So if __dest == __src the string that get's copied is freed. This e.g.
is the case in criu/lsm.c

int lsm_check_opts(void)
{
	char *aux;

	if (!opts.lsm_supplied)
		return 0;

	aux = strchr(opts.lsm_profile, ':');
	if (aux == NULL) {
		pr_err("invalid argument %s for --lsm-profile\n", opts.lsm_profile);
		return -1;
	}

	*aux = '\0';
	aux++;

	if (strcmp(opts.lsm_profile, "apparmor") == 0) {
		if (kdat.lsm != LSMTYPE__APPARMOR) {
			pr_err("apparmor LSM specified but apparmor not supported by kernel\n");
			return -1;
		}

		SET_CHAR_OPTS(lsm_profile, aux);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	} else if (strcmp(opts.lsm_profile, "selinux") == 0) {
		if (kdat.lsm != LSMTYPE__SELINUX) {
			pr_err("selinux LSM specified but selinux not supported by kernel\n");
			return -1;
		}

		SET_CHAR_OPTS(lsm_profile, aux);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	} else if (strcmp(opts.lsm_profile, "none") == 0) {
		xfree(opts.lsm_profile);
		opts.lsm_profile = NULL;
	} else {
		pr_err("unknown lsm %s\n", opts.lsm_profile);
		return -1;
	}

	return 0;
}

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
d0511319e5 github: Add templates for new issues and pull requests
This way users would be able to create more meaningfull pull-requests
and issues. And we would not need to ask them to provide basic
information each time.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
3c10d3335b criu(8): document --join-ns option
The --join-ns option was introduced with commits:

2cf17cd
790ec46

Closes: #1054

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
80ee4f8aec kdat: make uffd_open return errno from syscall separately
Previousely kerndat_uffd could not differentiate -EPERM and -1 returned
from uffd_open(). That way "Failed to get uffd API" and "Incompatible
uffd API ..." errors were just ignored, which is probably not what we
want.

v2: rework with extra argument of uffd_open for errno, rename err
label in uffd_open for readability

Fixes: cfdeac4a4 ("kerndat: Handle non-root mode when checking uffd")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
a8525c07d4 ci: no longer avoid overlayfs
Now that the Ubuntu kernel is no longer broken with regards to
overlayfs, let's switch back to overlayfs instead of devicemapper and
vfs graphdrivers.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
2aa4185a6c test/others: refactor loop process
There are several problems with the loop.sh script. First, the code is
duplicated across tests in the so-called 'othres' category. Second, we
need to run it with the 'setsid' utility to make sure that it runs in
a new session. Third, we have to redirect the standard file descriptors
and use the '&' operator to make it run in the background. Finally,
obtaining the PID of the 'loop.sh' process resulted in race condition.

In this patch we replace the loop.sh script with a program that would
address all problems mentioned above. The requirements for this program
are as follows.
- It must be reusable across tests
- It must start a process that is detached from the current shell
- It must wait for the process to start and output its PID

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
2b78d95e6b test/others: drop '_exit' function
The function name '_exit' is misleading as this function doesn't
actually exit when the status of the previous command is zero.
In addition, the behaviour of this function is not really needed.

This patch removes the '_exit' function and applies the correct
behaviour to stop the test on failure.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Andrei Vagin
34410b9e75 test: add a test to check that sigtrap handlers are restored
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
b310fbd31f ksigset: fix a typo in ksigdelset
Fixes: 8063eb8fe6 ("parasite: don't block SIGTRAP")
Reported-by: zl-wang <zlwang@ca.ibm.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
c1b2d194e9 mem/pidfd: fix poll retry error checking
One should never rely on errno if libc syscall is successful. We can
either see an errno set from some previous failed syscall or even errno
set by a this successful libc syscall. So lets check ret first.

Fixes: 1ccdaf47 ("criu: add pidfd based pid reuse detection for RPC
clients")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
1c08709cdb zdtm: add pidfd store based pid reuse test
This is just a symlink to the original transition/pid_reuse test with
the right options passed to trigger the pidfd store based pid reuse
detection code path.

Pidfd store based detection is supported only in RPC mode which
requires passing a unix socket fd to be used as pidfd store and
the kernel should support pidfd_open and pidfd_getfd syscalls
{'feature': 'pidfd_store'} for this test to work.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
ea0dc7807a zdtm: add --pidfd-store option in RPC mode
When testing pid reuse using pidfd_store feature in RPC mode we need
to pass a unix socket fd used to CRIU in the RPC option
pidfd_store_sk to store the pidfds between predump/dump iterations.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
e79131e8c3 criu: add pidfd based pid reuse detection for RPC clients
Closes: #717

This increases the reliability of pid reuse detection using pidfds,
currently through RPC migration tools like P.Haul.

A connectionless unix socket is passed to criu in RPC mode through
the RPC option pidfd_store_sk.
If this option is set, the socket is initialized in
init_pidfd_store_sk() to be used as a queue for task pidfds.
criu then sends tasks pidfds to this socket in send_pidfd_entry()
and receives them in the next pre-dump/dump iteration to build
the pidfds hashtable in init_pidfd_store_hash().
These pidfds will be used later in detect_pid_reuse().

How it should be used in migration tools like P.Haul:
	- Open a connectionless unix socket
	- Pass the socket fd in the RPC option pidfd_store_sk when
	  doing a pre-dump or dump

This will fail if the kernel does not support pidfd_open or
pidfd_getfd syscalls, so pidfd_store_sk should not be set if the
kernel does not support pidfd_open.
This could be checked with:
	CLI: criu check --feature pidfd_store
	RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to
	     true in the "features" field of the request

v2:
	- add reasonable polling restart limit in check_pidfd_entry_state
	  to avoid getting stuck
	- avoid leaking pidfd in send_pidfd_entry when entry is NULL,
	  otherwise pidfds are freed in free_pidfd_store

v3:
	- check that the passed pidfd store is not empty after
	  the first iteration (i.e. --prev-images-dir option set).

v4:
	- clear pidfd_hash heads
	- check entry allocation error in init_pidfd_store_hash()

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
ba882893c3 cr-check: add ability to check if pidfd_store feature is supported
pidfd_store which will be used for reliable pidfd based pid reuse
detection for RPC clients requires two recent syscalls (pidfd_open
and pidfd_getfd).

We allow checking if pidfd_store is supported using:
	1. CLI: criu check --feature pidfd_store
	2. RPC: CRIU_REQ_TYPE__FEATURE_CHECK and set pidfd_store to
	   true in the "features" field of the request

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
e3c9c3429a cr-service: add pidfd_store_sk option to rpc.proto
pidfd_store_sk option will be used later to store tasks pidfds
between predumps to detect pid reuse reliably.
pidfd_store_sk should be a fd of a connectionless unix socket.

init_pidfd_store_sk() steals the socket from the RPC client using
pidfd_getfd, checks that it is a connectionless unix socket and
checks if it is not initialized before (i.e. unnamed socket).
If not initialized the socket is first bound to an abstract name
(combination of the real pid/fd to avoid overlap), then it is
connected to itself hence allowing us to store the pidfds in the
receive queue of the socket (this is similar to how fdstore_init()
works).

v2:
	- avoid close(pidfd) overriding errno of SYS_pidfd_open in
	  init_pidfd_store_sk()
	- close pidfd_store_sk because we might have leftover from
	  previous iterations

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
a9508c9864 criu: check if pidfd_getfd syscall is supported
pidfd_getfd syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.

v2:
	- check size written/read of val_a/val_b is correct
	- return with error when val_a != val_b

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
30e8d8cadf criu: check if pidfd_open syscall is supported
pidfd_open syscall will be needed later to send pidfds between
pre-dump/dump iterations for pid reuse detection.

v2:
	- make kerndat_has_pidfd_open void since 0 is always returned
	- fix missing tabs in syscall tables

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
nithin-jaikar
5d08f975a1 kerndat: Handle non-root mode when checking uffd
When criu is run as user it fails and exits because of kerndat_uffd() returning -1.
This, in turn, happens after uffd = syscall(SYS_userfaultfd, flags); which only works
for root.

In the change it ignores the permission error and proceeds further just like it's done
for e.g. pagemap checking.

Signed-off-by: Nithin Jaikar J <jaikar006@gmail.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
8c303d1a64 test/others/crit: add test for 'x'
This commit extends the CRIT tests to cover the 'x' command, which is
used to explore an image directory.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
e393001095 lib/cli.py: Open explore file as a binary
Fixes #1484

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Andrei Vagin
c8973d426b test/zdtm: check that a penging SIGTRAP handled properly
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
61c7cc5a92 parasite: don't block SIGTRAP
This is the workaround for #1429.

The parasite code contains instructions that trigger SIGTRAP to stop at
certain points. In such cases, the kernel sends a force SIGTRAP that
can't be ignore and if it is blocked, the kernel resets its signal
handler to a default one and unblocks it. It means that if we want to
save the origin signal handle

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
ed58fb2214 test: create new tls certificates
The certificates expired again. This replaces the expired
certificates with test certificates which are valid for ever:

  echo -ne "ca\ncert_signing_key\nexpiration_days = -1\n" > temp
  certtool --generate-privkey > cakey.pem
  certtool --generate-self-signed \
           --template temp \
           --load-privkey cakey.pem \
           --outfile cacert.pem
  echo -ne "cn=$HOSTNAME\nencryption_key\nsigning_key\nexpiration_days = -1\n" > temp
  certtool --generate-privkey > key.pem
  certtool --generate-certificate \
           --template temp \
           --load-privkey key.pem \
           --load-ca-certificate cacert.pem \
           --load-ca-privkey cakey.pem \
           --outfile cert.pem
  rm cakey.pem temp

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn
6beeabcd42 zdtm: add sk-unix-dgram-ghost test case
This testcase reproduces deadlock in "wait_fds_event" futex in open_fdinfos()
function (files subsystem).

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn
2609e98ee7 sk-unix: ghost: fix deadlock between peer_fle->stage and fds wake up
This patch fixes deadlock that appears on ghost DGRAM unix sockets.
Problem is that wake_connected_sockets() function *should* be called
strictly after fle->stage >= FLE_OPEN.

Explanation:
Consider situation, we have ghost unix DGRAM socket (peer socket),
and also have several sockets that connected to this peer socket.

How restore of that picture works?
In files subsystem we have open_fdinfos(pstree_item*) function that calls open_fd()
function for *every* fd of task. open_fd() function calls appropriate
file descriptor "open" handler that may return "1" which means "try again later".
This retcode means, that some additional resources is needed to fully restore file
descriptor. For *ghost* UNIX sockets, for instance, we need to have peer socket
file descriptor *before* we can open and restore client sockets. Here is the main problem.
open_fdinfos() called from separate tasks simultaneously, so, when we get "1" retcode
we stay on futex (wait_fds_event() function) and waiting for someone another task
restore some resource and notify us that we can retry opening of file descriptor.

With *ghost* UNIX socket I've managed to caught next behaviour.
1. From one task (that holds client socket) open_fdinfos() called open_fd() that called
open_unixsk_standalone(). In open_unixsk_standalone we have check that means
"if socket have peer and that peer is GHOST and that peer fle->stage < FLE_OPEN"
return "try again". Ok. So, this task will stay on wait_fds_event().

2. Second task, that holds *peer* tried to open peer socket fd. So,
it also calls open_fd() -> open_unixsk_standalone() -> opening socket
-> bind_unix_sk() -> in bind_unix_sk we have call to wake_connected_sockets().
So, after that call we will "wake up" task from first point and it may proceed
fd restoring. Yes? No. In first point we need to "peer_fle->stage >= FLE_OPEN"
but fle->stage of our peer socket will become FLE_OPEN in open_fd(). After we
return from open_unixsk_standalone we proceed to setup_and_serve_out() where we have
appropriate stage change.
Between call of wake_connected_sockets and moment when we set stage to FLE_OPEN
should pass very small amount of time. But it may happen, so we "wake up"
tasks that holds client sockets but did not have enough time to change fle->stage
to FLE_OPEN. Exactly that case I've managed to reproduce.
(Really, ossec-hids application managed to reproduce this problem at first %) )

v1: file_desc_ops->on_stage_change callback was introduced,
sk-unix ghost code reworked so that to call wake_connected_sockets() strictly
after changing fle->stage to FLE_OPEN.
v2: implementation replaced with short and more practical patch by Andrei

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn
655610e09a ci: remove hack for netns-nft zdtm test
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn
ddefbbff16 zdtm: add combined nftables/iptables netns-nft-ipt test
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn
4696e61edb zdtm: skip static/netns-nft test if nftables feature isn't supported
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Alexander Mikhalitsyn
d8821d9a8f net: skip iptables dump if it has nft backend and nft dump is supported
On modern Linux distributions iptables binaries using new nftables
API. We dump iptables rules using "iptables-save", and nftables
rules using libnftables API. This breaks network unlock on modern systems
because technically, we dump rules (including network lock rules) two times.

There is another problem - on host we can have modern distribution, but
in Docker container we can use iptables with netfilter (legacy) API.
So, in this case this legacy rules will be skipped.

This patch handles all of that cases. It tries to find iptables legacy and
dump legacy rules by using appropriate iptables binaries, dump nftables
rules by using libnftables.

Fixes #1435

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
e26949cfed lsm: handle half initialized SELinux setups
CRIU used to check for the existence of /sys/fs/selinux to see if
SELinux is enabled on a system. We have seen systems with SELinux kind
of enabled but reading out the labels gives does not return real labels.

To work around this, this commit adds a check during LSM detection
if SELinux labels are in the right format. For CRIU this check means to
see if there are at least 3 ':' in a label. If not CRIU switches to no
LSM mode.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
e2c352e4f8 tools.mk: Use Python 3 by default
As of January 1st, 2020 Python 2 is no longer supported and
many distributions no longer provide packages for Python 2
dependencies.

This patch allows CRIU to use Python 3 by default when both
major versions are available on the system.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
177e4b4bad mips: remove empty gitignore
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
22142eedf0 mips: coding style fixes
CRIU follows Linux kernel coding style. This patch updates the
architecture-specific code for MIPS to use tab indentation,
add whitespace between closing parenthesis and open bracket,
and changes the mode of source files from 755 to 644.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
zl-wang
99a6a17c2f Allow systemcfg proc file to be dumped
Currently, it cannot be check-pointed, because that type of file is on UNSUPP list.

Signed-off-by: zl-wang <zlwang@ca.ibm.com>
2021-09-03 10:31:00 -07:00
Nicolas Viennot
731cafa85c logging: pr_perror() -> pr_msg() when execvp fails in action scripts and others
When invoking an action-script, all file descriptors >= 3 are closed.
If execvp() fails, we can only log the error on stderr. pr_msg() outputs
on stderr, so we use this as opposed to pr_perror().

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2021-09-03 10:31:00 -07:00
Nicolas Viennot
24bdfa72df net: add a #define for increased compatiblity with old distributions
Debian 9 doesn't have IFLA_NEW_IFINDEX defined

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2021-09-03 10:31:00 -07:00
Nicolas Viennot
29c34386b0 restore: fix error message when fork fails
`last_pid_buf` is not where the last_pid string is, but it is in `s`.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
f10425e053 criu: end pr_(err|warn|msg|info|debug) with \n
Unlike pr_perror, pr_err and other macros do not append \n
to the message being printed, so the caller needs to take care of it.

Sometimes it was not done, so let's add this manually.

To make sure it won't happen again, add a line to Makefile under the
linter target to check for such missing \n. NOTE this check is only
done for part of such cases (where the pr_* statement fits in one line
and there's no comment after), but it's better than nothing.

Add comments after pr_msg and pr_info statements where we deliberately
don't add \n, so that the above check ignores them.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
96b7178bab Whitespace at EOL cleanup and check
My editor (vim) auto-removes whitespace at EOL for *.c and *.h files,
and I think it makes sense to have a separate commit for this, rather
than littering other commits with such changes.

To make sure this won't pile up again, add a line to Makefile under
the linter target to check for such things (so CI will fail).

This is all whitespace except an addition to Makefile.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
7ea20e8f5a criu: make sure to use pr_perror to show errno
In cases where we need to print errno, it is better to use pr_perror.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
10c619adb9 test/zdtm: pr_err / pr_perror fixes
1. Use pr_perror where errno needs to be shown.

2. Use pr_err in cases where errno is not set
   by the previous failed call.

3. Make sure pr_err's first argument do not have \n.

4. While at it, fix some error messages.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
dca0eb5b4a test/others/bers: use pr_perror
When errno is set, it makes sense to use pr_perror.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
e326889c06 criu/mount.c: fix \n in pr_debug
Funny but it used incorrect slash.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
2166d47482 scripts: fix shellcheck warnings
On my system (shellcheck v0.7.1) make lint shows a few warnings about
needing to quote variables.

Fix those.

PS I am not sure why those are not shown by GHA CI, I assume there is
different shellcheck version used. Add shellcheck -- version to the
appropriate Makefile target to avoid confusion.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
5f3631916a Makefile: amend lint with pr_perror/fail checks
In many cases developers forget that pr_perror and fail macros
are a bit special, in particular:

1. they already show errno;
2. they already append \n to the message.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
4cd23083be test/zdtm: don't pass errno to fail()
Macro fail() already prints the value of errno, so there's no need to
explicitly add it.

Found by git grep '^\s*\<fail\>.*errno'

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
12a2bd0edd test/zdtm: don't use %m with fail
Macro fail already appends errno and strerror(errno) to the error
message, so there's no need to do it explicitly.

Brought to you by

	for f in $(git grep -l fail test/zdtm); do
		test -f $f || continue
		echo $f
		sed -i '\|^[[:space:]]*fail(.*[ (]%m)*"|s/:*[ (]*%m)*//' $f
	done

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
b20694835d test/zdtm: don't use \n with fail()
Macro fail already appends \n to the message, so there's no need to do
it explicitly.

Brought to you by

	for f in $(git grep -l fail test/zdtm); do
		test -f $f || continue
		echo $f
		sed -i '\%^[[:space:]]*fail(.*\\n"%s/\\n"/"/' $f
	done

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
9cbcaaed39 test/zdtm: don't use errno for pr_perror
Macro pr_perror already adds errno and its string representation to the
error message, so there's no need to explicitly do it.

While at it, fix some error messages.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
865a5e9511 test/zdtm: don't use pr_perror where errno is unset
pr_perror should be used for cases where the failed operation sets
errno. For cases where errno is not set, pr_err is preferable.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
d55a65e939 criu: don't use errno for pr_error
There is no need to, as pr_error already adds strerror(errno)
to the error message.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Kir Kolyshkin
f3be776ccc Drop \n from pr_perror
Another pr_perror spring cleaning time!

As pr_perror adds a semicolon, an strerror(errno), and a newline,
there's no need to add one manually.

Brought to you by

	for f in $(git grep -l pr_perror); do
		test -f $f || continue
		echo $f
		sed -i '\%^[[:space:]]*pr_perror(.*\\n"%s/\\n//' $f
	done

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
5e3b07b95d test/zdtm: check that restore can handle precreated veth devices
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
f60f24bfbe kerndat: check whether IFLA_NEW_IFINDEX is supported
Send an RTM_SETLINK request with a negative IFLA_NEW_IFINDEX. If
IFLA_NEW_IFINDEX is supported, the kernel will return ERANGE.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
3ca09f5c9f ci: exclude lazy-thp for remote pages over tls
Temporarily disable this test until the #1380 is resolved.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
6c77d72265 Makefile: docker-test don't use interactive tty
Running zdtm tests does not require input and therefore it is not
necessary to use -it. This change also allows to run the test in CI
where it currently fails with:

the input device is not a TTY
make: *** [Makefile:388: docker-test] Error 1

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
27b9ed53ea Makefile: update excluded tests for docker-test
All zdtm tests pass on Fedora 33 for `make docker-build && make docker-test`
with devicemapper storage driver.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
5d8ecee0ac docker-test: use host cgroup & network ns
The test/zdtm_mount_cgroups script fails with 'permission denied'
when running tests with private cgroup namespace.

Using the host network namespace allows us to test criu as if
it is running on the host, sharing iptables rules etc.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
e3c0fa7011 Dockerfile: add missing test dependencies
This patch adds missing dependencies required to run
the zdtm tests.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
3074b6d5a2 Dockerfile: re-build criu after clean
In order to be able to run the zdtm tests inside a container,
we have to make sure that all protobuf sources have been compiled.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
f432186e73 Dockerfile: use 'git clean' before build
The 'make docker-build' command creates a copy of all files from the
in local CRIU clone inside a container.

Then it runs 'make mrproper' inside the container, followed by
compilation of criu, followed by another 'make mrproper'.

After the last mrproper command, it attempts to check if
the clean was successful by running 'git clean'.

However, this check fails when the local repository contains
files that are not part of the repository.

For example, the vscode editor creates the folder '.vscode/'
which would be copied inside the docker container and cause
'make docker-build' to fail.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Liu Hua
264b4a8d24 tiny fix on function dump_empty_fs
return value of is_empty_dir:
*  < 0 : open directory stream failed
*    0 : directory is not empty
*    1 : directory is empty

Signed-off-by: Liu Hua <weldonliu@tencent.com>
2021-09-03 10:31:00 -07:00
Christian Brauner
cdb0d42702 net: allow restoring of precreated veth devices
This introduces a new option

--external netdev[IFNAME]:ORIGNAME

which informs CRIU that this is pre-created network device that it is
supposed to move into a target network namespace. The "netdev" name was
chosen to make it flexible enough to e.g. also cover physical devices if
that is desirable at some point. For example:

--external netdev[eth0]:vethA23adf3

would instruct CRIU to move the network device with the name
"vethA23adf3" into a target network namespace renaming it to "eth0"
while doing so.

In order to restore ip addresses and additional data correctly CRIU
needs to move the network device into the target netns with the recorded
ifindex. This requires a kernel patch as discussed in [1]. The patch has
been merged into net-next and is expected to show up in the v5.13
release (cf. [2])

The motivating use-case can be found in [1]. But I'm repeating it mostly
verbatim here:

Assume a container with a standard veth tunnel for an unprivileged container:
<veth-host> <-> <bridge-host> <-> <veth-container>

When LXD starts a container it will create the veth pair in the host
namespaces with random names, let's assume:
<veth-host> := vethHOST
<veth-container> := vethCONT

The LXD generates a config for the container and tells the container to
use vethCONT as network device and usually also tells it to rename that
device to something more sensible like eth3. The container will then use
netlink to move and rename the vethCONT device into it's network
namespace as eth3 during startup.

Users may use lxc snapshot --stateful to create a CRIU dump.
And they can restore via
lxc restore --stateful <container-name> <stateful-snapshot-name>

And this is where things get hairy currently. LXD's network models
requires it to always be in control of all network devices and so
similar to regular startup it will precreate the two veth devices
vethHOST and vethCONT and tell LXC about it.

What we would like CRIU to be able to do is to add a commandline option
to tell CRIU to not bother creating the veth device but instead to
simply assume that someone else will create, move, and rename it and
instead  just restore routes, iptables, addresses and so on.

With this kernel patch applied I can successfully dump and restore a
LXD containers:

ubuntu@f2-vm:~/src/bin$ lxc launch images:alpine/edge alp1
Creating alp1
Starting alp1

ubuntu@f2-vm:~/src/bin$ lxc snapshot --stateful alp1

ubuntu@f2-vm:~/src/bin$ lxc list
+------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| NAME |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| alp1 | RUNNING | 10.47.211.144 (eth0) | fd42:8722:277d:69cf:216:3eff:fe69:9b8b (eth0) | CONTAINER | 1         |
+------+---------+----------------------+-----------------------------------------------+-----------+-----------+

ubuntu@f2-vm:~/src/bin$ lxc restore --stateful alp1 snap0

ubuntu@f2-vm:~/src/bin$ lxc list
+------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| NAME |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| alp1 | RUNNING | 10.47.211.144 (eth0) | fd42:8722:277d:69cf:216:3eff:fe69:9b8b (eth0) | CONTAINER | 1         |
+------+---------+----------------------+-----------------------------------------------+-----------+-----------+

ubuntu@f2-vm:~/src/bin$ lxc exec alp1 -- sh
~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
15: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 00:16:3e:69:9b:8b brd ff:ff:ff:ff:ff:ff
    inet 10.47.211.144/24 brd 10.47.211.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd42:8722:277d:69cf:216:3eff:fe69:9b8b/64 scope global dynamic flags 100
       valid_lft 86355sec preferred_lft 86355sec
    inet6 fe80::216:3eff:fe69:9b8b/64 scope link
       valid_lft forever preferred_lft forever

[1]: https://github.com/checkpoint-restore/criu/issues/1421
[2]: https://patchwork.kernel.org/project/netdevbpf/patch/20210406075448.203816-1-avagin@gmail.com/
Fixes: #1421
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
e3b694392d scripts/build: drop obsolete ENV1 variable
The ENV1 variable was first introduced with commit
7290de5 (travis: enable ccache for docker/qemu builds)
and it is not used anymore.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Adrian Reber
eb5726c44a images: re-license as Expat license (so-called MIT)
This changes the license of all files in the images/ directory from
GPLv2 to the Expat license (so-called MIT).

According to git the files have been authored by:

   Abhishek Dubey
   Adrian Reber
   Alexander Mikhalitsyn
   Alice Frosi
   Andrei Vagin (Andrew Vagin, Andrey Vagin)
   Cyrill Gorcunov
   Dengguangxing
   Dmitry Safonov
   Guoyun Sun
   Kirill Tkhai
   Kir Kolyshkin
   Laurent Dufour
   Michael Holzheu
   Michał Cłapiński
   Mike Rapoport
   Nicolas Viennot
   Nikita Spiridonov
   Pavel Emelianov (Pavel Emelyanov)
   Pavel Tikhomirov
   Radostin Stoyanov
   rbruno@gsd.inesc-id.pt
   Sebastian Pipping
   Stanislav Kinsburskiy
   Tycho Andersen
   Valeriy Vdovin

The Expat license (so-called MIT) can be found here:
https://opensource.org/licenses/MIT

According to that link the correct SPDX short identifier is 'MIT'.

https://spdx.org/licenses/MIT.html

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
9c18c63d2a ci: enable crit tests in CI
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
b78c4e071a test: fix crit test and extend it
This fixes the others/crit test to work again and extends it to make
sure all possible input and output options are correctly handled by
crit.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
13e6e68998 lib: also handle extra pipe data correctly
CI sometimes errors out encoding/decoding extra pipe data.

This should fix extra pipe data for Python 3 and still keep it working
on Python 2.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
bf9e502c6f lib: print nice error if crit gets wrong input
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
bf80fee4f4 lib: correctly handle stdin/stdout (Python 3)
This changes stdin to be opened as binary if the input is not a tty.

This changes stdout to be opened as binary if encoding or if the output
is not a tty.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Bui Quang Minh
9635d6496e criu: Replace faccessat with fstatat when using AT_SYMLINK_NOFOLLOW flag
Currently, alpine musl libc library returns Invalid argument error (EINVAL)
when calling faccessat with AT_SYMLINK_NOFOLLOW flag.

Fix this by using fstatat instead.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2021-09-03 10:31:00 -07:00
Bui Quang Minh
96c1351d8a criu: Throw error when parent path is provided but invalid
In image dump directory, there are 2 parent symlink error cases:
- Parent symlink does not exist
- Parent symlink exists but points to invalid target

At the moment, 2 cases are handled exactly the same (do full dump). However, while
the first case happen when parent path is not provided, the second one is likely
user's mistake when provides invalid parent path.

So we throw an error in the latter case instead of performing the full dump.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
8dc7ce3e77 cr-service: fix CRIU_REQ_TYPE__FEATURE_CHECK RPC request
Closes: #1408

CRIU_REQ_TYPE__FEATURE_CHECK was failing, this was caused by two
things in handle_feature_check():
	1. setup_opts_from_req() was used and it could be NULL
	   (kerndat_init() is enough for feature checking)
	2. resp.success was always set to false

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
b82f222d6b lib: fix crit-recode fix for Python 2
The recent fix to make Jenkins run crit-recode again broke
Python 2 support (because Python 2 based CI was not running).

This should fix the Python 2 based test run.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
228e510d21 ci: move CentOS 8 based test to Cirrus
The kernel on GitHub Actions has a bug

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1919472

which breaks our CI. It works on Cirrus. Let's move it there.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
fedor
069d92e513 Use a real VM instead of a privileged container 2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
90e03b1a11 pstree: don't change sid/gid-s if current sid/gid is the same
Previously we only skipped replacing sid for shell job if root_item was
session leader, but there is other case where root_item sid is the same
as current_sid we can safely skip replacing for this case. Same applies
to gid-s.

Now after we have pid collision check we not only "can" but should skip
pid collision checks for the latter case. It is quite obvious that
there are tasks in tree with sid==current_sid if current_sid==old_sid.

Fixes: #1400
Fixes: 77968d43c ("pstree: check for pid collision before switching to
new sid/gid")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
248b773676 lib: correctly handle padding of dump images
With the switch to Python3 and binary output it is not possible to use
code like: 'f.write('\0' * (rounded - size))'. Switching to binary
helps.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
abe3405b20 lib: fromstring() and tostring() are deprecated
fromstring() and tostring() are deprecated since Python 3.2 and have
been removed in 3.9. Both functions were just aliases and this patch
changes images.py to directly call fromybytes() and tobytes().

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
c10aae8f6e criu-ns: Merge comparisons with 'in'
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
5f59a7cc35 criu-ns: Add unsupported msg for restore-sibling
Currently criu-ns does not support the --restore-sibling option.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
7974229867 criu-ns: Handle restore-detached option
The criu-ns script creates a new PID namespace where criu is the "init"
process. When using the --restore-detached option with criu-ns, users
expect criu-ns to exit without killing the restored process tree.

Thus, criu-ns should not pass the --restore-detached to criu to prevent
it from terminating, and it should exit instead of waiting for criu's
exit status.

Resolves #1278

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
6b375ed755 criu-ns: Pass arguments to run_criu()
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
55a0557db1 criu-ns: Close namespace fd before raise
It is a good practice to close open file descriptors.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
0e024bfce1 criu-ns: Extract set namespace functions
This change extracts some of the duplicated code from
set_pidns() and set_mntns() functions.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
a80f08c2e7 criu-ns: Remove unused _umount
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
6fd59abc8f criu-ns: Use documentation strings
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
f8556f947f criu-ns: Extract wait for process into a function
Reduce duplication of code.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
a08aa44064 criu-ns: Extract mount new /proc into a function
By extracting this code into a function the main code becomes
smaller and more obvious.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
a0a02c73e7 criu-ns: Remove space before/after bracket
Avoid extraneous whitespace.
https://python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-statements

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
8f69a58e03 criu-ns: Convert indentation to spaces
Spaces are the preferred indentation method.
https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Zeyad Yasser
f3d071461f ci: run zdtm/transition/pid_reuse with pre-dumps in ci tests
This test should be run with at least 1 pre-dump to trigger the problem as mentioned in commit 4d9bf608b5.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
288adfc591 ci: remove ccache setup
ccache was set up in Travis to speed up compilation by re-using the
.ccache directory from previous CI runs. As we are no longer using
Travis we can remove all CI related ccache setup.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
2e0107ead8 ci: run recode tests on more input files
We were running crit-recode in CI only on the output of
zdtm/static/env00.

This adds zdtm/transition/fork and zdtm/static/ghost_holes00
to run through crit-recode as the image files from those test
triggered errors in Jenkins we did not see in CI.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
71013465b8 lib: fix recode errors seen in Jenkins
Although we are running crit-recode.py also in all CI runs we never seen
following error except in Jenkins:

Traceback (most recent call last):
  File "/usr/lib/python3.8/base64.py", line 510, in _input_type_check
    m = memoryview(s)
TypeError: memoryview: a bytes-like object is required, not 'str'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./test/crit-recode.py", line 25, in recode_and_check
    r_img = pycriu.images.dumps(pb)
  File "/var/lib/jenkins/workspace/Q/test/pycriu/images/images.py", line 635, in dumps
    dump(img, f)
  File "/var/lib/jenkins/workspace/Q/test/pycriu/images/images.py", line 626, in dump
    handler.dump(img['entries'], f)
  File "/var/lib/jenkins/workspace/Q/test/pycriu/images/images.py", line 289, in dump
    f.write(base64.decodebytes(item['extra']))
  File "/usr/lib/python3.8/base64.py", line 545, in decodebytes
    _input_type_check(s)
  File "/usr/lib/python3.8/base64.py", line 513, in _input_type_check
    raise TypeError(msg) from err
TypeError: expected bytes-like object, not str

This commit fixes this by encoding the string to bytes.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
c84dddf2f2 ci: remove '-Wl,-z,now' workaround
This removes extending LDFLAGS with '-Wl,-z,now'. This was added as
workaround but never really worked. It is correctly fixed with
pull request #1379

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
ed0f4608f4 lib/cli.py: Open out file as a binary
python3 fails to encode image with the following:

> [dima@Mindolluin criu]$ ./crit/crit encode -i tmp -o tmp.1
> Traceback (most recent call last):
>   File "/home/dima/src/criu/./crit/crit", line 6, in <module>
>     cli.main()
>   File "/home/dima/src/criu/crit/pycriu/cli.py", line 410, in main
>     opts["func"](opts)
>   File "/home/dima/src/criu/crit/pycriu/cli.py", line 50, in encode
>     pycriu.images.dump(img, outf(opts))
>   File "/home/dima/src/criu/crit/pycriu/images/images.py", line 617, in dump
>     f.write(struct.pack('i', magic.by_name['IMG_COMMON']))
> TypeError: write() argument must be str, not bytes

Opening the output file as binary seems to help.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
a433943a7f docker-test: set log file path
By default docker writes logs in a run-time directory unique for each
container. To be able to read this file, we can specify the path in
CRIU's configuration file for runc.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
046cad8bf0 docker-test: use containerd v1.5.0-beta.0
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
25f3780830 ci: move Travis CI Docker tests to GitHub Actions
Travis CI is no longer providing CI minutes for open source projects.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
7e6a1a7011 pstree: check for pid collision before switching to new sid/gid
Without this check we can hit the BUG in lookup_create_item just a few
steps later (if one thread in images has same pid with new sid/gid). And
also this check saves us from different sorts of unexpected errors on
restore (if one non-thread task in images has same pid/sid/gid already).

Fixes: #1332
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
bb5bad5326 test: move vt test to minor 65 on s390x
Our Jenkins s390x vt test fails with

./vt --pidfile=vt.pid --outfile=vt.out --filename=vt.test
make: *** [Makefile:432: vt.pid] Error 1
 Test zdtm/static/vt FAIL at ['make', '--no-print-directory', '-C', 'zdtm/static', 'vt.pid']
Test output: ================================
08:08:15.556:    54: ERR: vt.c:38: Open virtual terminal vt.test failed (errno = 6 (No such device or address))
08:08:15.556:    53: ERR: test.c:316: Test exited unexpectedly with code 1

Because the host has no ttyS0 as used previously. This changes the test
to use 'ttysclp0'. That seems to exist on multiple s390x we checked.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
c66ca3aa23 zdtm/fpu03: Add .desc file to omit running on !x86
I managed to forget .desc file and ppc on Jenkins now fails with:

> === Run 177/394 =======--------- zdtm/static/fpu03
> timens isn't supported on 5.8.18-100.fc31.ppc64le
> ========================== Run zdtm/static/fpu03 in h ==========================
> Start test
> ./fpu03 --pidfile=fpu03.pid --outfile=fpu03.out
> make: *** [Makefile:430: fpu03.pid] Error 1
>  Test zdtm/static/fpu03 FAIL at ['make', '--no-print-directory', '-C', 'zdtm/static', 'fpu03.pid']
> Test output: ================================
> 08:56:48.325:    56: SKIP: fpu03.c:116: Unsupported arch
> 08:56:48.325:    55: ERR: test.c:316: Test exited unexpectedly with code 0
>
>  <<< ================================

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
a87c61fe8e Revert "compel: add -ffreestanding to force gcc not to use builtin memcpy, memset"
This reverts commit c98af78c58.

Now FPU/SSE/MMX/etc can be used inside parasite.
Let's have compiler optimizations back.

Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
1bac3a64b9 s390: Purge stale comment
Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
39b7252c6f fault-injection: Run fpu corruption tests
For the thread leader and for subthreads too.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
21e3c53073 compel: Provide compel_set_task_ext_regs()
Arch-dependend way to restore extended registers set.
Use it straight-away to restore per-thread registers.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
3613b6f15f compel: Store extended registers set in the thread context
Extended registers set for task is restored with rt_sigreturn() through
prepared sigframe. For threads it's currently lost.
Preserve it inside thread context to restore on thread curing.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
7af06af10d zdtm/fpu03: Add a test to check fpu C/R in a thread
CRIU dumps main thread and sub-threads differently, so there needed
a test to check if fpu is preserved across C/R in sub-threads.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
6c879c3c83 zdtm/fpu00: Simplify ifdeffery
..to introduce a new one! ;-D

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
e2e8be37f2 x86/compel/fault-inject: Add a fault-injection for corrupting extended regset
With pseudo-random garbage, the seed is printed with pr_err().
get_task_regs() is called during seizing the task and also for each
thread.
At this moment only for x86.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
327e14933d namespaces: properly handle errors of snprintf
If snprintf was truncated we should probably know about it instead of
continuing to increase off, as snprintf returns number of characters
which would have been written and not the number which was actually
written.

Normally we check snprintf only for overflow not for error, some modern
compilers print warnings if truncation was not checked.

Probably it was the case why we implemented [1], the truncation happened
and on the next iteration of for loop we've hit negative size for
snprintf and got -1.

Fixes: 90f043dea ("namespaces: handle errors of snprintf") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
ffb848e6d9 x86: Use PTRACE_GET_THREAD_AREA instead of sys_get_thread_area()
To minimize things done in parasite, PTRACE_GET_THREAD_AREA can be
used to get remote tls. That also removes an additional compat stack
(de)allocation in the parasite (also asm-coded syscall).

In order to use PTRACE_GET_THREAD_AREA, the dumpee should be stopped.
So, let's move this from criu to compel to non-seized state and put tls
into thread info on x86.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
72dc328502 ci/compat: Check if tests are 32-bit ELFs
To be sure that we don't lose COMPAT_TEST=y on the way to make.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
10fe08c37f github/stale: separate labels with commas without following spaces
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
ff38944b98 ci: fix Fedora rawhide CI failures
It seems the Fedora rawhide /tmp is no longer 1777 but 755.

Change it back to 1777 to make our CI runs successful again.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
79b3893ecf plugin: check for plugin path truncation
New compilators print warnings if snprintf return value is not checked
for truncation. Let's make them happy.

Fixes: #1372
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
8782235600 sk-unix: check whether a socket name is NULL before printing it
criu/include/log.h:72:2: error: '%s' directive argument is null [-Werror=format-overflow=]
   72 |  print_on_level(LOG_DEBUG,     \
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   73 |          LOG_PREFIX fmt, ##__VA_ARGS__)
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu/sk-unix.c:133:2: note: in expansion of macro 'pr_debug'
  133 |  pr_debug("\t%s: ino %u peer_ino %u family %4d type %4d state %2d name %s\n",
      |  ^~~~~~~~
  CC       criu/stats.o

Fixes: #1373
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
9582a44ce1 bug: add __builtin_unreachable in BUG_ON_HANDLER
This will surpress false gcc warnings like this:
criu/stats.c:85:10: error: array subscript 4 is above array bounds
of 'struct timing[2]' [-Werror=array-bounds]
   85 |   return &rstats->timings[t];
      |          ^~~~~~~~~~~~~~~~~~~
criu/stats.c:25:16: note: while referencing 'timings'
   25 |  struct timing timings[RESTORE_TIME_NS_STATS];
      |                ^~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
4eb43dc4de test: fix test compilation on rawhide
The latest glibc has redefined SIGSTKSZ as 'sysconf (_SC_SIGSTKSZ)' and
this breaks a static char[] definition.

Hardcoding TESTSIGSTKSZ to 16384 in the test. This fixes:

 sigaltstack.c:17:13: error: variably modified 'stack_thread' at file scope
   17 | static char stack_thread[SIGSTKSZ + TEST_MSG_BUFFER_SIZE] __stack_aligned__;
      |             ^~~~~~~~~~~~
sigaltstack.c:18:13: error: variably modified 'stack_main' at file scope
   18 | static char stack_main[SIGSTKSZ + TEST_MSG_BUFFER_SIZE] __stack_aligned__;
      |             ^~~~~~~~~~

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
6f8e671351 zdtm: Add javaTests output to .gitignore
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
7b3eb03ab2 test: Reduce verbosity of mvn output
The -q option will only show errors

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
ae143161b8 javaTests: Add --file-locks option
Resolves #1370

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
56d7dbd7cd file-lock: Add space in error message
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Adrian Reber
950805bf16 ci: use runc instead of crun for podman tests
The latest podman pulls in crun instead of runc. Unfortunately crun is
not built against libcriu and does not support checkpoint/restore.

Switch back to runc.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Nicolas Viennot
719e42fe1c seccomp: initialize seccomp_mode in all cases
In parse_pid_status(), it is assumed that the seccomp field can be
missing from /proc/pid/status. When the field is missing, it is not
properly initialized, leading to bad behavior.

We initialize seccomp_mode to SECCOMP_MODE_DISABLED by default,
similarly to what is done in compel/src/lib/infect.c:parse_pid_status.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
2dc65a6360 zdtm: add second fifo_upon_unix test
This differs from the previous one by
1. using relative path instead of absolute
2. chdir() after setup

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
1f2e10771a zdtm: add fifo upon unix socket test case
Create unix socket and unlink it. Make fifo in this place.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
7c5c813661 sk-unix: rework unix_resolve_name
Use SIOCUNIXFILE ioctl approach to get socket fd opened with O_PATH. Utilise it
for detecting deletion and resolving relative name. Preserve old method as
fallback if this new IOCTL fails.

Also remove overmount_sock crfail in zdtm. With the unix_resolve_name
reworked to use SIOCUNIXFILE criu can now pass this test.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
d0308e5ecc sk-unix: make criu respect existing files while restoring ghost unix socket fd
If there are any file in place of ghost unix socket, criu will delete it at
restore while trying to recreate ghost one.

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
49889baa20 files-reg: rework strip_deleted
Moved strip_deleted to util and reworked it so other parts of criu can use
it without the need of files-reg.h

Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
129cc7fbc4 files: Don't forget on stripping deleted postfix on linked files
Otherwise we gonna accumulated "(deleted)" postfix generated by
kernel on every c/r iteration eventually overflowing PATH_MAX
which will make container undumpable.

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
3a4bffc143 ci: move coverage run to github
This also connects the coverage run to codecov.io.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
6be56e92c8 test/zdtm: check that locks are not dumped if --file-locks isn't set
If criu finds a file lock and the --file-locks option isn't set, it
stops dumping processes, resumes them and exits with an error.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
7b5e7166ec dump: dump has to fail if there is locks and --file-locks isn't set
If criu finds a file lock and the --file-locks option isn't set, it
stops dumping processes, resumes them and exits with an error.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
37c09f8904 ci: move compat tests to Github Actions
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
246c37ad3a README.md: remove unused badges; add a few new badges
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
fad9f805cf README.md: remove trailing whitespaces
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
67ce4e46c0 ci: move asan and image streamer test to github
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
anatasluo
f983a55e68 vdso: fix segmentation fault caused by char pointer array
When I compile criu with "make DEBUG=1" and run it to restore my
program, it produces a segmentation fault.

In aarch64, with compile flag "-O0", when criu executes the code in pie,
it is unable to visit the content of ARCH_VDSO_SYMBOLS. So I put these
variables into the stack.

Signed-off-by: anatasluo <luolongjuna@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
909ce55d8c Tell podman to use vfs as storage-driver
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
f4c5937cac ci: move Fedora Rawhide based tests away from Travis
This moves Fedora Rawhide based tests away from Travis. To Github
Actions for x86_64 and to Drone for aarch64.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
ed7cefe217 ci: factor out Fedora Rawhide CI setup
To run Fedora Rawhide based aarch64 containers on Drone CI our current
Dockerfile setup does not work.

This moves the package installation out of the Dockerfile into
scripts/ci/prepare-for-fedora-rawhide.sh to be usable in the Dockerfile
environment and in the Drone CI environment.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
95c4a8b400 ci: skip bpf tests on vagrant
See: https://github.com/checkpoint-restore/criu/issues/1354

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
bb2078f368 ci: upgrade vagrant and Fedora version
The updates to the latest Vagrant version and from Fedora 32 to 33.

Also using --no-tty instead of > /dev/null for vagrant up.

Also run 'dnf upgrade -y' in out vagrant VM to get the latest kernel.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
da2c83d871 ci: fix syntax error in stale.yml
The commas need to be inside of the quotes. Not on the outside.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
fc5ba7de72 zdtm: handle a case when a test vma is merged with another one
Fixes: #1346
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
d74353d771 util: zero the events pointer to avoid its double free
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
540141c7c5 namespaces: handle errors of snprintf
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
b83a1dd956 ci: also use clang for compel-host-bin
Running in an environment with clang and without gcc even installed
does not work as compel-host-bin uses HOSTCC which defaults to gcc.

If CLANG=1 is set this also sets HOSTCC to clang to actually build
compel-host-bin with clang and not with gcc.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
baad84efb2 ci: run aarch64 compile tests on Drone
Besides Travis CI Drone CI seems to be only service providing ARM based
builds. This switches the aarch64 and arm32 builds to drone.io.

Because Drone CI is running in a Docker container we cannot use 'setarch
linux32' as it requires the blocked syscall 'personality(2)'.

But Drone CI provides an 'arch: arm' which gives the same architecture
as 'setarch linux32' on Travis aarch64: armv8l

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
95df2524cc zdtm: cleanup thread-bomb test error handling and printing
1) Let's do test_init earlier so that max_nr test_msg is now visible in
thread-bomb.out.

2) Also lets check errors from malloc and pthread_...  functions, print
messages about their errors and do proper deallocation at least.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
50a96e9faf ci: move vagrant test to cirrus ci
With Travis dramatically reducing the minutes available for CI, CRIU
needs a new place to run tests. This moves the Vagrant based Fedora 32
no VDSO test cases to Cirrus CI. Cirrus CI seems to be one of the very
few free CI services allowing access to /dev/kvm.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
f04e8517ca workflows/stale: Don't close issue that has labels 'new feature' or 'enhancement' 2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
2721d865f1 fsnotify: rework redundant code
open_handle and first part of alloc_openable do the same work. Both these
function are called from check_open_handle. Rework check_open_handle to call
only alloc_openable.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
c4f176b1e6 mount: adjust log level for mnt_is_dir
mnt_is_dir is used when looking up for suitable mount point. In some cases
that function may fail several times. Error level seems to strict for this
cases.
Added error message to lookup_mnt_sdev in case all mnt_is_dir failed.
As for open_handle and alloc_openable which are calling mnt_is_dir, they are
used in check_open_handle, which will call error afterwards.

Adjusted log level for __open_mountpoint result in open_handle since it is
allowed to fail several times. open_handle caller get_mark_path expect
possible failure and will print error in case.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Andrey Zhadchenko
3fd3a376fd mount: adjust log level for get_clean_mnt
In case get_clean_mnt fails open_mountpoint is still able to resolve mounts
by helper process or print error in the worst case. Using pr_warn instead of
pr_perror.

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Valeriy Vdovin
8c53627dd8 dump: at exit do not call timing_stop if stats are not initialized
Stats often call 'get_timing' function which has a BUG() assertion to
catch cases when stats failed to initialize correctly.
If stats haven't initialized yet assertion will also be triggered.
We dont want the trigger to happen in a case when criu fails at early
steps before initializing stats, but this can happen in the following
case:
  - at cr_dump_tasks criu can catch error before the call to init_stats.
  - it then decides to gracefully quit with error and
    calls cr_dump_finish.
  - cr_dump_finish will call timing_stop -> get_timing
    and BUG() gets triggered

But because criu is already quitting gracefully, BUG() is not needed.
In this code path we can call timing_stop under proper condition.

[avagin: rebase on top of criu-dev and a few minor changes]

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
c405a01166 coverity: get_service_fd() is passed to a parameter that cannot be negative
criu/fdstore.c:110: negative_return_fn: Function "get_service_fd(FDSTORE_SK_OFF)" returns a negative number.
criu/fdstore.c:110: assign: Assigning: "sk" = "get_service_fd(FDSTORE_SK_OFF)".
criu/fdstore.c:114: negative_returns: "sk" is passed to a parameter that cannot be negative.

criu/namespaces.c:1366: negative_return_fn: Function "get_service_fd(USERNSD_SK)" returns a negative number.
criu/namespaces.c:1366: assign: Assigning: "sk" = "get_service_fd(USERNSD_SK)".
criu/namespaces.c:1389: negative_returns: "sk" is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
6ff51fd8d3 restore: warning: Value stored to 'ret' is never read
criu/cr-restore.c:3230:3: note: Value stored to 'ret' is never read
                 ret = false;
                 ^     ~~~~~
  3228|   	if (n == -1) {
  3229|   		pr_perror("Failed to get number of supplementary groups");
  3230|-> 		ret = false;
  3231|   	}
  3232|   	if (n != n_groups)

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
0bb3d85863 memfd: use PROC_SELF instead of getpid in __open_proc
This looks better for me, should be no functional change.

Another implication of this is nested pid namespaces, when we will
support them "__open_proc(getpid()...)" will try to open file of the
process which has the same pid but in NS_ROOT pidns, which is bad.

See also aa2d92082 ("files: use PROC_SELF when a process accesses its
/proc/PID") for a similar change.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
34024dfdcb util: move open_proc_self_fd to service_fd
We need this to avoid conflicts with file descriptors which has to be
restored. Currently open_proc PROC_SELF is not used during restoring
file descriptors, but we are going to use it for memfd restore.

Note: in get_proc_fd let's not close service fd if we detect service fd
is not ours, it will be replaced in open_pid_proc anyway, and e.g. in
protected sfd context we can't close or open sfd, but can replace it
without any problems.

While on it also add FIXME because the check in get_proc_fd is error
prone in some rare cases (nested pidns is not supported yet).

We need to populate this new SELF service fd in populate_pid_proc, so
that it is later available in protected context. Also don't close
/proc/self service fd in prep_unix_sk_cwd as it can be called from
protected context and there is no much point of closing it anyway.

Close proc self servicefd in close_old_fds, because first we don't wan't
to reuse it from some ancestor, second there can be some junk fd as we
are yet only in the beginning of close_old_fds. This junk fd can come
from service fds of other tasks from parent's shared fdtable, and this
fd would not allow us to do opendir_proc(PROC_SELF).

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
068672f39d servicefd: don't move service fds in case they remain in the same place
Improve the check to skip moving service fds in clone_service_fd because
we don't need to move anything if old service fds resulting offset is
the same as new service fds resulting offset. It saves us from excess
calls to dup/fcntl(F_DUPFD).

Currently we check that base is the same and shared fdt ids are the
same, but there also can be situations where different bases with
different shared fdt ids still give the same offset sum (running zdtm in
Virtuozzo CT we've seen such a case where service_fd_base=512,
new_base=128, service_fd_id=24, id=0, SERVICE_FD_MAX=16).

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
5364ca3da4 compel/test: Fix warn_unused_result
gcc -O2 -g -Wall -Werror -I ../../../compel/include/uapi -o spy spy.c ../../../compel/libcompel.a
spy.c: In function ‘check_pipe_ends’:
spy.c:107:2: error: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Werror=unused-result]
  107 |  write(wfd, "1", 2);
      |  ^~~~~~~~~~~~~~~~~~
spy.c:108:2: error: ignoring return value of ‘read’, declared with attribute warn_unused_result [-Werror=unused-result]
  108 |  read(rfd, aux, sizeof(aux));
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
8aba7ae9fa compel: Fix missing loff_t in Alpine
musl defines 'loff_t' in fcntl.h as 'off_t'.
This patch resolves the following error when running the compel tests
on Alpine Linux:

gcc -O2 -g -Wall -Werror -c -Wstrict-prototypes -fno-stack-protector -nostdlib -fomit-frame-pointer -ffreestanding -fpie -I ../../../compel/include/uapi -o parasite.o parasite.c
In file included from ../../../compel/include/uapi/compel/plugins/std/syscall.h:8,
                 from ../../../compel/include/uapi/compel/plugins/std.h:5,
                 from parasite.c:3:
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:19:66: error: unknown type name 'loff_t'; did you mean 'off_t'?
   19 | extern long sys_pread (unsigned int fd, char *buf, size_t count, loff_t pos) ;
      |                                                                  ^~~~~~
      |                                                                  off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:46: error: unknown type name 'loff_t'; did you mean 'off_t'?
   96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
      |                                              ^~~~~~
      |                                              off_t
../../../compel/include/uapi/compel/plugins/std/syscall-64.h:96:61: error: unknown type name 'loff_t'; did you mean 'off_t'?
   96 | extern long sys_fallocate (int fd, int mode, loff_t offset, loff_t len) ;
      |                                                             ^~~~~~
      |                                                             off_t
make[1]: *** [Makefile:32: parasite.o] Error 1

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
cffbeffed6 ci: Enable compel testing
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
fbb21b4041 compel/test: Add main makefile
These changes enable running all compel tests with a single
command from the root path of the repository:

	# sudo make -C compel/test

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
ae686848b2 compel/test: Resolve missing includes
Resolves #1333

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Andrei Vagin
c7544894fe dump/ppc64,arm,mips: sanitize the ERESTART_RESTARTBLOCK -> EINTR transition
1. The -ERESTART_RESTARTBLOCK case in get_task_regs() depends on kernel
   internals too much, and for no reason. We shouldn't rely on fact that
   a) we are going to do sigreturn() and b) restore_sigcontext() always
   sets restart_block->fn = do_no_restart_syscall which returns -EINTR.

   Just change this code to enforce -EINTR after restore, this is what
   we actually want until we teach criu to handle ERESTART_RESTARTBLOCK.

2. Add pr_warn() to make the potential bug-reports more understandable,
   a sane application should handle -EINTR correctly but this is not
   always the case.

Fixes: #1325
Report-by: Mr Travis
Inspired-by: dd71cca58a ("dump/x86: sanitize the ERESTART_RESTARTBLOCK -> EINTR transition")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
0cbfba7780 github: auto-close stale issues and pull requests
This is a copy of stale.yml from the Podman repository to automatically
close tickets after 365 days. After 30 days tickets and issues are
marked as stale and after 365 tickets and issues are finally closed.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
fabd5be384 zdtm: look up iptables in /sbin and /usr/sbin
On Ubuntu 20.04, iptables is in /usr/sbin/.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Andrei Vagin
797f41e8aa test/zdtm_ct: Run zdtm.py in the host time namespace
Before the 5.11 kernel, there is a known issue.

start_time in /proc/pid/stat is printed in the host time namespace,
but /proc/uptime is shown in a current namespace, but criu compares them
to detect when a new task has reused one of old pids.

Fixes: #1266

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
f736b8750e ci: Alpine's busybox based free does not understand -h
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
d2ed60b60a namespaces: don't set rst on error in switch_ns_by_fd
This is just a small cleanup, no behaviour change.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
94fb7c36a1 ci: move alpine based tests to github actions
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
a28947bb82 ci: give an overview of the current CI environment
As CRIU is using multiple different CI systems this adds a printout to
each CI run about the CI environment for easier debugging of possible
errors.

Also use V=1 to build CRIU and the tests to easily see which compiler
and which options are used.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
70088b66c1 ci: add Circle CI definition
Circle CI provides bare metal test systems which are a very good
environment for the CRIU test cases. This adds two CI runs on Circle CI.

On Circle CI it is necessary to tell clang to use '-Wl,-z,now', because
gcc has it hard-coded in Ubuntu and clang does not.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
a719a2f49f CONTRIBUTING.md: add component prefix to the subject example
If one will do "git log --oneline", it is quite easy to see that we
don't begin subject lines with a capital letter. We start subjects with
component prefixes where components (mostly) are lowercase. So let's fix
it in contribution guide not to mislead newcomers.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
adfec67c08 .gitignore: Remove qemu-user-static
Not a thing since commit fe668075ad ("travis: switch pcp64le and s390x
to real hardware").

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
82bddc4b20 scripts/Docerfile.centos8: Use 'powertools' repo name
See https://bugs.centos.org/view.php?id=17920

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
anatasluo
898329b302 x86/asm: fix compile error in bitops.h
Build on Ubuntu 18.04 amd64 with command "make DEBUG=1" produces the following error:

include/common/asm/bitops.h: Assembler messages:
include/common/asm/bitops.h:71: Error: incorrect register `%edx' used with `q' suffix

Signed-off-by: anatasluo <luolongjuna@gmail.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
371d9c83d7 others/ns_ext: restore a process out of PID namespaces into the host PID namespace
It is quiet a common case to move the process from one pidns to another
existing pidns with criu (only restriction is pids should not
intersect). Let's check it works.

v2: - use pipe-s to synchronize processes
    - run the test in a pid namespace to avoid pid conflicts in the host
      pid namespace
    - grep errors in restore.log
    - add some more comments
v3: use 1000 for ns_last_pid so that test works on systems with low
pid_max; remove excess ';'s.

Co-Developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
eb9ed1aafa cr-restore: setup external pidns only for root task
All other tasks will inherit, let's remove excess steps. While on it
also add some info message about external pidns used.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

---
v2: add new external_pidns variable to sinchronize all uses of external
pidns case
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
c5064eda12 namespaces: make root_ns_mask more consistent
I) Make root_ns_mask always display namespaces of root task which are
different from criu ones. All this play with temporary unsetting it
makes this variable hard to understand (more over it is not in shared
memory).

II) Disable "INIT_PID + pidns is dumped" check for external pidns
explicitly.

III) On dump we should check that pidns of root task is external, not
just any pidns is external (in case in future we would support nested
pidns-es it would be wrong). That also allows us to use regular
lookup_ns_by_id search.

IV) On error when killing tasks we should kill only root task if it is
an init of pidns. Previousely we had CLONE_NEWPID set in root_ns_mask
for external pidns but root task was not init and we killed only root
task on error cleanup.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
c629525cad cr-restore: make CLONE_NEWPID flag in clone_flags more consistent
I) Let's make CLONE_NEWPID in rsti(item)->clone_flags always only mean
that we need to clone with CLONE_NEWPID to create this task. Only pidns
reaper would have this flag because only they can be restored via clone
with CLONE_NEWPID flag. If we have non pidns reaper root task but its
pidns is different from criu this can be only restored into external
pidns.

II) Let's remove clone_flags variable from fork_with_pid as it does not
actually needed now:

clone_flags was introduced to be able to restore into external pidns

rsti(item)->clone_flags is determined in prepare_pstree_kobj_ids and
before (I) it means 1) parent has different namespace from item or 2)
item is root task and criu has different namespace from it. We don't
support nested pid namespaces so (1) is always false. And for (2) we
have two cases a) pid == INIT_PID - when it is not possible to restore
into external pidns b) pid != INIT_PID - when it is only possible to
restore into external pidns.

For (b) we previousely had CLONE_NEWPID flag in rsti(item)->clone_flags
and to workaround it we've added this extra clone_flags variable, but I
think it is not needed because we can simply remove CLONE_NEWPID from
non-reaper processes initially.

Also the code with removing CLONE_NEWPID from clone_flags and adding the
same flag to rsti(item)->clone_flags is super strange because I don't
see any other place where we later can use rsti(item)->clone_flags.

III) Also don't print differen flags in "Forking task with ..." from
which we actually use in clone.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
98fbb766d6 compel/handle-elf: override unexpected precalculated addresses
We've seen addresses in parasite.built-in.o precalculated by linker but
in some unexpected manner:

readelf -WS criu/pie/parasite.built-in.o
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 1] .text             PROGBITS        0000000000000000 000040 00400a 00  AX  0   0 16
  [87] .data             PROGBITS        0000000000000000 005000 000068 00  WA  0   0 4096
  [88] .rodata           PROGBITS        0000000000000080 005080 001016 00   A  0   0 32

(Notes: All other sections does not have SHF_ALLOC or are of size 0, so I
skip them. Need to add "-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1" to
CFLAGS to reproduce.)

Section 88 has address 0x80 in elf file but in our "consequent"
addresses precalculation algorithm it should be at 0x5080:

  addr(.text) == 0x0
  addr(.data) == 0x400a + (0x1000 - 0x400a % 0x1000) + 0x68 == 0x5068
  addr(.rodata) == 0x5068 + (0x20 - 0x5068 % 0x20) == 0x5080

Probably the linker advises us to move 4096 aligned section to the
beginning to save some space, but it's just a guess.

So probably we should be ready to "non-consequent" alignments
precalculated and just override them.

https://github.com/checkpoint-restore/criu/issues/1301

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Tim Gates
6a7bb0b9f6 docs: fix simple typo, clietn -> client
There is a small typo in test/zdtm/static/socket_aio.c, test/zdtm/static/socket_listen.c, test/zdtm/static/socket_listen4v6.c, test/zdtm/static/socket_listen6.c, test/zdtm/static/socket_udp-corked.c, test/zdtm/static/socket_udp.c, test/zdtm/static/socket_udplite.c.

Should read `client` rather than `clietn`.

Signed-off-by: Tim Gates <tim.gates@iress.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
b023f0ab5a vim: remove wrong 8-space tabs indent from python files
Probably all vim users can setup their desired indent in their vimrc by
themselfs.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
2c89954cc6 zdtm: on fail with no error also print the tail of the log
We see strange cases there page-server or lazy-pages are exiting with
non-zero but print no errors, probably the tail of the log can help us
to understand what happened.

There are some other uses of grep_errors but let's only change cases
where we explicitly through an exeption on bad ret. For others I'm not
sure if we need extra output, e.g. for validly failing fault injections.

To debug #1280

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Mike Rapoport
9bdae79d0a uffd: check for exited task when reading uffd_msg
Sometimes there are uffd messages in a queue of a dying task and by the
time these messages are processed in handle_request, the uffd is no
longer valid and reading from it causes errors.

Add processing of EBADF in handle_uffd_event() to gracefully handle such
situation.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2021-09-03 10:31:00 -07:00
Pavel Tikhomirov
3b22021513 uffd: cleanup read error handling in handle_uffd_event
We can't use errno in case read returned >=0 according to man.

Found in scope of #1277

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
8ca4d6e5b0 cr-restore: Properly inspect status in sigchld_process()
Currently the code checks for SIGSTOP only if (!current).
Let's provide better status checks for debug-ability.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
00bd72f325 ci: remove special handling for mips
For the schedule daily special definitions were needed for MIPS as it is
not part of the release branch. Now that the release branch contains
MIPS, it is no longer necessary to have separate files for MIPS.

This also changes to make the scheduled runs actually daily and not
hourly.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
2d68627dc9 CI: remove centos7 from Travis
It is running on GitHub Actions

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
5bb4406e94 ci: use graviton2 for arm64 tests on Travis
Using travis-ci.com instead of travis-ci.org offers access to bare metal
aarch64 based systems and thus enabling us to run the full CRIU CI test
suite.

Switch arm64 based tests to arm64-graviton2 for tests.

This is the first non x86_64 architecture running tests and not just
compile in Travis.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
fb21643b28 tls: Add logging within send/recv callbacks
Log messages showing the send/recv errno value would
help us to debug issues such as #1280.

Example:
    Error (criu/tls.c:321): tls: Pull callback recv failed: Connection reset by peer'
    Error (criu/tls.c:147): tls: Failed receiving data: Error in the pull function.'
    Error (criu/page-xfer.c:1225): page-xfer: Can't read pagemap from socket: I/O error"

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
b28eb7b2d1 compel/log: Provide %u specifier parsing
%u is quite common and I remember there were workarounds to print
(unsigned long) as long or whatever.
Just support it from now - it's not hard and not much code.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
c39ed518f0 compel/log: Stop parsing at unknown format specifier
Currently if the specifier can't be parsed - error message is printed
and parsing of the format string continues. That's wrong as the argument
for the specifier will be used for the next specifier. I.e:
  pr_info("[%zu]`%s`\n", 0UL, "")
will crash PIE because %u is not known and the argument (0UL) will be
used for dereferencing string for %s.

Stop parsing printf position arguments at an unknown specifier.
Make this string visible so that `grep Error` in zdtm.py will catch it:

=[log]=> dump/zdtm/static/busyloop00/52/1/restore.log
------------------------ grep Error ------------------------
b'(00.001847) pie: 52: vdso: ['
b'Error: Unknown printf format %u'
------------------------ ERROR OVER ------------------------
Send the 15 signal to  52
Wait for zdtm/static/busyloop00(52) to die for 0.100000
======================= Test zdtm/static/busyloop00 PASS =======================

Reported-by: @ashwani29
Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Dmitry Safonov
b93fe2b2d6 vdso: Let zero-terminator in vdso_symbol_length
When vdso symbol is copied, it should be zero-terminated.
The logging code wants to print vdso names that differ
between vdso from images and vdso that's provided by kernel:

: pr_info("[%zu]`%s` offset differs: %lx != %lx (rt)\n",
:		i, sym_name, sym_offset, rt_sym_offset);

In unlikely event when vdso function name is longer than 32
(not any currently), null-terminator is missing.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
528ce25987 uffd: handle xrealloc() failure
In the case, that xrealloc() fails do not overwrite the original pointer
to be able to free the original pointer on exit.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
56a70ff99b uffd: fix 'double free detected in tcache 2'
One of the previous static code analyzer fixes added a xfree() at the
end of cr_lazy_pages(). It can, however, happen that during
complete_forks() the memory location for events is moved by xrealloc()
and the final xfree() will be done on the wrong address.

Passing &events to handle_requests() enables the xfree() to free the
correct and changed memory location.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
7db0c7c02b ci: add CentOS 8 based CI run
Our CentOS based CI run is based on CentOS 7. CentOS 8 exists already
for some time and CentOS 7 will probably go end of life at some point.

This adds a CentOS 8 based CI run to be prepared for the time CentOS 7
goes away.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
b0676302fb ci: switch centos7 to github actions
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
247523c0cf travis: rename centos test to centos7
Because it is actually running on CentOS 7 and to easier distinguish it
from the new CentOS 8 test.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
b6e4dae22e criu-ns: Remove unreachable statement
Raising an exception breaks out of the normal
flow of control of a code block. When an exception
is not handled, the interpreter terminates execution
of the program, or returns to its interactive main loop.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Mike Rapoport
ebea8f560f ci: fix lazy-pages test selection
The special characters in the test selection regexp should no be esaped
for the regexp to work properly.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
20a83e77c9 ci: 'fix' lazy tests
Most (all?) lazy tests are not being executed if "$KERN_MAJ" -ge "4" and
"$KERN_MIN" -ge "18". Currently most CI systems are running on something
with 5.4.x which means $KERN_MAJ is greater than 4 but $KERN_MIN is less
than 18 and so we are not running any lazy tests.

This commit removes the complete lazy test kernel version detection as
kernels on the CI systems are new enough to always have all required
features.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
1ecaee67a5 namespaces: fix 'Declaring variable "path" without initializer'
criu/namespaces.c:529: var_decl: Declaring variable "path" without initializer.
criu/namespaces.c:602: uninit_use_in_call: Using uninitialized value "*path" as argument to "%s" when calling "print_on_level".

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
097c931ed3 coverity: img_raw_fd() returns a negative number
criu/pagemap.c:245: negative_return_fn: Function "img_raw_fd(pr->pi)" returns a negative number.
criu/pagemap.c:245: assign: Assigning: "fd" = "img_raw_fd(pr->pi)".
criu/pagemap.c:258: negative_returns: "fd" is passed to a parameter that cannot be negative.

criu/ipc_ns.c:762: negative_return_fn: Function "img_raw_fd(img)" returns a negative number.
criu/ipc_ns.c:762: assign: Assigning: "ifd" = "img_raw_fd(img)".
criu/ipc_ns.c:768: negative_returns: "ifd" is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
04d7b71570 sk-unix: ignore coverity chroot() warning
criu/sk-unix.c:1173: chroot_call: Calling chroot: "chroot(".")".
criu/sk-unix.c:1175: chroot: Calling function "close_safe" after chroot() but before calling chdir("/").

criu/sk-unix.c:1251: chroot_call: Calling chroot: "chroot(".")".
criu/sk-unix.c:1263: chroot: Calling function "print_on_level" after chroot() but before calling chdir("/").

Coverity also says:

175312, 175313 Insecure chroot

If a call to chroot is not followed by a call to chdir("/") the chroot jail confinement can be violated.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
cfeb9c10ff cr-dump: get_service_fd() is passed to a parameter that cannot be negative
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
ed905a002a util: fix double_close false positive
Coverity does not understand how close_fds() works.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
b47cb05391 dump: Potential leak of memory pointed to by 'si'
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
def84b8ef5 coverity: fix parameter_hidden: declaration hides parameter
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
c98eb0384b restore: Value stored to 'ret' is never read
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
8e5acdd2d0 cr-dump: Potential leak of memory pointed to by 'si'
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
cf4fe1fa1c vdso-compat: let coverity know that the function does not return
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
cfcc0b14a6 coverity: ignore CHECKED_RETURN
Ignore coverity errors about CHECKED_RETURN.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
d0db532979 autofs: Potential leak of memory pointed to by 'token'
Using strsep() moves the pointer of the original string and this
introduces a copy of the malloc()ed memory to be able to free() it
later.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
9b1921fb70 sk-unix: do not overwrite function parameter
The function collect_one_unixsk() has a parameter 'i' and at the same
time has a variable, in a loop, with the name 'i'.

This is no real error or problem, because the function parameter 'i' is
never used in the whole function.

Just trying to reduce confusion and making a code checker happy.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
1d403eb18a Use 'is None' instead of '== None'
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
820525fe8d bfd: remove unused line
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
a029868048 coredump: remove two unused variables
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
1543527bf9 lib/py: remove unused variable
Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
7db0bb69e7 infect: initialize struct to avoid CLANG_WARNING
Using scan-build there is a warning about

 infect.c:231:17: warning: The left operand of '!=' is a garbage value
                 if (ss->state != 'Z') {

which is a false positive as every process will have a 'Status' field,
but initializing the structure makes the clang analyzer silent.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Adrian Reber
ee048e1489 lock: disable clang_analyzer for the LOCK_BUG_ON() macro
The clang analyzer, scan-build, cannot correctly handle the
LOCK_BUG_ON() macro. At multiple places there is the following warning:

  Error: CLANG_WARNING:
    criu/pie/restorer.c:1221:4: warning: Dereference of null pointer

  include/common/lock.h:14:35: note: expanded from macro 'LOCK_BUG_ON'
               *(volatile unsigned long *)NULL = 0xdead0000 + __LINE__
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~

This just disable the clang analyzer for the LOCK_BUG_ON() macro.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-09-03 10:31:00 -07:00
Michał Cłapiński
70c8c12c64 compel: don't mmap parasite as RWX
Some kernels have W^X mitigation, which means they won't execute memory
blocks if that memory block is also writable or ever was writable. This
patch enables CRIU to run on such kernels.

1. Align .data section to a page.
2. mmap a memory block for parasite as RX.
3. mprotect everything after .text as RW.

Signed-off-by: Michał Cłapiński <mclapinski@google.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
6edcef7406 cr-restore: Wait child & reap zombies if PID=1
When criu restore runs as PID=1 it has an additional responsibility to
reap zombie processes.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
4381043a7f criu-ns: Use PID 1 on restore
criu-ns performs double fork, which results in criu restore
using PID=2. Thus, if a user is trying to restore a process
with that PID, the restore will fail.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
b2232f7f7a criu-ns: Convert c_char_p strings to bytes object
class ctypes.c_char_p
    Represents the C char * datatype when it points to a zero-
    terminated string. For a general character pointer that may
    also point to binary data, POINTER(c_char) must be used.
    The constructor accepts an integer address, or a bytes object.

https://docs.python.org/3/library/ctypes.html#ctypes.c_char_p

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
d16033658f criu-ns: Print usage info when no args provided
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
26371e56f0 criu-ns: Convert to python3 style print() syntax
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2021-09-03 10:31:00 -07:00
Radostin Stoyanov
72ca9673de python: Replace xrange with range
In Py2 `range` returns a list and `xrange` creates a sequence object
that evaluates lazily. In Py3 `range` is equivalent to `xrange` in Py2.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-09-03 10:31:00 -07:00
Pavel Emelyanov
2598f64fa9 crns.py: New attempt to have --unshare option
So, here's the enhanced version of the first try.

Changes are:

1. The wrapper name is criu-ns instead of crns.py
2. The CLI is absolutely the same as for criu, since the script
   re-execl-s criu binary. E.g.
	   scripts/criu-ns dump -t 1234 ...
   just works
3. Caller doesn't need to care about substituting CLI options,
   instead, the scripts analyzes the command line and
   a) replaces -t|--tree argument with virtual pid __if__ the
      target task lives in another pidns
   b) keeps the current cwd (and root) __if__ switches to another
      mntns. A limitation applies here -- cwd path should be the
      same in target ns, no "smart path mapping" is performed. So
      this script is for now only useful for mntns clones (which
      is our main goal at the moment).

Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Looks-good-to: Andrey Vagin <avagin@openvz.org>
2021-09-03 10:31:00 -07:00
Adrian Reber
0d691acbae CI: distribute CI jobs between CI systems
Move podman, openj9, x86_64 tests from Travis to GitHub Actions.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-11-08 14:04:12 -08:00
Adrian Reber
e7cbeddff3 CI: rename 'travis' to 'ci'
CRIU is already using multiple CI systems and not just Travis. This
renames all Travis related things to 'ci' to show it is actually
independent of Travis.

Just a simple rename.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-11-08 14:04:12 -08:00
Andrei Vagin
f68da4a86f criu: Version 3.15
This is yet another big release with many new features in it:

* Introduced criu-image-streamer
* Added MIPS support.
* Allow checkpointing out of existing PID namespace and
  restoring into existing PID namespace.
* Added additional file validation mechanisms
* Added support to checkpoint and restore BPF hash maps and array maps.
* Initial cgroup2 support

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-11-03 08:31:28 -08:00
Adrian Reber
5a655e890a travis: install gzip and redhat-rpm-config for Fedora Rawhide based tests
Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Mike Rapoport
1062cc4fed x86/asm: update test_bit() and test_and_set_bit()
Build on Fedora Core 33 produces the following warnings:

include/common/asm/bitops.h: Assembler messages:
include/common/asm/bitops.h:37: Warning: no instruction mnemonic suffix given and no register operands; using default for `bt'
include/common/asm/bitops.h: Assembler messages:
include/common/asm/bitops.h:63: Warning: no instruction mnemonic suffix given and no register operands; using default for `bts'

Update test_bit() and test_and_set_bit() implementation with recent
version from the Linux kernel to fix the warning.

Fixes #1217
Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Andrey Zhadchenko
c7726b7f35 zdtm: add alternative socket filter
A little rework of sock_filter test to be able to use it with different
filters

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Andrey Zhadchenko
5c4cc46fdc sockets: fix incorrect malloc size
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Wojciech Marczenko
749eb33a92 compel: Calculate sh_addr if not provided by linker
GNU ld precalculates this information but lld does not. With this
change, handle-elf.c calculates those addresses on its own.

When calculating addresses sections with SHF_ALLOC bit are put one after
another, respecting their alignment requirements. This matches the way
how the blob is constructed by copying section contents.

Signed-off-by: Wojciech Marczenko <marczenko@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
867dd27c96 util: Improper use of negative value (NEGATIVE_RETURNS)
CID 73358 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
 sk is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
16aea4a7c9 mount: Explicit null dereferenced (FORWARD_NULL)
CID 181217 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
 Passing null pointer mntns to mntns_get_root_fd, which dereferences it.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
5f0674075e util: Improper use of negative value (NEGATIVE_RETURNS)
CID 192968 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
 dup(fd) is passed to a parameter that cannot be negative. [show details]

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
ca7a832b5e page-xfer: Argument cannot be negative (NEGATIVE_RETURNS)
CID 73358 (#2 of 2): Argument cannot be negative (NEGATIVE_RETURNS)
 sk is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
f0e48be482 sk-netlink: Argument cannot be negative (NEGATIVE_RETURNS)
CID 73378 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 sk is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
4e42278715 kerndat: Argument cannot be negative (NEGATIVE_RETURNS)
CID 92720 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 pfd is passed to a parameter that cannot be negative.

CID 92747 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 pfd is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
50dbcadf03 net: Argument cannot be negative (NEGATIVE_RETURNS)
CID 178391 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 sk is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
4a77e0456c net: Argument cannot be negative (NEGATIVE_RETURNS)
CID 192961 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 sockfd is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
5631e9dca7 action-scripts: Improper use of negative value (NEGATIVE_RETURNS)
CID 192963 (#1 of 1): Improper use of negative value (NEGATIVE_RETURNS)
 dup(sk) is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
0f4b053c18 cr-dump: Resource leak (RESOURCE_LEAK)
CID 226477 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable fd_dir going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
7e4f50e747 irmap: Double close (USE_AFTER_FREE)
CID 226478 (#1 of 2): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle fd which has already been closed.

CID 226478 (#2 of 2): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle fd which has already been closed.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
2ed16451b0 proc_parse: Copy into fixed size buffer (STRING_OVERFLOW)
CID 226480 (#1 of 1): Copy into fixed size buffer (STRING_OVERFLOW)
 You might overrun the 4096-character fixed-size string root_link.name by copying new->root without checking the length.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
8645648235 autofs: Resource leak (RESOURCE_LEAK)
CID 226482 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable path going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
90369437f9 cgroup-props: Resource leak (RESOURCE_LEAK)
CID 226483 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable p going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
2a4c4bf2fc filesystem: Double close (USE_AFTER_FREE)
CID 226484 (#1 of 1): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle fd which has already been closed.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
38246bf554 uffd: Resource leak (RESOURCE_LEAK)
CID 226485 (#1 of 3): Resource leak (RESOURCE_LEAK)
 Variable events going out of scope leaks the storage it points to

CID 226485 (#2 of 3): Resource leak (RESOURCE_LEAK)
 Variable events going out of scope leaks the storage it points to

CID 226485 (#3 of 3): Resource leak (RESOURCE_LEAK)
 Variable events going out of scope leaks the storage it points to

Also changed epoll_prepare() to check return value of epoll_create()
against '< 0' instead if '== -1' to make coverity happy.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
03d66390b0 mount: Resource leak (RESOURCE_LEAK)
CID 226486 (#1 of 2): Resource leak (RESOURCE_LEAK)
 Variable mi going out of scope leaks the storage it points to.

CID 226486 (#2 of 2): Resource leak (RESOURCE_LEAK)
 Variable mi going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
c1ab1a19e1 pagemap: Logically dead code (DEADCODE)
CID 302711 (#1 of 1): Logically dead code (DEADCODE)
 Execution cannot reach the expression pr->io_complete inside this statement: if (ret == 0 && pr->io_comp....

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
4e7e26b702 files-reg: Resource leak (RESOURCE_LEAK)
CID 302712 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable build_id going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
da5a4d6e57 cgroup: Resource leak (RESOURCE_LEAK)
CID 302714 (#1 of 1): Resource leak (RESOURCE_LEAK)
 Variable dirnew going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
af569ac017 pagemap: Argument cannot be negative (NEGATIVE_RETURNS)
CID 302715 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)
 fd is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
19365c1e68 cgroup: Resource leak (RESOURCE_LEAK)
CID 302717 (#2 of 2): Resource leak (RESOURCE_LEAK)
 Variable dirnew going out of scope leaks the storage it points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
b893034335 img-streamer: Double close (USE_AFTER_FREE)
CID 302718 (#1 of 1): Double close (USE_AFTER_FREE)
 Calling close(int) closes handle sockfd which has already been closed.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
59010ad6d5 net: Argument cannot be negative (NEGATIVE_RETURNS)
CID 302719 (#1 of 1): Argument cannot be negative (NEGATIVE_RETURNS)

 img_raw_fd(img) is passed to a parameter that cannot be negative.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
82cd3bb0d0 zdtm: update and refactor tests for BPF array and hash maps
This commit achieves the following:
a) Refactors ZDTM tests bpf_array.c and bpf_hash.c to make use of the
BPF ZDTM library functions. In addition, these tests now verify whether
information obtained from both procfs and BPF_OBJ_GET_INFO_BY_FD are
the same before and after c/r.
b) Updates ZDTM tests bpf_array.c and bpf_hash.c to include a BPF map's
name and also to freeze maps

Source files modified:

* zdtm/static/bpf_array.c
* zdtm/static/bpf_hash.c

Source files added:

* zdtm/static/bpf_array.desc
* zdtm/static/bpf_hash.desc

Note: ${test_name}.desc files have the 'suid' flag set because
BPF_MAP_FREEZE requires the global (root-userns) CAP_SYS_ADMIN or
CAP_BPF. Hence, only test flavors 'h' and 'ns' are executed ('uns'
is skipped) because BPF_MAP_FREEZE can't be used from non-root user
namespaces.

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
8301c7e012 criu: adding support for BPF map name, ifindex and freeze
This commit achieves the following:
a) C/R of a BPF map's name as well as ifindex (index of the network
interface to which the map is attached). This information is not
available from procfs and therefore has to be obtained using the
bpf() system call with BPF_OBJ_GET_INFO_BY_FD.
b) Adds support for frozen maps - during the restore operation, CRIU
now freezes a BPF map that was frozen during checkpoint.

Source files modified:

* bpfmap.c

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
02f7e3434d images: adding support for BPF map file name and ifindex
This commit adds a BPF map's name and ifindex to its protobuf image.
ifindex is the index of the network interface to which the BPF map is
attached and can be specified via a parameter while creating the BPF
map (BPF_MAP_CREATE). This commit also provides a default value of
false to the field 'frozen'.

Source files modified:

* images/bpfmap-file.proto

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
97ab725db1 zdtm: adding BPF helper functions as a new library
This commit adds BPF helper functions needed by tests in a new library.
It defines new functions that allow verifying BPF map meta-data from
the procfs as well as using the bpf() system call with
BPF_OBJ_GET_INFO_BY_FD. It is necessary to verify from procfs and using
BPF_OBJ_GET_INFO_BY_FD because the information available from both
these places is disjoint (for example, checking whether a map is frozen
cannot be performed with BPF_OBJ_GET_INFO_BY_FD).

Source files modified:

* test/zdtm/lib/Makefile - Generating build artifacts

Source files added:

* test/zdtm/lib/bpfmap_zdtm.c - Provides definitions for 3 new
functions:
    (a) parse_bpfmap_fdinfo() - Parses information about the BPF map
    from procfs
    (b) cmp_bpf_map_info() - Compares the attributes of a BPF map file
    obtained from BPF_OBJ_GET_INFO_BY_FD. This function is typically
    used to verify that the attributes of a BPF map remain the same
    before checkpoint and after restore
    (c) cmp_bpfmap_fdinfo() - Compares the attributes of a BPF map file
    obtained from procfs. This function is typically used to verify
    that the attributes of a BPF map remain the same before checkpoint
    and after restore

* test/zdtm/lib/bpfmap_zdtm.h - Structure and function declarations.
Declares struct bpfmap_fdinfo_obj, which stores information about BPF
maps parsed from procfs

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
f7bd705735 servicefd: close temporary fd on error path
We can have tmp != sfd if fcntl(F_DUPFD) sees that sfd is already used,
but tmp is left open on this error path, lets close it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
3ef2c1ff87 criu: check matching the tcp-close option on restore only
If the tcp-close has been set for dump, it has to set for restore too.

But we don't need to require matching of tcp-close for criu lazy-pages.

Reported-by: Mr Travis
2020-10-20 00:18:24 -07:00
Andrei Vagin
9acca8df9b tcp: add a separate test for listen sockets
In addition, it checks that an unconnected socket can be connected after
C/R.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
9ba9d6706f tcp: dump shutdown state for unconnected sockets
There is no direct way to get this state but we can get polling events
for a socket and guess its shutdown state.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
fd7b6e73d6 CI: run cross compile on all branches
The github action based cross compile tests are only running when
pushing to master or criu-dev. This changes this to have it run on all
branches. Useful to have all CI tests running on personal CRIU checkouts
on branches with other names.

If I prepare a branch to create a new pull request, the cross compile
tests have not been running if my branch has another name than criu-dev
or master. With this change these tests will run on all branches.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
32b162831d CI: add Travis test script to 'lint'
Running 'make lint' will now also check our travis-tests script with
shellcheck.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
025ef090d2 CI: switch to loop based apt-get
The previously introduced apt_install loop function to make package
install more robust against network errors is now moved to its own
script used in multiple places.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
0def9bc1ff tests: only run 'make lint' once in CI
Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
84215e0355 scripts: run shellcheck on the scripts folder
Shellcheck (https://github.com/koalaman/shellcheck) can identify common
errors in shell scripts. This initial integration of shellcheck only
checks the scripts in the 'scripts/' folder. This commit fixes (or
disables) all reports of shellcheck to ensure this part starts error
free. I am not convinced this is really necessary as most changes do not
seem to be necessary for their circumstances. On the other hand it
probably does not hurt to use a checker to avoid unnecessary errors.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
e2101abf26 crtools: Fix --help output line width
After commit cd479dc3 Travis fails with:

++ ./criu/criu --help
++ wc --max-line-length
+ WIDTH=87
+ '[' 87 -gt 80 ']'
+ echo 'criu --help output does not obey 80 characters line width!'

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2020-10-20 00:18:24 -07:00
Andrei Vagin
4a80dfab89 doc: update documentations for the tcp-close option
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
e42f5e032e tcp: allow to specify --tcp-close on dump
In this case, states of established tcp connections will not be dumped
and they will not be blocked. This will be useful in case of snapshots,
when we don't need to restore tcp connections.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
4f7c480413 test/zdtm: write in a tcp socket has to fail if tcp-close was set
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
5b5f4b717b socket/tcp: shutdown tcp sockets if the tcp-close option is set
A user process can try to write in a tcp socket and it has to get a
error in this case.

Fixes #1134

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Sebastiaan van Stijn
3957d9533b Switch to python 3 variants of dependencies on debian-based builds
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-10-20 00:18:24 -07:00
Guoyun Sun
a6214c360d mips64: implement vdso_redirect_calls()
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
80672c9f34 zdtm: Add test for SO_LINDER
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
5bb5890cb4 socket: c/r support for SO_LINGER
The SO_LINGER option allows to control how a TCP connection is closed.
The default behavior is to return immediately when close() is called,
and any unsent data is not guaranteed to be delivered. When SO_LINGER
is enabled, the close() call would block until all final data is
delivered to the remote end, for a specified time interval. When the
time interval is set to zero, the connection is aborted and any pending
data is immediately discarded upon close().

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
0aeddba7cc socket: c/r support for SO_OOBINLINE
This patch enables checkpoint/restore of the SO_OOBINLINE socket option.
When the SO_OOBINLINE option is used, out-of-band data is placed in the
normal input queue as it is received. This permits it to be read using
read or recv without specifying the MSG_OOB flag.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
5c11b0e440 zdtm: check that unbindable mount flag does not affect restore
Make two mounts, binds of the same file system, the first has root "/"
and the second has a cut root "/auxiliary". This insures that on restore
criu will mount the first mount first and latter bind the second from
it. After set the first mount unbindable to check if restoring these
flag does not interfere with restoring mounts.

Before the fix in these series we had the error:

(00.031286)      1: mnt: 	Mounting tmpfs @/tmp/.criu.mntns.wGroPU/12-0000000000/zdtm/static/unbindable.test/bind_of_unbindable (0)
(00.031298)      1: mnt: 	Bind /tmp/.criu.mntns.wGroPU/12-0000000000/zdtm/static/unbindable.test/unbindable/auxiliary to /tmp/.criu.mntns.wGroPU/12-0000000000/zdtm/static/unbindable.test/bind_of_unbindable
(00.031329)      1: Error (criu/mount.c:2298): mnt: Can't mount at /tmp/.criu.mntns.wGroPU/12-0000000000/zdtm/static/unbindable.test/bind_of_unbindable: Invalid argument

https://bugs.openvz.org/browse/OVZ-7116

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
b554eacb51 mount: replace mountpoint to mnt_id in error messages
Replace in restore_shared_options "->mountpoint" in messages with more
descriptive "->mnt_id". Mountpoints concide a lot, and mnt_id is unique.
These also makes a message shorter. (We already print a mapping from
mnt_id to mountpoint in mnt_tree_show, so we can easily find mountpoint
if we want).

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
f3b18865f9 mount: delay setting unbindable flag where possible
Setting mounts unbindable just after mounting them and before other
mounts of the same superblock are mounted can break the restore of
mounts, because other mounts will try to bind from it and obviously will
fail. See the test "unbindable" for more info.

Currently we can't delay if the mount is overmounted, if we will
need these in future we would likely set it through open fd on the
mountpoint.

https://bugs.openvz.org/browse/OVZ-7116

v2: simplify coderead, print message on setting unbindable error, add
some more comments about unbindable.
v3: add a list to optimise a walk over delayed unbindable mounts, don't
call set_unbindable on error path.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Hello1024
dbf798811f sk-netlink: Handle case of in-use netlink peer ids
Netlink sockets normally kernel-assign a peer id based on the
requesting pid at bind time. CRIU saves this peer ID and tries
to restore it. At restore time, it's possible for the id to
already be in use (e.g., assigned to another process, perhaps
in another pid namespace).

This patch tries to use the original id, but if that fails then
allows the kernel to auto-allocate a new peer id. We make a
warning message, because it is possible (although rather unlikely)
the application cares what its peer id is.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
c0f3653108 images: kindly ask not to use fields with id 18 in unix_sk_entry
The field id 18 is used in Virtuozzo criu in multiple releases, so that
we can't change the id easily. So we can at least kindly ask not to use
this field in mainstream criu to decrease the pain of Virtuozzo criu
rebases.

Reference to related patch in Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/58e61a20c22c#images/sk-unix.proto

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
ed74c9e058 zdtm: add new epoll01 test
This adds three epoll targets on tfd 702 and then adds two epoll targets
on tfd 701. This test is for off calculation in dump_one_eventpoll, the
reverse order makes qsort to actually work.

v2: update test_doc with longer explanation, remove unused DUPFDNO

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
0dda60f518 eventpoll: fix toff off calculation
Kcmp expect off to be a number of inclusion of target tfd. So we should
set off=0 to a first inclusion and off++ for all next inclusions.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
dd082fad10 eventpoll: resort toff back by idx
In dump_one_eventpoll we call find_tfd_bsearch with arguments
"e->tfd[i]->tfd, toff[i].off". By this we want to check if
"toff[i].off"-th inclusion of target "e->tfd[i]->tfd" in an eventpoll
coresponds to our fd "e->tfd[i]->tfd" in terms of kcmp.

But because of toff was sorted to calculate off, indexes in e->tfd and
toff does not address the same eventpoll target at this point.

Let's sort toff back to original state to make indexes consistent.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
596276a9af travis: use Fedora rawhide images from Fedora
The docker hub container registry is not updated as fast as Fedora's
registry at registry.fedoraproject.org. Fedora's registry gets a new
image whenever there is a new version of rawhide, docker hub's rawhide
image can take a couple of weeks because the process is not automated.

Especially when Fedora branches of a new release we see lot's of errors
in CRIU's Fedora rawhide based Travis runs. Switch to Fedora's registry
to always have the newest rawhide images for our tests.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Mike Rapoport
df0c793d52 travis: restore lazy-pages tests for uns flavor
Since commit cdd08cdff ("uffd: use userns_call() to execute
ioctl(UFFDIO_API)") UFFD_API ioctl() is wrapped with userns_call() and this
allows runing lazy-pages tests on recent kernels in uns.

Restore testing of lazy-pages in uns in travis.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
1391f84d85 criu: don't use the deprecated security_context_t (SELinux)
This change fixes the error:
error: 'security_context_t' is deprecated
[-Werror=deprecated-declarations]

Source files modified:

* lsm.c
* net.c

Please refer to:
9eb9c9327

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
20d87bbfac scripts: adding libbpf for Travis tests
Source files modified:

* travis/vagrant.sh - Adding libbpf-devel

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
372782d8ac zdtm: adding tests for BPF maps
This commit adds ZDTM tests for c/r of processes with BPF maps as open
files

Source files added:

* zdtm/static/bpf_hash.c - Tests for c/r of the data and meta-data of
BPF map type BPF_MAP_TYPE_HASH

* zdtm/static/bpf_array.c - Tests for c/r of the data and meta-data
of BPF map type BPF_MAP_TYPE_ARRAY

Source files modified:

* zdtm/static/Makefile - Generating build artifacts for BPF tests

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
14b1cc26de criu: restoring BPF map data
This commit restores the data of BPF maps. A hash table (indexed by
the map's id) is used to store data objects for multiple BPF map
files that a process may have opened. Collisions are resolved with
chaining using a linked list.

Source files modified:

* bpfmap.c - Structure and function definitions needed to:
    (a) collect the protobuf image containing BPF map data
    (b) read the BPF map's data from the image and store it in the
    hash table
    (c) restore the map's data using bpf_map_update_batch()

* include/bpfmap.h
    - Defines the size of the hash table and maks to be used while
	indexing into it
	- Structure and function declarations that are used while restoring
	BPF map data

* cr-restore.c - Collects the protobuf image containing BPF map data
during the restoration phase

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
4d073a75e2 criu: restoring BPF maps (without data)
This commit enables CRIU to restore a process' BPF map file
descriptors.

Source files modified:

* bpfmap.c - Structure and function definitions needed to:
    (a) collect a BPF map's information from its protobuf image
    (b) create and open a BPF map with the same parameters as when
    it was dumped
    (c) add the newly opened BPF map to the process' file descriptor
    list

* include/bpfmap.h - Structure declarations for restoring BPF maps

* files.c - Collects a BPF map's file entry during the restoration
phase

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
4b8186cb6e crit: add BPF map data decoding
This commit enables CRIT to decode the contents of a protobuf image
that stores information related to BPF map

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
b924394ccd criu: dumping BPF map data
This commit enables CRIU to dump data(key-value) pairs stored in BPF
maps

Source files modified:

* bpfmap.c

    - Function dump_one_bpfmap_data() reads the map's keys and
    values into two buffers using bpf_map_lookup_batch() and then
    writes them out to a protobuf image along with the number of
    key-value pairs read

    - Function dump_one_bpfmap() now dumps the data as well before
    returning

* include/bpfmap.h - Includes headers and declares functions needed to
dump BPF map data

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
5ff0e70586 criu: dumping meta-data about BPF map files
This commit enables CRIU to dump meta-data about BPF maps files by
prividing the structures and functions needed by other parts of the
code-base.

Source files added:

* bpfmap.c - defines new structures and functions:

    (a) struct fdtype_ops bpfmap_dump_ops:
            sets up the function handler to dump BPF maps

    (b) is_bpfmap_link():
            checks whether an anonymous inode is a BPF map file

    (c) dump_one_bpfmap():
            parses information for a BPF map file from procfs and
			dumps it

* include/bpfmap.h - structure and function declarations

Source files modified:

* Makefile.crtools - generates build artifacts for bpfmap.c

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
b85709797c criu: parse information about BPF maps from procfs
This commit enables CRIU to:
(a) identify an anonymous inode as being a BPF map
(b) parse information about BPF maps from procfs

Source files modified:

* files.c - Checks anonymous inodes to see whether they are BPF maps.
If so, sets struct fdtype_ops *ops to a structure that knows how to
dump BPF maps

* proc_parse.c - Function parse_fdinfo_pid_s() now checks whether the
current file being processed is a BPF map. If so, it calls a newly
defined function parse_bpfmap() which knows how to parse information
about BPF maps from procfs

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
aa59dfc6df Set Makefile config variables for libbpf
Source files modified:

* Makefile.config - Checks whether libbpf is installed on the system.
If so, we add -lbpf to LIBS_FEATURES, -DCONFIG_HAS_LIBBPF to
FEATURE_DEFINES and set CONFIG_HAS_LIBBPF. This allows us to check for
the presence of libbpf before compiling or executing BPF c/r code and
ZDTM tests.

* Makefile - Set CONFIG_HAS_LIBBPF to clean all files.

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
95b7d06cbc criu: define constants for c/r of BPF maps
This commit defines constants and includes necessary headers to c/r
BPF maps

Source files modified:

* magic.h - Defining BPFMAP_FILE_MAGIC and BPFMAP_DATA_MAGIC

* image-desc.h - Defining CR_FD_BPFMAP_FILE and CR_FD_BPFMAP_DATA

* image-desc.c - Create new entries for bpfmap-file and bpfmap-data
in CRIU's file descriptor set

* protobuf-desc.h - Defining PB_BPFMAP_FILE and PB_BPFMAP_DATA

* protobuf-desc.c - Including headers for BPF map protobuf images

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Abhishek Vijeev
c26cd1395f images: protobuf definitions for BPF map meta-data and data
This commit adds protobuf definitions needed to checkpoint and
restore BPF map files along with the data they contain

Source files added:

* bpfmap-file.proto - Stores the meta-data about BPF maps

* bpfmap-data.proto - Stores the data (key-value pairs) contained
in BPF maps

Source files modified:

* fdinfo.proto - Added BPF map as a new kind of file descriptor.
'message file_entry' can now hold information about BPF map file
descriptors

* Makefile - Now generates build artifacts for bpfmap-file.proto
and bpfmap-data.proto

Signed-off-by: Abhishek Vijeev <abhishek.vijeev@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
d6735616a9 travis: add a focal based test run
It seems travis supports now Ubuntu 20.04. Let's run at least one test
also on 20.04 (focal).

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
ca360ce30b travis: switch travis to Python 3
Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
8063fbb47d contrib: Add python-future to Debian packages
This one contains builtins module from which zdtm.py imports.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-10-20 00:18:24 -07:00
Mike Rapoport
52eff52e67 github: disable cross-compule for mips on master branch
Master branch does not have mips support yet, so automated builds for
mips on the master branch fail.

Temporarily split mips cross-build into a separate files until mips
support will be mergded into the master branch.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
5b751fbaf8 criu: the type of a socket inode has to be "unsigned int"
(00.015271) unix: 	Add a peer: ino 2203289876 peer_ino 2203289875 family    1 type    1 state  1 name /mnt/test/zdtm/static/sockets03.test
(00.015277) Warn  (criu/sk-unix.c:475): unix: Shutdown mismatch -2091677421:1 -> -2091677420:0

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
a6b00b1a7a actions: create file for daily rebuild
This adds a definition to do a daily rebuild of all cross-compile tests
on the master and criu-dev branch.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
18f42b3021 travis: loop over apt-get to recover from errors
One of the most common CI errors we see is that package install fails
due to some hashsum mismatch or some DNS errors.

This adds a loop around each apt-get install call to do a clean, update
and install and if one of the steps fails it repeats it up to 10 times.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
3d67e8a0d2 Makefiles: do not re-generate magic.py every time
I always wondered why re-running make on a criu checkout always prints
out

  GEN      magic.py

even if no file has changed. It seems the Makefile was looking for the
file in the wrong location. Providing the full path to the file will now
only rebuild magic.py if something actually changed that requires a
rebuild.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Otto Bittner
9ce4ed0935 python: Handle byte strings when converting protobuf to dict
Fixes #1165
Traceback (most recent call last):
  File "../criu/crit/crit-python3", line 6, in <module>
    cli.main()
  File "/home/xcv/repos/criu/crit/pycriu/cli.py", line 410, in main
    opts["func"](opts)
  File "/home/xcv/repos/criu/crit/pycriu/cli.py", line 43, in decode
    json.dump(img, f, indent=indent)
  File "/usr/lib/python3.8/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.8/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.8/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/usr/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.8/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable

Co-authored-by: Julian <jb@futureplay.de>
Signed-off-by: Otto Bittner <otto-bittner@gmx.de>
2020-10-20 00:18:24 -07:00
Mike Rapoport
320c88e922 CONTRIBUTING.md: clarify placement of Fixes: tags
The description of the Fixes: tags could be misleading and may be
understood as if "Fixes: " should be the commit summary.

Add a sentence with explicit description of Fixme: tags placement.

Reported-by: Otto Bittner <otto-bittner@gmx.de>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
83be11f1f4 lib/c: extend receive to handle incoming FDs
When using libcriu with the notify callback functionality CRIU transmits
an FD during 'orphan-pts-master' back to libcriu user. This is message
is sent via sendmsg() to transmit the FD and not via write() as all
other protobuf messages.

libcriu was using recv() and to be able to receive the FD this needs to
be changed to recvmsg() and if an FD is attached to it (currently only
for 'orphan-pts-master' this FD is stored in a variable which can be
retrieved with the function criu_get_orphan_pts_master_fd().

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
20a24c11ea log: Ask developers not to use print_on_level directly
The decision whether a log message is info/warning/error should
be made by the place in code where it's shown, not by any other
expression. This makes it pointless to use the print_on_level
directly, as in each particular place the needed pr_foo() helper
can be chosen.

However, we cannot (easily) make this function static, so keep
it in header, but ask people to think twise (or more times) before
calling it directly.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
ebc0d205a1 log: Hide vprint_on_level in log.c
This one is not used outside of log.c.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
4780724745 util: Use pr_info in vma printing
Same as previous patch -- all users are coded as pr_info-s
and are naturally such.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
1955d49077 ipc: Use pr_info() instead of print_on_level(PR_INFO...)
All the cases here are naturally and de-facto pr_info-s.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
99fc76d8b5 proc_parse: Do not feed loglevel into restore_loginuid
If a helper routine doesn't know whether its failure leads
to the error, then it should just emit a warning and return
-1. It's the caller who should print (or not) the error.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
8564bc49e4 check: Use pr_foo macros
Instead of directly calling the print_on_level.
The pr_msg/pr_warn seems to be better choise for all those cases.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
7646deed61 vagrant: Update to Fedora 32
Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
2020-10-20 00:18:24 -07:00
Ajay Bharadwaj
cee36af386 criu/files-reg.c: build-id size bug fix
This addresses a bug where the RegFileEntry n_build_id field would be
populated with incorrect data, causing dump to crash at times.

Signed-off-by: Ajay Bharadwaj <ajayrbharadwaj@gmail.com>
2020-10-20 00:18:24 -07:00
Ajay Bharadwaj
aeeaa30a56 criu/files-reg.c: build-id from multiple headers fix
This addresses a bug when the ELF file contains multiple PT_NOTE program
headers but only the first header is checked for the build-id.

Signed-off-by: Ajay Bharadwaj <ajayrbharadwaj@gmail.com>
2020-10-20 00:18:24 -07:00
Dmitry Safonov
9c0b904a02 compel/infect: Don't adjust stack/args alignment
Instead, fail to infect task and let compel cure it.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
70ecbbcc86 compel: allocate the GOT table to avoid memory corruption
Previously, the GOT table was using the same memory location as the
args region, leading to difficult to debug memory corruption bugs.

We allocate the GOT table between the parasite blob and the args region.
The reason this is a good placement is:
1) Putting it after the args region is possible but a bit combersome as
the args region has a variable size
2) The cr-restore.c code maps the parasite code without the args region,
as it does not do RPC.

Another option is to rely on the linker to generate a GOT section, but I
failed to do so despite my best attempts.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
fee517b3da compel: remove x86/prologue.S
This file has been ignored since commit 19fadee9 from 2016.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
8d8dd5a799 compel: x86 parasite_service entry point simplification
We don't need to push 0 on the stack. This seems to be a remnant of the
initial commit of 2011 that had a `pushq $0`.

The 16 bytes %rsp alignment was added with commit 2a0cea29 in 2012.
This is no longer necessary as we already guarantee that %rsp is 16
bytes aligned. A BUG_ON() is added to enforce this guarantee.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
b8c1d9d939 compel: rewrite parasite cmd and args manipulation from assembly to C
Previously, __export_parasite_cmd was located in parasite-head.S, and
__export_parasite_args located exactly at the end of the parasite blob.
This is not ideal for various reasons:
1) These two variables work together. It would be preferrable to have
them in the same location
2) This prevent us from allocating another section betweeen the parasite
blob and the args area. We'll need this to allocate a GOT table

This commit changes the allocation of these symbols from assembly/linker
script to a C file.

Moreover, the assembly entry points that invoke parasite_service()
prepares arguments with hand crafted assembly. This is unecessary.
This commit rewrite this logic with regular C code.

Note: if it wasn't for the x86 compat mode, we could remove all
parasite-head.S files and directly jump to parasite_service() via
ptrace.  An int3 architecture specific equivalent could be called at the
end of parasite_service() with an inline asm statement.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
391da74647 compel: stop rounding up the parasite blob size and args region to PAGE_SIZE
It is unnecessary and potentially confusing for understanding the memory
layout requirement of the parasite blob.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
d99fc1e553 compel: remove "addr_" from offset variable names
It removes the potential confusion when it comes to virtual address vs
offsets. Further, doing so makes naming more consistent with the rest of
the parasite_blob_desc struct.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
a531f9a8bc compel: pass the parasite_blob_desc to compel_relocs_apply()
compel_relocs_apply() was taking arguments mostly from the struct
parasite_blob_desc. Instead of passing all the arguments, we pass a
pointer to the struct itself.

This makes the code safer, as cr-restore.c calls compel_relocs_apply().
It previously needed to poke into what can be considered private
variables of the restorer-pie.h file.

To allow the parasite_blob_desc struct to be populated without a
parasite_ctl struct, we expand the compel API to export a
parasite_setup_c_header_desc() in the generated pie.h.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
8ac0fa6aaf compel: add error message for COMMON symbols
COMMON symbols are emitted for global variable that are not initialized
in shared libraries.
They typically end up in the .bss section when linking an executable.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
f92948ccaf zdtm: make cgroup_yard to be aware of cgroup2
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
f96cd8c749 tests: skip cgroup04 and cgroup_ifpriomap on pure cgroup2 systems
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
5f160811aa zdtm.py: add the cgroup2 freezer support
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
98e9165f0c cgroup: Add the initial support for cgroup2
This change is very straightforward. We don't skip the cgroup2
controller and dump it as any other controllers.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
a93df9eb9b pidns: fixup
I pushed the wrong branch to the pidns PR

https://github.com/checkpoint-restore/criu/pull/1056

which resulted in the wrong patches getting merged.

This is the actual result from the review.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Ajay Bharadwaj
bb555b3720 criu/config.c: added cli option for build-id
file_validation_method field added to cr_options structure in
"criu/include/cr_options.h" along with the constants:
FILE_VALIDATION_FILE_SIZE
FILE_VALIDATION_BUILD_ID
FILE_VALIDATION_DEFAULT (Equal to FILE_VALIDATION_BUILD_ID)

Usage and description information is yet to be added

Usage:
--file-validation="filesize" (To use only the file size check)
--file-validation="buildid" (To try and use only the build-id check)

Signed-off-by: Ajay Bharadwaj <ajayrbharadwaj@gmail.com>
2020-10-20 00:18:24 -07:00
Ajay Bharadwaj
9191f8728d criu/files-reg.c: add build-id validation functionality
efi.h: Required for accessing the build-id of .efi files

This adds functions to find, store and compare with the stored build-id.
get_build_id() calls 32-bit or 64-bit helper functions depending on
the bitness of the ELF file after first ensuring that it is actually
an ELF file by checking for the magic number.

The number of iterations while searching the elf file for the
build-id before giving up (500 while searching the note section)
are limited.

Signed-off-by: Ajay Bharadwaj <ajayrbharadwaj@gmail.com>
2020-10-20 00:18:24 -07:00
Ajay Bharadwaj
7b18c13c19 images/regfile.proto: adds additional fields to RegFileEntry
This adds build-id, checksum, checksum-config and checksum-parameter fields
to RegFileEntry to store metadata used for file verification.

build_id: Holds the build-id if it could be obtained

checksum: Holds the checksum if it could be obtained

checksum_config: Holds the configuration of bytes for which checksum has
been calculated (The entire file, first N bytes or every Nth byte)

checksum_parameter: Specifies the value of 'N', if required, for the
configuration of bytes

Signed-off-by: Ajay Bharadwaj <ajayrbharadwaj@gmail.com>
2020-10-20 00:18:24 -07:00
Angie Ni
8354b526c4 restore: skip unnecessary setgroups calls
When groups already have desired values, we skip calling setgroups.
Furthermore, this check allows us to restore when in an
unprivileged user namespace that cannot use setgroups.

Signed-off-by: Angie Ni <avtni@google.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
0d8d7f2329 tests: criu-image-streamer change dev branch to master branch
Now that the new CRIU to criu-image-streamer protocol is in, we can use
the master branch of criu-image-streamer.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Andrei Vagin
62d70bd482 test/zdtm/autofs: use sigaction instead of the deprecated siginterrupt
Reported-by: Mr Travis
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Angie Ni
539183fadd Add error logging to config + crtools init
CRIU sometimes returns 1 from main() with no explanation.
Changes made add more logging in the case of
initialization errors in config and crtools.

Signed-off-by: Angie Ni <avtni@google.com>
2020-10-20 00:18:24 -07:00
Angie Ni
9a4b933f24 Add error logging to kerndat init
CRIU sometimes returns 1 from main() with no explanation.
These changes add additional logging for initialization errors in kerndat.

Signed-off-by: Angie Ni <avtni@google.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
3490d997be test: test external namespace support
Adapt netns_ext tests to also work with pid namespaces and move it from
test/others/netns_ext/ to test/others/ns_ext/.

Also enable ns_ext tests in Travis runs.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
f3ebdeebe2 pidns: add external pidns to man-page
Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
9dd1ab00e7 pidns: support external PID namespaces
This allows CRIU to restore a process into an existing PID namespace.
During checkpointing the PID namespace can be marked as external using:

   --external pid[<inode>]:<label>

This can be the host PID namespace or any other PID namespace.

During restore a process can be restored into an existing PID namespace
using:

  --inherit-fd fd[<FD>]:<label>

The <label> has to be the same for checkpoint and restore. CRIU uses
the <label> to know which resource the user means.

A process can start in the host PID namespace and can be moved to
another namespace or the other way around. Any PID namespace can be
used.

This is necessary to checkpoint containers in a POD which share certain
namespaces and this code to support external PID namespaces is the first
step towards checkpointing and restoring containers which belong to a
POD.

This is not using the --join-ns functionality as it is not possible to
move existing process to a PID namespace using setns(). Only child
process will be created in the PID namespace and that is why setns()
needs to be called before clone().

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
f1e6b10369 pidns: write and read pidns information
This loads and stores the key for an external PID namespace if specified
by the user using: --external pid[<inode>]:<label>

Preparation for restoring into existing PID namespaces.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
4e7ec3c88b pidns: add pidns image file definition
TODO: create correct magic

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
99c8487837 zdtm: add zombie_leader test
Create a session leader and it's child - session member, make leader
zombie. To restore this criu will need to create a helper task a child
of our zombie so that member can inherit session. Before fixes in this
patchset we segfault on empty ids and fail to restore cgroups because of
empty cg_set

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Pavel Tikhomirov
f0438f47f2 cgroup: make prepare_task_cgroup lookup current cgset in ancestors
In case if our parent is a dead task (zombie) or a helper which in it's
turn has zombie parent, and parent thus has zero cg_set we should look
for current cgset deeper.

Fixes: #1066

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Cyrill Gorcunov
d38046b003 mount: restore_task_mnt_ns - Lookup for mount namespace conditionally
In case if our parent is a dead task (zombie) we should lookup
for parent ids which will be inherited on restore. Otherwise
parent->ids may be nil and SIGSEGV produced.

Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>

Rework and port from vzcriu:
87b320964 ("vz7: mount: restore_task_mnt_ns - Lookup for mount namespace
conditionally")

Fixes: #1066

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
41b535d312 test: skip vdso test on non-vdso systems
Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Angie Ni
ce22e0f37d uffd: uffd_open prints info, caller prints error
When uffd_open is called from kerndat_uffd, userfaultfd failure is not
considered an error, so the goal is to suppress the error message --
instead, we print this message as info.

If the function fails, it is the responsibility of the caller
to print the error message.

Signed-off-by: Angie Ni <avtni@google.com>
2020-10-20 00:18:24 -07:00
Mike Rapoport
6815aa958d CONTRIBUTING.md: add pull request guidelines
Following the discussion at [1] describe best practices for pull request
creation.

[1] https://github.com/checkpoint-restore/criu/pull/1096

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Mike Rapoport
35f8c056ac CONTRIBUTING.md: add sections about patch description and splitting
Shamelessly stolen from the Linux kernel [1], shortened a bit and
relaxed to match CRIU.

[1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Mike Rapoport
2e5805878b CONTRIBUTING.md: minor formatting fixes
* Mark lowcase criu as code in the environment section
* Add missing brace around the reference to https://criu.org/Secrity
* Fixup an admolition block that GitHub cannot render
* Spelling fixups
* s/github/GitHub/g

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Mike Rapoport
d0fcb01d47 CONTRIBUTING.md: import "How to submit patches" from criu.org
Import "How to submit patches" article from CRIU wiki and update its
format to match GitHub markdown.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Mike Rapoport
808684c99e Add CONTRIBUTING.md
Move the existing contribution guidelines to a dedicated file for future
extensions.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-10-20 00:18:24 -07:00
Cyrill Gorcunov
6ee4b72382 arch/x86: Fix calculation of xstate_size
The layout of xsave frame in a standart format is predefined by the hardware.
Lets make sure we're increasing in frame offsets and use latest offset where
appropriate.

https://github.com/checkpoint-restore/criu/issues/1042

Reported-by: Ashutosh Mehra <mehra.ashutosh@ibm.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2020-10-20 00:18:24 -07:00
Kir Kolyshkin
1d9438aefb criu swrk: fix usage, allow common options
TL;DR: this makes possible -v with criu swrk, and removes showing usage
which is useless in swrk mode.

1. Since criu swrk command is not described in usage, there is no sense
   in showing it. Instead, show a one-line hint about how to use it.

2. In case some global options (like -v) are used, argv[1] might not
   point to "swrk". Use optind to point to a correct non-option
   argument.

3. While at it, also error out in case we have extra arguments.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
cbf099400a Travis: use Vagrant to run VMs
This adds the minimal configuration to run Fedora 31 based VMs on
Travis.

This can be used to test cgroupv2 based tests, tests with vdso=off and
probably much more which requires booting a newer kernel.

As an example this builds CRIU on Fedora 31 and reconfigures it to boot
without VDSO support and runs one single test.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
d72428b7c4 Also report clone3() errors correctly
Without clone3() CRIU was able to detect a process with a wrong PID only
in the already created child process. With clone3() this error can
happen before the process is created.

In the case of EEXIST this error will now be correctly forwarded to an
RPC client.

This was detected by running test/others/libcriu on a clone3() system.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
047ecd3a15 test/others/libcriu: test version library calls
This adds the previously added libcriu version functions to the libcriu
tests.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
55f71b8667 lib/c: add criu_get_version()
Although the CRIU version is exported in macros in version.h it only
contains the CRIU version of libcriu during build time.

As it is possible that CRIU is upgraded since the last time something
was built against libcriu, this adds functions to query the actual CRIU
binary about its version.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
ZeyadYasser
e57e74a18d criu: optimize find_unix_sk_by_ino()
Fixes: #339
Replaced the linear search with a hashtable lookup.

Signed-off-by: Zeyad Yasser <zeyady98@gmail.com>
2020-10-20 00:18:24 -07:00
Kir Kolyshkin
62c03530c9 swrk: send notification instead of using status fd
When we use swrk, we have a mechanism to send notifications over RPC.
It is cleaner and more straightforward than sending \0 to status fd.

For now, both mechanisms are supported, although status fd request
option is now deprecated, so a warning is logged in case it's used.

Guess we can remove it in a few years.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-20 00:18:24 -07:00
Kir Kolyshkin
faf6dbf33e close_service_fd: rename to status_ready
The name close_service_fd() is misleading, as it not just closes the
status_fd, but also writes to it. On a high level, though, it signals
the other side that we are ready, so rename to status_ready.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
e34f5dd3a3 clang: Branch condition evaluates to a garbage value
criu-3.14/criu/namespaces.c:692:7: warning: Branch condition evaluates to a garbage value

criu-3.14/criu/namespaces.c:690:3: note: 'supported' declared without an initial value
              protobuf_c_boolean supported;
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu-3.14/criu/namespaces.c:691:8: note: Calling 'get_ns_id'
              id = get_ns_id(pid, &time_for_children_ns_desc, &supported);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu-3.14/criu/namespaces.c:479:9: note: Calling '__get_ns_id'
      return __get_ns_id(pid, nd, supported, NULL);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu-3.14/criu/namespaces.c:454:6: note: Assuming 'proc_dir' is < 0
      if (proc_dir < 0)
          ^~~~~~~~~~~~
criu-3.14/criu/namespaces.c:454:2: note: Taking true branch
      if (proc_dir < 0)
      ^
criu-3.14/criu/namespaces.c:455:3: note: Returning without writing to '*supported'
              return 0;
              ^
criu-3.14/criu/namespaces.c:479:9: note: Returning from '__get_ns_id'
      return __get_ns_id(pid, nd, supported, NULL);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu-3.14/criu/namespaces.c:479:2: note: Returning without writing to '*supported'
      return __get_ns_id(pid, nd, supported, NULL);
      ^
criu-3.14/criu/namespaces.c:691:8: note: Returning from 'get_ns_id'
              id = get_ns_id(pid, &time_for_children_ns_desc, &supported);
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
criu-3.14/criu/namespaces.c:692:7: note: Branch condition evaluates to a garbage value
              if (!supported || !id) {
                  ^~~~~~~~~~
690|   		protobuf_c_boolean supported;
691|   		id = get_ns_id(pid, &time_for_children_ns_desc, &supported);
692|-> 		if (!supported || !id) {
693|   			pr_err("Can't make timens id\n");
694|

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
b4c51ea492 coverity: fix FORWARD_NULL in criu/proc_parse.c: 1481
8. criu-3.14/criu/proc_parse.c:1511: var_deref_model: Passing null pointer "f" to "fclose", which dereferences it.
  1509|   	exit_code = 0;
  1510|   out:
  1511|-> 	fclose(f);
  1512|   	return exit_code;
  1513|   }

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
64347398c1 coverity: fix RESOURCE_LEAK criu/timens.c: 67
7. criu-3.14/criu/timens.c:67: leaked_storage: Variable "img" going out of scope leaks the storage it points to.
    65|   	if (id == 0 && empty_image(img)) {
    66|   		pr_warn("Clocks values have not been dumped\n");
    67|-> 		return 0;
    68|   	}

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
f334102520 libcriu: Add space between 'if' and parenthesis
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
4ac9a3c904 libcriu: Use spaces around '='
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
ae4fd07ca5 libcriu: Add orphan pts master
The orphan pts master option was introduced with commit [1]
to enable checkpoint/restore of containers with a pty pair
used as a console.

[1] 6afe523d97

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-10-20 00:18:24 -07:00
Kir Kolyshkin
f6d1b498dc cr-service: spell out an error
While working on runc checkpointing, I incorrectly closed status_fd
prematurely, and received an error from CRIU, but it was
non-descriptive.

Do print the error from open().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-20 00:18:24 -07:00
Kir Kolyshkin
00a44031e2 cr-service: fix wording in debug messages
The message "Overwriting RPC settings with values from <filename>" is
misleading, giving the impression that file is being read and consumed.
It really puzzled me, since <filename> didn't exist.

What it needs to say is "Would overwrite", i.e. if a file with such name
is present, it would be used.

Also, add actual "Parsing file ..." so it will be clear which files are
being used.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
00b8257d9f tests: move cross compilation to github actions
This moves the cross compilation tests to github actions, to slightly
reduce the number of Travis tests and run them in parallel on github
actions.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
8452be93cf travis: use bionic almost everywhere
A few tests were still running on xenial because at some point they were
hanging. This switches now all tests to bionic except one docker test
which still uses xenial to test with overlayfs.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Kir Kolyshkin
5bd776da38 Remove dupe of "deprecated stuff on" msg
A similar one is already printed in check_options().

Before this patch:
> $ ./criu/criu -vvvvvv --deprecated --log-file=/dev/stdout xxx
> (00.000000) Turn deprecated stuff ON
> ...
> (00.029680) DEPRECATED ON
> (00.029687) Error (criu/crtools.c:284): unknown command: xxx

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-10-20 00:18:24 -07:00
Josh Abraham
8364b09407 soccr/test: Fix error logging in libsoccr tcp-test
Signed-off-by: Joshua Abraham <sinisterpatrician@gmail.com>
2020-10-20 00:18:24 -07:00
Guoyun Sun
277b0b69fa mips: fix fail when run zdtm test pthread01.c
k_rtsigset_t is 16Bytes in mips architecture but not 8Bytes.
so blk_sigset_extended be added in TaskCoreEntry and ThreadCoreEntry for dumping
extern 8Bytes data in parasite-syscall.c, restore extern 8Bytes data in cr-restore.c

Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Guoyun Sun
be13941221 mips: impliment arch_shmat()
On MIPS CPUs with VIPT caches also has aliasing issues, just like ARMv6.
To overcome this issue, page coloring 0x40000 align for shared mappings was introduced (SHMLBA) in kernel.
    https://github.com/torvalds/linux/blob/master/arch/mips/include/asm/shmparam.h

Related to this, zdtm test suites ipc.c shm.c shm-unaligned.c and shm-mp.c are passed.

Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Andrei Vagin
d38851c9bd test/jenkins: use bash to run shell scripts
We permanently have issues like this:
./test/jenkins/criu-iter.sh: 3: source: not found

It looks like a good idea to use one shell to run our jenkins scripts.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
40169b950e style: fix typos
Oddly, one of the test had a typo which should be fatal.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Guoyun Sun
b5c34c74c5 mips:support docker-cross compile
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Guoyun Sun
afe90627e2 mips:criu: Enable mips in criu
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Guoyun Sun
d325b7b775 mips:criu/arch/mips: Add mips parts to criu
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Guoyun Sun
158e8f8fe6 mips:proto: Add mips to protocol buffer files
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Guoyun Sun
e7d13b368d mips:compel: Enable mips in compel/
Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Guoyun Sun
ba0d6dbac1 mips:compel/arch/mips: Add architecture support to compel tool and libraries
This patch only adds the support but does not enable it for building.

Signed-off-by: Guoyun Sun <sunguoyun@loongson.cn>
2020-10-20 00:18:24 -07:00
Adrian Reber
8be1d457d7 net: fix coverity RESOURCE_LEAK
criu-3.12/criu/net.c:2043: overwrite_var: Overwriting "img" in "img =
open_image_at(-1, CR_FD_IP6TABLES, 0UL, pid)" leaks the storage that
"img" points to.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
eb732bcf0d util: Remove deprecated print_data() routine
The print_data() function was part of the deprecated (and removed)
'show' action, and it was moved in util.c with the following commit:

	a501b4804b
	The 'show' action has been deprecated since 1.6, let's finally drop it.

	The print_data() routine is kept for yet another (to be deprecated too)
	feature called 'criu exec'.

The criu exec feature was removed with:

	909590a355
	Remove criu exec code

	It's now obsoleted by compel library.
	Maybe-TODO: Add compel tool exec action?

Therefore, now we can drop print_data() as well.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-10-20 00:18:24 -07:00
Pavel Emelyanov
8c538ca10d page-read: Warn about async read w/o completion cb
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
27ab533cbe tests: run tests with criu-image-streamer with --stream
One can pass --stream to zdtm.py for testing criu with image streaming.
criu-image-streamer should be installed in ../criu-image-streamer
relative to the criu project directory. But any path will do providing
that criu-image-streamer can be found in the PATH env.

Added a few tests to run on travis-ci to make sure streaming works.
We run test that are likely to fail. However, it would be good to once
in a while run all tests with `--stream -a`.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
7d79a58f4d img-streamer: introduction of criu-image-streamer
This adds the ability to stream images with criu-image-streamer

The workflow is the following:
1) criu-image-streamer is started, and starts listening on a UNIX
   socket.
2) CRIU is started. img_streamer_init() is invoked, which connects to the
   socket. During dump/restore operations, instead of using local disk to
   open an image file, img_streamer_open() is called to provide a UNIX pipe
   that is sent over the UNIX socket.
3) Once the operation is done, img_streamer_finish() is called, and the
   UNIX socket is disconnected.

criu-image-streamer can be found at:
https://github.com/checkpoint-restore/criu-image-streamer

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Nicolas Viennot
51c3f8a908 pipes: loop over splice() when dumping a pipe's data
Instead of erroring, we should loop until we get the desired number of
bytes written, like regular I/O loops.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
0708cbd883 remote: Use tmp file buffer when restore ip dump
When CRIU calls the ip tool on restore, it passes the fd of remote
socket by replacing the STDIN before execvp. The stdin is used by the
ip tool to receive input. However, the ip tool calls ftell(stdin)
which fails with "Illegal seek" since UNIX sockets do not support file
positioning operations. To resolve this issue, read the received
content from the UNIX socket and store it into temporary file, then
replace STDIN with the fd of this tmp file.

 # python test/zdtm.py run -t zdtm/static/env00 --remote -f ns
 === Run 1/1 ================ zdtm/static/env00

 ========================= Run zdtm/static/env00 in ns ==========================
 Start test
 ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
 Adding image cache
 Adding image proxy
 Run criu dump
 Run criu restore
 =[log]=> dump/zdtm/static/env00/31/1/restore.log
 ------------------------ grep Error ------------------------
 RTNETLINK answers: File exists
 (00.229895)      1: do_open_remote_image RDONLY path=route-9.img snapshot_id=dump/zdtm/static/env00/31/1
 (00.230316)      1: 	Running ip route restore
 Failed to restore: ftell: Illegal seek
 (00.232757)      1: Error (criu/util.c:712): exited, status=255
 (00.232777)      1: Error (criu/net.c:1479): IP tool failed on route restore
 (00.232803)      1: Error (criu/net.c:2153): Can't create net_ns
 (00.255091) Error (criu/cr-restore.c:1177): 105 killed by signal 9: Killed
 (00.255307) Error (criu/mount.c:2960): mnt: Can't remove the directory /tmp/.criu.mntns.dTd7ak: No such file or directory
 (00.255339) Error (criu/cr-restore.c:2119): Restoring FAILED.
 ------------------------ ERROR OVER ------------------------
 ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
 ##################################### FAIL #####################################

Fixes #311

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2020-10-20 00:18:24 -07:00
Radostin Stoyanov
01cab14dfa util: Fix addr casting for IPv4/IPv6 in autobind
When saddr.ss_family is AF_INET6 we should cast &saddr to
(struct sockaddr_in6 *).

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-10-20 00:18:24 -07:00
Adrian Reber
be2ded15ee test: fix flake8 errors
The newest version of flake reports errors that variable names like 'l'
should not be used, because they are hard to read.

This changes 'l' to 'line' to make flake8 happy.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-06-06 11:46:14 -07:00
Adrian Reber
d23d1fc0f9 travis: fix alpine builds
With the latest version of the alpine container image it seems that
alpine changed a few package names. This adapts the alpine container
to solve the travis failures.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-06-06 11:45:21 -07:00
Adrian Reber
f2edc1e199 Update certificates for failing tls based tests
When using zdtm.py with --tls it started to fail as the certificates
seem to have expired. Following commands have been used to re-generate
the certificate:

            # Generate CA key and certificate
            echo -ne "ca\ncert_signing_key" > temp
            certtool --generate-privkey > cakey.pem
            certtool --generate-self-signed \
                --template temp \
                --load-privkey cakey.pem \
                --outfile cacert.pem

            # Generate server key and certificate
            echo -ne "cn=$HOSTNAME\nencryption_key\nsigning_key" > temp
            certtool --generate-privkey > key.pem
            certtool --generate-certificate \
                --template temp \
                --load-privkey key.pem \
                --load-ca-certificate cacert.pem \
                --load-ca-privkey cakey.pem \
                --outfile cert.pem
            rm temp cakey.pem

Without this tests will fail in Travis.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-06-05 11:37:50 -07:00
Pavel Emelyanov
95ead14874 criu: Version π
The long-tempting release with lots of new features on board.
We have finally the time namespace support, great improvment of
the pre-dump memory consumption, new clone3 support and many
more.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-04-29 16:31:49 +03:00
Kir Kolyshkin
5c5e7695a5 get_clean_mount: demote an error to a warning
When testing runc checkpointing, I frequently see the following error:

> Error (criu/mount.c:1107): mnt: Can't create a temporary directory: Read-only file system

This happens because container root is read-only mount.

The error here is not actually fatal since it is handled later
in ns_open_mountpoint() (at least since [1] is fixed), but it is shown
as error in runc integration tests.

Since it is not fatal, let's demote it to a warning to avoid confusion.

[1] https://github.com/checkpoint-restore/criu/issues/520

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
c83a0aae2c proc: parse clock symbolic names in /proc/pid/timens_offsets
Clock IDs in this file has been replaced by clock symbolic names.

Now it looks like this:
    $ cat /proc/774/timens_offsets
    monotonic      864000         0
    boottime      1728000         0

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
7dc89376b8 pstree: improve error handling in read_pstree_image
First don't free pstree_item as they are allocated with shmalloc on
restore. Second always pstree_entry__free_unpacked PstreeEntry. Third
remove all breaks replacing them with implict goto err, so that it would
be easier to understand that we are on error path. Forth split out
code for reading one pstree item in separate function.

Sadly there is no much use in xfree-ing pi->threads because in case of
an error we still have ->threads unfreed from previous entries anyway.

But at least some cleanup can be done here.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
42b5700b72 kerndat remove duplicate call to kerndat_nsid()
Func kerndat_nsid() is called twice.

v2: leave kerndat_nsid call near kerndat_link_nsid

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
2c2fdd3334 parasite-msg: %u is not implemented for parasite code
Changed all the %u into %d.

Ideally, we should implement the %u format for parasite code.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
ef7ef9cfa0 kerndat: remove duplicate call to kerndat_socket_netns()
kerndat_socket_netns() is called twice. We keep the latter to avoid
changing the behavior.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
62088c721f criu: put statement continuation on the same line as the closing bracket
We should follow Linux Kernel Codding Style:

... the closing brace is empty on a line of its own, except in the cases
where it is followed by a continuation of the same statement, ie ... an
else in an if-statement ...

https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces

Automaticly fixing with:

:!git grep --files-with-matches "^\s*else[^{]*{" | xargs
:argadd <files>
:argdo :%s/}\s*\n\s*\(else[^{]*{\)/} \1/g | update

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-25 00:43:23 -07:00
Alexander Mikhalitsyn
d1fa1734ee autofs: fix integer overflow in mount options parsing
In real life cases pipe_ino param could be larger that INT_MAX,
but in autofs_parse() function we using atoi function, that uses
4 byte integers. It's a bug.

Example of mount info from real case:
(00.508286) 	type autofs source /etc/auto.misc mnt_id 2824 s_dev 0x4b9 / @
./misc flags 0x300000 options fd=5,pipe_ino=3480845226,pgrp=95929,timeout=300,
minproto=5,maxproto=5,indirect

3480845226 > 2147483647 (32-bit wide signed int max value) => we have a problem

It causes a error:
(03.195915) Error (criu/pipes.c:529): The packetized mode for pipes is not supported yet

Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
6b9faabf39 mem: avoid re-opening CR_FD_PAGES when not needed
This commit introduces an optimization when rsti(t)->vma_io is empty.
This optimization allows streaming a non-seekable image as CR_FD_PAGES
is not reopened.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
4d34f84bb6 img: rellocate a PATH_MAX buffer from the bss section to the stack
Reducing our memory footprint by 4K.

Improved-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Nicolas Viennot
bb0b4219ef img: fix image_name() when image is empty
When an image is opened but errored with a ENOENT error, the image is
still valid. Later on, do_pb_read_one() can fail and will invoke
image_name(). The image fd is EMPTY_IMG_FD (-404). read_fd_link fails.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
067a20c815 zdtm: fail if test with the crfail tag passes
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
698f3a4dbd zdtm: limit the line length for ps by 160 symbols
By default, this limit is 80 symbols and this isn't enough:
 4730 pts/0    S+     0:00          \_ ./zdtm_ct zdtm.py
7535 4731 pts/0    S+     0:00          |   \_ python zdtm.py
7536 4839 pts/0    S+     0:00          |       \_ python zdtm.p
7537 4861 pts/0    S+     0:00          |           \_ make --no
7538 4882 pts/0    S+     0:00          |               \_ ./mnt
7539 4883 ?        Ss     0:00          |                   \_ .

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
eab1a30748 timens: restore processes in a new timens to restore clocks
After restoring processes, we have to be sure that monotonic and
boottime clocks will not go backward. For this, we can restore processes
in a new time namespace and set proper offsets for the clocks.

In this patch, criu dumps clocks values event when processes are running
in this host time namespace and on restore, criu creates a new time
namespace, sets dumped clock values and restores processes.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
73438d34bb test: check that C/R of nested time namespaces fails
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
0d8c0562f9 zdtm_ct: run each test in a new time namespace
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
f1655fd540 zdtm: add a new test to check c/r of time namespaces
This test checks that monotonic and boottime don't jump after C/R.

In ns and uns flavors, the test is started in a separate time namespace
with big offsets, so if criu will restore a time namespace incorrectly
the test will detect the big delta of clocks values before and after C/R.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
3fd0fa4bdc zdtm: add support for time namespaces
For ns and uns flavors, tests run in separate time namespaces.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
ddba4af608 namespace: fail if ns/time_for_children isn't equal to ns/time
This case isn't supported right now.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Andrei Vagin
4127ef4ab7 criu: Add support for time namespaces
The time namespace allows for per-namespace offsets to the system
monotonic and boot-time clocks.

C/R of time namespaces are very straightforward. On dump, criu enters a
target time namespace and dumps currents clocks values, then on restore,
criu creates a new namespace and restores clocks values.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-04-25 00:43:23 -07:00
Pavel Tikhomirov
0e9b42acf9 MAINTAINERS: Add Pavel (myself) to maintainers
Hope I have enough experience in the project to be nominated. I want to
help with review and will try to do my best in it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-04-06 23:19:57 -07:00
Pavel Tikhomirov
e3fb52e375 remove header include statements duplicates
Revert "util: introduce the mount_detached_fs helper"

This reverts commit 5dbc24b206.

Revert "criu: Make use strlcpy() to copy into allocated strings"

This reverts commit bc49927bbc.

Fixes for https://github.com/checkpoint-restore/criu/pull/1003

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-03-30 19:43:32 -07:00
Pavel Emelyanov
fcb23dbfcf
Merge pull request #1003 from avagin/v3.14-part2
Prepare v3.14 (part 2)
2020-03-30 13:50:56 +03:00
Andrei Vagin
8c36865c84 memfd: split the struct memfd_inode
The struct memfd_inode has a union for dump and restore parts.
The only common parts are the list_head node, and the inode id.

Suggested-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
e3a5d09752 memfd: save all memfd inodes in one image
Per-object image is acceptable if we expect to have 1-3 objects
per-container. If we expect to have more objects, it is better to save
them all into one image. There are a number of reasons for this:
* We need fewer system calls to read all objects from one image.
* It is faster to save or move one image.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Byeonggon Lee
967797a867 Add build directory to gitignore
After running make install, build directory is generated but not ignored
in gitignore. So this commit add build directory to gitignore.

Signed-off-by: Byeonggon Lee <gonny952@gmail.com>
2020-03-27 19:36:20 +03:00
Pavel Tikhomirov
cc362b432e namespaces: fix error handling in dump_user_ns
Fix n_xid_map leaks on error path and remove useless exit_code.

Fixes: 6e1726f8 ("userns: set uid and gid before entering into userns")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
1ad8657ddb config/nftables: include string.h for strlen
Fixes: 9433b7b9db ("make: use cflags/ldflags for config.h detection mechanism")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
5f28b692a0 test/fifo_loop: change sizes of all fifo-s to fit a test buffer
This test doesn't expect that the write operation will block.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
1ad209b9c2 test/pipe03: check that pipe size is restored
Create two pipes with and without queued data.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
2b376168ef pipe: restore pipe size even if a pipe is empty
Without this patch, pipe size is restored only if a pipe has
queued data.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Valeriy Vdovin
fa705e418b zdtm: Use safe helper function to initialize unix socket sockaddr structure
The helper function removes code duplication from tests that want to
initialize unix socket address to an absolute file path, derived from
current working directory of the test + relative filename of a resulting
socket. Because the former code used cwd = get_current_dir_name() as
part of absolute filename generation, the resulting filepath could later
cause failure of bind systcall due to unchecked permissions and
introduce confusing permission errors.

Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Valeriy Vdovin
691b4a4e7e zdtm: Implemented get_current_dir_name wrapper that checks for 'x' permissions
Any filesystem syscall, that needs to navigate to inode by it's
absolute path performs successive lookup operations for each part of the
path. Lookup operation includes access rights check.
Usually but not always zdtm tests processes fall under 'other' access
category. Also, usually directories don't have 'x' bit set for other.
In case when bit 'x' is not set and user-ID and group-ID of a process
relate it to 'other', test's will not succeed in performing these
syscalls which are most of filesystem api, that has const char *path
as part of it arguments (open, openat, mkdir, bind, etc).
The observable behavior of that is that zdtm tests fail at file
creation ops on one system and pass on the other. The above is not
immediately clear to the developer by just looking at failed test's logs.
Investigation of that is also not quick for a developer due to the
complex structure of zdtm runtime where nested clones with
NAMESPACE flags take place alongside with bind-mounts.

As an additional note: 'get_current_dir_name' is documented as returning
EACCESS in case when some part of the path lacks read/list permissions.
But in fact it's not always so. Practice shows, that test processes can
get false success on this operation only to fail on later call to
something like mkdir/mknod/bind with a given path in arguments.

'get_cwd_check_perm' is a wrapper around 'get_current_dir_name'. It also
checks for permissions on the given filepath and logs the error. This
directs the developer towards the right investigation path or even
eliminates the need for investigation completely.

Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
c40c09cbbf test/zdtmp: add a test to C/R shared memory file descriptors
Any shared memory region can be openned via /proc/self/map_files.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
10b1d46f67 mem/vma: set VMA_FILE_{PRIVATE,SHARED} if a vma file is borrowed
Here is a fast path when two consequent vma-s share the same file.

But one of these vma-s can map a file with MAP_SHARED, but another one
can map it with MAP_PRIVATE and we need to take this into account.
2020-03-27 19:36:20 +03:00
Andrei Vagin
fb65ab2b1a mem: dump shared memory file descriptors
Any shared memroy mapping can be opened via /proc/self/maps_files/.
Such file descriptors look like memfd file descriptors, so
they can be dumped by the same way.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
f42ae70c75 make: use cflags/ldflags for config.h detection mechanism
The config.h detection scripts should use the provided CFLAGS/LDFLAGS
as it tries to link libnl, libnet, and others.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
d0d6f1ad10 mailmap: update my email
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Mike Rapoport
c3ad4942d4 travis: add ppc64-cross test on amd64
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-03-27 19:36:20 +03:00
Alexander Mikhalitsyn
b9c8e957d8 crit-recode: skip (not try to parse) nftables raw image
We should ignore (not parse) images that has non-crtool format,
that images has no magic number (RAW_IMAGE_MAGIC equals 0).

nftables images has format compatible with `nft -f /proc/self/fd/0`
input format.

Reported-by: Mr Jenkins
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
1f74f8d770 travis: Use debian/buster as base for cross build tests
Jessie is called 'oldoldstable', migrate to Buster.

Suggested-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
18ac1540c4 travis: Add aarch64-cross test on amd64
Fixes: #924
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
327554ee64 compel: Remove compel.h
The file only includes other headers (which may be not needed).
If we aim for one-include-for-compel, we could instead paste all
subheaders into "compel.h".
Rather, I think it's worth to migrate to more fine-grained compel
headers than follow the strategy 'one header to rule them all'.

Further, the header creates problems for cross-compilation: it's
included in files, those are used by host-compel. Which rightfully
confuses compiler/linker as host's definitions for fpu regs/other
platform details get drained into host's compel.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
62ad2f6095 criu: Remove compel.h includes
The plan is to remove "compel.h". That file only includes other headers
(which may be not needed). If we aim for one-include-for-compel, we
could instead paste all subheaders into "compel.h".
Rather, I think it's worth to migrate to more fine-grained compel
headers than follow the strategy 'one header to rule them all'.

Further, the header creates problems for cross-compilation: it's
included in files, those are used by host-compel. Which rightfully
confuses compiler/linker as host's definitions for fpu regs/other
platform details get drained into host's compel.

As a first step - stop including "compel.h" in criu.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
065ff6f415 zdtm/fifo_loop: don't try to write more than pipe size
... otherwise write() can block.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Pavel Tikhomirov
73e0ed3b8a zdtm: add a test on open symlink migration
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Co-Developed-by: Vitaly Ostrosablin <vostrosablin@virtuozzo.com>
Signed-off-by: Vitaly Ostrosablin <vostrosablin@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
2020-03-27 19:36:20 +03:00
Alexander Mikhalitsyn
1936608ce4 files: allow dumping opened symlinks
To really open symlink file and not the regular file below it, one needs
to do open with O_PATH|O_NOFOLLOW flags. Looks like systemd started to
open /etc/localtime symlink this way sometimes, and before that nobody
actually used this and thus we never supported this in CRIU.

Error (criu/files-ext.c:96): Can't dump file 11 of that type [120777]
(unknown /etc/localtime)

Looks like it is quiet easy to support, as c/r of symlink file is almost
the same as c/r of regular one. We need to only make fstatat not
following links in check_path_remap.

Also we need to take into account support of ghost symlinks.

Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
Co-developed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Pavel Tikhomirov
8b9c1f4c5b zdtm: add a test for files opened with O_PATH
On these test without the patch ("fown: Don't fail on dumping files opened
wit O_PATH") we trigger these errors:

Error (criu/pie/parasite.c:340): fcntl(4, F_GETOWN_EX) -> -9
Error (criu/files.c:403): Can't get owner signum on 18: Bad file descriptor
Error (criu/files-reg.c:1887): Can't restore file pos: Bad file descriptor

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
2020-03-27 19:36:20 +03:00
Cyrill Gorcunov
f167d1f4e9 fown: Don't fail on dumping files opened with O_PATH
O_PATH opened files are special: they have empty
file operations in kernel space, so there not that
much we can do with them, even setting position is
not allowed. Same applies to a signal number for
owner settings.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Co-developed-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn (Virtuozzo) <alexander@mihalicyn.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
58fd63042c zdtm/inhfd: force python to read new data from a file
python 2.7 doesn't call the read system call if it's read file to the
end once. The next seek allows to workaround this problem.

inhfd/memfd.py hangs due to this issue.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
fce196d88d memfd: don't corrupt a state of the dumped fd
Right now, criu uses a dumped fd to dump content of a memfd "file".

Here are two reasons why we should not do this:
* a state of a dumped fd doesn't have to be changed, but now criu calls
  lseek on it. This can be workarounded by using pread.
* a dumped descriptor can be write-only.

Reported-by: Mr Jenkins
Cc: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
ffe0896ed0 fs: use __open_proc instead of open("/proc/...", ... )
Processes can run in a mount namespace without /proc.

Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Adrian Reber
4129d3262a cgroup2: add minimal cgroup2 support
The runc test cases are (sometimes) mounting a cgroup inside of the
container. For these tests to succeed, let CRIU know that cgroup2 exists
and how to restore such a mount.

This does not fix any specific cgroup2 settings, it just enables CRIU to
mount cgroup2 in the restored container.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-03-27 19:36:20 +03:00
Adrian Reber
10416bcbcb seize: support cgroup v2 freezer
This adds support to checkpoint processes using the cgroup v2 freezer.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-03-27 19:36:20 +03:00
Adrian Reber
9f902e0c6b seize: factor out opening and writing the freezer state
More preparations for cgroupv2 freezer. Factor our the freezer state
opening and writing to have one location where to handle v1 and v2
differences.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-03-27 19:36:20 +03:00
Adrian Reber
563c5e5e76 seize: prepare for cgroupv2 freezer
The cgroupv2 freezer does not return the same strings as v1. Instead of
THAWED and FROZEN v2 returns 0 and 1 (strings). This prepares the seize
code to use 0 and 1 everywhere and THAWED and FROZEN only for v1
specific code paths.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-03-27 19:36:20 +03:00
Radostin Stoyanov
bb032cc3e2 criu(8): Convert tabs to spaces
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-03-27 19:36:20 +03:00
Radostin Stoyanov
48f3b6516b criu(8): Add documentation for --enable-fs
This option was introduced with:

e2c38245c6

v2: (comment from Pavel Tikhomirov) --enable-fs does not fit with
--external dev[]:, see try_resolve_ext_mount, external dev mounts
only determined for FSTYPE__UNSUPPORTED.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-03-27 19:36:20 +03:00
Mike Rapoport
cdd08cdff8 uffd: use userns_call() to execute ioctl(UFFDIO_API)
In the recent kernels the userfaultfd support for FORK events is limited to
CAP_SYS_PTRACE. That causes the followong error when the ioctl(UFFDIO_API)
is executed from non-privilieged userns:

Error (criu/uffd.c:273): uffd: Failed to get uffd API: Operation not permitted

Wrapping the call to ioctl(UFFDIO_API) in userns_call() resolves the issue.

Fixes: #964
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-03-27 19:36:20 +03:00
Pavel Tikhomirov
38793699e7 test/jenkins: remove empty line at the end of file
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
3eab205bae python: sort imports
202 Additional newline in a group of imports.
I100 Import statements are in the wrong order.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
bc49927bbc criu: Make use strlcpy() to copy into allocated strings
strncpy() with n == strlen(src) won't put NULL-terminator in dst.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00
Dmitry Safonov
ec11644954 criu: Use strlcpy() instead of strncpy()
gcc8 in Fedora Rawhide has a new useful warning:

> criu/img-remote.c: In function 'push_snapshot_id':
> criu/img-remote.c:1099:2: error: 'strncpy' specified bound 4096 equals destination size [-Werror=stringop-truncation]
>  1099 |  strncpy(rn.snapshot_id, snapshot_id, PATH_MAX);
>       |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From man 3 strncpy:
> Warning:  If there is no null byte among the first n bytes of src,
> the string placed in dest will not be null-terminated.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
2dd105b8df memfd: add tests
Testing for all the memfd features, namely support for CR of:
* the same fd shared by multiple processes
* the same file shared by multiple processes
* the memfd content
* file flags and fd flags
* mmaps, MAP_SHARED and MAP_PRIVATE
* seals, excluding F_SEAL_FUTURE_WRITE because this feature only exists
  in recent kernels (5.1 and up)
* inherited fd

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
b133c375ad inhfd_test: add support for non-pair files
File pairs naturally block on read() until the write() happen (or the
writer is closed). This is not the case for regular files, so we
take extra precaution for these.

Also cleaned-up an extra my_file.close()

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
56d8e2455f memfd: add seals support
See "man fcntl" for more information about seals.

memfd are the only files that can be sealed, currently. For this
reason, we dump the seal values in the MEMFD_INODE image.

Restoring seals must be done carefully as the seal F_SEAL_FUTURE_WRITE
prevents future write access. This means that any memory mapping with
write access must be restored before restoring the seals.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
29a1a88bce memfd: add memory mapping support
* During checkpoint, we add a vma flags: VMA_AREA_MEMFD to denote memfd
  regions.
* Even though memfd is backed by the shmem device, we use the file
  semantics of memfd (via /proc/map_files/<vma>) which we already have
  support for.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
b25684e24a memfd: add --inherit-fd support
Upon file restore, inherited_fd() is called to check for a user-defined
inerit-fd override. Note that the MEMFD_INODE image is read at each
invocation (memfd name is not cached).

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
875ac4d03f files: increase path buffer size in inherited_fd()
Prepare memfd to use inherited_fd(), needing long path names support.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Nicolas Viennot
c1e72aa936 memfd: add file support
See "man memfd_create" for more information of what memfd is.

This adds support for memfd open files, that are not not memory mapped.

* We add a new kind of file: MEMFD.
* We add two image types MEMFD_FILE, and MEMFD_INODE.
  MEMFD_FILE contains usual file information (e.g., position).
  MEMFD_INODE contains the memfd name, and a shmid identifier
  referring to the content.
* We reuse the shmem facilities for dumping memfd content as it
  would be easier to support incremental checkpoints in the future.

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-03-27 19:36:20 +03:00
Andrei Vagin
5dbc24b206 util: introduce the mount_detached_fs helper
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-27 19:36:14 +03:00
Pavel Emelyanov
8bd634c1af
Merge pull request #940 from avagin/v3.14-part1
Preparing the v3.14 release.
2020-03-06 16:43:19 +03:00
Andrei Vagin
e19f4cf3b1 MAINTAINERS: Add Dima and Adrian to maintainers
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-03-03 11:48:47 -08:00
Mike Rapoport
42db2c1563 MAINTAINERS: add Mike
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-02-29 10:00:04 -08:00
Pavel Emelyanov
872b795a56
Maintainers: Suggest the maintainers codex (#932)
The guide is based on the one from the RunC project, but
has some criu-related specifics.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
2020-02-21 18:48:41 +03:00
Andrei Vagin
ff756cbb28 python: sort imports
202 Additional newline in a group of imports.
I100 Import statements are in the wrong order.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-16 09:15:31 -08:00
Andrei Vagin
d68a68b8f4 test/zdtm/inhfd: update dump options one each iteration
This allows to run inhfd tests with many iterations of C/R.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-16 09:15:15 -08:00
Adrian Reber
f5181b2767 Travis: fix podman test case
Podman changed the output of 'podman ps'. For the test only running
containers are interesting. Adding the filter '-f status=running' only
returns running containers as previously.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-16 09:15:15 -08:00
Radostin Stoyanov
3a4c33c502 zdtm: mntns_rw_ro_rw update error msg
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-16 09:15:15 -08:00
Dmitry Safonov
9cb4067e13 vdso: Don't page-align vvar
It's always page-aligned (as any VMA).

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-16 09:15:15 -08:00
Dmitry Safonov
a96a7ed87f vdso: Repair !CONFIG_VDSO
Apparently, C/R is broken when CONFIG_VDSO is not set.
Probably, I've broken it while adding arm vdso support.
Or maybe some commits after.

Repair it by adding checks into vdso_init_dump(), vdso_init_restore().
Also, don't try handling vDSO in restorer if it wasn't present in
parent. And prevent summing VDSO_BAD_SIZE to {vdso,vvar}_rt_size.

Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-16 09:15:15 -08:00
Dmitry Safonov
0022c28468 vdso: Add vdso_is_present() helper
Use it in kerndat to check if the kernel provides vDSO.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-16 09:15:15 -08:00
Dmitry Safonov
99346a2824 zdtm: Make test_{doc,author} weak variables
Allows to override them in every test, optionally.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-16 09:15:15 -08:00
Dmitry Safonov
72ff290708 criu: Make use strlcpy() to copy into allocated strings
strncpy() with n == strlen(src) won't put NULL-terminator in dst.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-16 09:15:14 -08:00
Nicolas Viennot
0f438ceeed typo: fix missing space in error message
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-02-16 09:14:37 -08:00
Andrei Vagin
9bc9366c94 vdso: use correct offsets to remap vdso and vvar mappings
In the current version, the offsets of remapping vvar and vdso regions
are mixed up.

If vdso is before vvar, vvar has to be mapped with the vdso_size offset.
if vvar is before vdso, vdso has to be mapped with the vvar_size offset.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-16 09:14:37 -08:00
Andrei Vagin
f1714ccce7 test/vdso: check the code path when here is no API to map vDSO
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-16 09:14:37 -08:00
Sergei Trofimovich
56258da176 criu: fix build failure against gcc-10
On gcc-10 (and gcc-9 -fno-common) build fails as:

```
ld: criu/arch/x86/crtools.o:criu/include/cr_options.h:159:
  multiple definition of `rpc_cfg_file'; criu/arch/x86/cpu.o:criu/include/cr_options.h:159: first defined here
make[2]: *** [scripts/nmk/scripts/build.mk:164: criu/arch/x86/crtools.built-in.o] Error 1
```

gcc-10 will change the default from -fcommon to fno-common:
https://gcc.gnu.org/PR85678.

The error also happens if CFLAGS=-fno-common passed explicitly.

Reported-by: Toralf Förster
Bug: https://bugs.gentoo.org/707942
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2020-02-04 12:39:44 -08:00
Kir Kolyshkin
23374b7798 criu(8): fix for asciidoctor
Commit 0493724c8e added support for using asciidoctor
(instead of asciidoc + xmlto) to generate man pages.

For some reason, asciidoctor does not deal well with some
complex formatting that we use for options such as --external,
leading to literal ’ and ' appearing in the man page instead
of italic formatting. For example:

> --inherit-fd fd[’N']:’resource'

(here both N and resource should be in italic).

Asciidoctor documentation (asciidoctor --help syntax) tells:

> == Text Formatting
>
> .Constrained (applied at word boundaries)
> *strong importance* (aka bold)
> _stress emphasis_ (aka italic)
> `monospaced` (aka typewriter text)
> "`double`" and '`single`' typographic quotes
> +passthrough text+ (substitutions disabled)
> `+literal text+` (monospaced with substitutions disabled)
>
> .Unconstrained (applied anywhere)
> **C**reate+**R**ead+**U**pdate+**D**elete
> fan__freakin__tastic
> ``mono``culture

so I had to carefully replace *bold* with **bold** and
'italic' with __italic__ to make it all work.

Tested with both terminal and postscript output, with both
asciidoctor and asciidoc+xmlto.

TODO: figure out how to fix examples (literal multi-line text),
since asciidoctor does not display it in monospaced font (this
is only true for postscript/pdf output so low priority).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-02-04 12:39:44 -08:00
Kir Kolyshkin
a15426a111 criu(8): some minor rewording
1. Add a/the articles where I see them missing

2. s/Forbid/disable/

3. s/crit/crit(1)/ as we're referring to a man page

4. Simplify some descriptions

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-02-04 12:39:44 -08:00
Kir Kolyshkin
8477875dc2 doc/Makefile: don't hide xmlto stderr
In case asciidoc is installed and xmlto is not, make returns an error
but there's no diagnostics shown, since "xmlto: command not found"
goes to /dev/null.

Remove the redirect.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-02-04 12:39:44 -08:00
Alexander Mikhalitsyn
c98af78c58 compel: add -ffreestanding to force gcc not to use builtin memcpy, memset
This patch fixes the problem with SSE (xmm) registers corruption on amd64
architecture. The problem was that gcc generates parasite blob that uses
xmm registers, but we don't preserve this registers in CRIU when injecting
parasite. Also, gcc, even with -nostdlib option uses builtin memcpy,
memset functions that optimized for amd64 and involves SSE registers.

It seems, that optimal solution is to use -ffreestanding gcc option
to compile parasite. This option implies -fno-builtin and also it designed
for OS kernels compilation/another code that suited to work on non-hosted
environments and could prevent future sumilar bugs.

To check that you amd64 CRIU build affected by this problem you could simply
objdump -dS criu/pie/parasite.o | grep xmm
Output should be empty.

Reported-by: Diyu Zhou <zhoudiyupku at gmail.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
f6de8d4ea9 travis: fix warning and errors from validation
This fixes the validation errors from Travis:

 Build config validation
 root: deprecated key sudo (The key `sudo` has no effect anymore.)
 root: missing os, using the default linux
 root: key matrix is an alias for jobs, using jobs

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
0e291d26c9 arm: use clone3() if it exists
This is the last architecture specific change to make CRIU use clone3()
with set_tid if available. Just as on all other architectures this adds
a clone3() based assembler wrapper to be used in the restorer code.

Tested on Fedora 31 with the same 5.5.0-rc6 kernel as on the other
architectures.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
f991f23506 aarch64: use clone3() if possible
This adds the parasite clone3() with set_tid wrapper for aarch64.

Tested on Fedora 31 with 5.5.0-rc6.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
3dabd38a82 clone3: handle clone3() with CLONE_PARENT
clone3() explicitly blocks setting an exit_signal if CLONE_PARENT is
specified. With clone() it also did not work, but there was no error
message. The exit signal from the thread group leader is taken.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
f6469493dd ppc64le: use clone3() if possible
This adds the parasite clone3() with set_tid wrapper for ppc64le.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
55c8ec62a5 arm: remove stack pointer from clobber list
Just like on all other supported architectures gcc complains about the
stack pointer register being part of the clobber list. This removes the
stack pointer from the clobber list.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
cbadd201cb s390x: use clone3() if possible
This adds the parasite clone3() with set_tid wrapper for s390x.

In contrast to the x86_64 implementation the thread start address and
arguments are not put on the thread stack but passed via r4 and r5. As
those registers are caller-saved they still contain the correct value
(thread start address and arguments) after returning from the syscall.

Tested on 5.5.0-rc6.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
4c4f67a56b s390x: remove stack pointer from clobber list
Just like on all other supported architectures gcc complains about the
stack pointer register being part of the clobber list:

error: listing the stack pointer register ‘15’ in a clobber list is deprecated [-Werror=deprecated]

This removes the stack pointer from the clobber list.

'zdtm.py run -a' still runs without any errors after this change.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
a1ea8deb4c Use clone3() with set_tid to create processes
With the in Linux Kernel 5.4 introduced clone3() with set_tid it is no
longer necessary to write to to /proc/../ns_last_pid to influence the
next PID number. clone3() can directly select a PID for the newly
created process/thread.

After checking for the availability of clone3() with set_tid and adding
the assembler wrapper for clone3() in previous patches, this extends
criu/pie/restorer.c and criu/clone-noasan.c to use the newly added
assembler clone3() wrapper to create processes with a certain PID.

This is a RFC and WIP, but I wanted to share it and run it through CI
for feedback. As the CI will probably not use a 5.4 based kernel it
should just keep on working as before.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
97c03b97d0 Add assembler wrapper for clone3()
To create a new process/thread with a certain PID based on clone3() a
new assembler wrapper is necessary as there is not glibc wrapper (yet).

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
ca02c47075 kerndat: detect if system support clone3() with set_tid
Linux kernel 5.4 extends clone3() with set_tid to allow processes to
specify the PID of a newly created process. This introduces detection
of the clone3() syscall and if set_tid is supported.

This first implementation is X86_64 only.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
8fea2647b6 travis: reduce the number of podman tests
We are running each podman test loop 50 times. This takes more than 20
minutes in Travis. Reduce both test loops to only run 20 times.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Valeriy Vdovin
4232b270b8 image: core -- Reserve start_time field
To ensure consistency of runtime environment processes within a
container need to see same start time values over suspend/resume
cycles. We introduce new field to the core image structure to
store start time of a dumped process. Later same value would be
restored to a newly created task. In future the feature is likely
to be pulled here, so we reserve field id in protobuf descriptor.

Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
f1abc9aa26 ppc64le: remove register '1' from clobber list
Compiling 'criu-dev' on Fedora 31 gives two errors about wrong clobber
lists:

compel/include/uapi/compel/asm/sigframe.h:47:9: error: listing the stack pointer register ‘1’ in a clobber list is deprecated [-Werror=deprecated]

criu/arch/ppc64/include/asm/restore.h:14:2: error: listing the stack pointer register ‘1’ in a clobber list is deprecated [-Werror=deprecated]

There was also a bug report from Debian that CRIU does not build because
of this.

Each of these errors comes with the following note:

note: the value of the stack pointer after an ‘asm’ statement must be the same as it was before the statement

As far as I understand it this should not be a problem in this cases as
the code never returns anyway.

Running zdtm very seldom fails during 'zdtm/static/cgroup_ifpriomap'
with a double free or corruption. This happens not very often and I
cannot verify if it happens without this patch. As CRIU does not build
without the patch.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Nidhi Gupta
0c218746d5 Switch open-j9 alpine tests to python3
Signed-off-by: Nidhi Gupta <itsnidhi16@gmail.com>
2020-02-04 12:39:44 -08:00
Nidhi Gupta
1e9ff2aa03 Add Socket-based Java Functional Tests
Signed-off-by: Nidhi Gupta <itsnidhi16@gmail.com>
2020-02-04 12:39:44 -08:00
Adrian Reber
8b5dea33f6 travis: switch alpine to python3
Now that Python 2 has officially reached its end of life also switch the
Alpine based test to Python 3.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:44 -08:00
Nicolas Viennot
75a7442380 files: Add FD_CLOEXEC test 2020-02-04 12:39:44 -08:00
Nicolas Viennot
8255caf27b files: Remove O_CLOEXEC from file flags
The kernel artificially adds the O_CLOEXEC flag when reading from the
/proc/fdinfo/fd interface if FD_CLOEXEC is set on the file descriptor
used to access the file.

This commit removes the O_CLOEXEC flag in our file flags.

To restore the proper FD_CLOEXEC value in each of the file descriptors,
CRIU uses fcntl(F_GETFD) to retrieve the FD_CLOEXEC status, and restore
it later with fcntl(F_SETFD). This is necessary because multiple file
descriptors may point to the same open file.
2020-02-04 12:39:44 -08:00
Nicolas Viennot
2ac43cd426 python: Improve decoding of file flags
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-02-04 12:39:44 -08:00
Nicolas Viennot
7622b7a70e files: fix ghost file error path
Signed-off-by: Nicolas Viennot <nicolas.viennot@twosigma.com>
2020-02-04 12:39:44 -08:00
Alexander Mikhalitsyn
acb42456dc zdtm: nft tables preservation test
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
[Added test_author to zdtm test]
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:43 -08:00
Alexander Mikhalitsyn
e1c4871759 net: add nftables c/r
After Centos-8 nft used instead of iptables. But we had never supported nft rules in
CRIU, and after c/r all rules are flushed.

Co-developed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:43 -08:00
Nicolas Viennot
17c4a8b245 style: Enforce kernel style -Wstrict-prototypes
Include warnings that the kernel uses during compilation:
-Wstrict-prototypes: enforces full declaration of functions.
Previously, when declaring extern void func(), one can call func(123)
and have no compilation error. This is dangerous. The correct declaration
is extern void func(void).

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
[Generated a commit message from the pull request]
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:42 -08:00
Nicolas Viennot
8bb3c17a0f style: Enforce kernel style -Wdeclaration-after-statement
Include warnings that the kernel uses during compilation:
-Wdeclaration-after-statement: enforces having variables declared at the top of scopes

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
[Generated a commit message from the pull request]
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:27 -08:00
Adrian Reber
79559bef92 Fix tests on Ubuntu
It seems like Ubuntu introduced a overlayfs change which breaks CRIU:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857257

This disables overlayfs (unfortunately) in most tests by switching to
devicemapper or vfs.

Upstream kernels do not seem to have this problem.

This also adds the 'docker-test' for xenial which still has a working
overlayfs from CRIU's point of view.

Also adjust Podman Ubuntu package location

Podman Ubuntu packages are now available via OBS and no longer via PPA.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:05 -08:00
Radostin Stoyanov
8b467dd944 zdtm: Add test for SO_KEEPALIVE
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:05 -08:00
Radostin Stoyanov
d4e6fc2a0d socket: c/r support for SO_KEEPALIVE
TCP keepalive packets can be used to determine if a connection
is still valid. When the SO_KEEPALIVE option is set, TCP packets
are periodically sent to keep the connection alive.

This patch implements checkpoint/restore support for SO_KEEPALIVE,
TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT options.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:05 -08:00
Radostin Stoyanov
0980617e24 sockets: Remove duplicate variable assignment
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:05 -08:00
Radostin Stoyanov
00bb068785 scripts: alpine: Install py2 packages with pip
The py-future package has been renamed to py3-future [1] and py2 package
for yaml has been dropped [2].

[1] https://git.alpinelinux.org/aports/commit/main?id=316d44abaed13964e97eb43c095cd1b64e3943ad
[2] https://git.alpinelinux.org/aports/commit/main?id=e369c1fd7707a73f2c3e2b11b613198d9a4106de

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:05 -08:00
Nicolas Viennot
2e656222d7 crit: fix python3 encoding issues
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
2020-02-04 12:39:05 -08:00
Pavel Tikhomirov
4c46cbc4d8 x86/cpu: cleanup and improve xfeatures_mask check
Make xfeatures_mask check explicit. We were relying on our guess about
hardware "backward compatibility" and used ">" check here for a long
time. But it looks better to explicitly check that all xfeature bits
available on the source are also available on the destination.

For xsave_size we need to have smaller size on destination than on
source, because xsave operation on small allocated buffer may corrupt
the nearby data. So split up comments about xfeatures_mask and
xsave_size, as having single comment for quiet a different cases is less
understandable.

v2: improve comments, remove extra else-ifs, remove extra typecast

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:05 -08:00
Alexander Mikhalitsyn
55f7a571f2 zdtm: sysctl net.unix.max_dgram_qlen value preservation test
Test checks that if the /proc/sys/net/unix/max_dgram_qlen value has
been changed in process net namespace, then it is saved after c/r.

Signed-off-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2020-02-04 12:39:05 -08:00
Cyrill Gorcunov
ebe3b52353 unix: sysctl -- Preserve max_dgram_qlen value
The /proc/sys/net/unix/max_dgram_qlen is a per-net variable and
we already noticed that systemd inside a container may change its value
(for example it sets it to 512 by now instead of kernel's default
value 10), thus we need keep it inside image and restore then.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
2020-02-04 12:39:05 -08:00
Cyrill Gorcunov
1d23dc4a30 mount: Order call_helper_process calls
When we do clone threads in a later stage of restore procedure
it may race with helpers which do call clone_noasan by self.

Thus we need to walk over each clone_noasan call and figure
out if calling it without last_pid lock is safe.

 - open_mountpoint: called by fusectl_dump, dump_empty_fs,
   binfmt_misc_dump, tmpfs_dump -- they all are processing
   dump stage, thus safe

 - call_helper_process: try_remount_writable -- called from
   various places in reg-files.c, in particular open_reg_by_id
   called in parallel with other threads, needs a lock
   remount_readonly_mounts -- called from sigreturn_restore,
   so in parallel, needs a lock

 - call_in_child_process: prepare_net_namespaces -- called
   from prepare_namespace which runs before we start forking,
   no need for lock

Thus call_helper_process should use lock_last_pid and
unlock_last_pid helpers and wait for subprocess to finish.

Same time put a warning text into clone_noasan comment
so next time we need to use it we would recall the pitfalls.

v2:
 - fix unitialized ret variable
v3:
 - use exit_code instead of ret

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
2020-02-04 12:39:05 -08:00
Pavel Tikhomirov
2237666ac1 restorer/inotify: reorder inotify cleanup after waiting helpers and zombies
We've seen ppoll interrupted with signal in VZ7 CT migration tests, that
is because in the beggining of CR_STATE_RESTORE_SIGCHLD zombies and
helpers die, and that can trigger SIGCHILDs sent to their parents.

Adding additional debug (printing "Task..." for zombies and helpers) in
sigchld_handler I see:

  (15.644339) pie: 1: Task 10718 exited, status= 0
  (15.644349) pie: 1: Cleaning inotify events from 29
  (15.644359) pie: 1: Cleaning inotify events from 19
  (15.644367) pie: 1: Cleaning inotify events from 10

And previousely we had:

  (05.718449) pie: 104: Cleaning inotify events from 5
  (05.718835) pie: 330: Cleaning inotify events from 3
  (05.719046) pie: 1: Cleaning inotify events from 23
  (05.719164) pie: 80: Cleaning inotify events from 7
  (05.719185) pie: 1: Error (criu/pie/restorer.c:1287): Failed to poll from inotify fd: -4
  (05.719202) pie: 95: Cleaning inotify events from 6
  (05.719269) pie: 1: Error (criu/pie/restorer.c:1890): Restorer fail 1

So reordering cleanup and wait should fix it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:05 -08:00
Andrei Vagin
af7e5f994b readme: github pull-requests is the preferred way to contribute
We will continue accepting patches.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:05 -08:00
Andrei Vagin
be43c3b840 cgroup: use new mount API to open the cgroup file system
It doesn't require to create a temporary directory and mount the proc
file system in it.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:05 -08:00
Andrei Vagin
76e4d31a3f net: use new mount API to open the sysfs file system
It doesn't require to create a temporary directory and mount the proc
file system in it.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:05 -08:00
Andrei Vagin
1a2d8ad7e1 mount: use new mount API to open the proc file system
It doesn't require to create a temporary directory and mount the proc
file system in it.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:05 -08:00
Andrei Vagin
4997a096e4 util: introduce the mount_detached_fs helper
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:05 -08:00
Andrei Vagin
b5b1c4ec45 kerndat: check whether the new mount API is supported of not
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:05 -08:00
Andrei Vagin
3ca09b1914 travis: ignore fails of podman-test
until it will not be fixed.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:05 -08:00
Nidhi Gupta
37220b3c41 Add File-based Java Functional Tests
Signed-off-by: Nidhi Gupta <itsnidhi16@gmail.com>
2020-02-04 12:39:05 -08:00
Dmitry Safonov
6ab2bdd940 zdtm/socket-tcp-fin-wait1: Use array index fro TEST_MSG
Fixes the following compile-error:
>  CC        socket-tcp-fin-wait1.o
> socket-tcp-fin-wait1.c:144:26: error: adding 'int' to a string does not append to the string [-Werror,-Wstring-plus-int]
>                 if (write(fd, TEST_MSG + 2, sizeof(TEST_MSG) - 2) != sizeof(TEST_MSG) - 2) {
>                               ~~~~~~~~~^~~
> socket-tcp-fin-wait1.c:144:26: note: use array indexing to silence this warning
>                 if (write(fd, TEST_MSG + 2, sizeof(TEST_MSG) - 2) != sizeof(TEST_MSG) - 2) {
>                                        ^
>                               &        [  ]
> 1 error generated.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
434e6b92db Documentation: Add a hint about docker build
The original/old guide probably doesn't work anymore:
- the patch isn't accessible;
- criu now depends on more libraries not only protobuf

Still, keep it as it might be helpful for someone.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
1dbc835954 travis: Add armv7-cross as cross-compile test
Fixes: #455
Based-on-patch-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
bffa6e0ad0 build/zdtm: Use pkg-config to find includes/libs
Helps to cross-compile zdtm tests in case somebody needs it.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
3b24574b6d build/zdtm: Makefile hack for travis aarch64/armv8l
The very same hack to build aarch32 zdtm tests on armv8 Travis-CI
as in the commit dfa0a1edcb ("Makefile hack for travis
aarch64/armv8l")

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
70fae12509 build/zdtm: Support cross-build
Maybe not that useful, but only little change needed.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
1463c41119 build: Use SUBARCH
Instead of doing additional `uname -m` - use provided $(SUBARCH) to detect
what architecture flavour the build should produce the result for.

Fixes two things:
- zdtm make now correctly supplies $(USERCFLAGS)
- subtly fixes cross compilation by providing a way to specify $(SUBARCH)

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
df66aa99b6 build/nmk: Provide proper SUBARCH
It's always equal ARCH and not very useful (so nothing actually uses it).
Time for a change: SUBARCH now is meaningful and gives a way to detect
what kind of ARCH flavor build is dealing with.

Also, for cross-compiling sake don't set SUBARCH if the user supplied it.
(and don't call useless uname during cross compilation)

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
a4fa4162d4 build/nmk: Remove SRCARCH
It's not used anywhere now.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
25f6d4f72f build: Remove SRCARCH
SRCARCH is always equal ARCH. There are no rules when to use one or
another and architectures may forget to set one of them up.

No need for a second variable meaning the same and confusing people.
Remove it completely.

Self-correction [after some debug]: SRCARCH was different in one place:
zdtm Makefile by some unintentional mistake:
> ifeq ($(ARCH),arm64)
>         ARCH		?= aarch64
>         SRCARCH	?= aarch64
> endif

That meant to be "ARCH := aarch64" because "?=" would never work inside
that ifeq. Fix up this part of mess too.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
60bb5c7310 zdtm: Set --root path to 0700 on restore
Update zdtm tests to verify that CRIU does not require the --root
path to be accessible to the unprivileged user being restored when
restoring user namespace.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
90cbeadb66 zdtm: Replace if->continue with if->elif->else
Replacing the if->continue pattern with if->elif->else
reduces the number of lines while preserving the logic.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
9a50fbce72 man: Describe --root option requirements
These requirements have been described in

b133feae/libcontainer/container_linux.go (L1265)

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
8ab3e40e3e restore: Create temp proc in /tmp
When restoring a container with user namespace, CRIU fails to create
a temporary directory for proc. The is because the unprivileged user
that has been just restored does not have permissions to access the
working directory used by CRIU.

Resolves #828

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
d99ee9753e mount: Bind-mount root via userns_call
When restoring a runc container with enabled user namespace CRIU fails
to mount the specified root directory because the path is under
/run/runc which is inaccessible to unprivileged users.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
b50b6ea09e mount: Add error messages
Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Mike Rapoport
75fcec0ecb travis: exclude uns tests for lazy-pages on newer kernels
Kernels 5.4 and higher will restrict availability of UFFD_EVENT_FORK only
for users with SYS_CAP_PTRACE. This prevents running --lazy-pages tests
with 'uns' flavor.

Disable 'uns' for lazy pages testing in travis for newer kernels.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-02-04 12:39:04 -08:00
Mike Rapoport
8f45330d16 travis: group lazy-pages options
The amount of lazy-pages options keeps growing, let's put the common ones
into a variable.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2020-02-04 12:39:04 -08:00
Michał Cłapiński
dc4677123b Checkpoint only specified controllers
Before this change CRIU would checkpoint all controllers, even the ones
not specified in --cgroup-dump-controller. That becomes a problem if
there's a cgroup controller on the checkpointing machine that doesn't
exist on the restoring machine even if CRIU is instructed not to dump
that controller. After that change everything works as expected.

Signed-off-by: Michał Cłapiński <mclapinski@google.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
1c0716924b compel/criu: Add __must_check
All those compel functions can fail by various reasons.
It may be status of the system, interruption by user or anything else.
It's really desired to handle as many PIE related errors as possible
otherwise it's hard to analyze statuses of parasite/restorer
and the C/R process.

At least warning for logs should be produced or even C/R stopped.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
56bc4189e4 criu: Kill tasks even when the network is unlocked
Currently if anything fails after network has been unlocked tasks aren't
killed. Which doesn't work anyway: any stage sets `ret` and nothing
later gets called. Which means the tasks aren't resumed properly.
Furthermore, functions like catch_tasks() and compel_stop_on_syscall()
return failure on the first error.

Let's do the cleanup even when the network is unlocked.
If we want to keep the mess and ignore failures - a cli option should be
introduced for that (and existing code should be reworked with decisions
what is critical and what can be ignored).

Move "Restore finished successfully" message accordingly where
everything is evidently good.

While at here, any late failure will result not only in cleanup but in
criu returning error code.

Which in result makes tests to fail in such case:
> ======================= Run zdtm/static/inotify04 in ns ========================
> Start test
> ./inotify04 --pidfile=inotify04.pid --outfile=inotify04.out --dirname=inotify04.test
> Run criu dump
> =[log]=> dump/zdtm/static/inotify04/84/1/dump.log
> ------------------------ grep Error ------------------------
> (00.119763) fsnotify: 			openable (inode match) as zdtm/static/inotify04.test/inotify-testfile
> (00.119766) fsnotify: 	Dumping /zdtm/static/inotify04.test/inotify-testfile as path for handle
> (00.119769) fsnotify: id 0x00000b flags 0x000800
> (00.119787) 88 fdinfo 5: pos:                0 flags:             4000/0
> (00.119796) Warn  (criu/fsnotify.c:336): fsnotify: The 0x00000c inotify events will be dropped
> ------------------------ ERROR OVER ------------------------
> Run criu restore
> =[log]=> dump/zdtm/static/inotify04/84/1/restore.log
> ------------------------ grep Error ------------------------
> (00.391582) 123 was stopped
> (00.391667) 106 was trapped
> (00.391674) 106 (native) is going to execute the syscall 11, required is 11
> (00.391697) 106 was stopped
> (00.391720) Error (compel/src/lib/infect.c:1439): Task 123 is in unexpected state: b7f
> (00.391736) Error (compel/src/lib/infect.c:1447): Task stopped with 11: Segmentation fault
> ------------------------ ERROR OVER ------------------------
> 5: Old maps lost: set([])
> 5: New maps appeared: set([u'10000-1a000 rwxp', u'1a000-24000 rw-p'])
> ############### Test zdtm/static/inotify04 FAIL at maps compare ################
> Send the 9 signal to  106
> Wait for zdtm/static/inotify04(106) to die for 0.100000
> ======================= Test zdtm/static/inotify04 PASS ========================

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
c21c0aea1b compel/infect: Detach but fail compel_resume_task()
Unknown state means that the task in the end may be not in wanted state.
Return err code.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
bd17ee8588 parasite-syscall: Log if can't cure on failed infection
Maybe expected, hopefully never happens - let's warn in any case.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
abe48f8c36 cr-restore: Warn if restorer can't be unmapped
Too late to stop restore: it's already printed that restore was
successful. Oh, well warn aloud about infection.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
1038a0ae44 cr-dump: Warn if unmapping local memfd failed
Probably, not the worst that could happen, but still unexpected.
Preparing the ground to make compel_cure*() functions __must_check.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
b5a83623b0 cr-dump: Try to cure remote on err-pathes
On daemon stop or threads dump failures it's still desired to remove
parasite from the remote (if possible). Try best and keep hopeing.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
7173856578 lib/infect: Check if compel succeed in executing munmap
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
ee449e27c6 compel: Mark compat argument of __NR() as used
And remove __maybe_unused work-around.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
c8f16bfacb compel/infect: Warn if close() failed on memfd
As a preparation for __must_check on compel_syscall(), check it on
close() too - maybe not as useful as with other syscalls, but why not.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
a93117ede1 lib/ptrace: Be more elaborate about failures
Also, don't use the magic -2 => return errno on failure.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
ef277068de lib/ptrace: Allow PTRACE_PEEKDATA with errno != 0
>From man ptrace:
> On error, all requests return -1, and errno is set appropriately.
> Since the value returned by a successful PTRACE_PEEK* request may be
> -1, the caller must clear errno before the call, and then check
> it afterward to determine whether or not an error occurred.

FWIW: if ptrace_peek_area() is called with (errno != 0) it may
false-fail if the data is (-1).

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
ea018e9a9c travis: remove group from .travis.yml
Tests are successful even after removing 'group:' from .travis.yml.
Apparently it is not necessary.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
fe668075ad travis: switch pcp64le and s390x to real hardware
Now that Travis also supports ppc64le and s390x we can remove all qemu
based docker emulation from our test setup. This now runs ppc64le and
s390x tests on real hardware (LXD containers).

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
eab8cf0775 travis: switch all arm related tests to real hardware
This switches all arm related tests (32bit and 64bit) to the aarch64
systems Travis provides. For arm32 we are running in a armv7hf container
on aarch64 with 'setarch linux32'.

The main changes are that docker on Travis aarch64 cannot use
'--privileged' as Travis is using unprivileged LXD containers to setup
the testing environment.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
075f1beaf7 Makefile hack for travis aarch64/armv8l
For CRIU's compile only tests for armv7hf on Travis we are using
'setarch linux32' which returns armv8l on Travis aarch64.

This adds a path in the Makefile to treat armv8l just as armv7hf during
compile. This enables us to run armv7hf compile tests on Travis aarch64
hardware. Much faster. Maybe not entirely correct, but probably good
enough for compile testing in an armv7hf container.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
6be414bb2b travis: Do not run privileged containers in LXD
Travis uses unprivileged containers for aarch64 in LXD. Docker with
'--privileged' fails in such situation. This changes the travis setup
to only start docker with '--privileged' if running on x86_64.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
62953d4334 travis: fix copy paste error from previous commit
In my previous commit I copied a line with a return into the main script
body. bash can only return from functions. This changes return to exit.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Nidhi Gupta
2b4e653361 Run java functional tests on travis
Signed-off-by: Nidhi Gupta <itsnidhi16@gmail.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
f3cca97d80 mount: make mnt_resort_siblings nonrecursive and reuse friendly
Add mnt_subtree_next DFS-next search to remove recursion.

v5: add these patch, remove recursion from sorting helpers
v6: rip out butifull yet unused step-part of nfs-next algorithm

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
35adc08598 mount: rework mount tree build step on restore
Build each mntns mount tree alone just after reading mounts for it from
image. These additional step before merging everything to a single mount
tree allows us to have pointers to each mntns root mount at hand, also
it allows us to remove extra complication from mnt_build_tree.

Teach collect_mnt_from_image return a tail pointer, so we can merge
lists together later after building each tree.

Add separate merge_mount_trees helper to create joint mount tree for all
mntns'es and simplify mnt_build_ids_tree.

I don't see any place where we use mntinfo_tree on restore, so save the
real root of mntns mounts tree in it, instead of root_yard_mp, will need
it in next patches for checking restore of these trees.

v2: prepend children to the root_yard in merge_mount_trees so that the
order in merged tree persists

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
7be7260261 ns/restore/image: do not read namespace images for non-namespaced case
Images for mount and net namespaces are empty if ns does not belong to
us, thus we don't need to collect on restore.

By adding these checks we will eliminate suspicious messages in logs
about lack of images:

./test/zdtm.py run -k always -f h -t zdtm/static/env00

env00/54/2/restore.log:(00.000332) No mountpoints-5.img image
env00/54/2/restore.log:(00.000342) No netns-2.img image

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
71dff54aa4 ns: make rst_new_ns_id static
It's never used outside of namespaces.c

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
d804f70a68 mount: remove useless check in populate_mnt_ns
The path:

restore_root_task
 prepare_namespace_before_tasks
  mntns_maybe_create_roots

is always called before the path below:

retore_root_task
 fork_with_pid
  restore_task_with_children
   prepare_namespace
    prepare_mnt_ns
     populate_mnt_ns

So (!!mnt_roots) == (root_ns_mask & CLONE_NEWNS) in populate_mnt_ns, but
in prepare_mnt_ns we've already checked that it is true, so there is no
need in these check - remove it.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
9325339e64 travis: Disallow failures on ia32
It seems pretty stable and hasn't add many false-positives during last
months. While can reveal some issues for compatible C/R code.

Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:04 -08:00
Nidhi Gupta
389bcfef3e test/java: Add FileRead Tests
Signed-off-by: Nidhi Gupta <itsnidhi16@gmail.com>
2020-02-04 12:39:04 -08:00
Vitaly Ostrosablin
c4006c0034 test/static:conntracks: Support nftables
Update test to support both iptables and nft to create conntrack rules.

Signed-off-by: Vitaly Ostrosablin <vostrosablin@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Adrian Reber
a7c625938e travis: start to use aarch64 hardware
With the newly introduced aarch64 at Travis it is possible for the CRIU
test-cases to switch to aarch64.

Travis uses unprivileged LXD containers on aarch64 which blocks many of
the kernel interfaces CRIU needs. So for now this only tests building
CRIU natively on aarch64 instead of using the Docker+QEMU combination.

All tests based on Docker are not working on aarch64 is there currently
seems to be a problem with Docker on aarch64. Maybe because of the
nesting of Docker in LXD.

Signed-off-by: Adrian Reber <areber@redhat.com>
2020-02-04 12:39:04 -08:00
Sergey Bronnikov
3861b334b2 Fix broken web-links 2020-02-04 12:39:04 -08:00
Nicolas Viennot
1a28dee52b Action scripts should be invoked with normal signal behavior
Signal masks propagate through execve, so we need to clear them before
invoking the action scripts as it may want to handle SIGCHLD, or SIGSEGV.

Signed-off-by: Nicolas Viennot <nicolas.viennot@twosigma.com>
2020-02-04 12:39:04 -08:00
Dmitry Safonov
19a24df53c early-log: Print warnings only if the buffer is full
I don't see many issues with early-log, so we probably don't
need the warning when it was used. Note that after
commit 74731d9 ("zdtm: make grep_errors also grep warnings")
also warnings are grepped by zdtm.py (and I believe that was
an improvement) which prints some bothering lines:

> =[log]=> dump/zdtm/static/inotify00/38/1/dump.log
> ------------------------ grep Error ------------------------
> (00.000000) Will allow link remaps on FS
> (00.000034) Warn  (criu/log.c:203): The early log isn't empty
> ------------------------ ERROR OVER ------------------------

Instead of decreasing loglevel of the message, improve it by
reporting a real issue.

Cc: Adrian Reber <adrian@lisas.de>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
2020-02-04 12:39:04 -08:00
Ashutosh Mehra
00ce121fd5 Add criu to PATH env variable in libcriu tests
PATH is pointing to incorrect location for `criu` executable
causing libcriu tests to fail when running in travis.
Also added statements to display log file contents on failure
to help in debugging.

Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
2020-02-04 12:39:04 -08:00
Ashutosh Mehra
321f826621 Enable libcriu testing in travis jobs
Updated scripts/travis/travis-tests to run libcriu test.

Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
2020-02-04 12:39:04 -08:00
Ashutosh Mehra
f8125b8bef Couple of fixes to build and run libcriu tests
libcriu tests are currently broken. This patch fixes couple of
issues to allow the building and running libcriu tests.

1. lib/c/criu.h got updated to include version.h which is present
at "criu/include", but the command to compile libcriu tests is not
specifying "criu/include" in the path to be searched for header
files. This resulted in compilation error.
This can be fixed by adding "-I ../../../../../criu/criu/include"
however it causes more problems as "criu/include/fcntl.h" would now
hide system defined fcntl.h
Solution is to use "-iquote ../../../../../criu/criu/include"
which applies only to the quote form of include directive.

2. Secondly, libcriu.so major version got updated to 2 but
libcriu/run.sh still assumes verion 1. Instead of just updating the
version in libcriu/run.sh to 2, this patch updates the libcriu/Makefile
to use "CRIU_SO_VERSION_MAJOR" so that future changes to major version
of libcriu won't cause same problem again.

Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
477c3a4b0b service: Use space on stack for msg buffer
RPC messages are have fairly small size and using space on the stack
might be a better option. This change follows the pattern used with
do_pb_read_one() and pb_write_one().

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
e56401ed3c image-desc: Remove CR_FD_FILE_LOCKS_PID
The support for per-pid images with locks has been dropped with
commit d040219 ("locks: Drop support for per-pid images with locks")
and CR_FD_FILE_LOCKS_PID is not used.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Pavel Tikhomirov
f65b17e976 cgroup: fix cg_yard leak on error path in prepare_cgroup_sfd
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:39:04 -08:00
Radostin Stoyanov
5a92f100b8 page-pipe: Resize up to PIPE_MAX_SIZE
When performing pre-dump we continuously increase the page-pipe size to
fit the max amount memory pages in the pipe's buffer. However, we never
actually set the pipe's buffer size to max. By doing so, we can reduce
the number of pipe-s necessary for pre-dump and improve the performance
as shown in the example below.

For example, let's consider the following process:

	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>

	void main(void)
	{
		int i = 0;
		void *cache = calloc(1, 1024 * 1024 * 1024);
		while(1) {
			printf("%d\n", i++);
			sleep(1);
		}
	}

stats-dump before this change:

	frozen_time: 123538
	memdump_time: 95344
	memwrite_time: 11980078
	pages_scanned: 262721
	pages_written: 262169
	page_pipes: 513
	page_pipe_bufs: 519

stats-dump after this change:

	frozen_time: 83287
	memdump_time: 54587
	memwrite_time: 12547466
	pages_scanned: 262721
	pages_written: 262169
	page_pipes: 257
	page_pipe_bufs: 263

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:39:04 -08:00
Nicolas Viennot
71c2a9dc73 Guard against empty file lock status
The lock status string may be empty. This can happen when the owner of
the lock is invisible from our PID namespace. This unfortunate behavior
is fixed in kernels v4.19 and up (see commit 1cf8e5de40)

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
3efe44382f image: avoid name conflicts in image files
Conflict register for file "sk-opts.proto": READ is already defined in
file "rpc.proto". Please fix the conflict by adding package name on the
proto file, or use different name for the duplication.  Note: enum
values appear as siblings of the enum type instead of children of it.

https://github.com/checkpoint-restore/criu/issues/815
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
6b264f591f criu: use atomic_add instead of atomic_sub
atomic_sub isn't defined for all platforms.

Reported-by: Mr Jenkins
Cc: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
7c97cc7eb2 lib/c: fix a compile time error
lib/c/criu.c:343:30: error: implicit conversion from enumeration type
'enum criu_pre_dump_mode' to different enumeration type 'CriuPreDumpMode'
(aka 'enum _CriuPreDumpMode') [-Werror,-Wenum-conversion

                opts->rpc->pre_dump_mode = mode;
                                         ~ ^~~~

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Andrei Vagin
d305576996 zdtm: handle --pre-dump-mode in the rpc mode
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
befbbd9bba Refactor time accounting macros
refactoring time macros as per read mode
pre-dump design.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
98608b90de read mode pre-dump implementation
Pre-dump using the process_vm_readv syscall.
During frozen state, only iovecs will be
generated and draining of memory happens
after the task is unfrozen. Pre-dumping of
shared memory remains unmodified.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
4c774afc18 Adding cnt_sub for stats manipulation
adding cnt_sub function (complement of cnt_add).
cnt_sub is utilized to decrement stats counter
according to skipped page count during "read"
mode pre-dump.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
29b63e9a72 Skip adding PROT_READ to non-PROT_READ mappings
"read" mode pre-dump may fail even after
adding PROT_READ flag. Adding PROT_READ
works when dumping statically. See added
comment for details.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
e0ea21ad5e Handling iov generation for non-PROT_READ regions
Skip iov-generation for regions not having
PROT_READ, since process_vm_readv syscall
can't process them during "read" pre-dump.
Handle random order of "read" & "splice"
pre-dumps.

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:04 -08:00
Abhishek Dubey
20d4920a8b Adding --pre-dump-mode option
Two modes of pre-dump algorithm:
    1) splicing memory by parasite
        --pre-dump-mode=splice (default)
    2) using process_vm_readv syscall
        --pre-dump-mode=read

Signed-off-by: Abhishek Dubey <dubeyabhishek777@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:39:02 -08:00
Dmitry Safonov
576a99f492 restorer/inotify: Don't overflow PIE stack
PATH_MAX == 4096; PATH_MAX*8 == 32k; RESTORE_STACK_SIZE == 32k.

Fixes: a3cdf94869 ("inotify: cleanup auxiliary events from queue")
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Andrei Vagin <avagin@gmail.com>
Co-debugged-with: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Nicolas Viennot
578597299a Cleanup do_full_int80()
1) Instead of tampering with the nr argument, do_full_int80() returns
the value of the system call. It also avoids copying all registers back
into the syscall_args32 argument after the syscall.

2) Additionally, the registers r12-r15 were added in the list of
clobbers as kernels older than v4.4 do not preserve these.

3) Further, GCC uses a 128-byte red-zone as defined in the x86_64 ABI
optimizing away the correct position of the %rsp register in
leaf-functions. We now avoid tampering with the red-zone, fixing a
SIGSEGV when running mmap_bug_test() in debug mode (DEBUG=1).

Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
b84f481b55 unix: print inode numbers as unsigned int
Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
3f1c4a17ad pipe: print pipe_id as unsigned to generate an external pipe name
Reported-by: Mr Jenkins
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Pavel Tikhomirov
b47ef26eac cgroup: fixup nits
1) s/\s*$//
2) fix snprintf out of bound access

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
f44939317f zdtm/cgroup_yard: create a test cgroup yard from the post-start hook
Right now, it is created from the pre-dump hook, but
if the --snap option is set, the test fails:
$ python test/zdtm.py run -t zdtm/static/cgroup_yard -f h --snap --iter 3
...
Running zdtm/static/cgroup_yard.hook(--pre-dump)
Traceback (most recent call last):
  File zdtm/static/cgroup_yard.hook, line 14, in <module>
    os.mkdir(yard)
OSError: [Errno 17] File exists: 'external_yard'

Cc: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Andrei Vagin
db40ef5be6 test/cgroup_yard: always clean up a test cgroup yard
Right now it is cleaned up from a post-restore hook,
but zdtm.py can be executed with the norst option:
$ zdtm.py run -t zdtm/static/cgroup_yard --norst
...
OSError: [Errno 17] File exists: 'external_yard'

Cc: Michał Cłapiński <mclapinski@google.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:38:24 -08:00
Radostin Stoyanov
813bfbeb4f Convert pr_msg() error messages to pr_err()
Print error messages to stderr (instead of stdout).

Suggested-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:38:23 -08:00
Radostin Stoyanov
a9f974b495 Introduce flush_early_log_to_stderr destructor
Prior log initialisation CRIU preserves all (early) log messages in a
buffer. In case of error the content of the content of this buffer
needs to be printed out (flushed).

Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:37:37 -08:00
Andrei Vagin
8bdc60d50e arch/x86: fpu_state->fpu_state_ia32.xsave hast to be 64-byte aligned
Before the 5.2 kernel, only fpu_state->fpu_state_64.xsave has to be
64-byte aligned. But staring with the 5.2 kernel, the same is required
for pu_state->fpu_state_ia32.xsave.

The behavior was changed in:
c2ff9e9a3d9d ("x86/fpu: Merge the two code paths in __fpu__restore_sig()")

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:37:37 -08:00
Radostin Stoyanov
4f24786b36 travis: Install missing diffutils dependency
The following tests fail in Fedora rawhide because /usr/bin/diff
is missing.

 * zdtm/static/bridge(ns)
 * zdtm/static/cr_veth(uns)
 * zdtm/static/macvlan(ns)
 * zdtm/static/netns(uns)
 * zdtm/static/netns-nf(ns)
 * zdtm/static/sit(ns)

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:37:37 -08:00
Michał Cłapiński
cf0080505a test: implement test for new --cgroup-yard option
Signed-off-by: Michał Cłapiński <mclapinski@google.com>
2020-02-04 12:37:37 -08:00
Michał Cłapiński
2f337652ad Add new command line option: --cgroup-yard
Instead of creating cgroup yard in CRIU, now we can create it externally
and pass it to CRIU. Useful if somebody doesn't want to grant
CAP_SYS_ADMIN to CRIU.

Signed-off-by: Michał Cłapiński <mclapinski@google.com>
2020-02-04 12:37:37 -08:00
Radostin Stoyanov
ad7e82a30f scripts: Drop Fedora 28/rawhide fix
This change was introduced with c75cb2b and it is no longer necessary.

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:37:37 -08:00
Dmitry Safonov
3e9dc1c7f5 compel/x86: Don't use pushq for a label
`pushq` sign-extends the value. Which is a bummer as the label's address
may be higher that 2Gb, which means that the sign-bit will be set.

As it long-jumps with ia32 selector, %r11 can be scratched.
Use %r11 register as a temporary to push the 32-bit address.

Complements: a9a760278c ("arch/x86: push correct eip on the stack
before lretq")
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:37:37 -08:00
Andrei Vagin
0d8e2477e9 arch/x86: push correct eip on the stack before lretq
Right now we use pushq, but it pushes sign-extended value, so if the
parasite code is placed higher that 2Gb, we will see something like
this:

   0xf7efd5b0:	pushq  $0x23
   0xf7efd5b2:	pushq  $0xfffffffff7efd5b9
=> 0xf7efd5b7:	lretq

Actually we want to push 0xf7efd5b9 instead of 0xfffffffff7efd5b9.

Fixes: #398

Cc: Dmitry Safonov <dima@arista.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-04 12:37:37 -08:00
Radostin Stoyanov
8ea953f18b cr-dump: Remove redundant if-statement
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:37:37 -08:00
Radostin Stoyanov
3eed47223b files-reg: Drop clear_ghost_files() prototype
The function clear_ghost_files() has been removed in commit
b11eeea "restore: auto-unlink for ghost files (v2)".

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2020-02-04 12:37:37 -08:00
Radostin Stoyanov
08f3b57ab3 py: Manual fixlets of code formatting
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
2020-02-04 12:37:37 -08:00
1664 changed files with 86903 additions and 23732 deletions

27
.circleci/config.yml Normal file
View file

@ -0,0 +1,27 @@
version: 2.1
jobs:
test-local-gcc:
machine:
image: default
working_directory: ~/criu
steps:
- checkout
- run:
name: "Test local with GCC"
command: sudo -E make -C scripts/ci local
test-local-clang:
machine:
image: default
working_directory: ~/criu
steps:
- checkout
- run:
name: "Test local with CLANG"
command: sudo -E make -C scripts/ci local CLANG=1
workflows:
version: 2
builds:
jobs:
- test-local-gcc
- test-local-clang

101
.cirrus.yml Normal file
View file

@ -0,0 +1,101 @@
task:
name: Vagrant Fedora based test (no VDSO)
environment:
HOME: "/root"
CIRRUS_WORKING_DIR: "/tmp/criu"
compute_engine_instance:
image_project: cirrus-images
image: family/docker-kvm
platform: linux
cpu: 4
memory: 16G
nested_virtualization: true
setup_script: |
contrib/apt-install make gcc pkg-config git perl-modules iproute2 kmod wget cpu-checker
sudo kvm-ok
build_script: |
make -C scripts/ci vagrant-fedora-no-vdso
task:
name: CentOS Stream 9 based test
environment:
HOME: "/root"
CIRRUS_WORKING_DIR: "/tmp/criu"
compute_engine_instance:
image_project: centos-cloud
image: family/centos-stream-9
platform: linux
cpu: 4
memory: 8G
setup_script: |
dnf config-manager --set-enabled crb # Same as CentOS 8 powertools
dnf -y install epel-release epel-next-release
contrib/dependencies/dnf-packages.sh
# The image has a too old version of nettle which does not work with gnutls.
# Just upgrade to the latest to make the error go away.
dnf -y upgrade nettle nettle-devel
systemctl stop sssd
# Even with selinux in permissive mode the selinux tests will be executed.
# The Cirrus CI user runs as a service from selinux point of view and is
# much more restricted than a normal shell (system_u:system_r:unconfined_service_t:s0).
# The test case above (vagrant-fedora-no-vdso) should run selinux tests in enforcing mode.
setenforce 0
build_script: |
make -C scripts/ci local SKIP_CI_PREP=1 CC=gcc CD_TO_TOP=1 ZDTM_OPTS="-x zdtm/static/socket-raw"
task:
name: Vagrant Fedora Rawhide based test
environment:
HOME: "/root"
CIRRUS_WORKING_DIR: "/tmp/criu"
compute_engine_instance:
image_project: cirrus-images
image: family/docker-kvm
platform: linux
cpu: 4
memory: 16G
nested_virtualization: true
setup_script: |
contrib/apt-install make gcc pkg-config git perl-modules iproute2 kmod wget cpu-checker
sudo kvm-ok
build_script: |
make -C scripts/ci vagrant-fedora-rawhide
task:
name: Vagrant Fedora based test (non-root)
environment:
HOME: "/root"
CIRRUS_WORKING_DIR: "/tmp/criu"
compute_engine_instance:
image_project: cirrus-images
image: family/docker-kvm
platform: linux
cpu: 4
memory: 16G
nested_virtualization: true
setup_script: |
contrib/apt-install make gcc pkg-config git perl-modules iproute2 kmod wget cpu-checker
sudo kvm-ok
build_script: |
make -C scripts/ci vagrant-fedora-non-root
task:
name: aarch64 Fedora Rawhide
arm_container:
image: registry.fedoraproject.org/fedora:rawhide
cpu: 4
memory: 4G
script: uname -a
build_script: |
scripts/ci/prepare-for-fedora-rawhide.sh
make -C scripts/ci/ local CC=gcc SKIP_CI_PREP=1 SKIP_CI_TEST=1 CD_TO_TOP=1
make -C test/zdtm -j 4

565
.clang-format Normal file
View file

@ -0,0 +1,565 @@
# SPDX-License-Identifier: GPL-2.0
#
# clang-format configuration file. Intended for clang-format >= 11.
#
# For more information, see:
#
# Documentation/process/clang-format.rst
# https://clang.llvm.org/docs/ClangFormat.html
# https://clang.llvm.org/docs/ClangFormatStyleOptions.html
#
---
AccessModifierOffset: -4
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlines: Left # Unknown to clang-format-4.0
AlignOperands: true
AlignTrailingComments: true
AlignConsecutiveMacros: true
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: false
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: true
AfterNamespace: true
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false # Unknown to clang-format-5.0
BeforeCatch: false
BeforeElse: false
IndentBraces: false
SplitEmptyFunction: true # Unknown to clang-format-4.0
SplitEmptyRecord: true # Unknown to clang-format-4.0
SplitEmptyNamespace: true # Unknown to clang-format-4.0
BreakBeforeBinaryOperators: None
BreakBeforeBraces: Custom
BreakBeforeInheritanceComma: false # Unknown to clang-format-4.0
BreakBeforeTernaryOperators: false
BreakConstructorInitializersBeforeComma: false
BreakConstructorInitializers: BeforeComma # Unknown to clang-format-4.0
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: false
ColumnLimit: 0
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false # Unknown to clang-format-4.0
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 8
ContinuationIndentWidth: 8
Cpp11BracedListStyle: false
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: false # Unknown to clang-format-4.0
# Taken from:
# git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' include/ \
# | sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," \
# | sort | uniq
ForEachMacros:
- 'for_each_pstree_item'
- 'for_each_bit'
- 'apei_estatus_for_each_section'
- 'ata_for_each_dev'
- 'ata_for_each_link'
- '__ata_qc_for_each'
- 'ata_qc_for_each'
- 'ata_qc_for_each_raw'
- 'ata_qc_for_each_with_internal'
- 'ax25_for_each'
- 'ax25_uid_for_each'
- '__bio_for_each_bvec'
- 'bio_for_each_bvec'
- 'bio_for_each_bvec_all'
- 'bio_for_each_integrity_vec'
- '__bio_for_each_segment'
- 'bio_for_each_segment'
- 'bio_for_each_segment_all'
- 'bio_list_for_each'
- 'bip_for_each_vec'
- 'bitmap_for_each_clear_region'
- 'bitmap_for_each_set_region'
- 'blkg_for_each_descendant_post'
- 'blkg_for_each_descendant_pre'
- 'blk_queue_for_each_rl'
- 'bond_for_each_slave'
- 'bond_for_each_slave_rcu'
- 'bpf_for_each_spilled_reg'
- 'btree_for_each_safe128'
- 'btree_for_each_safe32'
- 'btree_for_each_safe64'
- 'btree_for_each_safel'
- 'card_for_each_dev'
- 'cgroup_taskset_for_each'
- 'cgroup_taskset_for_each_leader'
- 'cpufreq_for_each_entry'
- 'cpufreq_for_each_entry_idx'
- 'cpufreq_for_each_valid_entry'
- 'cpufreq_for_each_valid_entry_idx'
- 'css_for_each_child'
- 'css_for_each_descendant_post'
- 'css_for_each_descendant_pre'
- 'device_for_each_child_node'
- 'displayid_iter_for_each'
- 'dma_fence_chain_for_each'
- 'do_for_each_ftrace_op'
- 'drm_atomic_crtc_for_each_plane'
- 'drm_atomic_crtc_state_for_each_plane'
- 'drm_atomic_crtc_state_for_each_plane_state'
- 'drm_atomic_for_each_plane_damage'
- 'drm_client_for_each_connector_iter'
- 'drm_client_for_each_modeset'
- 'drm_connector_for_each_possible_encoder'
- 'drm_for_each_bridge_in_chain'
- 'drm_for_each_connector_iter'
- 'drm_for_each_crtc'
- 'drm_for_each_crtc_reverse'
- 'drm_for_each_encoder'
- 'drm_for_each_encoder_mask'
- 'drm_for_each_fb'
- 'drm_for_each_legacy_plane'
- 'drm_for_each_plane'
- 'drm_for_each_plane_mask'
- 'drm_for_each_privobj'
- 'drm_mm_for_each_hole'
- 'drm_mm_for_each_node'
- 'drm_mm_for_each_node_in_range'
- 'drm_mm_for_each_node_safe'
- 'flow_action_for_each'
- 'for_each_acpi_dev_match'
- 'for_each_active_dev_scope'
- 'for_each_active_drhd_unit'
- 'for_each_active_iommu'
- 'for_each_aggr_pgid'
- 'for_each_available_child_of_node'
- 'for_each_bio'
- 'for_each_board_func_rsrc'
- 'for_each_bvec'
- 'for_each_card_auxs'
- 'for_each_card_auxs_safe'
- 'for_each_card_components'
- 'for_each_card_dapms'
- 'for_each_card_pre_auxs'
- 'for_each_card_prelinks'
- 'for_each_card_rtds'
- 'for_each_card_rtds_safe'
- 'for_each_card_widgets'
- 'for_each_card_widgets_safe'
- 'for_each_cgroup_storage_type'
- 'for_each_child_of_node'
- 'for_each_clear_bit'
- 'for_each_clear_bit_from'
- 'for_each_cmsghdr'
- 'for_each_compatible_node'
- 'for_each_component_dais'
- 'for_each_component_dais_safe'
- 'for_each_comp_order'
- 'for_each_console'
- 'for_each_cpu'
- 'for_each_cpu_and'
- 'for_each_cpu_not'
- 'for_each_cpu_wrap'
- 'for_each_dapm_widgets'
- 'for_each_dev_addr'
- 'for_each_dev_scope'
- 'for_each_dma_cap_mask'
- 'for_each_dpcm_be'
- 'for_each_dpcm_be_rollback'
- 'for_each_dpcm_be_safe'
- 'for_each_dpcm_fe'
- 'for_each_drhd_unit'
- 'for_each_dss_dev'
- 'for_each_dtpm_table'
- 'for_each_efi_memory_desc'
- 'for_each_efi_memory_desc_in_map'
- 'for_each_element'
- 'for_each_element_extid'
- 'for_each_element_id'
- 'for_each_endpoint_of_node'
- 'for_each_evictable_lru'
- 'for_each_fib6_node_rt_rcu'
- 'for_each_fib6_walker_rt'
- 'for_each_free_mem_pfn_range_in_zone'
- 'for_each_free_mem_pfn_range_in_zone_from'
- 'for_each_free_mem_range'
- 'for_each_free_mem_range_reverse'
- 'for_each_func_rsrc'
- 'for_each_hstate'
- 'for_each_if'
- 'for_each_iommu'
- 'for_each_ip_tunnel_rcu'
- 'for_each_irq_nr'
- 'for_each_link_codecs'
- 'for_each_link_cpus'
- 'for_each_link_platforms'
- 'for_each_lru'
- 'for_each_matching_node'
- 'for_each_matching_node_and_match'
- 'for_each_member'
- 'for_each_memcg_cache_index'
- 'for_each_mem_pfn_range'
- '__for_each_mem_range'
- 'for_each_mem_range'
- '__for_each_mem_range_rev'
- 'for_each_mem_range_rev'
- 'for_each_mem_region'
- 'for_each_migratetype_order'
- 'for_each_msi_entry'
- 'for_each_msi_entry_safe'
- 'for_each_msi_vector'
- 'for_each_net'
- 'for_each_net_continue_reverse'
- 'for_each_netdev'
- 'for_each_netdev_continue'
- 'for_each_netdev_continue_rcu'
- 'for_each_netdev_continue_reverse'
- 'for_each_netdev_feature'
- 'for_each_netdev_in_bond_rcu'
- 'for_each_netdev_rcu'
- 'for_each_netdev_reverse'
- 'for_each_netdev_safe'
- 'for_each_net_rcu'
- 'for_each_new_connector_in_state'
- 'for_each_new_crtc_in_state'
- 'for_each_new_mst_mgr_in_state'
- 'for_each_new_plane_in_state'
- 'for_each_new_private_obj_in_state'
- 'for_each_node'
- 'for_each_node_by_name'
- 'for_each_node_by_type'
- 'for_each_node_mask'
- 'for_each_node_state'
- 'for_each_node_with_cpus'
- 'for_each_node_with_property'
- 'for_each_nonreserved_multicast_dest_pgid'
- 'for_each_of_allnodes'
- 'for_each_of_allnodes_from'
- 'for_each_of_cpu_node'
- 'for_each_of_pci_range'
- 'for_each_old_connector_in_state'
- 'for_each_old_crtc_in_state'
- 'for_each_old_mst_mgr_in_state'
- 'for_each_oldnew_connector_in_state'
- 'for_each_oldnew_crtc_in_state'
- 'for_each_oldnew_mst_mgr_in_state'
- 'for_each_oldnew_plane_in_state'
- 'for_each_oldnew_plane_in_state_reverse'
- 'for_each_oldnew_private_obj_in_state'
- 'for_each_old_plane_in_state'
- 'for_each_old_private_obj_in_state'
- 'for_each_online_cpu'
- 'for_each_online_node'
- 'for_each_online_pgdat'
- 'for_each_pci_bridge'
- 'for_each_pci_dev'
- 'for_each_pci_msi_entry'
- 'for_each_pcm_streams'
- 'for_each_physmem_range'
- 'for_each_populated_zone'
- 'for_each_possible_cpu'
- 'for_each_present_cpu'
- 'for_each_prime_number'
- 'for_each_prime_number_from'
- 'for_each_process'
- 'for_each_process_thread'
- 'for_each_prop_codec_conf'
- 'for_each_prop_dai_codec'
- 'for_each_prop_dai_cpu'
- 'for_each_prop_dlc_codecs'
- 'for_each_prop_dlc_cpus'
- 'for_each_prop_dlc_platforms'
- 'for_each_property_of_node'
- 'for_each_registered_fb'
- 'for_each_requested_gpio'
- 'for_each_requested_gpio_in_range'
- 'for_each_reserved_mem_range'
- 'for_each_reserved_mem_region'
- 'for_each_rtd_codec_dais'
- 'for_each_rtd_components'
- 'for_each_rtd_cpu_dais'
- 'for_each_rtd_dais'
- 'for_each_set_bit'
- 'for_each_set_bit_from'
- 'for_each_set_clump8'
- 'for_each_sg'
- 'for_each_sg_dma_page'
- 'for_each_sg_page'
- 'for_each_sgtable_dma_page'
- 'for_each_sgtable_dma_sg'
- 'for_each_sgtable_page'
- 'for_each_sgtable_sg'
- 'for_each_sibling_event'
- 'for_each_subelement'
- 'for_each_subelement_extid'
- 'for_each_subelement_id'
- '__for_each_thread'
- 'for_each_thread'
- 'for_each_unicast_dest_pgid'
- 'for_each_vsi'
- 'for_each_wakeup_source'
- 'for_each_zone'
- 'for_each_zone_zonelist'
- 'for_each_zone_zonelist_nodemask'
- 'fwnode_for_each_available_child_node'
- 'fwnode_for_each_child_node'
- 'fwnode_graph_for_each_endpoint'
- 'gadget_for_each_ep'
- 'genradix_for_each'
- 'genradix_for_each_from'
- 'hash_for_each'
- 'hash_for_each_possible'
- 'hash_for_each_possible_rcu'
- 'hash_for_each_possible_rcu_notrace'
- 'hash_for_each_possible_safe'
- 'hash_for_each_rcu'
- 'hash_for_each_safe'
- 'hctx_for_each_ctx'
- 'hlist_bl_for_each_entry'
- 'hlist_bl_for_each_entry_rcu'
- 'hlist_bl_for_each_entry_safe'
- 'hlist_for_each'
- 'hlist_for_each_entry'
- 'hlist_for_each_entry_continue'
- 'hlist_for_each_entry_continue_rcu'
- 'hlist_for_each_entry_continue_rcu_bh'
- 'hlist_for_each_entry_from'
- 'hlist_for_each_entry_from_rcu'
- 'hlist_for_each_entry_rcu'
- 'hlist_for_each_entry_rcu_bh'
- 'hlist_for_each_entry_rcu_notrace'
- 'hlist_for_each_entry_safe'
- 'hlist_for_each_entry_srcu'
- '__hlist_for_each_rcu'
- 'hlist_for_each_safe'
- 'hlist_nulls_for_each_entry'
- 'hlist_nulls_for_each_entry_from'
- 'hlist_nulls_for_each_entry_rcu'
- 'hlist_nulls_for_each_entry_safe'
- 'i3c_bus_for_each_i2cdev'
- 'i3c_bus_for_each_i3cdev'
- 'ide_host_for_each_port'
- 'ide_port_for_each_dev'
- 'ide_port_for_each_present_dev'
- 'idr_for_each_entry'
- 'idr_for_each_entry_continue'
- 'idr_for_each_entry_continue_ul'
- 'idr_for_each_entry_ul'
- 'in_dev_for_each_ifa_rcu'
- 'in_dev_for_each_ifa_rtnl'
- 'inet_bind_bucket_for_each'
- 'inet_lhash2_for_each_icsk_rcu'
- 'key_for_each'
- 'key_for_each_safe'
- 'klp_for_each_func'
- 'klp_for_each_func_safe'
- 'klp_for_each_func_static'
- 'klp_for_each_object'
- 'klp_for_each_object_safe'
- 'klp_for_each_object_static'
- 'kunit_suite_for_each_test_case'
- 'kvm_for_each_memslot'
- 'kvm_for_each_vcpu'
- 'list_for_each'
- 'list_for_each_codec'
- 'list_for_each_codec_safe'
- 'list_for_each_continue'
- 'list_for_each_entry'
- 'list_for_each_entry_continue'
- 'list_for_each_entry_continue_rcu'
- 'list_for_each_entry_continue_reverse'
- 'list_for_each_entry_from'
- 'list_for_each_entry_from_rcu'
- 'list_for_each_entry_from_reverse'
- 'list_for_each_entry_lockless'
- 'list_for_each_entry_rcu'
- 'list_for_each_entry_reverse'
- 'list_for_each_entry_safe'
- 'list_for_each_entry_safe_continue'
- 'list_for_each_entry_safe_from'
- 'list_for_each_entry_safe_reverse'
- 'list_for_each_entry_srcu'
- 'list_for_each_prev'
- 'list_for_each_prev_safe'
- 'list_for_each_safe'
- 'llist_for_each'
- 'llist_for_each_entry'
- 'llist_for_each_entry_safe'
- 'llist_for_each_safe'
- 'mci_for_each_dimm'
- 'media_device_for_each_entity'
- 'media_device_for_each_intf'
- 'media_device_for_each_link'
- 'media_device_for_each_pad'
- 'nanddev_io_for_each_page'
- 'netdev_for_each_lower_dev'
- 'netdev_for_each_lower_private'
- 'netdev_for_each_lower_private_rcu'
- 'netdev_for_each_mc_addr'
- 'netdev_for_each_uc_addr'
- 'netdev_for_each_upper_dev_rcu'
- 'netdev_hw_addr_list_for_each'
- 'nft_rule_for_each_expr'
- 'nla_for_each_attr'
- 'nla_for_each_nested'
- 'nlmsg_for_each_attr'
- 'nlmsg_for_each_msg'
- 'nr_neigh_for_each'
- 'nr_neigh_for_each_safe'
- 'nr_node_for_each'
- 'nr_node_for_each_safe'
- 'of_for_each_phandle'
- 'of_property_for_each_string'
- 'of_property_for_each_u32'
- 'pci_bus_for_each_resource'
- 'pcl_for_each_chunk'
- 'pcl_for_each_segment'
- 'pcm_for_each_format'
- 'ping_portaddr_for_each_entry'
- 'plist_for_each'
- 'plist_for_each_continue'
- 'plist_for_each_entry'
- 'plist_for_each_entry_continue'
- 'plist_for_each_entry_safe'
- 'plist_for_each_safe'
- 'pnp_for_each_card'
- 'pnp_for_each_dev'
- 'protocol_for_each_card'
- 'protocol_for_each_dev'
- 'queue_for_each_hw_ctx'
- 'radix_tree_for_each_slot'
- 'radix_tree_for_each_tagged'
- 'rb_for_each'
- 'rbtree_postorder_for_each_entry_safe'
- 'rdma_for_each_block'
- 'rdma_for_each_port'
- 'rdma_umem_for_each_dma_block'
- 'resource_list_for_each_entry'
- 'resource_list_for_each_entry_safe'
- 'rhl_for_each_entry_rcu'
- 'rhl_for_each_rcu'
- 'rht_for_each'
- 'rht_for_each_entry'
- 'rht_for_each_entry_from'
- 'rht_for_each_entry_rcu'
- 'rht_for_each_entry_rcu_from'
- 'rht_for_each_entry_safe'
- 'rht_for_each_from'
- 'rht_for_each_rcu'
- 'rht_for_each_rcu_from'
- '__rq_for_each_bio'
- 'rq_for_each_bvec'
- 'rq_for_each_segment'
- 'scsi_for_each_prot_sg'
- 'scsi_for_each_sg'
- 'sctp_for_each_hentry'
- 'sctp_skb_for_each'
- 'shdma_for_each_chan'
- '__shost_for_each_device'
- 'shost_for_each_device'
- 'sk_for_each'
- 'sk_for_each_bound'
- 'sk_for_each_entry_offset_rcu'
- 'sk_for_each_from'
- 'sk_for_each_rcu'
- 'sk_for_each_safe'
- 'sk_nulls_for_each'
- 'sk_nulls_for_each_from'
- 'sk_nulls_for_each_rcu'
- 'snd_array_for_each'
- 'snd_pcm_group_for_each_entry'
- 'snd_soc_dapm_widget_for_each_path'
- 'snd_soc_dapm_widget_for_each_path_safe'
- 'snd_soc_dapm_widget_for_each_sink_path'
- 'snd_soc_dapm_widget_for_each_source_path'
- 'tb_property_for_each'
- 'tcf_exts_for_each_action'
- 'udp_portaddr_for_each_entry'
- 'udp_portaddr_for_each_entry_rcu'
- 'usb_hub_for_each_child'
- 'v4l2_device_for_each_subdev'
- 'v4l2_m2m_for_each_dst_buf'
- 'v4l2_m2m_for_each_dst_buf_safe'
- 'v4l2_m2m_for_each_src_buf'
- 'v4l2_m2m_for_each_src_buf_safe'
- 'virtio_device_for_each_vq'
- 'while_for_each_ftrace_op'
- 'xa_for_each'
- 'xa_for_each_marked'
- 'xa_for_each_range'
- 'xa_for_each_start'
- 'xas_for_each'
- 'xas_for_each_conflict'
- 'xas_for_each_marked'
- 'xbc_array_for_each_value'
- 'xbc_for_each_key_value'
- 'xbc_node_for_each_array_value'
- 'xbc_node_for_each_child'
- 'xbc_node_for_each_key_value'
- 'zorro_for_each_dev'
IncludeBlocks: Preserve # Unknown to clang-format-5.0
IncludeCategories:
- Regex: '.*'
Priority: 1
IncludeIsMainRegex: '(Test)?$'
IndentCaseLabels: false
IndentGotoLabels: false
IndentPPDirectives: None # Unknown to clang-format-5.0
IndentWidth: 8
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Auto # Unknown to clang-format-5.0
ObjCBlockIndentWidth: 8
ObjCSpaceAfterProperty: true
ObjCSpaceBeforeProtocolList: true
# Taken from git's rules
PenaltyBreakAssignment: 10 # Unknown to clang-format-4.0
PenaltyBreakBeforeFirstCallParameter: 30
PenaltyBreakComment: 10
PenaltyBreakFirstLessLess: 0
PenaltyBreakString: 10
PenaltyExcessCharacter: 100
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Right
ReflowComments: false
SortIncludes: false
SortUsingDeclarations: false # Unknown to clang-format-4.0
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeCtorInitializerColon: true # Unknown to clang-format-5.0
SpaceBeforeInheritanceColon: true # Unknown to clang-format-5.0
SpaceBeforeParens: ControlStatementsExceptForEachMacros
SpaceBeforeRangeBasedForLoopColon: true # Unknown to clang-format-5.0
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInContainerLiterals: false
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp03
TabWidth: 8
UseTab: Always
...

3
.codespellrc Normal file
View file

@ -0,0 +1,3 @@
[codespell]
skip = ./.git,./test/pki,./tags,./plugins/amdgpu/amdgpu_drm.h,./plugins/amdgpu/drm.h,./plugins/amdgpu/drm_mode.h
ignore-words-list = creat,fpr,fle,ue,bord,parms,nd,te,testng,inh,wronly,renderd,bui,clen,sems

63
.github/ISSUE_TEMPLATE.md vendored Normal file
View file

@ -0,0 +1,63 @@
<!--
Before reporting a new issue, please make sure that it's not a duplicate.
If you suspect your issue is a bug, please provide information as shown below. If your issue is a feature request, this information is not always necessary.
-->
**Description**
<!--
Briefly describe the problem you are having in a few paragraphs.
-->
**Steps to reproduce the issue:**
1.
2.
3.
**Describe the results you received:**
**Describe the results you expected:**
**Additional information you deem important (e.g. issue happens only occasionally):**
**CRIU logs and information:**
<!--
You can either attach logs as files to the issue or put them under details
-->
<details><summary>CRIU full dump/restore logs:</summary>
<p>
```
(paste your output here)
```
</p>
</details>
<details><summary>Output of `criu --version`:</summary>
<p>
```
(paste your output here)
```
</p>
</details>
<details><summary>Output of `criu check --all`:</summary>
<p>
```
(paste your output here)
```
</p>
</details>
**Additional environment details:**

18
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View file

@ -0,0 +1,18 @@
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/checkpoint-restore/criu/blob/criu-dev/CONTRIBUTING.md
In short you need to:
- Describe What you do and How you do it;
- Separate each logical change into a separate commit;
- Add a "Signed-off-by:" line identifying that you certify your work with DCO;
- If you fix some specific bug or commit, please add "Fixes: ..." line;
- Review fixes should be made by amending the original commits. For example:
a) fix the code (e.g. this fixes commit with hash aaa1111)
b) git commit -a --fixup aaa1111
c) git rebase --interactive --autosquash aaa1111^
- Pull request integration tests should generally be passing;
- If you change something non-obvious, please consider adding a ZDTM test for it;
-->

34
.github/workflows/aarch64-test.yaml vendored Normal file
View file

@ -0,0 +1,34 @@
name: aarch64 test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: aarch64-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
strategy:
matrix:
os: [ubuntu-24.04-arm, ubuntu-22.04-arm]
target: [GCC=1, CLANG=1]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: Run Tests ${{ matrix.target }} on ${{ matrix.os }}
# Following tests are failing on the VMs:
# ./change_mnt_context --pidfile=change_mnt_context.pid --outfile=change_mnt_context.out
# 45: ERR: change_mnt_context.c:23: mount (errno = 22 (Invalid argument))
#
# In combination with '--remote-lazy-pages' following error occurs:
# 138: FAIL: maps05.c:84: Data corrupted at page 1639 (errno = 11 (Resource temporarily unavailable))
run: |
# The 'sched_policy00' needs the following:
sudo sysctl -w kernel.sched_rt_runtime_us=-1
# etc/hosts entry is needed for netns_lock_iptables
echo "127.0.0.1 localhost" | sudo tee -a /etc/hosts
sudo -E make -C scripts/ci local ${{ matrix.target }} RUN_TESTS=1 \
ZDTM_OPTS="-x zdtm/static/change_mnt_context -x zdtm/static/maps05"

21
.github/workflows/alpine-test.yml vendored Normal file
View file

@ -0,0 +1,21 @@
name: Alpine Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: alpine-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
strategy:
matrix:
os: [ubuntu-22.04, ubuntu-22.04-arm]
target: [GCC=1, CLANG=1]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: Run Alpine ${{ matrix.target }} Test
run: sudo -E make -C scripts/ci alpine ${{ matrix.target }}

16
.github/workflows/archlinux-test.yml vendored Normal file
View file

@ -0,0 +1,16 @@
name: Arch Linux Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: archlinux-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run Arch Linux Test
run: sudo -E make -C scripts/ci archlinux

30
.github/workflows/check-commits.yml vendored Normal file
View file

@ -0,0 +1,30 @@
name: Verify self-contained commits
on: pull_request
# Cancel any preceding run on the pull request
concurrency:
group: commit-test-${{ github.event.pull_request.number }}
jobs:
build:
runs-on: ubuntu-latest
# Check if pull request does not have label "not-selfcontained-ok"
if: "!contains(github.event.pull_request.labels.*.name, 'not-selfcontained-ok')"
steps:
- uses: actions/checkout@v4
with:
# Needed to rebase against the base branch
fetch-depth: 0
# Checkout pull request HEAD commit instead of merge commit
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: sudo contrib/apt-install libprotobuf-dev libprotobuf-c-dev protobuf-c-compiler protobuf-compiler python3-protobuf libnl-3-dev libnet-dev libcap-dev uuid-dev
- name: Configure git user details
run: |
git config --global user.email "checkpoint-restore@users.noreply.github.com"
git config --global user.name "checkpoint-restore"
- name: Configure base branch without switching current branch
run: git fetch origin ${{ github.base_ref }}:${{ github.base_ref }}
- name: Build each commit
run: git rebase ${{ github.base_ref }} -x "make -C scripts/ci check-commit"

50
.github/workflows/codeql.yml vendored Normal file
View file

@ -0,0 +1,50 @@
name: "CodeQL"
on:
push:
branches: [ "criu-dev", "master" ]
pull_request:
branches: [ "criu-dev" ]
schedule:
- cron: "11 6 * * 3"
# Cancel any preceding run on the pull request.
concurrency:
group: codeql-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ python, cpp ]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Packages (cpp)
if: ${{ matrix.language == 'cpp' }}
run: |
sudo contrib/apt-install protobuf-c-compiler libprotobuf-c-dev libprotobuf-dev build-essential libprotobuf-dev libprotobuf-c-dev protobuf-c-compiler protobuf-compiler python3-protobuf libnet-dev pkg-config libnl-3-dev libbsd0 libbsd-dev iproute2 libcap-dev libaio-dev libbsd-dev python3-yaml libnl-route-3-dev gnutls-dev
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: +security-and-quality
- name: Autobuild
uses: github/codeql-action/autobuild@v3
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{ matrix.language }}"

21
.github/workflows/compat-test.yml vendored Normal file
View file

@ -0,0 +1,21 @@
name: Compat Tests
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: compat-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
strategy:
matrix:
target: [GCC, CLANG]
steps:
- uses: actions/checkout@v4
- name: Run Compat Tests (${{ matrix.target }})
run: sudo -E make -C scripts/ci local COMPAT_TEST=y ${{ matrix.target }}=1

View file

@ -0,0 +1,22 @@
name: Daily Cross Compile Tests
on:
schedule:
- cron: '30 12 * * *'
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
target: [armv7-stable-cross, aarch64-stable-cross, ppc64-stable-cross, mips64el-stable-cross, riscv64-stable-cross]
branches: [criu-dev, master]
steps:
- uses: actions/checkout@v4
with:
ref: ${{ matrix.branches }}
- name: Run Cross Compilation Targets
run: >
sudo make -C scripts/ci ${{ matrix.target }}

40
.github/workflows/cross-compile.yml vendored Normal file
View file

@ -0,0 +1,40 @@
name: Cross Compile Tests
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: cross-compile-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-latest
continue-on-error: ${{ matrix.experimental }}
strategy:
fail-fast: false
matrix:
experimental: [false]
target: [
armv7-stable-cross,
aarch64-stable-cross,
ppc64-stable-cross,
mips64el-stable-cross,
riscv64-stable-cross,
]
include:
- experimental: true
target: armv7-unstable-cross
- experimental: true
target: aarch64-unstable-cross
- experimental: true
target: ppc64-unstable-cross
- experimental: true
target: mips64el-unstable-cross
steps:
- uses: actions/checkout@v4
- name: Run Cross Compilation Targets
run: >
sudo make -C scripts/ci ${{ matrix.target }}

19
.github/workflows/docker-test.yml vendored Normal file
View file

@ -0,0 +1,19 @@
name: Docker Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: docker-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-22.04]
steps:
- uses: actions/checkout@v4
- name: Run Docker Test (${{ matrix.os }})
run: sudo make -C scripts/ci docker-test

17
.github/workflows/fedora-asan-test.yml vendored Normal file
View file

@ -0,0 +1,17 @@
name: Fedora ASAN Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: fedora-asan-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run Fedora ASAN Test
run: sudo -E make -C scripts/ci fedora-asan

View file

@ -0,0 +1,21 @@
name: Fedora Rawhide Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: fedora-rawhide-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run Fedora Rawhide Test
# We need to pass environment variables from the CI environment to
# distinguish between CI environments. However, we need to make sure that
# XDG_RUNTIME_DIR environment variable is not set due to a bug in Podman.
# FIXME: https://github.com/containers/podman/issues/14920
run: sudo -E XDG_RUNTIME_DIR= make -C scripts/ci fedora-rawhide CONTAINER_RUNTIME=podman BUILD_OPTIONS="--security-opt seccomp=unconfined"

21
.github/workflows/gcov-test.yml vendored Normal file
View file

@ -0,0 +1,21 @@
name: Coverage Tests
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: gcov-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run Coverage Tests
run: sudo -E make -C scripts/ci local GCOV=1
- name: Run gcov
run: sudo -E find . -name '*gcda' -type f -print0 | sudo -E xargs --null --max-args 128 --max-procs 4 gcov
- name: Run Coverage Analysis
run: sudo -E make codecov

16
.github/workflows/java-test.yml vendored Normal file
View file

@ -0,0 +1,16 @@
name: Java Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: java-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run Java Test
run: sudo make -C scripts/ci java-test

40
.github/workflows/lint.yml vendored Normal file
View file

@ -0,0 +1,40 @@
name: Run code linter
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: lint-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-latest
container:
image: registry.fedoraproject.org/fedora:latest
steps:
- name: Install tools
run: sudo dnf -y install git make ruff xz clang-tools-extra codespell git-clang-format ShellCheck
- uses: actions/checkout@v4
- name: Set git safe directory
# https://github.com/actions/checkout/issues/760
run: git config --global --add safe.directory "$GITHUB_WORKSPACE"
- name: Run make lint
run: make lint
- name: Run make indent
continue-on-error: true
run: |
if [ -z "${{github.base_ref}}" ]; then
git fetch --deepen=1
make indent
else
git fetch origin ${{github.base_ref}}
make indent BASE=origin/${{github.base_ref}}
fi
- name: Raise in-line make indent warnings
run: |
git diff | ./scripts/github-indent-warnings.py

View file

@ -0,0 +1,15 @@
name: LoongArch64 Qemu Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: loongarch64-qemu-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- run: sudo make -C scripts/ci loongarch64-qemu-test

14
.github/workflows/manage-labels.yml vendored Normal file
View file

@ -0,0 +1,14 @@
name: Remove labels
on: [issue_comment, pull_request_review_comment]
jobs:
remove-labels-on-comments:
name: Remove labels on comments
if: github.event_name == 'issue_comment'
runs-on: ubuntu-latest
steps:
- uses: mondeja/remove-labels-gh-action@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
labels: |
changes requested
awaiting reply

24
.github/workflows/nftables-test.yml vendored Normal file
View file

@ -0,0 +1,24 @@
name: Nftables bases testing
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: nftables-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- name: Remove iptables
run: sudo apt remove -y iptables
- name: Install libnftables-dev
run: sudo contrib/apt-install libnftables-dev
- name: chmod 755 /home/runner
# CRIU's tests are sometimes running as some random user and need
# to be able to access the test files.
run: sudo chmod 755 /home/runner
- name: Build with nftables network locking backend
run: sudo make -C scripts/ci local COMPILE_FLAGS="NETWORK_LOCK_DEFAULT=NETWORK_LOCK_NFTABLES"

16
.github/workflows/podman-test.yml vendored Normal file
View file

@ -0,0 +1,16 @@
name: Podman Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: podman-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run Podman Test
run: sudo make -C scripts/ci podman-test

27
.github/workflows/stale.yml vendored Normal file
View file

@ -0,0 +1,27 @@
name: Mark stale issues and pull requests
# Please refer to https://github.com/actions/stale/blob/master/action.yml
# to see all config knobs of the stale action.
on:
schedule:
- cron: "0 0 * * *"
jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v5
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
stale-issue-message: 'A friendly reminder that this issue had no activity for 30 days.'
stale-pr-message: 'A friendly reminder that this PR had no activity for 30 days.'
stale-issue-label: 'stale-issue'
stale-pr-label: 'stale-pr'
days-before-stale: 30
days-before-close: 365
remove-stale-when-updated: true
exempt-pr-labels: 'no-auto-close'
exempt-issue-labels: 'no-auto-close,new feature,enhancement'

17
.github/workflows/stream-test.yml vendored Normal file
View file

@ -0,0 +1,17 @@
name: CRIU Image Streamer Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: stream-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run CRIU Image Streamer Test
run: sudo -E make -C scripts/ci local STREAM_TEST=1

16
.github/workflows/x86-64-clang-test.yml vendored Normal file
View file

@ -0,0 +1,16 @@
name: X86_64 CLANG Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: clang-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run X86_64 CLANG Test
run: sudo make -C scripts/ci x86_64 CLANG=1

16
.github/workflows/x86-64-gcc-test.yml vendored Normal file
View file

@ -0,0 +1,16 @@
name: X86_64 GCC Test
on: [push, pull_request]
# Cancel any preceding run on the pull request.
concurrency:
group: gcc-test-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/criu-dev' }}
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Run X86_64 GCC Test
run: sudo make -C scripts/ci x86_64

13
.gitignore vendored
View file

@ -20,25 +20,16 @@ compel/compel
compel/compel-host-bin
images/*.c
images/*.h
images/google/protobuf/*.c
images/google/protobuf/*.h
.gitid
criu/criu
crit/crit
criu/arch/*/sys-exec-tbl*.c
# x86 syscalls-table is not generated
!criu/arch/x86/sys-exec-tbl.c
criu/arch/*/syscalls*.S
criu/include/syscall-codes*.h
criu/include/syscall*.h
criu/unittest/unittest
criu/include/version.h
criu/pie/restorer-blob.h
criu/pie/parasite-blob.h
criu/protobuf-desc-gen.h
lib/build/
lib/c/criu.pc
scripts/build/qemu-user-static/*
lib/.crit-setup.files
compel/include/asm
include/common/asm
include/common/config.h
build/**

25
.lgtm.yml Normal file
View file

@ -0,0 +1,25 @@
extraction:
cpp:
prepare:
packages:
- "protobuf-c-compiler"
- "libprotobuf-c-dev"
- "libprotobuf-dev"
- "build-essential"
- "libprotobuf-dev"
- "libprotobuf-c-dev"
- "protobuf-c-compiler"
- "protobuf-compiler"
- "python3-protobuf"
- "libnet-dev"
- "pkg-config"
- "libnl-3-dev"
- "libbsd0"
- "libbsd-dev"
- "iproute2"
- "libcap-dev"
- "libaio-dev"
- "libbsd-dev"
- "python3-yaml"
- "libnl-route-3-dev"
- "gnutls-dev"

View file

@ -1,6 +1,10 @@
Stanislav Kinsbursky <skinsbursky@parallels.com> <skinsbursky@openvz.org>
Pavel Emelyanov <xemul@parallels.com> <xemul@openvz.org>
Andrey Vagin <avagin@parallels.com> <avagin@openvz.org>
Andrey Vagin <avagin@parallels.com> <avagin@gmail.com>
Andrey Vagin <avagin@parallels.com> Andrew Vagin <avagin@parallels.com>
Andrei Vagin <avagin@gmail.com> <avagin@openvz.org>
Andrei Vagin <avagin@gmail.com> <avagin@parallels.com>
Andrei Vagin <avagin@gmail.com> <avagin@virtuozzo.com>
Andrei Vagin <avagin@gmail.com> <avagin@odin.com>
Andrei Vagin <avagin@gmail.com> <avagin@google.com>
Cyrill Gorcunov <gorcunov@openvz.org> <gorcunov@gmail.com>
Alexander Mikhalitsyn <alexander@mihalicyn.com> <alexander.mikhalitsyn@virtuozzo.com>
Alexander Mikhalitsyn <alexander@mihalicyn.com> <aleksandr.mikhalitsyn@canonical.com>

View file

@ -1,43 +0,0 @@
language: c
sudo: required
dist: xenial
cache: ccache
services:
- docker
env:
- TR_ARCH=local
- TR_ARCH=local CLANG=1
- TR_ARCH=local COMPAT_TEST=y
- TR_ARCH=local CLANG=1 COMPAT_TEST=y
- TR_ARCH=alpine
- TR_ARCH=fedora-asan
- TR_ARCH=x86_64
- TR_ARCH=x86_64 CLANG=1
- TR_ARCH=armv7hf
- TR_ARCH=aarch64
- TR_ARCH=ppc64le
- TR_ARCH=s390x
- TR_ARCH=armv7hf CLANG=1
- TR_ARCH=aarch64 CLANG=1
- TR_ARCH=ppc64le CLANG=1
- TR_ARCH=alpine CLANG=1
- TR_ARCH=docker-test
- TR_ARCH=fedora-rawhide
- TR_ARCH=fedora-rawhide-aarch64
- TR_ARCH=centos
- TR_ARCH=podman-test
matrix:
allow_failures:
- env: TR_ARCH=docker-test
- env: TR_ARCH=fedora-rawhide
- env: TR_ARCH=fedora-rawhide-aarch64
- env: TR_ARCH=s390x
- env: TR_ARCH=local GCOV=1
- env: TR_ARCH=local COMPAT_TEST=y
- env: TR_ARCH=local CLANG=1 COMPAT_TEST=y
script:
- sudo make CCACHE=1 -C scripts/travis $TR_ARCH
after_success:
- ccache -s
- make -C scripts/travis after_success
group: deprecated-2017Q2

1
CLAUDE.md Symbolic link
View file

@ -0,0 +1 @@
GEMINI.md

417
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,417 @@
## How to contribute to CRIU
CRIU project is (almost) the never-ending story, because we have to always keep up with the
Linux kernel supporting checkpoint and restore for all the features it provides. Thus we're
looking for contributors of all kinds -- feedback, bug reports, testing, coding, writing, etc.
Here are some useful hints to get involved.
* We have both -- [very simple](https://github.com/checkpoint-restore/criu/issues?q=is%3Aissue+is%3Aopen+label%3Aenhancement) and [more sophisticated](https://github.com/checkpoint-restore/criu/issues?q=is%3Aissue+is%3Aopen+label%3A%22new+feature%22) coding tasks;
* CRIU does need [extensive testing](https://github.com/checkpoint-restore/criu/issues?q=is%3Aissue+is%3Aopen+label%3Atesting);
* Documentation is always hard, we have [some information](https://criu.org/Category:Empty_articles) that is to be extracted from people's heads into wiki pages as well as [some texts](https://criu.org/Category:Editor_help_needed) that all need to be converted into useful articles;
* Feedback is expected on the GitHub issues page and on the [mailing list](https://lore.kernel.org/criu);
* We accept GitHub pull requests and this is the preferred way to contribute to CRIU. If you prefer to send patches by email, you are welcome to send them to [CRIU development mailing list](https://lore.kernel.org/criu).
Below we describe in more detail recommend practices for CRIU development.
* Spread the word about CRIU in [social networks](http://criu.org/Contacts);
* If you're giving a talk about CRIU -- let us know, we'll mention it on the [wiki main page](https://criu.org/News/events);
### Setting up the development environment
Although `criu` could be run as non-root (see [Security](https://criu.org/Security)), development is better to be done as root. For example, some tests require root. So, it would be a good idea to set up some recent Linux distro on a virtual machine.
### Get the source code
The CRIU sources are tracked by Git. Official CRIU repo is at https://github.com/checkpoint-restore/criu.
The repository may contain multiple branches. Development happens in the **criu-dev** branch.
To clone CRIU repo and switch to the proper branch, run:
```
git clone https://github.com/checkpoint-restore/criu criu
cd criu
git checkout criu-dev
```
### Building from source
Follow these steps to compile CRIU from source code.
#### Installing build dependencies
First, you need to install the required build dependencies. We provide scripts to simplify this process for several Linux distributions in [contrib/dependencies](contrib/dependencies). For a complete list of dependencies, please refer to the [installation guide](https://criu.org/Installation).
##### On Ubuntu/Debian-based systems:
```
./contrib/dependencies/apt-packages.sh
```
##### On Fedora/CentOS-based systems:
```
./contrib/dependencies/dnf-packages.sh
```
##### Using Nix:
```
nix develop
```
#### Compiling CRIU
Once the dependencies are installed, you can compile CRIU by running the `make` command from the root of the source directory:
```
make
```
This should create the `./criu/criu` executable.
## Edit the source code
When you change the source code, please keep in mind the following code conventions:
* code is written to be read, so the code readability is the most important thing you need to have in mind when preparing patches
* we prefer tabs and indentations to be 8 characters width
* we prefer line length of 80 characters or less, more is allowed if it helps with code readability
* CRIU mostly follows [Linux kernel coding style](https://www.kernel.org/doc/Documentation/process/coding-style.rst), but we are less strict than the kernel community
Other conventions can be learned from the source code itself. In short, make sure your new code looks similar to what is already there.
## Automatic tools to fix coding-style
Important: These tools are there to advise you, but should not be considered as a "source of truth", as tools also make nasty mistakes from time to time which can completely break code readability.
The following command can be used to automatically run a code linter for Python files (ruff), Shell scripts (shellcheck),
text spelling (codespell), and a number of CRIU-specific checks (usage of print macros and EOL whitespace for C files).
```
make lint
```
In addition, we have adopted a [clang-format configuration file](https://www.kernel.org/doc/Documentation/process/clang-format.rst)
based on the kernel source tree. However, compliance with the clang-format autoformat rules is optional. If the automatic code formatting
results in decreased readability, we may choose to ignore these errors.
Run the following command to check if your changes are compliant with the clang-format rules:
```
make indent
```
This command is built upon the `git-clang-format` tool and supports two options `BASE` and `OPTS`. The `BASE` option allows you to
specify a range of commits to check for coding style issues. By default, it is set to `HEAD~1`, so that only the last commit is checked.
If you are developing on top of the criu-dev branch and want to check all your commits for compliance with the clang-format rules, you
can use `BASE=origin/criu-dev`. The `OPTS` option can be used to pass additional options to `git-clang-format`. For example, if you want
to check the last *N* commits for formatting errors, without applying the changes to the codebase you can use the following command.
```
make indent OPTS=--diff BASE=HEAD~N
```
Note that for pull requests, the "Run code linter" workflow runs these checks for all commits. If a clang-format error is detected
we need to review the suggested changes and decide if they should be fixed before merging.
Here are some bad examples of clang-format-ing:
* if clang-format tries to force 120 characters and breaks readability - it is wrong:
```
@@ -58,8 +59,7 @@ static int register_membarriers(void)
}
if (!all_ok) {
- fail("can't register membarrier()s - tried %#x, kernel %#x",
- barriers_registered, barriers_supported);
+ fail("can't register membarrier()s - tried %#x, kernel %#x", barriers_registered, barriers_supported);
return -1;
}
```
* if clang-format breaks your beautiful readability friendly alignment in structures, comments or defines - it is wrong:
```
--- a/test/zdtm/static/membarrier.c
+++ b/test/zdtm/static/membarrier.c
@@ -27,9 +27,10 @@ static const struct {
int register_cmd;
int execute_cmd;
} membarrier_cmds[] = {
- { "", MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, MEMBARRIER_CMD_PRIVATE_EXPEDITED },
- { "_SYNC_CORE", MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE },
- { "_RSEQ", MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ, MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ },
+ { "", MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, MEMBARRIER_CMD_PRIVATE_EXPEDITED },
+ { "_SYNC_CORE", MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE,
+ MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE },
+ { "_RSEQ", MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ, MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ },
};
```
## Test your changes
CRIU comes with an extensive test suite. To check whether your changes introduce any regressions, run
```
make test
```
The command runs [ZDTM Test Suite](https://criu.org/ZDTM_Test_Suite). Check for any error messages produced by it.
## Describe your changes
Describe your problem. Whether your change is a one-line bug fix or
5000 lines of a new feature, there must be an underlying problem that
motivated you to do this work. Convince the reviewer that there is a
problem worth fixing and that it makes sense for them to read past the
first paragraph.
Once the problem is established, describe what you are actually doing
about it in technical detail. It's important to describe the change
in plain English for the reviewer to verify that the code is behaving
as you intend it to.
Solve only one problem per commit. If your description starts to get
long, that's a sign that you probably need to split up your commit.
See [Separate your changes](#separate-your-changes).
Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This commit] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.
If your change fixes a bug in a specific commit, e.g. you found an issue using
`git bisect`, please use the `Fixes:` tag with the abbreviation of
the SHA-1 ID, and the one line summary. For example:
```
Fixes: 9433b7b9db3e ("make: use cflags/ldflags for config.h detection mechanism")
```
The following `git config` settings can be used to add a pretty format for
outputting the above style in the `git log` or `git show` commands:
```
[pretty]
fixes = Fixes: %h (\"%s\")
```
If your change address an issue listed in GitHub, please use `Fixes:` tag with the number of the issue. For instance:
```
Fixes: #339
```
The `Fixes:` tags should be put at the end of the detailed description.
Please add a prefix to your commit subject line describing the part of the
project your change is related to. This can be either the name of the file or
directory you changed, or just a general word. If your patch is touching
multiple components you may separate prefixes with "/"-es. Here are some good
examples of subject lines from git log:
```
criu-ns: Convert to python3 style print() syntax
compel: Calculate sh_addr if not provided by linker
style: Enforce kernel style -Wstrict-prototypes
rpc/libcriu: Add lsm-profile option
```
You may refer to [How to Write a Git Commit
Message](https://chris.beams.io/posts/git-commit/) article for
recommendations for good commit message.
## Separate your changes
Separate each **logical change** into a separate commit.
For example, if your changes include both bug fixes and performance
enhancements for a single driver, separate those changes into two
or more commits. If your changes include an API update, and a new
driver which uses that new API, separate those into two commits.
On the other hand, if you make a single change to numerous files,
group those changes into a single commit. Thus a single logical change
is contained within a single commit.
The point to remember is that each commit should make an easily understood
change that can be verified by reviewers. Each commit should be justifiable
on its own merits.
When dividing your change into a series of commits, take special care to
ensure that CRIU builds and runs properly after each commit in the
series. Developers using `git bisect` to track down a problem can end up
splitting your patch series at any point; they will not thank you if you
introduce bugs in the middle.
## Sign your work
To improve tracking of who did what, we ask you to sign off the commits in
your fork of CRIU or the patches that are to be emailed.
The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as an open-source patch. The rules are pretty simple: if you
can certify the below:
### Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
then you just add a line saying
```
Signed-off-by: Random J Developer <random at developer.example.org>
```
using your real name (please, no pseudonyms or anonymous contributions if
it possible).
Hint: you can use `git commit -s` to add Signed-off-by line to your
commit message. To append such line to a commit you already made, use
`git commit --amend -s`.
```
From: Random J Developer <random at developer.example.org>
Subject: [PATCH] component: Short patch description
Long patch description (could be skipped if patch
is trivial enough)
Signed-off-by: Random J Developer <random at developer.example.org>
---
Patch body here
```
## Submit your work upstream
We accept GitHub pull requests and this is the preferred way to contribute to CRIU.
For that you should push your work to your fork of CRIU at [GitHub](https://github.com) and create a [pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
### Pull request guidelines
Pull request comment should contain description of the problem your changes
solve and a brief outline of the changes included in the pull request.
Please avoid pushing fixup commits to an existent pull request. Each commit
should be self contained and there should not be fixup commits in a patch
series. Pull requests that contain one commit which breaks something
and another commit which fixes it, will be rejected.
Please merge the fixup commits into the commits that has introduced the
problem before creating a pull request.
It may happen that the reviewers were not completely happy with your
changes and requested changes to your patches. After you updated your
changes please close the old pull request and create a new one that
contains the following:
* Description of the problem your changes solve and a brief outline of the
changes
* Link to the previous version of the pull request
* Brief description of the changes between old and new versions of the pull
request. If there were more than one previous pull request, all the
revisions should be listed. For example:
```
v3: rebase on the current criu-dev
v2: add commit to foo() and update bar() coding style
```
If there are only minor updates to the commits in a pull request, it is
possible to force-push them into an existing pull request. This only applies
to small changes and should be used with care. If you update an existing
pull request, remember to add the description of the changes from the
previous version.
### Mailing list submission
Historically, CRIU worked with mailing lists and patches so if you still prefer this way continue reading till the end of this section.
### Make a patch
To create a patch, run
```
git format-patch --signoff origin/criu-dev
```
You might need to read GIT documentation on how to prepare patches
for mail submission. Take a look at http://book.git-scm.com/ and/or
http://git-scm.com/documentation for details. It should not be hard
at all.
We recommend to post patches using `git send-email`
```
git send-email --cover-letter --no-chain-reply-to --annotate \
--confirm=always --to=criu@lists.linux.dev criu-dev
```
Note that the `git send-email` subcommand may not be in
the main git package and using it may require installation of a
separate package, for example the "git-email" package in Fedora and
Debian.
If this is your first time using git send-email, you might need to
configure it to point it to your SMTP server with something like:
```
git config --global sendemail.smtpServer stmp.example.net
```
If you get tired of typing `--to=criu@lists.linux.dev` all the time,
you can configure that to be automatically handled as well:
```
git config sendemail.to criu@lists.linux.dev
```
If a developer is sending another version of the patch (e.g. to address
review comments), they are advised to note differences to previous versions
after the `---` line in the patch so that it helps reviewers but
doesn't become part of git history. Moreover, such patch needs to be prefixed
correctly with `--subject-prefix=PATCHv2` appended to
`git send-email` (substitute `v2` with the correct
version if needed though).
### Mail patches
The patches should be sent to CRIU development mailing list, `criu AT lists.linux.dev`. Note that you need to be subscribed first in order to post. The list web interface is available at https://lore.kernel.org/criu; you can also use standard mailman aliases to work with it.
Please make sure the email client you're using doesn't screw your patch (line wrapping and so on).
> **Note:** When sending a patch set that consists of more than one patch, please, push your changes in your local repo and provide the URL of the branch in the cover-letter
### Wait for response
Be patient. Most CRIU developers are pretty busy people so if
there is no immediate response on your patch — don't be surprised,
sometimes a patch may fly around a week before it gets reviewed.
## Continuous integration
Wiki article: [Continuous integration](https://criu.org/Continuous_integration)
CRIU tests are run for each series sent to the mailing list. If you get a message from our patchwork that patches failed to pass the tests, you have to investigate what is wrong.

View file

@ -1,4 +1,10 @@
This HOWTO explains how to cross-compile CRIU on x86
How to cross-compile CRIU on x86:
Use the Dockerfile provided:
scripts/build/Dockerfile.armv7-cross
Historical guide how-to do it without docker container:
[Unsupported, may not work anymore!]
1. Download the protobuf sources.
2. Apply the patch http://16918.selcdn.ru/crtools/aarch64/0001-protobuf-added-the-support-for-the-acrchitecture-AAr.patch

View file

@ -12,7 +12,9 @@ endif
FOOTER := footer.txt
SRC1 += crit.txt
SRC1 += criu-ns.txt
SRC1 += compel.txt
SRC1 += criu-amdgpu-plugin.txt
SRC8 += criu.txt
SRC := $(SRC1) $(SRC8)
XMLS := $(patsubst %.txt,%.xml,$(SRC))
@ -54,7 +56,7 @@ ifneq ($(USE_ASCIIDOCTOR),)
$(Q) $(ASCIIDOC) -b manpage -d manpage -o $@ $<
else
$(Q) $(ASCIIDOC) -b docbook -d manpage -o $(patsubst %.1,%.xml,$@) $<
$(Q) $(XMLTO) man -m custom.xsl $(patsubst %.1,%.xml,$@) 2>/dev/null
$(Q) $(XMLTO) man -m custom.xsl $(patsubst %.1,%.xml,$@)
endif
%.8: %.txt $(FOOTER) custom.xsl
@ -63,7 +65,7 @@ ifneq ($(USE_ASCIIDOCTOR),)
$(Q) $(ASCIIDOC) -b manpage -d manpage -o $@ $<
else
$(Q) $(ASCIIDOC) -b docbook -d manpage -o $(patsubst %.8,%.xml,$@) $<
$(Q) $(XMLTO) man -m custom.xsl $(patsubst %.8,%.xml,$@) 2>/dev/null
$(Q) $(XMLTO) man -m custom.xsl $(patsubst %.8,%.xml,$@)
endif
%.ps: %.1

View file

@ -86,18 +86,21 @@ Infecting code
~~~~~~~~~~~~~~
The parasitic code is compiled and converted to a header using *compel*, and included here.
*#include <compel/compel.h>*
*#include <compel/infect.h>*
*#include "parasite.h"*
Following steps are perfomed to infect the victim process:
Following steps are performed to infect the victim process:
- stop the task: *int compel_stop_task(int pid);*
- prepare infection handler: *struct parasite_ctl *compel_prepare(int pid);*
- execute system call: *int compel_syscall(ctl, int syscall_nr, long *ret, int arg ...);*
- infect victim: *int compel_infect(ctl, nr_thread, size_of_args_area);*
- cure the victim: *int compel_cure(ctl);* //ctl pointer is freed by this call
- Resume victim: *int compel_resume_task(pid, orig_state, state);*
- Resume victim: *int compel_resume_task(pid, orig_state, state)* or
*int compel_resume_task_sig(pid, orig_state, state, stop_signo).*
//compel_resume_task_sig() could be used in case when victim is in stopped state.
stop_signo could be read by calling compel_parse_stop_signo().
*ctl* must be configured with blob information by calling *PREFIX_setup_c_header()*, with ctl as its argument.
*PREFIX* is the argument given to *-p* when calling hgen, else it is deduced from file name.

View file

@ -0,0 +1,114 @@
ROCM Support(1)
===============
NAME
----
criu-amdgpu-plugin - A plugin extension to CRIU to support checkpoint/restore in
userspace for AMD GPUs.
CURRENT SUPPORT
---------------
Single and Multi GPU systems (Gfx9)
Checkpoint / Restore on different system
Checkpoint / Restore inside a docker container
Pytorch
Tensorflow
Using CRIU Image Streamer
Parallel Restore
DESCRIPTION
-----------
Though *criu* is a great tool for checkpointing and restoring running
applications, it has certain limitations such as it cannot handle
applications that have device files open. In order to support *ROCm* based
workloads with *criu* we need to augment criu's core functionality with a
plugin based extension mechanism. *criu-amdgpu-plugin* provides the necessary support
to criu to allow Checkpoint / Restore with ROCm.
Dependencies
------------
*amdkfd support*::
In order to snapshot the *VRAM* and other *GPU* device states, we require
an updated version of amdkfd(amdgpu) driver.
OPTIONS
-------
Optional parameters can be passed in as environment variables before
executing criu command.
*KFD_FW_VER_CHECK*::
Enable or disable firmware version check.
If enabled, firmware version on restored gpu needs to be greater than or
equal firmware version on checkpointed GPU. Default:Enabled
E.g:
KFD_FW_VER_CHECK=0
*KFD_SDMA_FW_VER_CHECK*::
Enable or disable SDMA firmware version check.
If enabled, SDMA firmware version on restored gpu needs to be greater than or
equal firmware version on checkpointed GPU. Default:Enabled
E.g:
KFD_SDMA_FW_VER_CHECK=0
*KFD_CACHES_COUNT_CHECK*::
Enable or disable caches count check. If enabled, the caches count on
restored GPU needs to be greater than or equal caches count on checkpointed
GPU. Default:Enabled
E.g:
KFD_CACHES_COUNT_CHECK=0
*KFD_NUM_GWS_CHECK*::
Enable or disable num_gws check. If enabled, the num_gws on
restored GPU needs to be greater than or equal num_gws on checkpointed
GPU. Default:Enabled
E.g:
KFD_NUM_GWS_CHECK=0
*KFD_VRAM_SIZE_CHECK*::
Enable or disable VRAM size check. If enabled, the VRAM size on
restored GPU needs to be greater than or equal VRAM size on checkpointed
GPU. Default:Enabled
E.g:
KFD_VRAM_SIZE_CHECK=0
*KFD_NUMA_CHECK*::
Enable or disable NUMA CPU region check. If enabled, the plugin will restore
GPUs that belong to one CPU NUMA region to the same CPU NUMA region.
Default:Enabled
E.g:
KFD_NUMA_CHECK=1
*KFD_CAPABILITY_CHECK*::
Enable or disable capability check. If enabled, the capability on
restored GPU needs to be equal to the capability on the checkpointed GPU.
Default:Enabled
E.g:
KFD_CAPABILITY_CHECK=1
*KFD_MAX_BUFFER_SIZE*::
On some systems, VRAM sizes may exceed RAM sizes, and so buffers for dumping
and restoring VRAM may be unable to fit. Set to a nonzero value (in bytes)
to set a limit on the plugin's memory usage.
Default:0 (Disabled)
E.g:
KFD_MAX_BUFFER_SIZE="2G"
AUTHOR
------
The AMDKFD team.
COPYRIGHT
---------
Copyright \(C) 2020-2021, Advanced Micro Devices, Inc. (AMD)

32
Documentation/criu-ns.txt Normal file
View file

@ -0,0 +1,32 @@
CRIU-NS(1)
==========
include::footer.txt[]
NAME
----
criu-ns - run criu in different namespaces
SYNOPSIS
--------
*criu-ns* 'dump' -t PID [<options>]
*criu-ns* 'pre-dump' -t PID [<options>]
*criu-ns* 'restore' [<options>]
*criu-ns* 'check' [<options>]
DESCRIPTION
-----------
The *criu-ns* command executes 'criu' in a new PID and mount namespace.
The purpose of this wrapper script is to enable restoring a process tree
that might require a specific PID that is already used on the system;
so called "PID mismatch" problem.
SEE ALSO
--------
nsenter(1) namespaces(7) criu(8)
AUTHOR
------
The CRIU team

View file

@ -24,8 +24,8 @@ on a different system, or both.
OPTIONS
-------
Most of the true / false long options (the ones without arguments) can be
prefixed with *--no-* to negate the option (example: *--display-stats*
Most of the long flags can be
prefixed with *no-* to negate the option (example: *--display-stats*
and *--no-display-stats*).
Common options
@ -33,12 +33,11 @@ Common options
Common options are applicable to any 'command'.
*-v*[*v*...], *--verbosity*::
Increase verbosity up from the default level. Multiple *v* can be used,
each increasing verbosity by one level. Using long option without argument
increases verbosity by one level.
Increase verbosity up from the default level. In case of short option,
multiple *v* can be used, each increasing verbosity by one.
*-v*'num', *--verbosity*='num'::
Set verbosity level to 'num'. The higher the level, the more output
**-v**__num__, **--verbosity=**__num__::
Set verbosity level to _num_. The higher the level, the more output
is produced.
+
The following levels are available:
@ -57,26 +56,31 @@ The following levels are available:
Pass a specific configuration file to criu.
*--no-default-config*::
Forbid parsing of default configuration files.
Disable parsing of default configuration files.
*--pidfile* 'file'::
Write root task, service or page-server pid into a 'file'.
*-o*, *--log-file* 'file'::
Write logging messages to 'file'.
Write logging messages to a 'file'.
*--display-stats*::
During dump as well as during restore *criu* collects information
like the time required to dump or restore the process or the
During dump, as well as during restore, *criu* collects some statistics,
like the time required to dump or restore the process, or the
number of pages dumped or restored. This information is always
written to the files 'stats-dump' and 'stats-restore' and can
be easily displayed using *crit*. The option *--display-stats*
additionally prints out this information on the console at the end
of a dump or a restore.
saved to the *stats-dump* and *stats-restore* files, and can
be shown using *crit*(1). The option *--display-stats*
prints out this information on the console at the end
of a dump or restore operation.
*-D*, *--images-dir* 'path'::
Use 'path' as a base directory where to look for sets of image files.
*--stream*::
dump/restore images using criu-image-streamer.
See https://github.com/checkpoint-restore/criu-image-streamer for detailed
usage.
*--prev-images-dir* 'path'::
Use 'path' as a parent directory where to look for sets of image files.
This option makes sense in case of incremental dumps.
@ -91,6 +95,19 @@ The following levels are available:
*-L*, *--libdir* 'path'::
Path to plugins directory.
*--enable-fs* ['fs'[,'fs'...]]::
Specify a comma-separated list of filesystem names that should
be auto-detected. The value 'all' enables auto-detection for
all filesystems.
+
Note: This option is not safe, use at your own risk.
Auto-detecting a filesystem mount assumes that the mountpoint can
be restored with *mount(src, mountpoint, flags, options)*. When used,
*dump* is expected to always succeed if a mountpoint is to be
auto-detected, however *restore* may fail (or do something wrong)
if the assumption for restore logic is incorrect. This option is
not compatible with *--external* *dev*.
*--action-script* 'script'::
Add an external action script to be executed at certain stages.
The environment variable *CRTOOLS_SCRIPT_ACTION* is available
@ -138,6 +155,17 @@ The following levels are available:
notification message contains a file descriptor for
the master pty
*query-ext-files*:::
called after the process tree is stopped and network is locked.
This hook is used only in the RPC mode. The notification reply
contains file ids to be added to external file list (may be empty).
*--unprivileged*::
This option tells *criu* to accept the limitations when running
as non-root. Running as non-root requires *criu* at least to have
*CAP_SYS_ADMIN* or *CAP_CHECKPOINT_RESTORE*. For details about running
*criu* as non-root please consult the *NON-ROOT* section.
*-V*, *--version*::
Print program version and exit.
@ -156,6 +184,12 @@ In addition, *page-server* options may be specified.
Turn on memory changes tracker in the kernel. If the option is
not passed the memory tracker get turned on implicitly.
*--pre-dump-mode*='mode'::
There are two 'mode' to operate pre-dump algorithm. The 'splice' mode
is parasite based, whereas 'read' mode is based on process_vm_readv
syscall. The 'read' mode incurs reduced frozen time and reduced
memory pressure as compared to 'splice' mode. Default is 'splice' mode.
*dump*
~~~~~~
Performs a checkpoint procedure.
@ -179,7 +213,7 @@ In other words, do not use it unless really needed.
*-s*, *--leave-stopped*::
Leave tasks in stopped state after checkpoint, instead of killing.
*--external* 'type'*[*'id'*]:*'value'::
*--external* __type__**[**__id__**]:**__value__::
Dump an instance of an external resource. The generic syntax is
'type' of resource, followed by resource 'id' (enclosed in literal
square brackets), and optional 'value' (prepended by a literal colon).
@ -188,35 +222,48 @@ In other words, do not use it unless really needed.
Note to restore external resources, either *--external* or *--inherit-fd*
is used, depending on resource type.
*--external mnt[*'mountpoint'*]:*'name'::
*--external* **mnt[**__mountpoint__**]:**__name__::
Dump an external bind mount referenced by 'mountpoint', saving it
to image under the identifier 'name'.
*--external mnt[]:*'flags'::
*--external* **mnt[]:**__flags__::
Dump all external bind mounts, autodetecting those. Optional 'flags'
can contain *m* to also dump external master mounts, *s* to also
dump external shared mounts (default behavior is to abort dumping
if such mounts are found). If 'flags' are not provided, colon
is optional.
*--external dev[*'major'*/*'minor'*]:*'name'::
*--external* **dev[**__major__**/**__minor__**]:**__name__::
Allow to dump a mount namespace having a real block device mounted.
A block device is identified by its 'major' and 'minor' numbers,
and *criu* saves its information to image under the identifier 'name'.
*--external file[*'mnt_id'*:*'inode'*]*::
*--external* **file[**__mnt_id__**:**__inode__**]**::
Dump an external file, i.e. an opened file that is can not be resolved
from the current mount namespace, which can not be dumped without using
this option. The file is identified by 'mnt_id' (a field obtained from
*/proc/*'pid'*/fdinfo/*'N') and 'inode' (as returned by *stat*(2)).
**/proc/**__pid__**/fdinfo/**__N__) and 'inode' (as returned by
*stat*(2)).
*--external tty[*'rdev'*:*'dev'*]*::
*--external* **tty[**__rdev__**:**__dev__**]**::
Dump an external TTY, identified by *st_rdev* and *st_dev* fields
returned by *stat*(2).
*--external unix[*'id'*]*::
*--external* **unix[**__id__**]**::
Tell *criu* that one end of a pair of UNIX sockets (created by
*socketpair*(2)) with 'id' is OK to be disconnected.
*socketpair*(2)) with the given _id_ is OK to be disconnected.
*--external* **net[**__inode__**]:**__name__::
Mark a network namespace as external and do not include it in the
checkpoint. The label 'name' can be used with *--inherit-fd* during
restore to specify a file descriptor to a preconfigured network
namespace.
*--external* **pid[**__inode__**]:**__name__::
Mark a PID namespace as external. This can be later used to restore
a process into an existing PID namespace. The label 'name' can be
used to assign another PID namespace during restore with the help
of *--inherit-fd*.
*--freeze-cgroup*::
Use cgroup freezer to collect processes.
@ -266,14 +313,42 @@ For example, the command line for the above example should look like this:
discovered automatically (usually via */proc*). This option is
useful when one needs *criu* to skip some controllers.
*--cgroup-props-ignore-default*::
When combined with *--cgroup-props*, makes *criu* substitute
a predefined controller property with the new one shipped. If the option
is not used, the predefined properties are merged with the provided ones.
*--cgroup-yard* 'path'::
Instead of trying to mount cgroups in CRIU, provide a path to a directory
with already created cgroup yard. Useful if you don't want to grant
CAP_SYS_ADMIN to CRIU. For every cgroup mount there should be exactly one
directory. If there is only one controller in this mount, the dir's name
should be just the name of the controller. If there are multiple controllers
comounted, the directory name should have them be separated by a comma.
+
For example, if */proc/cgroups* looks like this:
+
----------
#subsys_name hierarchy num_cgroups enabled
cpu 1 1 1
devices 2 2 1
freezer 2 2 1
----------
+
then you can create the cgroup yard by the following commands:
+
----------
mkdir private_yard
cd private_yard
mkdir cpu
mount -t cgroup -o cpu none cpu
mkdir devices,freezer
mount -t cgroup -o devices,freezer none devices,freezer
----------
*--tcp-established*::
Checkpoint established TCP connections.
*--tcp-close*::
Don't dump the state of, or block, established tcp connections
(including the connection is once established but now closed).
This is useful when tcp connections are not going to be restored.
*--skip-in-flight*::
This option skips in-flight TCP connections. If any TCP connections
that are not yet completely established are found, *criu* ignores
@ -303,6 +378,10 @@ For example, the command line for the above example should look like this:
Allows to link unlinked files back, if possible (modifies filesystem
during *restore*).
*--timeout* 'number'::
Set a time limit in seconds for collecting tasks during the
dump operation. The timeout is 10 seconds by default.
*--ghost-limit* 'size'::
Set the maximum size of deleted file to be carried inside image.
By default, up to 1M file is allowed. Using this
@ -310,6 +389,13 @@ For example, the command line for the above example should look like this:
'size' may be postfixed with a *K*, *M* or *G*, which stands for kilo-,
mega, and gigabytes, accordingly.
*--ghost-fiemap*::
Enable an optimization based on fiemap ioctl that can reduce the
number of system calls used when checkpointing highly sparse ghost
files. This option is enabled by default, and it can be disabled
with *--no-ghost-fiemap*. An automatic fallback to SEEK_HOLE/SEEK_DATA
is used when fiemap is not supported.
*-j*, *--shell-job*::
Allow one to dump shell jobs. This implies the restored task will
inherit session and process group ID from the *criu* itself.
@ -347,22 +433,78 @@ By default the option is set to *fpu* and *ins*.
option is intended for post-copy (lazy) migration and should be
used in conjunction with *restore* with appropriate options.
*--file-validation* ['mode']::
Set the method to be used to validate open files. Validation is done
to ensure that the version of the file being restored is the same
version when it was dumped.
+
The 'mode' may be one of the following:
*filesize*:::
To explicitly use only the file size check all the time.
This is the fastest and least intensive check.
*buildid*:::
To validate ELF files with their build-ID. If the
build-ID cannot be obtained, 'chksm-first' method will be
used. This is the default if mode is unspecified.
*--network-lock* ['mode']::
Set the method to be used for network locking/unlocking. Locking is done
to ensure that tcp packets are dropped between dump and restore. This is
done to avoid the kernel sending RST when a packet arrives destined for
the dumped process.
+
The 'mode' may be one of the following:
*iptables*::: Use iptables rules to drop the packets.
This is the default if 'mode' is not specified.
*nftables*::: Use nftables rules to drop the packets.
*skip*::: Don't lock the network. If *--tcp-close* is not used, the network
must be locked externally to allow CRIU to dump TCP connections.
*--allow-uprobes*::
Allow dumping when uprobes vma is present. When used on dump, this option is
required on restore as well.
A uprobes vma is automatically created by the kernel once a uprobe is
triggered. This mapping is not removed even once the uprobe is deleted. So,
even if a process once had uprobes attached to it, and they're removed by
the time the process is dumped, this option is still required because criu
has no way of knowing whether there are active uprobes or not.
When using this option on restore, make sure the uprobes (if any) active on
the dumped processes are still active. Otherwise, when execution reaches
a uprobe'd location in any of the restored processes, that process will be
sent a SIGTRAP.
As an example, say a uprobe is set at function foo in the executable of the
process p_bar. Whenever execution in p_bar reaches function foo, the uprobe
is triggered. If the uprobe has been triggered at least once, then the kernel
will have created the uprobes vma. To dump p_bar, this option is
necessary. After dumping, say the uprobe is deleted. Now, on restoring with
this option, once execution reaches function foo, SIGTRAP will be sent to
the restored p_bar. Unless it has a signal handler installed for SIGTRAP,
it will be terminated and core dumped.
*restore*
~~~~~~~~~
Restores previously checkpointed processes.
*--inherit-fd* *fd[*'N'*]:*'resource'::
*--inherit-fd* **fd[**__N__**]:**__resource__::
Inherit a file descriptor. This option lets *criu* use an already opened
file descriptor 'N' for restoring a file identified by 'resource'.
This option can be used to restore an external resource dumped
with the help of *--external* *file*, *tty*, and *unix* options.
with the help of *--external* *file*, *tty*, *pid* and *unix* options.
+
The 'resource' argument can be one of the following:
+
- *tty[*'rdev'*:*'dev'*]*
- *pipe[*'inode'*]*
- *socket[*'inode'*]*
- *file[*'mnt_id'*:*'inode'*]*
- **tty[**__rdev__**:**__dev__**]**
- **pipe:[**__inode__**]**
- **socket:[**__inode__*]*
- **file[**__mnt_id__**:**__inode__**]**
- 'path/to/file'
+
@ -385,8 +527,10 @@ usually need to be escaped from shell.
*-r*, *--root* 'path'::
Change the root filesystem to 'path' (when run in a mount namespace).
This option is required to restore a mount namespace. The directory
'path' must be a mount point and its parent must not be overmounted.
*--external* 'type'*[*'id'*]:*'value'::
*--external* __type__**[**__id__**]:**__value__::
Restore an instance of an external resource. The generic syntax is
'type' of resource, followed by resource 'id' (enclosed in literal
square brackets), and optional 'value' (prepended by a literal colon).
@ -396,7 +540,7 @@ usually need to be escaped from shell.
the help of *--external* *file*, *tty*, and *unix* options), option
*--inherit-fd* should be used.
*--external mnt[*'name'*]:*'mountpoint'::
*--external* **mnt[**__name__**]:**__mountpoint__::
Restore an external bind mount referenced in the image by 'name',
bind-mounting it from the host 'mountpoint' to a proper mount point.
@ -404,26 +548,36 @@ usually need to be escaped from shell.
Restore all external bind mounts (dumped with the help of
*--external mnt[]* auto-detection).
*--external dev[*'name'*]:*'/dev/path'::
*--external* **dev[**__name__**]:**__/dev/path__::
Restore an external mount device, identified in the image by 'name',
using the existing block device '/dev/path'.
*--external veth[*'inner_dev'*]:*'outer_dev'*@*'bridge'::
*--external* **veth[**__inner_dev__**]:**__outer_dev__**@**__bridge__::
Set the outer VETH device name (corresponding to 'inner_dev' being
restored) to 'outer_dev'. If optional *@*'bridge' is specified,
restored) to 'outer_dev'. If optional **@**_bridge_ is specified,
'outer_dev' is added to that bridge. If the option is not used,
'outer_dev' will be autogenerated by the kernel.
*--external macvlan[*'inner_dev'*]:*'outer_dev'::
*--external* **macvlan[**__inner_dev__**]:**__outer_dev__::
When restoring an image that have a MacVLAN device in it, this option
must be used to specify to which 'outer_dev' (an existing network device
in CRIU namespace) the restored 'inner_dev' should be bound to.
*-J*, *--join-ns* **NS**:{**PID**|**NS_FILE**}[,**EXTRA_OPTS**]::
Restore process tree inside an existing namespace. The namespace can
be specified in 'PID' or 'NS_FILE' path format (example:
*--join-ns net:12345* or *--join-ns net:/foo/bar*). Currently supported
values for **NS** are: *ipc*, *net*, *time*, *user*, and *uts*.
This option doesn't support joining a PID namespace, however, this is
possible using *--external* and *--inheritfd*. 'EXTRA_OPTS' is optional
and can be used to specify UID and GID for user namespace (e.g.,
*--join-ns user:PID,UID,GID*).
*--manage-cgroups* ['mode']::
Restore cgroups configuration associated with a task from the image.
Controllers are always restored in an optimistic way -- if already present
in system, *criu* reuses it, otherwise it will be created.
+
The 'mode' may be one of the following:
*none*::: Do not restore cgroup properties but require cgroup to
@ -433,7 +587,7 @@ The 'mode' may be one of the following:
*soft*::: Restore cgroup properties if only cgroup has been created
by *criu*, otherwise do not restore properties. This is the
default if mode is unspecified.
default if mode is unspecified.
*full*::: Always restore all cgroups and their properties.
@ -442,6 +596,11 @@ The 'mode' may be one of the following:
*ignore*::: Don't deal with cgroups and pretend that they don't exist.
*--cgroup-yard* 'path'::
Instead of trying to mount cgroups in CRIU, provide a path to a directory
with already created cgroup yard. For more information look in the *dump*
section.
*--cgroup-root* ['controller'*:*]/'newroot'::
Change the root cgroup the controller will be installed into. No controller
means that root is the default for all controllers not specified.
@ -454,16 +613,38 @@ The 'mode' may be one of the following:
*--tcp-close*::
Restore connected TCP sockets in closed state.
*--veth-pair* 'IN'*=*'OUT'::
*--veth-pair* __IN__**=**__OUT__::
Correspondence between outside and inside names of veth devices.
*-l*, *--file-locks*::
Restore file locks from the image.
*--lsm-profile* 'type'*:*'name'::
Specify an LSM profile to be used during restore. The `type` can be
*--lsm-profile* __type__**:**__name__::
Specify an LSM profile to be used during restore. The _type_ can be
either *apparmor* or *selinux*.
*--lsm-mount-context* 'context'::
Specify a new mount context to be used during restore.
+
This option will only replace existing mount context information
with the one specified with this option. Mounts without the
'context=' option will not be changed.
+
If a mountpoint has been checkpointed with an option like
context="system_u:object_r:container_file_t:s0:c82,c137"
+
it is possible to change this option using
--lsm-mount-context "system_u:object_r:container_file_t:s0:c204,c495"
+
which will result that the mountpoint will be restored
with the new 'context='.
+
This option is useful if using *selinux* and if the *selinux*
labels need to be changed on restore like if a container is
restored into an existing Pod.
*--auto-dedup*::
As soon as a page is restored it get punched out from image.
@ -516,6 +697,29 @@ are not adequate, but this can be suppressed by using *--cpu-cap=none*.
restored process.
This option requires running *lazy-pages* daemon.
*--file-validation* ['mode']::
Set the method to be used to validate open files. Validation is done
to ensure that the version of the file being restored is the same
version when it was dumped.
+
The 'mode' may be one of the following:
*filesize*:::
To explicitly use only the file size check all the time.
This is the fastest and least intensive check.
*buildid*:::
To validate ELF files with their build-ID. If the
build-ID cannot be obtained, 'chksm-first' method will be
used. This is the default if mode is unspecified.
*--skip-file-rwx-check*::
Skip checking file permissions (r/w/x for u/g/o) on restore.
*--allow-uprobes*::
Required when dumped with this option. Refer to this option in the section
on dumping for more details.
*check*
~~~~~~~
Checks whether the kernel supports the features needed by *criu* to
@ -526,17 +730,17 @@ check* always checks Category 1 features unless *--feature* is specified
which only checks a specified feature.
*Category 1*::: Absolutely required. These are features like support for
*/proc/PID/map_files*, *NETLINK_SOCK_DIAG* socket
monitoring, */proc/sys/kernel/ns_last_pid* etc.
*/proc/PID/map_files*, *NETLINK_SOCK_DIAG* socket
monitoring, */proc/sys/kernel/ns_last_pid* etc.
*Category 2*::: Required only for specific cases. These are features
like AIO remap, */dev/net/tun* and others that are only
required if a process being dumped or restored
is using those.
like AIO remap, */dev/net/tun* and others that are only
required if a process being dumped or restored
is using those.
*Category 3*::: Experimental. These are features like *task-diag* that
are used for experimental purposes (mostly
during development).
are used for experimental purposes (mostly
during development).
If there are no errors or warnings, *criu* prints "Looks good." and its
exit code is 0.
@ -722,6 +926,42 @@ configuration file will overwrite all other configuration file settings
or RPC options. *This can lead to undesired behavior of criu and
should only be used carefully.*
NON-ROOT
--------
*criu* can be used as non-root with either the *CAP_SYS_ADMIN* capability
or with the *CAP_CHECKPOINT_RESTORE* capability introduces in Linux kernel 5.9.
*CAP_CHECKPOINT_RESTORE* is the minimum that is required.
*criu* also needs either *CAP_SYS_PTRACE* or a value of 0 in
*/proc/sys/kernel/yama/ptrace_scope* (see *ptrace*(2)) to be able to interrupt
the process for dumping.
Running *criu* as non-root has many limitations and depending on the process
to checkpoint and restore it may not be possible.
In addition to *CAP_CHECKPOINT_RESTORE* it is possible to give *criu* additional
capabilities to enable additional features in non-root mode.
Currently *criu* can benefit from the following additional capabilities:
- *CAP_NET_ADMIN*
- *CAP_SYS_CHROOT*
- *CAP_SETUID*
- *CAP_SYS_RESOURCE*
Note that for some operations, having a capability in a namespace other than
the init namespace (i.e. the default/root namespace) is not sufficient. For
example, in order to read symlinks in proc/[pid]/map_files CRIU requires
CAP_CHECKPOINT_RESTORE in the init namespace; having CAP_CHECKPOINT_RESTORE
while running in another user namespace (e.g. in a container) does not allow
CRIU to read symlinks in /proc/[pid]/map_files.
Without access to /proc/[pid]/map_files checkpointing/restoring processes
that have mapped deleted files may not be possible.
Independent of the capabilities it is always necessary to use "*--unprivileged*" to
accept *criu*'s limitation in non-root mode.
EXAMPLES
--------
To checkpoint a program with pid of *1234* and write all image files into

136
Documentation/logo.svg Normal file
View file

@ -0,0 +1,136 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 16.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
width="560px" height="560px" viewBox="0 0 560 560" enable-background="new 0 0 560 560" xml:space="preserve">
<path opacity="0.3" fill="#990000" d="M315.137,360.271c-18.771-7.159-41.548-8.85-68.479-8.85c-16.661,0-46.255,2.939-74.654,3.38
c11.209-4.884,20.734-10.265,24.842-16.87c14.531-23.346,17.645-65.893,17.645-65.893l-20.758,3.114c0,0-2.591,35.8-16.085,47.733
c-5.35,4.736-15.96,7.834-27.916,10.856c2.447-26.071,29.477-57.552,29.477-57.552l-14.874-3.966l-5.88-7.448
c0,0-3.011,1.761-7.588,5.315c-18.298,4.208-75.946,20.443-75.946,57.983c0,15.292,5.77,26.308,14.768,34.244
c-22.858,26.966-20.755,61.618-20.755,61.618s-8.945,16.61-8.021,31.254c2.083,32.973,34.931,25.097,44.313,26.374
c9.644,1.313,34.313-4.18,34.313-4.18s-16.276-2.639-15.329-18.562c0.5-8.369-0.947-27.628-21.404-37.307
c-1.13-10.066,2.111-18.309,6.379-28.015c18.452,45.263,92.601,53.97,92.601,53.97c0.393-0.097-10.269,20.047,0.221,35.632
c4.652,6.915,18.284,10.019,22.436,19.356c4.151,9.341,2.199,30.354,2.199,30.354s21.267-16.864,27.239-30.18
c3.334-7.432,25.989,0.926,25.989-34.047c0-14.077-12.26-26.841-13.675-29.815c-20.858-20.334-5.427-4.743,2.677-8.236
c12.758-5.499,35.412,11.657,35.412,11.657s-10.402-20.119-11.437-31.013c-0.795-8.335-4.537-16.816-16.624-30.042
c7.166-0.752,20.362,2.327,20.362,2.327s-5.202,11.251-0.879,25.515c3.588,11.84,7.193,7.193,14.736,14.737
c6.599,6.598,3.146,26.284,3.146,26.284s4.674-4.513,18.081-18.235c9.072-9.29,23.645-16.717,23.645-47.86
C355.312,365.969,334.97,360.979,315.137,360.271z M134.108,285.901c-11.5,13.048-23.667,32.329-28.23,58.293
c-4.821-3.519-7.613-8.1-7.613-14.043C98.265,309.699,117.078,295.016,134.108,285.901z"/>
<path fill="#990000" d="M382.184,115.435c3.654,1.208,7.327,2.37,10.968,3.444c14.16,4.183,26.745-9.798,26.745-9.798
s-8.785-2.243-17.857-3.497c12.173-2.653,21.085-18.66,21.085-18.66s-17.366,4.819-27.224,5.087
c-2.042,0.057-4.107,0.118-6.189,0.186c2.464-0.37,4.925-0.847,7.361-1.485c14.201-3.714,21.505-23.382,21.505-23.382
s-15.411,6.743-24.951,9.239c-2.694,0.703-5.438,1.437-8.197,2.185c3.038-1.071,6.008-2.306,8.815-3.82
c12.922-6.965,12.241-29.347,12.241-29.347s-10.162,11.926-18.844,16.605c-3.557,1.916-7.199,3.904-10.846,5.911
c3.798-2.277,7.45-4.743,10.596-7.569c10.918-9.814,7.722-29.605,7.722-29.605s-9.801,12.54-17.135,19.131
c-8.939,8.037-18.775,14.104-27.014,21.81c-6.427,6.011-25.14,35.236-36.812,46.283c-11.671,11.047-18.301,12.476-19.159,14.388
c-0.863,1.913,1.006,30.46-14.078,39.145c-16.476-21.583-50.565-44.007-53.101-72.033c-2.079-22.959,5.209-34.055,19.149-35.316
c14.994-1.359,15.998,24.507,15.998,24.507s-1.379,1.064-1.708,6.391c-0.097,0.629-0.145,1.272-0.083,1.934
c0.004,0.031,0.008,0.06,0.011,0.091c-0.014,1.674,0.065,3.664,0.278,6.039c1.131,12.474,4.53,14.574,4.53,14.574l2.075-0.722
c0,0-2.24-4.079-2.554-7.529c-0.172-1.917-0.187-3.556-0.079-4.977c0.45,0.067,0.949,0.081,1.506,0.031
c4.398-0.399,6.049-4.141,5.65-8.539c-0.042-0.45-0.069-0.885-0.094-1.316c2.485-26.032-1.756-29.637,4.788-41.391
c9.032-16.218,17.279-16.015,17.279-16.015l1.402-8.155c0,0-6.817,2.462-14.819,13.652c-8.833,12.354-8.983,26.229-9.066,47.958
c-0.188-0.761-0.502-1.37-1.017-1.784c-2.457-11.192-9.087-32.13-24.112-30.77c-16.72,1.514-29.419,14.974-26.773,44.171
c3.609,39.832,26.186,52.701,29.829,80.84c-13.47-2.349-23.883-10.656-30.866-20.282c-7.803-10.749-7.297-22.949-8.324-24.779
c-1.027-1.829-7.761-2.662-20.367-12.627c-12.605-9.965-33.845-37.41-40.78-42.824c-8.895-6.942-19.229-12.111-28.848-19.32
c-7.892-5.915-18.769-17.531-18.769-17.531s-1.419,19.995,10.323,28.8c3.386,2.536,7.246,4.665,11.229,6.597
c-3.808-1.674-7.616-3.33-11.327-4.925c-9.062-3.887-20.246-14.861-20.246-14.861s1.31,22.353,14.803,28.143
c2.931,1.257,6,2.223,9.12,3.019c-2.818-0.5-5.615-0.985-8.357-1.447c-9.728-1.636-25.677-6.981-25.677-6.981
s9.025,18.94,23.5,21.376c2.485,0.417,4.975,0.674,7.466,0.822c-2.08,0.118-4.148,0.242-6.183,0.368
c-9.843,0.61-27.566-2.645-27.566-2.645S85.667,120.333,110,120c-8.922,2.057-25.678,6.008-25.678,6.008s13.778,12.806,27.508,7.38
c3.533-1.394,7.087-2.876,10.62-4.404c-3.726,1.804-7.424,3.581-11.005,5.273c-8.963,4.243-19.428,10.176-19.428,10.176
s15.069,9.759,27.305,1.497c0.558-0.378,3.121-1.76,3.678-2.143c-7.904,5.808-19.754,14.937-19.754,14.937
s15.802,6.027,27.092-3.354c4.663-3.875,8.104-7.185,12.238-11.618c-3.773,4.55-6.699,8.018-10.634,12.106
c-6.839,7.104-13.06,19.791-13.06,19.791s15.597,0.39,24.359-11.388c4.488-6.035,7.482-11.633,10.974-18.191
c-3.113,6.479-5.468,11.95-8.911,17.788c-5.018,8.49-7.574,22.624-7.574,22.624s15.342-3.655,21.07-17.17
c2.231-5.266,2.107-9.783,3.694-15.291c-1.257,5.272-0.666,9.475-2.24,14.319c-3.045,9.379,0.011,25.554,0.011,25.554
s9.713-5.855,10.359-20.52c0.006-0.153,0.5-8.47,0.5-8.625L171,171.496c0,9.917,6.295,23.276,6.295,23.276
s11.459-10.649,9.369-25.266c-0.188-1.31-0.1-2.627-0.305-3.947c0.408,1.507,0.998,3.016,1.493,4.524
c3.075,9.429,3.5,15.957,3.5,15.957s6.483,1.251,8.73-1.594c0.764,5.625-0.843,10.2-0.843,10.2s5.471-1.1,8.893-3.756
c0.705,5.331,0.155,8.789,0.155,8.789s5.106-1.603,8.419-4.323c0.611,4.642,1.764,7.542,1.764,7.542s6.398-0.88,9.021-5.393
c0.199,0.038,0.395,0.079,0.59,0.117c2.269,4.875,1.438,8.517,1.438,8.517s7.492-2.14,9.492-6.14c0.003,0,0.007,0,0.01,0
c1.798,4,2.727,6.102,2.727,6.102s4.853-2.349,7.093-6.064c0.189,0.009,0.364-0.093,0.547-0.086
c-4.702,19.629-23.62,29.658-42.207,42.764c-1.392,0.981-2.712,1.925-3.97,2.884c-2.891,1.512-6.788,3.495-11.311,5.724
c-9.829,3.363-23.7,6.057-41.038,4.084c-9.798-1.115-21.037,10.02-21.037,10.02s6.87,4.843,16.565,5.028
c-8.819,3.621-17.438,12.632-17.438,12.632s0.045,0.019,0.069,0.029c-27.096,11.688-51.621,29.917-47.651,57.105
c2.375,16.27,14.692,25.475,31.704,30.254c-17.81,14.742-32.921,36.129-30.707,60.59c0.134,1.487,0.309,2.916,0.508,4.311
c-2.209,5.6-3.288,17.842-2.674,24.886c0.949,10.838,13.686,8.662,18.219,6.729c14.139,12.202,32.258,10.252,32.258,10.252
s-17.301,1.211-30.306-11.156c5.551-2.659,6.424-3.925,6.788-11.579c0.36-7.61-9.104-20.759-20.57-21.966
c-1.25-20.07,9.861-43.32,30.603-60.203c0.02,0.249,0.023,0.491,0.048,0.742c4.248,46.957,30.584,54.634,81.148,63.26
c12.603,2.15,22.04,5.821,29.042,10.457c-3.844,5.388-5.706,21.559-2.895,32.325c3.045,11.655,12.647,14.53,19.429,14.955
c-3.304,16.035-11.235,29.024-11.235,29.024s10.015-11.628,15.04-29.016c0.48-0.031,0.928-0.069,1.319-0.114
c10.922-1.262,16.17-11.338,14.743-23.071c-1.195-9.826-13.974-24.54-28.598-25.992c-33.117-21.52-109.104-9.05-113.877-61.769
c-0.341-3.746-0.517-7.367-0.571-10.888c5.709,1.111,11.782,1.844,18.104,2.244c14.111,28.517,62.158,22.269,95.818,20.694
c1.764,3.09,7.043,7.064,13.929,9.779c11.751,4.633,14.889,3.742,18.869,1.502c1.484-0.835,2.828-1.92,3.979-3.155
c10.822,10.456,25.37,30.251,25.37,30.251s-12.29-22.284-22.733-33.97c2.601-4.923,2.433-10.619-2.559-13.297
c-6.956-3.732-31.321,1.581-36.316,4.981c-30.811,1.668-71.853,6.551-89.576-16.474c41.005,1.192,88.786-9.133,102.385-10.365
c21.726-1.966,47.319,1.367,64.887,8.228c-0.783,5.681,1.867,18.47,4.641,25.318c3.316,8.197,11.561,5.887,16.562,3.028
c-0.588,13.3-4.495,22.638-4.495,22.638s7.86-14.125,9.117-26.183c4.354-4.041,4.774-5.562,2.904-12.887
c-1.849-7.24-14.317-16.821-25.47-15.096c-21.855-8.906-54.594-11.087-75.74-9.175c-18.253,1.653-61.404,10.802-97.611,10.237
c-1.895-3.338-3.402-7.122-4.412-11.479c5.113-2.364,10.551-4.388,16.307-5.975c30.999-8.551,40.97-29.258,42.943-48.579
c1.127,1.303,1.938,2.069,1.938,2.069s7.087-12.679,5.522-27.275c-0.264-2.469-0.429-4.737-0.553-6.911
c2.499,6.741,7.778,13.001,7.778,13.001s16.438-20.208,5.846-27.268c-11.583-7.714-6.836-13.283-4.31-15.299
c3.354-1.984,6.973-3.94,10.859-5.817c26.561-12.817,59.903-20.002,64.443-40.039c0.265-1.172,0.388-2.34,0.443-3.507
c3.701,2.396,9.165,2.053,9.165,2.053s-0.367-2.88-0.601-7.556c3.747,2.081,8.874,1.758,8.874,1.758s-0.986-2.319-1.255-7.689
c3.846,1.998,8.434,2.278,8.434,2.278s-0.725-2.246-1.24-5.573c3.788,0.719,8.84,0.419,8.84,0.419s-3.543-7.302-1.316-16.965
c0.357-1.547,0.666-3.09,0.938-4.626c-0.087,1.332-0.169,2.662-0.238,3.985c-0.783,14.742,10.85,24.47,10.85,24.47
S337,172.178,337,162.303c0-0.021,0-0.042,0-0.061c0,0.153-0.804,0.309-0.782,0.46c1.951,14.548,13.499,20.839,13.499,20.839
s2.388-16.471-1.478-25.542c-1.998-4.686-3.966-9.742-5.688-14.881c2.068,5.344,4.374,10.673,7.067,15.72
c6.909,12.952,20.498,15.406,20.498,15.406s-1.832-14.029-7.581-22.041c-3.952-5.505-7.874-11.654-11.551-17.83
c4.059,6.22,8.622,12.438,13.631,18.048c9.774,10.953,25.27,9.178,25.27,9.178s-7.323-12.085-14.767-18.552
c-4.283-3.722-8.589-7.824-12.754-12.019c4.513,4.047,9.319,7.944,14.31,11.39c12.077,8.341,27.281,0.931,27.281,0.931
s-10.533-7.219-18.926-12.302c0.595,0.332,1.186,0.662,1.777,0.988c12.922,7.14,28.146-3.013,28.146-3.013
s-12.036-5.887-21.343-9.313C389.896,118.341,386.055,116.903,382.184,115.435z M116.917,367.418
c-0.172,0.131-0.344,0.268-0.516,0.398c-17.301-3.899-29.646-12.415-31.124-28.752c-2.244-24.777,21.669-42.631,47.562-54.59
c3.553,1,9.203,1.919,15.541,0.503c-4.694,4.817-7.998,9.859-7.998,9.859s2.076,0.564,5.3,0.733
C133.582,308.673,115.917,333.715,116.917,367.418z M146.295,295.598c1.834,0.062,3.979-0.014,6.326-0.386
c-0.141,0.365-0.274,0.72-0.401,1.069c-10.511,14.57-18.745,34.363-17.404,59.912c-4.522,2.267-9.248,5.074-13.939,8.343
C122.237,330.3,136.218,307.613,146.295,295.598z M121.776,368.86c4.131-2.979,8.589-5.697,13.361-8.115
c0.358,3.527,1.032,6.741,2.025,9.634C131.805,370.131,126.629,369.657,121.776,368.86z M150.478,350.278
c-3.791,0.864-8.16,2.403-12.812,4.546c-0.062-0.425-0.168-0.803-0.224-1.236c-2.557-19.875,3.873-37.276,13.005-51.347
c0,0.005-0.007,0.032-0.007,0.032s13.533-3.395,23.088-14.017c-1.715,7.205,0.158,14.79,0.158,14.79s9.774-5.185,16.654-15.216
c-0.131,5.548,2.84,10.803,5.451,14.331C193.303,321.731,182.711,342.934,150.478,350.278z M259.516,275.357
c0.846-4.127,1.649-8.135,2.42-12.012c2.199-4.002,5.203-6.524,9.011-7.55c3.808-1.04,7.78-1.559,11.919-1.559l1.739-17.042
c-5.942,0.378-11.657,1.419-17.144,3.105c-5.492,1.672-10.946,3.611-16.369,5.8c-4.526,4.131-7.915,8.875-10.169,14.237
c-2.262,5.359-3.755,11.051-4.655,17.055c-0.906,6.007-1.268,12.17-1.268,18.489v18.209c0,3.23,0.201,6.368,0.779,9.393
c0.584,3.045,1.728,5.66,3.543,7.85c3.614,2.588,7.203,3.85,10.822,3.771c3.619-0.066,7.224-0.712,10.842-1.925
c3.611-1.23,7.162-2.757,10.647-4.558c3.484-1.811,6.904-3.293,10.266-4.457l7.159-14.521c-2.066,0.505-4.2,1.23-6.394,2.127
c-2.199,0.9-4.453,1.643-6.777,2.224c-2.322,0.585-4.649,0.773-6.977,0.585c-2.322-0.189-4.649-1.2-6.976-2.994
c-2.063-3.626-3.355-7.475-3.87-11.541c-0.519-4.065-0.612-8.165-0.289-12.296C258.1,283.619,258.674,279.488,259.516,275.357z
M367.6,320.582c-0.196-3.025-1.001-5.908-2.42-8.623c-1.031-3.608-2.649-6.588-4.846-8.905c-2.193-2.333-4.682-4.162-7.458-5.516
c-2.773-1.358-5.712-2.364-8.812-3.014c-3.098-0.643-6.004-1.056-8.717-1.259c-2.711-0.188-5.101-0.285-7.166-0.285
s-3.419-0.062-4.064-0.189c0.25-1.037,0.449-2.302,0.574-3.783c0.133-1.481,0.322-2.866,0.584-4.162
c0.258-1.419,0.512-2.977,0.773-4.65c6.326,0,12.073-0.581,17.242-1.749c5.165-1.148,9.688-3.059,13.558-5.705
c3.876-2.646,7.135-6.131,9.781-10.469c2.649-4.318,4.558-9.583,5.715-15.776c-5.684,0-11.596,0.029-17.727,0.093
s-12.328,0.158-18.593,0.284c-6.266,0.143-12.431,0.332-18.5,0.583c-6.066,0.27-11.812,0.584-17.236,0.979
c0.128,0,0.221,1.387,0.293,4.161c0.062,2.775,0.062,6.465,0,11.035c-0.072,4.588-0.2,9.788-0.386,15.589
c-0.199,5.819-0.49,11.73-0.875,17.734c-0.386,6.007-0.878,11.901-1.451,17.72c-0.584,5.815-1.262,10.908-2.035,15.304
c5.552-0.268,11.432-0.488,17.624-0.677c2.162-0.065,4.33-0.127,6.503-0.176l1.247-5.547c0.385-2.192,0.708-4.776,0.969-7.739
c0.259-2.979,0.513-5.754,0.773-8.338c0.259-3.093,0.386-6.196,0.386-9.286c0.646-0.127,1.677-0.206,3.103-0.206
c1.547,0,3.225,0.269,5.039,0.773c1.804,0.519,3.68,1.292,5.612,2.334c1.938,1.041,3.615,2.522,5.034,4.46
c1.42,1.925,2.45,4.352,3.104,7.252c0.638,2.914,0.638,6.495,0,10.75l0.631,5.39c1.609,0.033,3.207,0.079,4.796,0.144
c6.068,0.189,11.812,0.471,17.234,0.866C367.891,326.747,367.795,323.609,367.6,320.582z M327.506,263.345
c0.707-4.397,1.323-8.133,1.835-11.238c1.168-0.521,2.522-0.835,4.069-0.962c1.549-0.125,3.103-0.205,4.65-0.205
c1.677,0,3.291,0.031,4.845,0.112c1.547,0.062,2.901,0.093,4.069,0.093c0,1.151-0.041,2.586-0.103,4.256
c-0.066,1.688-0.189,3.42-0.389,5.232c-0.189,1.815-0.512,3.578-0.97,5.331c-0.446,1.732-1.127,3.182-2.034,4.347
c-0.896,0.918-2.128,1.657-3.681,2.224c-1.543,0.584-3.159,1.042-4.84,1.357c-1.677,0.33-3.291,0.55-4.838,0.677
c-1.555,0.141-2.78,0.207-3.682,0.207C326.439,271.542,326.798,267.727,327.506,263.345z M393.035,246.385
c-2.517,0.33-4.84,0.584-6.97,0.773c-2.135,0.205-3.781,0.172-4.939-0.096l3.678,2.711c0.899,5.423,1.356,11.051,1.356,16.851
c0,5.818-0.195,11.695-0.584,17.642c-0.385,5.941-0.872,11.805-1.45,17.624c-0.581,5.801-1,11.427-1.261,16.85
c-0.907,4.522-1.519,9.238-1.835,14.139c-0.331,4.901-0.843,9.713-1.554,14.425c-0.708,4.712-1.812,9.3-3.297,13.761
c-1.48,4.443-3.773,8.481-6.869,12.107l-2.908,1.543c0.513,0.52,1.323,0.993,2.42,1.45c1.093,0.457,1.842,0.678,2.23,0.678
c2.708-3.23,4.712-6.558,6.004-9.978c1.286-3.419,2.64-6.746,4.069-9.963c1.544-2.711,2.969-5.626,4.261-8.716
c1.286-3.107,2.774-6.008,4.455-8.719c1.671-2.708,3.681-5.045,6.008-6.984c2.322-1.938,5.285-3.15,8.903-3.67
c0.386-6.319,0.836-13.114,1.354-20.335c0.517-7.235,1.001-14.534,1.451-21.896c0.457-7.361,0.846-14.596,1.168-21.689
c0.323-7.111,0.482-13.684,0.482-19.769c-2.713,0-5.458,0.143-8.229,0.394C398.196,245.785,395.553,246.07,393.035,246.385z
M483.002,245c0,4-0.061,5.618-0.188,7.038c-0.135,1.419-0.323,3.525-0.581,5.259c-0.261,1.751-0.584,4.166-0.972,6.752
c-0.386,2.584-0.843,6.388-1.354,11.165c-0.519,4.791-1.135,11.551-1.839,19.167c-0.715,7.612-1.519,18.619-2.427,29.619h-32.15
c0-15,1.065-26.686,3.192-39.535c2.138-12.847,4.101-25.911,5.911-38.695c-5.034,0.52-9.85,1.042-14.427,1.812
c-4.589,0.773-9.136,0.898-13.662,0.52c-0.513,13.682-1.543,27.507-3.097,41.521c-1.553,13.998-3.23,27.586-5.038,40.749
c4.52,0,9.396-0.166,14.631-0.496c5.224-0.316,10.292-0.479,15.2-0.479c0.649,1.152,1.285,2.776,1.942,4.838
c0.638,2.065,1.22,4.318,1.738,6.779c0.517,2.457,0.997,5.027,1.454,7.753c0.447,2.715,0.873,5.424,1.258,8.135
c0.9,6.32,1.681,13.102,2.327,20.336c2.192-6.196,4.454-12.28,6.777-18.209c1.938-5.045,4.004-10.262,6.196-15.699
c2.199-5.423,4.327-10.073,6.393-13.936c2.323,0.254,4.649,0.316,6.974,0.188c2.326-0.124,4.681-0.25,7.071-0.392
c2.386-0.127,4.775-0.127,7.163,0c2.389,0.142,4.681,0.52,6.88,1.165c-0.257-6.716-0.164-13.619,0.293-20.728
c0.449-7.093,1.096-14.204,1.932-21.297c0.841-7.111,1.707-15.14,2.615-22.062c0.907-6.901,1.742-13.27,2.522-21.27H483.002z"/>
</svg>

After

Width:  |  Height:  |  Size: 15 KiB

136
GEMINI.md Normal file
View file

@ -0,0 +1,136 @@
# CRIU (Checkpoint/Restore In User-space)
CRIU is a tool for saving the state of a running application to a set of files
(checkpointing) and restoring it back to a live state. It is primarily used for
live migration of containers, in-place updates, and fast application startup.
It is implemented as a command-line tool called `criu`. The two primary commands
are `dump` and `restore`.
- `dump`: Saves a process tree and all its related resources (file
descriptors, IPC, sockets, namespaces, etc.) into a collection of image
files.
- `restore`: Restores processes from image files to the same state they were
in before the dump.
## Quick Start
To get a feel for `criu`, you can try checkpointing and restoring a simple
process.
1. **Run a simple process:**
Open a terminal and run a command that will run for a while. Find its PID.
```bash
sleep 1000 &
[1] 12345
```
2. **Dump the process:**
As root, use `criu dump` with the process ID (`-t`) and a directory for the
image files (`-D`).
```bash
sudo criu dump -t 12345 -D /tmp/sleep_images -v4 --shell-job
```
The `sleep` process will no longer be running.
3. **Restore the process:**
Use `criu restore` to bring the process back to life from the images.
```bash
sudo criu restore -D /tmp/sleep_images -v4 --shell-job
```
The `sleep` process will be running again as if nothing happened.
# For Developers and Contributors
This section contains more technical details about CRIU's internals and
development process.
## Dump Process
On dump, CRIU uses available kernel interfaces to collect information about
processes. For properties that can only be retrieved from within the process
itself, CRIU injects a binary blob (called a "parasite") into the process's
address space and executes it in the context of one of the process's threads.
This injection is handled by a subproject called **Compel**.
## Restore Process
On restore, CRIU reads the image files to reconstruct the processes. The goal is
to restore them to the exact state they were in before the dump. The restore
process is divided into several stages (defined as `CR_STATE_*` in
`./criu/include/restorer.h`).
The main `criu` process acts as a coordinator. It first restores resources with
inter-process dependencies (file descriptors, sockets, shared memory,
namespaces, etc.). It then forks the process tree and sets up namespaces.
Finally, it restores process-specific resources like file descriptors and memory
mappings.
A key step involves a small, self-contained binary called the "restorer". All
restored processes switch to executing this code, which unmaps the CRIU-specific
memory and restores the application's original memory mappings. On the final
step, the restorer calls `sigreturn` on a prepared signal frame to resume the
process with the state it had at the moment of the dump.
## Compel
Compel is a subproject responsible for generating the binary blobs used for the
parasite code (for dumping) and the restorer code (for restoring). It provides a
library for injecting and executing this code within the target process's
address space. It is a separate project because the logic for generating and
injecting Position-Independent Executable (PIE) code is complex and
self-contained.
## Coding Style
The C code in the CRIU project follows the
[Linux Kernel Coding Style](https://www.kernel.org/doc/html/latest/process/coding-style.html).
Here are some of the main points:
- **Indentation**: Use tabs, which are set to 8 characters.
- **Line Length**: The preferred line limit is 80 characters, but it can be
extended to 120 if it improves code readability.
- **Braces**:
- The opening brace for a function goes on a new line.
- The opening brace for a block (like `if`, `for`, `while`, `switch`) goes
on the same line.
- **Spaces**: Use spaces around operators (`+`, `-`, `*`, `/`, `%`, `<`, `>`,
`=`, etc.).
- **Naming**: Use descriptive names for functions and variables.
- **Comments**: Use C-style comments (`/* ... */`). For multi-line comments,
the preferred format is:
```c
/*
* This is a multi-line
* comment.
*/
```
## Code Layout
The code is organized into the following directories:
- `./compel`: The Compel sub-project.
- `./criu`: The main `criu` tool source code.
- `./images`: Protobuf descriptions for the image files.
- `./test`: All tests.
- `./test/zdtm`: The Zero-Downtime Migration (ZDTM) test suite.
- `./test/zdtm.py`: The executor script for ZDTM tests.
- `./scripts`: Helper scripts.
- `./scripts/build`: Docker image files used for CI and cross-compilation
checks.
- `./crit`: A tool to inspect and manipulate CRIU image files.
- `./soccr`: A library for TCP socket checkpoint/restore.
## Tests
The main test suite is ZDTM. Here is an example of how to run a single test:
```bash
sudo ./test/zdtm.py run -t zdtm/static/env00
```
Each ZDTM test has three stages: preparation, C/R, and results checks. During
the test, a process calls `test_daemon()` to signal it is ready for C/R, then
calls `test_waitsig()` to wait for the C/R stage to complete. After being
restored, the test checks that all its resources are still in a valid state.

View file

@ -1,11 +1,31 @@
## Building CRIU from source code
First, you need to install compile-time dependencies. Check [Installation dependencies](https://criu.org/Installation#Dependencies) for more info.
To compile CRIU, run:
```
make
```
This should create the `./criu/criu` executable.
To change the default behaviour of CRIU, the following variables can be passed
to the make command:
* **NETWORK_LOCK_DEFAULT**, can be set to one of the following
values: `NETWORK_LOCK_IPTABLES`, `NETWORK_LOCK_NFTABLES`,
`NETWORK_LOCK_SKIP`. CRIU defaults to `NETWORK_LOCK_IPTABLES`
if nothing is specified. If another network locking backend is
needed, `make` can be called like this:
`make NETWORK_LOCK_DEFAULT=NETWORK_LOCK_NFTABLES`
## Installing CRIU from source code
Once CRIU is built one can easily setup the complete CRIU package
(which includes executable itself, CRIT tool, libraries, manual
and etc) simply typing
make install
```
make install
```
this command accepts the following variables:
* **DESTDIR**, to specify global root where all components will be placed under (empty by default);
@ -16,17 +36,17 @@ this command accepts the following variables:
* **LIBDIR**, to specify directory where to put libraries (guess the correct path by default).
Thus one can type
make DESTDIR=/some/new/place install
```
make DESTDIR=/some/new/place install
```
and get everything installed under `/some/new/place`.
## Uninstalling CRIU
To clean up previously installed CRIU instance one can type
make uninstall
```
make uninstall
```
and everything should be removed. Note though that if some variable (**DESTDIR**, **BINDIR**
and such) has been used during installation procedure, the same *must* be passed with
uninstall action.

8
MAINTAINERS Normal file
View file

@ -0,0 +1,8 @@
Pavel Emelyanov <ovzxemul@gmail.com> (chief)
Andrey Vagin <avagin@gmail.com>
Mike Rapoport <rppt@kernel.org>
Dmitry Safonov <0x7f454c46@gmail.com>
Adrian Reber <areber@redhat.com>
Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Radostin Stoyanov <rstoyanov@fedoraproject.org>
Alexander Mikhalitsyn <alexander@mihalicyn.com>

136
MAINTAINERS_GUIDE.md Normal file
View file

@ -0,0 +1,136 @@
## Introduction
Dear maintainer. Thank you for investing the time and energy to help
make CRIU as useful as possible. Maintaining a project is difficult,
sometimes unrewarding work. Sure, you will contribute cool features
to the project, but most of your time will be spent reviewing patches,
cleaning things up, documenting, answering questions, justifying design
decisions - while everyone else will just have fun! But remember -- the
quality of the maintainers work is what distinguishes the good projects
from the great. So please be proud of your work, even the unglamorous
parts, and encourage a culture of appreciation and respect for *every*
aspect of improving the project -- not just the hot new features.
Being a maintainer is a time consuming commitment and should not be
taken lightly. This document is a manual for maintainers old and new.
It explains what is expected of maintainers, how they should work, and
what tools are available to them.
This is a living document - if you see something out of date or missing,
speak up!
## What are a maintainer's responsibility?
Part of a healthy project is to have active maintainers to support the
community in contributions and perform tasks to keep the project running.
It is every maintainer's responsibility to:
* Keep the community a friendly place
* Deliver prompt feedback and decisions on pull requests and mailing
list threads
* Encourage other members to help each other, especially in cases the
maintainer is overloaded or feels the lack of needed expertise
* Make sure the changes made respects the philosophy, design and
roadmap of the project
## How are decisions made?
CRIU is an open-source project with an open design philosophy. This
means that the repository is the source of truth for EVERY aspect of the
project. *If it's part of the project, it's in the repo. It's in the
repo, it's part of the project.*
All decisions affecting CRIU, big and small, follow the same 3 steps:
* Submit a change. Anyone can do this
* Discuss it. Anyone can and is encouraged to do this
* Accept or decline it. Only maintainers do this
*I'm a maintainer, should I make pull requests / send patches too?*
Yes. Nobody should ever push to the repository directly. All changes
should be made through submitting (and accepting) the change.
### Two-steps decision making ###
Since CRIU is extremely complex piece of software we try double hard
not to make mistakes, that would be hard to fix in the future. In order
to facilitate this, the "final" decision is made in two stages:
* We definitely want to try something out
* We think that the attempt was successful
Respectively, new features get accepted first into the *criu-dev* branch and
after they have been validated they are merged into the *master* branch. Yet,
urgent bug fixes may land directly in the master branch. If a change in
the criu-dev branch is considered to be bad (whatever it means), then it
can be reverted without propagation to the master branch. Reverting from
the master branch is expected not to happen at all, but if such an
extraordinary case occurs, the impact of this step, especially the question
of backward compatibility, should be considered in the most careful manner.
## Who decides what?
All decisions can be expressed as changes to the repository (either in the
form of pull requests, or patches sent to the mailing list), and maintainers
make decisions by merging or rejecting them. Review and approval or
disagreement can be done by anyone and is denoted by adding a respective
comment in the pull request. However, merging the change into either branch
only happens after approvals from maintainers.
In order for a patch to be merged into the criu-dev branch at least two
maintainers should accept it. In order for a patch to be merged into the
master branch the majority of maintainers should decide that (then prepare
a pull request, submit it, etc.).
Overall the maintainer system works because of mutual respect across the
maintainers of the project. The maintainers trust one another to make
decisions in the best interests of the project. Sometimes maintainers
can disagree and this is part of a healthy project to represent the point
of views of various people. In the case where maintainers cannot find
agreement on a specific change the role of a Chief Maintainer comes into
play.
### Chief maintainer
The chief maintainer for the project is responsible for overall architecture
of the project to maintain conceptual integrity. Large decisions and
architecture changes should be reviewed by the chief maintainer.
Also the chief maintainer has the veto power on any change submitted
to any branch. Naturally, a change in the criu-dev branch can be reverted
after a chief maintainer veto, a change in the master branch must be
carefully reviewed by the chief maintainer and vetoed in advance.
### How are maintainers added (and removed)?
The best maintainers have a vested interest in the project. Maintainers
are first and foremost contributors that have shown they are committed to
the long term success of the project. Contributors wanting to become
maintainers are expected to be deeply involved in contributing code,
patches review, and paying needed attention to the issues in the project.
Just contributing does not make you a maintainer, it is about building trust
with the current maintainers of the project and being a person that they can
rely on and trust to make decisions in the best interest of the project.
When a contributor wants to become a maintainer or nominate someone as a
maintainer, one can submit a "nomination", which technically is the
respective modification to the `MAINTAINERS` file. When a maintainer feels
they is unable to perform the required duties, or someone else wants to draw
the community attention to this fact, one can submit a "(self-)removing"
change.
The final vote to add or to remove a maintainer is to be approved by the
majority of current maintainers (with the chief maintainer having veto power
on that too).
One might have noticed, that the chief maintainer (re-)assignment is not
regulated by this document. That's true :) However, this can be done. If
the community decides that the chief maintainer needs to be changed the
respective "decision making rules" are to be prepared, submitted and
accepted into this file first.
Good luck!

193
Makefile
View file

@ -17,34 +17,41 @@ ifeq ($(origin HOSTCFLAGS), undefined)
HOSTCFLAGS := $(CFLAGS) $(USERCFLAGS)
endif
UNAME-M := $(shell uname -m)
#
# Supported Architectures
ifneq ($(filter-out x86 arm aarch64 ppc64 s390,$(ARCH)),)
ifneq ($(filter-out x86 arm aarch64 ppc64 s390 mips loongarch64 riscv64,$(ARCH)),)
$(error "The architecture $(ARCH) isn't supported")
endif
# The PowerPC 64 bits architecture could be big or little endian.
# They are handled in the same way.
ifeq ($(UNAME-M),ppc64)
ifeq ($(SUBARCH),ppc64)
error := $(error ppc64 big endian is not yet supported)
endif
#
# Architecture specific options.
ifeq ($(ARCH),arm)
ARMV := $(shell echo $(UNAME-M) | sed -nr 's/armv([[:digit:]]).*/\1/p; t; i7')
DEFINES := -DCONFIG_ARMV$(ARMV) -DCONFIG_VDSO_32
ARMV := $(shell echo $(SUBARCH) | sed -nr 's/armv([[:digit:]]).*/\1/p; t; i7')
ifeq ($(ARMV),6)
USERCFLAGS += -march=armv6
ARCHCFLAGS += -march=armv6
endif
ifeq ($(ARMV),7)
USERCFLAGS += -march=armv7-a
ARCHCFLAGS += -march=armv7-a+fp
endif
ifeq ($(ARMV),8)
# Running 'setarch linux32 uname -m' returns armv8l on aarch64.
# This tells CRIU to handle armv8l just as armv7hf. Right now this is
# only used for compile testing. No further verification of armv8l exists.
ARCHCFLAGS += -march=armv7-a
ARMV := 7
endif
DEFINES := -DCONFIG_ARMV$(ARMV) -DCONFIG_VDSO_32
PROTOUFIX := y
# For simplicity - compile code in Arm mode without interwork.
# We could choose Thumb mode as default instead - but a dirty
@ -57,6 +64,8 @@ endif
ifeq ($(ARCH),aarch64)
DEFINES := -DCONFIG_AARCH64
CC_MBRANCH_PROT := $(shell $(CC) -c -x c /dev/null -mbranch-protection=none -o /dev/null >/dev/null 2>&1 && echo "-mbranch-protection=none")
CFLAGS_PIE := $(CC_MBRANCH_PROT)
endif
ifeq ($(ARCH),ppc64)
@ -69,6 +78,18 @@ ifeq ($(ARCH),x86)
DEFINES := -DCONFIG_X86_64
endif
ifeq ($(ARCH),mips)
DEFINES := -DCONFIG_MIPS
endif
ifeq ($(ARCH),loongarch64)
DEFINES := -DCONFIG_LOONGARCH64
endif
ifeq ($(ARCH),riscv64)
DEFINES := -DCONFIG_RISCV64
endif
#
# CFLAGS_PIE:
#
@ -77,7 +98,6 @@ endif
# commit "S/390: Fix 64 bit sibcall".
ifeq ($(ARCH),s390)
ARCH := s390
SRCARCH := s390
DEFINES := -DCONFIG_S390
CFLAGS_PIE := -fno-optimize-sibling-calls
endif
@ -85,25 +105,47 @@ endif
CFLAGS_PIE += -DCR_NOGLIBC
export CFLAGS_PIE
LDARCH ?= $(SRCARCH)
LDARCH ?= $(ARCH)
export LDARCH
export PROTOUFIX DEFINES
#
# Independent options for all tools.
DEFINES += -D_FILE_OFFSET_BITS=64
DEFINES += -D_LARGEFILE64_SOURCE
DEFINES += -D_GNU_SOURCE
WARNINGS := -Wall -Wformat-security
WARNINGS := -Wall -Wformat-security -Wdeclaration-after-statement -Wstrict-prototypes
# -Wdangling-pointer results in false warning when we add a list element to
# local list head variable. It is false positive because before leaving the
# function we always check that local list head variable is empty, thus
# insuring that pointer to it is not dangling anywhere, but gcc can't
# understand it.
# Note: There is similar problem with kernel list, where this warning is also
# disabled: https://github.com/torvalds/linux/commit/49beadbd47c2
WARNINGS += -Wno-dangling-pointer -Wno-unknown-warning-option
CFLAGS-GCOV := --coverage -fno-exceptions -fno-inline -fprofile-update=atomic
export CFLAGS-GCOV
ifeq ($(ARCH),mips)
WARNINGS := -rdynamic
endif
ifeq ($(ARCH),loongarch64)
WARNINGS += -Wno-implicit-function-declaration
endif
ifneq ($(GCOV),)
LDFLAGS += -lgcov
CFLAGS += $(CFLAGS-GCOV)
endif
ifneq ($(NETWORK_LOCK_DEFAULT),)
CFLAGS += -DNETWORK_LOCK_DEFAULT=$(NETWORK_LOCK_DEFAULT)
endif
ifeq ($(ASAN),1)
CFLAGS-ASAN := -fsanitize=address
export CFLAGS-ASAN
@ -128,12 +170,12 @@ export GMON GMONLDOPT
endif
AFLAGS += -D__ASSEMBLY__
CFLAGS += $(USERCFLAGS) $(WARNINGS) $(DEFINES) -iquote include/
CFLAGS += $(USERCFLAGS) $(ARCHCFLAGS) $(WARNINGS) $(DEFINES) -iquote include/
HOSTCFLAGS += $(WARNINGS) $(DEFINES) -iquote include/
export AFLAGS CFLAGS USERCLFAGS HOSTCFLAGS
# Default target
all: criu lib crit
all: criu lib crit cuda_plugin
.PHONY: all
#
@ -188,12 +230,13 @@ criu-deps += include/common/asm
#
# Configure variables.
export CONFIG_HEADER := include/common/config.h
ifeq ($(filter tags etags cscope clean mrproper,$(MAKECMDGOALS)),)
ifeq ($(filter tags etags cscope clean lint indent fetch-clang-format help mrproper,$(MAKECMDGOALS)),)
include Makefile.config
else
# To clean all files, enable make/build options here
export CONFIG_COMPAT := y
export CONFIG_GNUTLS := y
export CONFIG_HAS_LIBBPF := y
endif
#
@ -235,22 +278,19 @@ criu: $(criu-deps)
$(Q) $(MAKE) $(build)=criu all
.PHONY: criu
crit/Makefile: ;
crit/%: criu .FORCE
$(Q) $(MAKE) $(build)=crit $@
crit: criu
$(Q) $(MAKE) $(build)=crit all
.PHONY: crit
unittest: $(criu-deps)
$(Q) $(MAKE) $(build)=criu unittest
.PHONY: unittest
#
# Libraries next once crit it ready
# Libraries next once criu is ready
# (we might generate headers and such
# when building criu itself).
lib/Makefile: ;
lib/%: crit .FORCE
lib/%: criu .FORCE
$(Q) $(MAKE) $(build)=lib $@
lib: crit
lib: criu
$(Q) $(MAKE) $(build)=lib all
.PHONY: lib
@ -259,21 +299,28 @@ clean mrproper:
$(Q) $(MAKE) $(build)=criu $@
$(Q) $(MAKE) $(build)=soccr $@
$(Q) $(MAKE) $(build)=lib $@
$(Q) $(MAKE) $(build)=crit $@
$(Q) $(MAKE) $(build)=compel $@
$(Q) $(MAKE) $(build)=compel/plugins $@
$(Q) $(MAKE) $(build)=lib $@
$(Q) $(MAKE) $(build)=crit $@
.PHONY: clean mrproper
clean-amdgpu_plugin:
$(Q) $(MAKE) -C plugins/amdgpu clean
.PHONY: clean-amdgpu_plugin
clean-cuda_plugin:
$(Q) $(MAKE) -C plugins/cuda clean
.PHONY: clean-cuda_plugin
clean-top:
$(Q) $(MAKE) -C Documentation clean
$(Q) $(MAKE) $(build)=test/compel clean
$(Q) $(RM) .gitid
.PHONY: clean-top
clean: clean-top
clean: clean-top clean-amdgpu_plugin clean-cuda_plugin
mrproper-top: clean-top
mrproper-top: clean-top clean-amdgpu_plugin clean-cuda_plugin
$(Q) $(RM) $(CONFIG_HEADER)
$(Q) $(RM) $(VERSION_HEADER)
$(Q) $(RM) $(COMPEL_VERSION_HEADER)
@ -301,6 +348,18 @@ test: zdtm
$(Q) $(MAKE) -C test
.PHONY: test
amdgpu_plugin: criu
$(Q) $(MAKE) -C plugins/amdgpu all
.PHONY: amdgpu_plugin
cuda_plugin: criu
$(Q) $(MAKE) -C plugins/cuda all
.PHONY: cuda_plugin
crit: lib
$(Q) $(MAKE) -C crit
.PHONY: crit
#
# Generating tar requires tag matched CRIU_VERSION.
# If not found then simply use GIT's describe with
@ -354,17 +413,19 @@ gcov:
.PHONY: gcov
docker-build:
$(MAKE) -C scripts/build/ x86_64
$(MAKE) -C scripts/build/ x86_64
.PHONY: docker-build
docker-test:
docker run --rm -it --privileged criu-x86_64 ./test/zdtm.py run -a -x tcp6 -x tcpbuf6 -x static/rtc -x cgroup
docker run --rm --privileged -v /lib/modules:/lib/modules --network=host --cgroupns=host criu-x86_64 \
./test/zdtm.py run -a --keep-going --ignore-taint
.PHONY: docker-test
help:
@echo ' Targets:'
@echo ' all - Build all [*] targets'
@echo ' * criu - Build criu'
@echo ' * crit - Build crit'
@echo ' zdtm - Build zdtm test-suite'
@echo ' docs - Build documentation'
@echo ' install - Install CRIU (see INSTALL.md)'
@ -377,14 +438,76 @@ help:
@echo ' cscope - Generate cscope database'
@echo ' test - Run zdtm test-suite'
@echo ' gcov - Make code coverage report'
@echo ' unittest - Run unit tests'
@echo ' lint - Run code linters'
@echo ' indent - Indent C code'
@echo ' amdgpu_plugin - Make AMD GPU plugin'
@echo ' cuda_plugin - Make NVIDIA CUDA plugin'
.PHONY: help
lint:
flake8 --version
flake8 --config=scripts/flake8.cfg test/zdtm.py
flake8 --config=scripts/flake8.cfg test/inhfd/*.py
flake8 --config=scripts/flake8.cfg test/others/rpc/config_file.py
flake8 --config=scripts/flake8.cfg lib/py/images/pb2dict.py
ruff:
@ruff --version
ruff check ${RUFF_FLAGS} --config=scripts/ruff.toml \
test/zdtm.py \
test/inhfd/*.py \
test/others/rpc/config_file.py \
test/others/action-script/check_actions.py \
test/others/pycriu/*.py \
lib/pycriu/criu.py \
lib/pycriu/__init__.py \
lib/pycriu/images/pb2dict.py \
lib/pycriu/images/images.py \
scripts/criu-ns \
test/others/criu-ns/run.py \
crit/*.py \
crit/crit/*.py \
scripts/uninstall_module.py \
coredump/ coredump/coredump \
scripts/github-indent-warnings.py
shellcheck:
shellcheck --version
shellcheck scripts/*.sh
shellcheck scripts/ci/*.sh
shellcheck contrib/apt-install contrib/dependencies/*.sh
shellcheck -x test/others/crit/*.sh
shellcheck -x test/others/libcriu/*.sh
shellcheck -x test/others/crit/*.sh test/others/criu-coredump/*.sh
shellcheck -x test/others/config-file/*.sh
shellcheck -x test/others/action-script/*.sh
codespell:
codespell
lint: ruff shellcheck codespell
# Do not append \n to pr_perror, pr_pwarn or fail
! git --no-pager grep -E '^\s*\<(pr_perror|pr_pwarn|fail)\>.*\\n"'
# Do not use %m with pr_* or fail
! git --no-pager grep -E '^\s*\<(pr_(err|perror|warn|pwarn|debug|info|msg)|fail)\>.*%m'
# Do not use errno with pr_perror, pr_pwarn or fail
! git --no-pager grep -E '^\s*\<(pr_perror|pr_pwarn|fail)\>\(".*".*errno'
# End pr_(err|warn|msg|info|debug) with \n
! git --no-pager grep -En '^\s*\<pr_(err|warn|msg|info|debug)\>.*);$$' | grep -v '\\n'
# No EOL whitespace for C files
! git --no-pager grep -E '\s+$$' \*.c \*.h
.PHONY: lint ruff shellcheck codespell
codecov: SHELL := $(shell command -v bash)
codecov:
curl -Os https://uploader.codecov.io/latest/linux/codecov
chmod +x codecov
./codecov
.PHONY: codecov
fetch-clang-format: .FORCE
$(E) ".clang-format"
$(Q) scripts/fetch-clang-format.sh
BASE ?= "HEAD~1"
OPTS ?= "--quiet"
indent:
git clang-format --style file --extensions c,h $(OPTS) $(BASE)
.PHONY: indent
include Makefile.install

View file

@ -50,8 +50,8 @@ compel/plugins/%: $(compel-deps) .FORCE
#
# GNU make 4.x supports targets matching via wide
# match targeting, where GNU make 3.x series (used on
# Travis) is not, so we have to write them here explicitly.
# match targeting, where GNU make 3.x series is not,
# so we have to write them here explicitly.
compel/plugins/std.lib.a: $(compel-deps) .FORCE
$(Q) $(MAKE) $(build)=compel/plugins $@

View file

@ -2,12 +2,15 @@ include $(__nmk_dir)utils.mk
include $(__nmk_dir)msg.mk
include scripts/feature-tests.mak
# This is a kludge for $(info ...) to not eat spaces.
S :=
ifeq ($(call try-cc,$(FEATURE_TEST_LIBBSD_DEV),-lbsd),true)
LIBS_FEATURES += -lbsd
FEATURE_DEFINES += -DCONFIG_HAS_LIBBSD
else
$(info Note: Building without setproctitle() and strlcpy() support.)
$(info $(info) To enable these features, please install libbsd-devel (RPM) / libbsd-dev (DEB).)
$(info Note: Building without setproctitle() support.)
$(info $S Install libbsd-devel (RPM) / libbsd-dev (DEB) to fix.)
endif
ifeq ($(call pkg-config-check,libselinux),y)
@ -15,45 +18,82 @@ ifeq ($(call pkg-config-check,libselinux),y)
FEATURE_DEFINES += -DCONFIG_HAS_SELINUX
endif
ifeq ($(call pkg-config-check,libbpf),y)
LIBS_FEATURES += -lbpf
FEATURE_DEFINES += -DCONFIG_HAS_LIBBPF
export CONFIG_HAS_LIBBPF := y
endif
ifeq ($(call pkg-config-check,libdrm),y)
export CONFIG_AMDGPU := y
$(info Note: Building with amdgpu_plugin.)
else
$(info Note: Building without amdgpu_plugin.)
$(info $S Install libdrm-devel (RPM) or libdrm-dev (DEB) to fix.)
endif
ifeq ($(NO_GNUTLS)x$(call pkg-config-check,gnutls),xy)
LIBS_FEATURES += -lgnutls
export CONFIG_GNUTLS := y
FEATURE_DEFINES += -DCONFIG_GNUTLS
else
$(info Note: Building without GnuTLS support)
$(info Note: Building without GnuTLS support.)
$(info $S Install gnutls-devel (RPM) or gnutls-dev (DEB) to fix.)
endif
ifeq ($(call pkg-config-check,libnftables),y)
LIB_NFTABLES := $(shell $(PKG_CONFIG) --libs libnftables)
ifeq ($(call try-cc,$(FEATURE_TEST_NFTABLES_LIB_API_0),$(LIB_NFTABLES)),true)
LIBS_FEATURES += $(LIB_NFTABLES)
FEATURE_DEFINES += -DCONFIG_HAS_NFTABLES_LIB_API_0
else ifeq ($(call try-cc,$(FEATURE_TEST_NFTABLES_LIB_API_1),$(LIB_NFTABLES)),true)
LIBS_FEATURES += $(LIB_NFTABLES)
FEATURE_DEFINES += -DCONFIG_HAS_NFTABLES_LIB_API_1
else
$(info Warn: Building without nftables support (incompatible API version).)
endif
else
$(info Warn: Building without nftables support.)
$(info $S Install nftables-devel (RPM) or libnftables-dev (DEB) to fix.)
endif
export LIBS += $(LIBS_FEATURES)
ifneq ($(PLUGINDIR),)
FEATURE_DEFINES += -DCR_PLUGIN_DEFAULT="\"$(PLUGINDIR)\""
endif
CONFIG_FILE = .config
$(CONFIG_FILE):
touch $(CONFIG_FILE)
ifeq ($(SRCARCH),x86)
ifeq ($(ARCH),x86)
# CONFIG_COMPAT is only for x86 now, no need for compile-test other archs
ifeq ($(call try-asm,$(FEATURE_TEST_X86_COMPAT)),true)
export CONFIG_COMPAT := y
FEATURE_DEFINES += -DCONFIG_COMPAT
else
$(info Note: Building without ia32 C/R, missed ia32 support in gcc)
$(info $(info) That may be related to missing gcc-multilib in your)
$(info $(info) distribution or you may have Debian with buggy toolchain)
$(info $(info) (issue https://github.com/checkpoint-restore/criu/issues/315))
$(info Note: Building without ia32 C/R, missing ia32 support in gcc.)
$(info $S It may be related to missing gcc-multilib in your)
$(info $S distribution, or you may have Debian with buggy toolchain.)
$(info $S See https://github.com/checkpoint-restore/criu/issues/315.)
endif
endif
export DEFINES += $(FEATURE_DEFINES)
export CFLAGS += $(FEATURE_DEFINES)
FEATURES_LIST := TCP_REPAIR STRLCPY STRLCAT PTRACE_PEEKSIGINFO \
SETPROCTITLE_INIT MEMFD TCP_REPAIR_WINDOW
FEATURES_LIST := TCP_REPAIR PTRACE_PEEKSIGINFO \
SETPROCTITLE_INIT TCP_REPAIR_WINDOW MEMFD_CREATE \
OPENAT2 NO_LIBC_RSEQ_DEFS
# $1 - config name
define gen-feature-test
ifeq ($$(call try-cc,$$(FEATURE_TEST_$(1)),$$(LIBS_FEATURES),$$(DEFINES)),true)
$(Q) echo '#define CONFIG_HAS_$(1)' >> $$@
$(Q) echo '' >> $$@
else
$(Q) echo '// CONFIG_HAS_$(1) is not set' >> $$@
endif
endef

View file

@ -7,6 +7,7 @@ MANDIR ?= $(PREFIX)/share/man
INCLUDEDIR ?= $(PREFIX)/include
LIBEXECDIR ?= $(PREFIX)/libexec
RUNDIR ?= /run
PLUGINDIR ?= $(PREFIX)/lib/criu
#
# For recent Debian/Ubuntu with multiarch support.
@ -26,7 +27,34 @@ endif
LIBDIR ?= $(PREFIX)/lib
export PREFIX BINDIR SBINDIR MANDIR RUNDIR
export LIBDIR INCLUDEDIR LIBEXECDIR
export LIBDIR INCLUDEDIR LIBEXECDIR PLUGINDIR
# Detect externally managed Python environment (PEP 668).
PYTHON_EXTERNALLY_MANAGED := $(shell $(PYTHON) -c 'import os, sysconfig; print(int(os.path.isfile(os.path.join(sysconfig.get_path("stdlib"), "EXTERNALLY-MANAGED"))))')
PIP_BREAK_SYSTEM_PACKAGES ?= 0
# If Python environment is externally managed and PIP_BREAK_SYSTEM_PACKAGES is not set, skip pip install.
SKIP_PIP_INSTALL := 0
ifeq ($(PYTHON_EXTERNALLY_MANAGED),1)
ifeq ($(PIP_BREAK_SYSTEM_PACKAGES),0)
SKIP_PIP_INSTALL := 1
$(info Warn: Externally managed python environment)
$(info Consider using PIP_BREAK_SYSTEM_PACKAGES=1)
endif
endif
# Default flags for pip install:
# --ignore-installed: Overwrite already installed pycriu/crit packages
# --no-build-isolation: Use current Python environment to build pycriu/crit packages
# --no-deps: Don't install any dependencies
# --no-index: Don't use PyPI index to find packages
# --progress-bar: Cleaner output
# --upgrade: Treat the install as an upgrade when replacing the installed version
PIPFLAGS ?= --ignore-installed --no-build-isolation --no-deps --no-index --progress-bar off --upgrade
export SKIP_PIP_INSTALL PIPFLAGS
install-man:
$(Q) $(MAKE) -C Documentation install
@ -36,22 +64,37 @@ install-lib: lib
$(Q) $(MAKE) $(build)=lib install
.PHONY: install-lib
install-crit: lib
$(Q) $(MAKE) $(build)=crit install
.PHONY: install-crit
install-criu: criu
$(Q) $(MAKE) $(build)=criu install
.PHONY: install-criu
install-amdgpu_plugin: amdgpu_plugin
$(Q) $(MAKE) -C plugins/amdgpu install
.PHONY: install-amdgpu_plugin
install-cuda_plugin: cuda_plugin
$(Q) $(MAKE) -C plugins/cuda install
.PHONY: install-cuda_plugin
install-compel: $(compel-install-targets)
$(Q) $(MAKE) $(build)=compel install
$(Q) $(MAKE) $(build)=compel/plugins install
.PHONY: install-compel
install: install-man install-lib install-criu install-compel ;
install: install-man install-lib install-crit install-criu install-compel install-amdgpu_plugin install-cuda_plugin ;
.PHONY: install
uninstall:
$(Q) $(MAKE) -C Documentation $@
$(Q) $(MAKE) $(build)=lib $@
$(Q) $(MAKE) $(build)=crit $@
$(Q) $(MAKE) $(build)=criu $@
$(Q) $(MAKE) $(build)=compel $@
$(Q) $(MAKE) $(build)=compel/plugins $@
$(Q) $(MAKE) -C plugins/amdgpu $@
$(Q) $(MAKE) -C plugins/cuda $@
.PHONY: uninstall

View file

@ -1,10 +1,10 @@
#
# CRIU version.
CRIU_VERSION_MAJOR := 3
CRIU_VERSION_MINOR := 13
CRIU_VERSION_MAJOR := 4
CRIU_VERSION_MINOR := 2
CRIU_VERSION_SUBLEVEL :=
CRIU_VERSION_EXTRA :=
CRIU_VERSION_NAME := Silicon Willet
CRIU_VERSION_NAME := CRIUTIBILITY
CRIU_VERSION := $(CRIU_VERSION_MAJOR)$(if $(CRIU_VERSION_MINOR),.$(CRIU_VERSION_MINOR))$(if $(CRIU_VERSION_SUBLEVEL),.$(CRIU_VERSION_SUBLEVEL))$(if $(CRIU_VERSION_EXTRA),.$(CRIU_VERSION_EXTRA))
export CRIU_VERSION_MAJOR CRIU_VERSION_MINOR CRIU_VERSION_SUBLEVEL

View file

@ -1,22 +1,33 @@
[![master](https://travis-ci.org/checkpoint-restore/criu.svg?branch=master)](https://travis-ci.org/checkpoint-restore/criu)
[![development](https://travis-ci.org/checkpoint-restore/criu.svg?branch=criu-dev)](https://travis-ci.org/checkpoint-restore/criu)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/55251ec7db28421da4481fc7c1cb0cee)](https://www.codacy.com/app/xemul/criu?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=xemul/criu&amp;utm_campaign=Badge_Grade)
<p align="center"><img src="https://criu.org/w/images/1/1c/CRIU.svg" width="256px"/></p>
[![X86_64 GCC Test](https://github.com/checkpoint-restore/criu/workflows/X86_64%20GCC%20Test/badge.svg)](
https://github.com/checkpoint-restore/criu/actions/workflows/x86-64-gcc-test.yml)
[![Docker Test](https://github.com/checkpoint-restore/criu/actions/workflows/docker-test.yml/badge.svg)](
https://github.com/checkpoint-restore/criu/actions/workflows/docker-test.yml)
[![Podman Test](https://github.com/checkpoint-restore/criu/actions/workflows/podman-test.yml/badge.svg)](
https://github.com/checkpoint-restore/criu/actions/workflows/podman-test.yml)
[![CircleCI](https://circleci.com/gh/checkpoint-restore/criu.svg?style=svg)](
https://circleci.com/gh/checkpoint-restore/criu)
<p align="center"><img src="Documentation/logo.svg" width="256px"/></p>
## CRIU -- A project to implement checkpoint/restore functionality for Linux
CRIU (stands for Checkpoint and Restore in Userspace) is a utility to checkpoint/restore Linux tasks.
Using this tool, you can freeze a running application (or part of it) and checkpoint
Using this tool, you can freeze a running application (or part of it) and checkpoint
it to a hard drive as a collection of files. You can then use the files to restore and run the
application from the point it was frozen at. The distinctive feature of the CRIU
project is that it is mainly implemented in user space. There are some more projects
doing C/R for Linux, and so far CRIU [appears to be](https://criu.org/Comparison_to_other_CR_projects)
doing C/R for Linux, and so far CRIU [appears to be](https://criu.org/Comparison_to_other_CR_projects)
the most feature-rich and up-to-date with the kernel.
CRIU project is (almost) the never-ending story, because we have to always keep up with the
Linux kernel supporting checkpoint and restore for all the features it provides. Thus we're
looking for contributors of all kinds -- feedback, bug reports, testing, coding, writing, etc.
Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) if you would like to get involved.
The project [started](https://criu.org/History) as the way to do live migration for OpenVZ
Linux containers, but later grew to more sophisticated and flexible tool. It is currently
used by (integrated into) OpenVZ, LXC/LXD, Docker, and other software, project gets tremendous
Linux containers, but later grew to more sophisticated and flexible tool. It is currently
used by (integrated into) OpenVZ, LXC/LXD, Docker, and other software, project gets tremendous
help from the community, and its packages are included into many Linux distributions.
The project home is at http://criu.org. This wiki contains all the knowledge base for CRIU we have.
@ -24,15 +35,15 @@ Pages worth starting with are:
- [Installation instructions](http://criu.org/Installation)
- [A simple example of usage](http://criu.org/Simple_loop)
- [Examples of more advanced usage](https://criu.org/Category:HOWTO)
- Troubleshooting can be hard, some help can be found [here](https://criu.org/When_C/R_fails), [here](https://criu.org/What_cannot_be_checkpointed) and [here](https://criu.org/FAQ)
- Troubleshooting can be hard, some help can be found [here](https://criu.org/When_C/R_fails), [here](https://criu.org/What_cannot_be_checkpointed) and [here](https://criu.org/index.php?title=FAQ)
### Checkpoint and restore of simple loop process
[<p align="center"><img src="https://asciinema.org/a/232445.png" width="572px" height="412px"/></p>](https://asciinema.org/a/232445)
### Checkpoint and restore of simple loop process
<p align="center"><a href="https://asciinema.org/a/232445"><img src="https://asciinema.org/a/232445.png" width="572px" height="412px"/></a></p>
## Advanced features
As main usage for CRIU is live migration, there's a library for it called P.Haul. Also the
project exposes two cool core features as standalone libraries. These are libcompel for parasite code
project exposes two cool core features as standalone libraries. These are libcompel for parasite code
injection and libsoccr for TCP connections checkpoint-restore.
### Live migration
@ -56,21 +67,9 @@ One of the CRIU features is the ability to save and restore state of a TCP socke
without breaking the connection. This functionality is considered to be useful by
itself, and we have it available as the [libsoccr library](https://criu.org/Libsoccr).
## How to contribute
CRIU project is (almost) the never-ending story, because we have to always keep up with the
Linux kernel supporting checkpoint and restore for all the features it provides. Thus we're
looking for contributors of all kinds -- feedback, bug reports, testing, coding, writing, etc.
Here are some useful hints to get involved.
* We have both -- [very simple](https://checkpoint-restore/criu/issues?q=is%3Aissue+is%3Aopen+label%3Aenhancement) and [more sophisticated](https://checkpoint-restore/criu/issues?q=is%3Aissue+is%3Aopen+label%3A%22new+feature%22) coding tasks;
* CRIU does need [extensive testing](https://checkpoint-restore/criu/issues?q=is%3Aissue+is%3Aopen+label%3Atesting);
* Documentation is always hard, we have [some information](https://criu.org/Category:Empty_articles) that is to be extracted from people's heads into wiki pages as well as [some texts](https://criu.org/Category:Editor_help_needed) that all need to be converted into useful articles;
* Feedback is expected on the github issues page and on the [mailing list](https://lists.openvz.org/mailman/listinfo/criu);
* For historical reasons we do not accept PRs, instead [patches are welcome](http://criu.org/How_to_submit_patches);
* Spread the word about CRIU in [social networks](http://criu.org/Contacts);
* If you're giving a talk about CRIU -- let us know, we'll mention it on the [wiki main page](https://criu.org/News/events);
## Licence
The project is licensed under GPLv2 (though files sitting in the lib/ directory are LGPLv2.1).
All files in the images/ directory are licensed under the Expat license (so-called MIT).
See the images/LICENSE file.

3
compel/.gitignore vendored
View file

@ -4,6 +4,9 @@ arch/arm/plugins/std/syscalls/syscalls.S
arch/aarch64/plugins/std/syscalls/syscalls.S
arch/s390/plugins/std/syscalls/syscalls.S
arch/ppc64/plugins/std/syscalls/syscalls.S
arch/mips/plugins/std/syscalls/syscalls-64.S
arch/loongarch64/plugins/std/syscalls/syscalls-64.S
arch/riscv64/plugins/std/syscalls/syscalls.S
include/version.h
plugins/include/uapi/std/asm/syscall-types.h
plugins/include/uapi/std/syscall-64.h

View file

@ -28,8 +28,12 @@ lib-y += src/lib/infect-util.o
lib-y += src/lib/infect.o
lib-y += src/lib/ptrace.o
# handle_elf() has no support of ELF relocations on ARM (yet?)
ifneq ($(filter arm aarch64,$(ARCH)),)
ifeq ($(ARCH),x86)
lib-y += arch/$(ARCH)/src/lib/thread_area.o
endif
# handle_elf() has no support of ELF relocations on ARM and RISCV64 (yet?)
ifneq ($(filter arm aarch64 loongarch64 riscv64,$(ARCH)),)
CFLAGS += -DNO_RELOCS
HOSTCFLAGS += -DNO_RELOCS
endif

View file

@ -1,7 +1,7 @@
#ifndef COMPEL_ARCH_SYSCALL_TYPES_H__
#define COMPEL_ARCH_SYSCALL_TYPES_H__
#define SA_RESTORER 0x04000000
#define SA_RESTORER 0x04000000
typedef void rt_signalfn_t(int, siginfo_t *, void *);
typedef rt_signalfn_t *rt_sighandler_t;
@ -9,20 +9,20 @@ typedef rt_signalfn_t *rt_sighandler_t;
typedef void rt_restorefn_t(void);
typedef rt_restorefn_t *rt_sigrestore_t;
#define _KNSIG 64
#define _NSIG_BPW 64
#define _KNSIG 64
#define _NSIG_BPW 64
#define _KNSIG_WORDS (_KNSIG / _NSIG_BPW)
#define _KNSIG_WORDS (_KNSIG / _NSIG_BPW)
typedef struct {
unsigned long sig[_KNSIG_WORDS];
} k_rtsigset_t;
typedef struct {
rt_sighandler_t rt_sa_handler;
unsigned long rt_sa_flags;
rt_sigrestore_t rt_sa_restorer;
k_rtsigset_t rt_sa_mask;
rt_sighandler_t rt_sa_handler;
unsigned long rt_sa_flags;
rt_sigrestore_t rt_sa_restorer;
k_rtsigset_t rt_sa_mask;
} rt_sigaction_t;
#endif /* COMPEL_ARCH_SYSCALL_TYPES_H__ */

View file

@ -2,19 +2,6 @@
.section .head.text, "ax"
ENTRY(__export_parasite_head_start)
adr x2, __export_parasite_head_start // get the address of this instruction
ldr x0, __export_parasite_cmd
ldr x1, parasite_args_ptr
add x1, x1, x2 // fixup __export_parasite_args
bl parasite_service
brk #0 // the instruction BRK #0 generates the signal SIGTRAP in Linux
parasite_args_ptr:
.quad __export_parasite_args
__export_parasite_cmd:
.quad 0
END(__export_parasite_head_start)

View file

@ -1,3 +1,3 @@
#ifndef __NR_openat
# define __NR_openat 56
#define __NR_openat 56
#endif

View file

@ -29,8 +29,4 @@ SECTIONS
*(.eh_frame*)
*(*)
}
/* Parasite args should have 4 bytes align, as we have futex inside. */
. = ALIGN(4);
__export_parasite_args = .;
}

View file

@ -7,7 +7,7 @@
#include "log.h"
#undef LOG_PREFIX
#undef LOG_PREFIX
#define LOG_PREFIX "cpu: "
static compel_cpuinfo_t rt_info;
@ -22,11 +22,24 @@ static void fetch_rt_cpuinfo(void)
}
}
void compel_set_cpu_cap(compel_cpuinfo_t *info, unsigned int feature) { }
void compel_clear_cpu_cap(compel_cpuinfo_t *info, unsigned int feature) { }
int compel_test_cpu_cap(compel_cpuinfo_t *info, unsigned int feature) { return 0; }
int compel_test_fpu_cap(compel_cpuinfo_t *info, unsigned int feature) { return 0; }
int compel_cpuid(compel_cpuinfo_t *info) { return 0; }
void compel_set_cpu_cap(compel_cpuinfo_t *info, unsigned int feature)
{
}
void compel_clear_cpu_cap(compel_cpuinfo_t *info, unsigned int feature)
{
}
int compel_test_cpu_cap(compel_cpuinfo_t *info, unsigned int feature)
{
return 0;
}
int compel_test_fpu_cap(compel_cpuinfo_t *info, unsigned int feature)
{
return 0;
}
int compel_cpuid(compel_cpuinfo_t *info)
{
return 0;
}
bool compel_cpu_has_feature(unsigned int feature)
{

View file

@ -1,20 +1,17 @@
#include <string.h>
#include "uapi/compel.h"
#include <errno.h>
#include "handle-elf.h"
#include "piegen.h"
#include "log.h"
static const unsigned char __maybe_unused
elf_ident_64_le[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00,
static const unsigned char __maybe_unused elf_ident_64_le[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, /* clang-format */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
static const unsigned char __maybe_unused
elf_ident_64_be[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x02, 0x01, 0x00,
static const unsigned char __maybe_unused elf_ident_64_be[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x02, 0x01, 0x00, /* clang-format */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};

View file

@ -3,8 +3,8 @@
#include "elf64-types.h"
#define __handle_elf handle_elf_aarch64
#define arch_is_machine_supported(e_machine) (e_machine == EM_AARCH64)
#define __handle_elf handle_elf_aarch64
#define arch_is_machine_supported(e_machine) (e_machine == EM_AARCH64)
extern int handle_elf_aarch64(void *mem, size_t size);

View file

@ -1,4 +1,8 @@
#ifndef __COMPEL_SYSCALL_H__
#define __COMPEL_SYSCALL_H__
#define __NR(syscall, compat) __NR_##syscall
#define __NR(syscall, compat) \
({ \
(void)compat; \
__NR_##syscall; \
})
#endif

View file

@ -2,14 +2,41 @@
#define __COMPEL_BREAKPOINTS_H__
#define ARCH_SI_TRAP TRAP_BRKPT
static inline int ptrace_set_breakpoint(pid_t pid, void *addr)
{
return 0;
}
#include <sys/types.h>
#include <stdbool.h>
static inline int ptrace_flush_breakpoints(pid_t pid)
{
return 0;
}
struct hwbp_cap {
char arch;
char bp_count;
};
/* copied from `linux/arch/arm64/include/asm/hw_breakpoint.h` */
/* Lengths */
#define ARM_BREAKPOINT_LEN_1 0x1
#define ARM_BREAKPOINT_LEN_2 0x3
#define ARM_BREAKPOINT_LEN_3 0x7
#define ARM_BREAKPOINT_LEN_4 0xf
#define ARM_BREAKPOINT_LEN_5 0x1f
#define ARM_BREAKPOINT_LEN_6 0x3f
#define ARM_BREAKPOINT_LEN_7 0x7f
#define ARM_BREAKPOINT_LEN_8 0xff
/* Privilege Levels */
#define AARCH64_BREAKPOINT_EL1 1
#define AARCH64_BREAKPOINT_EL0 2
/* Breakpoint */
#define ARM_BREAKPOINT_EXECUTE 0
/* Watchpoints */
#define ARM_BREAKPOINT_LOAD 1
#define ARM_BREAKPOINT_STORE 2
#define AARCH64_ESR_ACCESS_MASK (1 << 6)
#define DISABLE_HBP 0
#define ENABLE_HBP 1
int ptrace_set_breakpoint(pid_t pid, void *addr);
int ptrace_flush_breakpoints(pid_t pid);
#endif

View file

@ -1,6 +1,7 @@
#ifndef UAPI_COMPEL_ASM_CPU_H__
#define UAPI_COMPEL_ASM_CPU_H__
typedef struct { } compel_cpuinfo_t;
typedef struct {
} compel_cpuinfo_t;
#endif /* UAPI_COMPEL_ASM_CPU_H__ */

View file

@ -0,0 +1,47 @@
#ifndef __UAPI_ASM_GCS_TYPES_H__
#define __UAPI_ASM_GCS_TYPES_H__
#ifndef NT_ARM_GCS
#define NT_ARM_GCS 0x410 /* ARM GCS state */
#endif
/* Shadow Stack/Guarded Control Stack interface */
#define PR_GET_SHADOW_STACK_STATUS 74
#define PR_SET_SHADOW_STACK_STATUS 75
#define PR_LOCK_SHADOW_STACK_STATUS 76
/* When set PR_SHADOW_STACK_ENABLE flag allocates a Guarded Control Stack */
#ifndef PR_SHADOW_STACK_ENABLE
#define PR_SHADOW_STACK_ENABLE (1UL << 0)
#endif
/* Allows explicit GCS stores (eg. using GCSSTR) */
#ifndef PR_SHADOW_STACK_WRITE
#define PR_SHADOW_STACK_WRITE (1UL << 1)
#endif
/* Allows explicit GCS pushes (eg. using GCSPUSHM) */
#ifndef PR_SHADOW_STACK_PUSH
#define PR_SHADOW_STACK_PUSH (1UL << 2)
#endif
#ifndef SHADOW_STACK_SET_TOKEN
#define SHADOW_STACK_SET_TOKEN 0x1 /* Set up a restore token in the shadow stack */
#endif
#define PR_SHADOW_STACK_ALL_MODES \
PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE | PR_SHADOW_STACK_PUSH
/* copied from: arch/arm64/include/asm/sysreg.h */
#define GCS_CAP_VALID_TOKEN 0x1
#define GCS_CAP_ADDR_MASK 0xFFFFFFFFFFFFF000ULL
#define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | GCS_CAP_VALID_TOKEN)
#define GCS_SIGNAL_CAP(addr) (((unsigned long)addr) & GCS_CAP_ADDR_MASK)
#include <asm/hwcap.h>
#ifndef HWCAP_GCS
#define HWCAP_GCS (1UL << 32)
#endif
#endif /* __UAPI_ASM_GCS_TYPES_H__ */

View file

@ -2,12 +2,13 @@
#define UAPI_COMPEL_ASM_TYPES_H__
#include <stdint.h>
#include <stdbool.h>
#include <signal.h>
#include <sys/mman.h>
#include <asm/ptrace.h>
#define SIGMAX 64
#define SIGMAX_OLD 31
#define SIGMAX 64
#define SIGMAX_OLD 31
/*
* Copied from the Linux kernel header arch/arm64/include/uapi/asm/ptrace.h
@ -15,18 +16,53 @@
* A thread ARM CPU context
*/
typedef struct user_pt_regs user_regs_struct_t;
typedef struct user_fpsimd_state user_fpregs_struct_t;
typedef struct user_pt_regs user_regs_struct_t;
#define REG_RES(r) ((uint64_t)(r).regs[0])
#define REG_IP(r) ((uint64_t)(r).pc)
#define REG_SP(r) ((uint64_t)((r).sp))
#define REG_SYSCALL_NR(r) ((uint64_t)(r).regs[8])
/*
* GCS (Guarded Control Stack)
*
* This mirrors the kernel definition but renamed to cr_user_gcs
* to avoid conflict with kernel headers (/usr/include/asm/ptrace.h).
*/
struct cr_user_gcs {
__u64 features_enabled;
__u64 features_locked;
__u64 gcspr_el0;
};
#define user_regs_native(pregs) true
struct user_fpregs_struct {
struct user_fpsimd_state fpstate;
struct cr_user_gcs gcs;
};
typedef struct user_fpregs_struct user_fpregs_struct_t;
#define ARCH_SI_TRAP TRAP_BRKPT
#define __compel_arch_fetch_thread_area(tid, th) 0
#define compel_arch_fetch_thread_area(tctl) 0
#define compel_arch_get_tls_task(ctl, tls)
#define compel_arch_get_tls_thread(tctl, tls)
#define __NR(syscall, compat) __NR_##syscall
#define REG_RES(r) ((uint64_t)(r).regs[0])
#define REG_IP(r) ((uint64_t)(r).pc)
#define SET_REG_IP(r, val) ((r).pc = (val))
#define REG_SP(r) ((uint64_t)((r).sp))
#define REG_SYSCALL_NR(r) ((uint64_t)(r).regs[8])
#define user_regs_native(pregs) true
#define ARCH_SI_TRAP TRAP_BRKPT
#define __NR(syscall, compat) \
({ \
(void)compat; \
__NR_##syscall; \
})
extern bool __compel_host_supports_gcs(void);
#define compel_host_supports_gcs __compel_host_supports_gcs
struct parasite_ctl;
extern int __parasite_setup_shstk(struct parasite_ctl *ctl,
user_fpregs_struct_t *ext_regs);
#define parasite_setup_shstk __parasite_setup_shstk
#endif /* UAPI_COMPEL_ASM_TYPES_H__ */

View file

@ -1,37 +1,48 @@
#ifndef UAPI_COMPEL_ASM_SIGFRAME_H__
#define UAPI_COMPEL_ASM_SIGFRAME_H__
#include <asm/sigcontext.h>
#include <signal.h>
#include <sys/ucontext.h>
#include <stdint.h>
#include <asm/types.h>
/* Copied from the kernel header arch/arm64/include/uapi/asm/sigcontext.h */
#define FPSIMD_MAGIC 0x46508001
#define FPSIMD_MAGIC 0x46508001
#define GCS_MAGIC 0x47435300
typedef struct fpsimd_context fpu_state_t;
typedef struct fpsimd_context fpu_state_t;
struct aux_context {
struct fpsimd_context fpsimd;
/* additional context to be added before "end" */
struct _aarch64_ctx end;
struct gcs_context {
struct _aarch64_ctx head;
__u64 gcspr;
__u64 features_enabled;
__u64 reserved;
};
// XXX: the idetifier rt_sigcontext is expected to be struct by the CRIU code
#define rt_sigcontext sigcontext
struct aux_context {
struct fpsimd_context fpsimd;
struct gcs_context gcs;
/* additional context to be added before "end" */
struct _aarch64_ctx end;
};
// XXX: the identifier rt_sigcontext is expected to be struct by the CRIU code
#define rt_sigcontext sigcontext
#include <compel/sigframe-common.h>
/* Copied from the kernel source arch/arm64/kernel/signal.c */
struct rt_sigframe {
siginfo_t info;
ucontext_t uc;
uint64_t fp;
uint64_t lr;
siginfo_t info;
ucontext_t uc;
uint64_t fp;
uint64_t lr;
};
/* clang-format off */
#define ARCH_RT_SIGRETURN(new_sp, rt_sigframe) \
asm volatile( \
"mov sp, %0 \n" \
@ -40,30 +51,30 @@ struct rt_sigframe {
: \
: "r"(new_sp) \
: "x8", "memory")
/* clang-format on */
/* cr_sigcontext is copied from arch/arm64/include/uapi/asm/sigcontext.h */
struct cr_sigcontext {
__u64 fault_address;
/* AArch64 registers */
__u64 regs[31];
__u64 sp;
__u64 pc;
__u64 pstate;
/* 4K reserved for FP/SIMD state and future expansion */
__u8 __reserved[4096] __attribute__((__aligned__(16)));
__u64 fault_address;
/* AArch64 registers */
__u64 regs[31];
__u64 sp;
__u64 pc;
__u64 pstate;
/* 4K reserved for FP/SIMD state and future expansion */
__u8 __reserved[4096] __attribute__((__aligned__(16)));
};
#define RT_SIGFRAME_UC(rt_sigframe) (&rt_sigframe->uc)
#define RT_SIGFRAME_REGIP(rt_sigframe) ((long unsigned int)(rt_sigframe)->uc.uc_mcontext.pc)
#define RT_SIGFRAME_HAS_FPU(rt_sigframe) (1)
#define RT_SIGFRAME_SIGCONTEXT(rt_sigframe) ((struct cr_sigcontext *)&(rt_sigframe)->uc.uc_mcontext)
#define RT_SIGFRAME_AUX_CONTEXT(rt_sigframe) ((struct aux_context*)&(RT_SIGFRAME_SIGCONTEXT(rt_sigframe)->__reserved))
#define RT_SIGFRAME_FPU(rt_sigframe) (&RT_SIGFRAME_AUX_CONTEXT(rt_sigframe)->fpsimd)
#define RT_SIGFRAME_OFFSET(rt_sigframe) 0
#define RT_SIGFRAME_UC(rt_sigframe) (&rt_sigframe->uc)
#define RT_SIGFRAME_REGIP(rt_sigframe) ((long unsigned int)(rt_sigframe)->uc.uc_mcontext.pc)
#define RT_SIGFRAME_HAS_FPU(rt_sigframe) (1)
#define RT_SIGFRAME_SIGCONTEXT(rt_sigframe) ((struct cr_sigcontext *)&(rt_sigframe)->uc.uc_mcontext)
#define RT_SIGFRAME_AUX_CONTEXT(rt_sigframe) ((struct aux_context *)&(RT_SIGFRAME_SIGCONTEXT(rt_sigframe)->__reserved))
#define RT_SIGFRAME_FPU(rt_sigframe) (&RT_SIGFRAME_AUX_CONTEXT(rt_sigframe)->fpsimd)
#define RT_SIGFRAME_OFFSET(rt_sigframe) 0
#define RT_SIGFRAME_GCS(rt_sigframe) (&RT_SIGFRAME_AUX_CONTEXT(rt_sigframe)->gcs)
#define rt_sigframe_erase_sigset(sigframe) \
memset(&sigframe->uc.uc_sigmask, 0, sizeof(k_rtsigset_t))
#define rt_sigframe_copy_sigset(sigframe, from) \
memcpy(&sigframe->uc.uc_sigmask, from, sizeof(k_rtsigset_t))
#define rt_sigframe_erase_sigset(sigframe) memset(&sigframe->uc.uc_sigmask, 0, sizeof(k_rtsigset_t))
#define rt_sigframe_copy_sigset(sigframe, from) memcpy(&sigframe->uc.uc_sigmask, from, sizeof(k_rtsigset_t))
#endif /* UAPI_COMPEL_ASM_SIGFRAME_H__ */

View file

@ -2,7 +2,9 @@
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <linux/elf.h>
#include <sys/auxv.h>
#include <asm/ptrace.h>
#include <compel/plugins/std/syscall-codes.h>
#include "common/page.h"
#include "uapi/compel/asm/infect-types.h"
@ -10,6 +12,9 @@
#include "errno.h"
#include "infect.h"
#include "infect-priv.h"
#include "asm/breakpoints.h"
#include "asm/gcs-types.h"
#include <linux/prctl.h>
unsigned __page_size = 0;
unsigned __page_shift = 0;
@ -18,12 +23,11 @@ unsigned __page_shift = 0;
* Injected syscall instruction
*/
const char code_syscall[] = {
0x01, 0x00, 0x00, 0xd4, /* SVC #0 */
0x00, 0x00, 0x20, 0xd4 /* BRK #0 */
0x01, 0x00, 0x00, 0xd4, /* SVC #0 */
0x00, 0x00, 0x20, 0xd4 /* BRK #0 */
};
static const int
code_syscall_aligned = round_up(sizeof(code_syscall), sizeof(long));
static const int code_syscall_aligned = round_up(sizeof(code_syscall), sizeof(long));
static inline void __always_unused __check_code_syscall(void)
{
@ -31,40 +35,66 @@ static inline void __always_unused __check_code_syscall(void)
BUILD_BUG_ON(!is_log2(sizeof(code_syscall)));
}
int sigreturn_prep_regs_plain(struct rt_sigframe *sigframe,
user_regs_struct_t *regs,
user_fpregs_struct_t *fpregs)
bool __compel_host_supports_gcs(void)
{
unsigned long hwcap = getauxval(AT_HWCAP);
return (hwcap & HWCAP_GCS) != 0;
}
static bool __compel_gcs_enabled(struct cr_user_gcs *gcs)
{
if (!compel_host_supports_gcs())
return false;
return gcs && (gcs->features_enabled & PR_SHADOW_STACK_ENABLE) != 0;
}
int sigreturn_prep_regs_plain(struct rt_sigframe *sigframe, user_regs_struct_t *regs, user_fpregs_struct_t *fpregs)
{
struct fpsimd_context *fpsimd = RT_SIGFRAME_FPU(sigframe);
struct gcs_context *gcs = RT_SIGFRAME_GCS(sigframe);
memcpy(sigframe->uc.uc_mcontext.regs, regs->regs, sizeof(regs->regs));
sigframe->uc.uc_mcontext.sp = regs->sp;
sigframe->uc.uc_mcontext.pc = regs->pc;
sigframe->uc.uc_mcontext.pstate = regs->pstate;
pr_debug("sigreturn_prep_regs_plain: sp %lx pc %lx\n", (long)regs->sp, (long)regs->pc);
memcpy(fpsimd->vregs, fpregs->vregs, 32 * sizeof(__uint128_t));
sigframe->uc.uc_mcontext.sp = regs->sp;
sigframe->uc.uc_mcontext.pc = regs->pc;
sigframe->uc.uc_mcontext.pstate = regs->pstate;
fpsimd->fpsr = fpregs->fpsr;
fpsimd->fpcr = fpregs->fpcr;
memcpy(fpsimd->vregs, fpregs->fpstate.vregs, 32 * sizeof(__uint128_t));
fpsimd->fpsr = fpregs->fpstate.fpsr;
fpsimd->fpcr = fpregs->fpstate.fpcr;
fpsimd->head.magic = FPSIMD_MAGIC;
fpsimd->head.size = sizeof(*fpsimd);
if (__compel_gcs_enabled(&fpregs->gcs)) {
gcs->head.magic = GCS_MAGIC;
gcs->head.size = sizeof(*gcs);
gcs->reserved = 0;
gcs->gcspr = fpregs->gcs.gcspr_el0 - 8;
gcs->features_enabled = fpregs->gcs.features_enabled;
pr_debug("sigframe gcspr=%llx features_enabled=%llx\n", fpregs->gcs.gcspr_el0 - 8, fpregs->gcs.features_enabled);
} else {
pr_debug("sigframe gcspr=[disabled]\n");
memset(gcs, 0, sizeof(*gcs));
}
return 0;
}
int sigreturn_prep_fpu_frame_plain(struct rt_sigframe *sigframe,
struct rt_sigframe *rsigframe)
int sigreturn_prep_fpu_frame_plain(struct rt_sigframe *sigframe, struct rt_sigframe *rsigframe)
{
return 0;
}
int get_task_regs(pid_t pid, user_regs_struct_t *regs, save_regs_t save,
void *arg, __maybe_unused unsigned long flags)
int compel_get_task_regs(pid_t pid, user_regs_struct_t *regs, user_fpregs_struct_t *ext_regs, save_regs_t save,
void *arg, __maybe_unused unsigned long flags)
{
struct iovec iov;
user_fpregs_struct_t fpsimd;
int ret;
pr_info("Dumping GP/FPU registers for %d\n", pid);
@ -76,25 +106,79 @@ int get_task_regs(pid_t pid, user_regs_struct_t *regs, save_regs_t save,
goto err;
}
iov.iov_base = &fpsimd;
iov.iov_len = sizeof(fpsimd);
iov.iov_base = &ext_regs->fpstate;
iov.iov_len = sizeof(ext_regs->fpstate);
if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_PRFPREG, &iov))) {
pr_perror("Failed to obtain FPU registers for %d", pid);
goto err;
}
ret = save(arg, regs, &fpsimd);
memset(&ext_regs->gcs, 0, sizeof(ext_regs->gcs));
iov.iov_base = &ext_regs->gcs;
iov.iov_len = sizeof(ext_regs->gcs);
if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_GCS, &iov) == 0) {
pr_info("gcs: GCSPR_EL0 for %d: 0x%llx, features: 0x%llx\n",
pid, ext_regs->gcs.gcspr_el0, ext_regs->gcs.features_enabled);
if (!__compel_gcs_enabled(&ext_regs->gcs))
pr_info("gcs: GCS is NOT enabled\n");
} else {
pr_info("gcs: GCS state not available for %d\n", pid);
}
ret = save(pid, arg, regs, ext_regs);
err:
return ret;
}
int compel_syscall(struct parasite_ctl *ctl, int nr, long *ret,
unsigned long arg1,
unsigned long arg2,
unsigned long arg3,
unsigned long arg4,
unsigned long arg5,
unsigned long arg6)
int compel_set_task_ext_regs(pid_t pid, user_fpregs_struct_t *ext_regs)
{
struct iovec iov;
struct cr_user_gcs gcs;
struct iovec gcs_iov = { .iov_base = &gcs, .iov_len = sizeof(gcs) };
pr_info("Restoring GP/FPU registers for %d\n", pid);
iov.iov_base = &ext_regs->fpstate;
iov.iov_len = sizeof(ext_regs->fpstate);
if (ptrace(PTRACE_SETREGSET, pid, NT_PRFPREG, &iov)) {
pr_perror("Failed to set FPU registers for %d", pid);
return -1;
}
if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_GCS, &gcs_iov) < 0) {
pr_warn("gcs: Failed to get GCS for %d\n", pid);
} else {
ext_regs->gcs = gcs;
compel_set_task_gcs_regs(pid, ext_regs);
}
return 0;
}
int compel_set_task_gcs_regs(pid_t pid, user_fpregs_struct_t *ext_regs)
{
struct iovec iov;
pr_info("gcs: restoring GCS registers for %d\n", pid);
pr_info("gcs: restoring GCS: gcspr=%llx features=%llx\n",
ext_regs->gcs.gcspr_el0, ext_regs->gcs.features_enabled);
iov.iov_base = &ext_regs->gcs;
iov.iov_len = sizeof(ext_regs->gcs);
if (ptrace(PTRACE_SETREGSET, pid, NT_ARM_GCS, &iov)) {
pr_perror("gcs: Failed to set GCS registers for %d", pid);
return -1;
}
return 0;
}
int compel_syscall(struct parasite_ctl *ctl, int nr, long *ret, unsigned long arg1, unsigned long arg2,
unsigned long arg3, unsigned long arg4, unsigned long arg5, unsigned long arg6)
{
user_regs_struct_t regs = ctl->orig.regs;
int err;
@ -115,15 +199,12 @@ int compel_syscall(struct parasite_ctl *ctl, int nr, long *ret,
return err;
}
void *remote_mmap(struct parasite_ctl *ctl,
void *addr, size_t length, int prot,
int flags, int fd, off_t offset)
void *remote_mmap(struct parasite_ctl *ctl, void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
long map;
int err;
err = compel_syscall(ctl, __NR_mmap, &map,
(unsigned long)addr, length, prot, flags, fd, offset);
err = compel_syscall(ctl, __NR_mmap, &map, (unsigned long)addr, length, prot, flags, fd, offset);
if (err < 0 || (long)map < 0)
map = 0;
@ -150,9 +231,7 @@ int arch_fetch_sas(struct parasite_ctl *ctl, struct rt_sigframe *s)
long ret;
int err;
err = compel_syscall(ctl, __NR_sigaltstack,
&ret, 0, (unsigned long)&s->uc.uc_stack,
0, 0, 0, 0);
err = compel_syscall(ctl, __NR_sigaltstack, &ret, 0, (unsigned long)&s->uc.uc_stack, 0, 0, 0, 0);
return err ? err : ret;
}
@ -176,3 +255,175 @@ unsigned long compel_task_size(void)
return task_size;
}
static struct hwbp_cap *ptrace_get_hwbp_cap(pid_t pid)
{
static struct hwbp_cap info;
static int available = -1;
if (available == -1) {
unsigned int val;
struct iovec iovec = {
.iov_base = &val,
.iov_len = sizeof(val),
};
if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_HW_BREAK, &iovec) < 0)
available = 0;
else {
info.arch = (char)((val >> 8) & 0xff);
info.bp_count = (char)(val & 0xff);
available = (info.arch != 0);
}
}
return available == 1 ? &info : NULL;
}
int ptrace_set_breakpoint(pid_t pid, void *addr)
{
k_rtsigset_t block;
struct hwbp_cap *info = ptrace_get_hwbp_cap(pid);
struct user_hwdebug_state regs = {};
unsigned int ctrl = 0;
struct iovec iovec;
if (info == NULL || info->bp_count == 0)
return 0;
/*
* The struct is copied from `arch/arm64/include/asm/hw_breakpoint.h` in
* linux kernel:
* struct arch_hw_breakpoint_ctrl {
* __u32 __reserved : 19,
* len : 8,
* type : 2,
* privilege : 2,
* enabled : 1;
* };
*
* The part of `struct arch_hw_breakpoint_ctrl` bits meaning is defined
* in <<ARM Architecture Reference Manual for A-profile architecture>>,
* D13.3.2 DBGBCR<n>_EL1, Debug Breakpoint Control Registers.
*/
ctrl = ARM_BREAKPOINT_LEN_4;
ctrl = (ctrl << 2) | ARM_BREAKPOINT_EXECUTE;
ctrl = (ctrl << 2) | AARCH64_BREAKPOINT_EL0;
ctrl = (ctrl << 1) | ENABLE_HBP;
regs.dbg_regs[0].addr = (__u64)addr;
regs.dbg_regs[0].ctrl = ctrl;
iovec.iov_base = &regs;
iovec.iov_len = (offsetof(struct user_hwdebug_state, dbg_regs) + sizeof(regs.dbg_regs[0]));
if (ptrace(PTRACE_SETREGSET, pid, NT_ARM_HW_BREAK, &iovec))
return -1;
/*
* FIXME(issues/1429): SIGTRAP can't be blocked, otherwise its handler
* will be reset to the default one.
*/
ksigfillset(&block);
ksigdelset(&block, SIGTRAP);
if (ptrace(PTRACE_SETSIGMASK, pid, sizeof(k_rtsigset_t), &block)) {
pr_perror("Can't block signals for %d", pid);
return -1;
}
if (ptrace(PTRACE_CONT, pid, NULL, NULL) != 0) {
pr_perror("Unable to restart the stopped tracee process %d", pid);
return -1;
}
return 1;
}
int ptrace_flush_breakpoints(pid_t pid)
{
struct hwbp_cap *info = ptrace_get_hwbp_cap(pid);
struct user_hwdebug_state regs = {};
unsigned int ctrl = 0;
struct iovec iovec;
if (info == NULL || info->bp_count == 0)
return 0;
ctrl = ARM_BREAKPOINT_LEN_4;
ctrl = (ctrl << 2) | ARM_BREAKPOINT_EXECUTE;
ctrl = (ctrl << 2) | AARCH64_BREAKPOINT_EL0;
ctrl = (ctrl << 1) | DISABLE_HBP;
regs.dbg_regs[0].addr = 0ul;
regs.dbg_regs[0].ctrl = ctrl;
iovec.iov_base = &regs;
iovec.iov_len = (offsetof(struct user_hwdebug_state, dbg_regs) + sizeof(regs.dbg_regs[0]));
if (ptrace(PTRACE_SETREGSET, pid, NT_ARM_HW_BREAK, &iovec))
return -1;
return 0;
}
int inject_gcs_cap_token(struct parasite_ctl *ctl, pid_t pid, struct cr_user_gcs *gcs)
{
struct iovec gcs_iov = { .iov_base = gcs, .iov_len = sizeof(*gcs) };
uint64_t token_addr = gcs->gcspr_el0 - 8;
uint64_t sigtramp_addr = gcs->gcspr_el0 - 16;
uint64_t cap_token = ALIGN_DOWN(GCS_SIGNAL_CAP(token_addr), 8);
unsigned long restorer_addr;
pr_info("gcs: (setup) CAP token: 0x%lx at addr: 0x%lx\n", cap_token, token_addr);
/* Inject capability token at gcspr_el0 - 8 */
if (ptrace(PTRACE_POKEDATA, pid, (void *)token_addr, cap_token)) {
pr_perror("gcs: (setup) Inject GCS cap token failed");
return -1;
}
/* Inject restorer trampoline address (gcspr_el0 - 16) */
restorer_addr = ctl->parasite_ip;
if (ptrace(PTRACE_POKEDATA, pid, (void *)sigtramp_addr, restorer_addr)) {
pr_perror("gcs: (setup) Inject GCS restorer failed");
return -1;
}
/* Update GCSPR_EL0 */
gcs->gcspr_el0 = token_addr;
if (ptrace(PTRACE_SETREGSET, pid, NT_ARM_GCS, &gcs_iov)) {
pr_perror("gcs: PTRACE_SETREGS FAILED");
return -1;
}
pr_debug("gcs: parasite_ip=%#lx sp=%#llx gcspr_el0=%#llx\n",
ctl->parasite_ip, ctl->orig.regs.sp, gcs->gcspr_el0);
return 0;
}
int parasite_setup_shstk(struct parasite_ctl *ctl, user_fpregs_struct_t *ext_regs)
{
struct cr_user_gcs gcs;
struct iovec gcs_iov = { .iov_base = &gcs, .iov_len = sizeof(gcs) };
pid_t pid = ctl->rpid;
if(!__compel_host_supports_gcs())
return 0;
if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_GCS, &gcs_iov) != 0) {
pr_perror("GCS state not available for %d", pid);
return -1;
}
if (!__compel_gcs_enabled(&gcs))
return 0;
if (inject_gcs_cap_token(ctl, pid, &gcs)) {
pr_perror("Failed to inject GCS cap token for %d", pid);
return -1;
}
pr_info("gcs: GCS enabled for %d\n", pid);
return 0;
}

View file

@ -1,7 +1,7 @@
#ifndef COMPEL_ARCH_SYSCALL_TYPES_H__
#define COMPEL_ARCH_SYSCALL_TYPES_H__
#define SA_RESTORER 0x04000000
#define SA_RESTORER 0x04000000
typedef void rt_signalfn_t(int, siginfo_t *, void *);
typedef rt_signalfn_t *rt_sighandler_t;
@ -9,20 +9,20 @@ typedef rt_signalfn_t *rt_sighandler_t;
typedef void rt_restorefn_t(void);
typedef rt_restorefn_t *rt_sigrestore_t;
#define _KNSIG 64
#define _NSIG_BPW 32
#define _KNSIG 64
#define _NSIG_BPW 32
#define _KNSIG_WORDS (_KNSIG / _NSIG_BPW)
#define _KNSIG_WORDS (_KNSIG / _NSIG_BPW)
typedef struct {
unsigned long sig[_KNSIG_WORDS];
} k_rtsigset_t;
typedef struct {
rt_sighandler_t rt_sa_handler;
unsigned long rt_sa_flags;
rt_sigrestore_t rt_sa_restorer;
k_rtsigset_t rt_sa_mask;
rt_sighandler_t rt_sa_handler;
unsigned long rt_sa_flags;
rt_sigrestore_t rt_sa_restorer;
k_rtsigset_t rt_sa_mask;
} rt_sigaction_t;
#endif /* COMPEL_ARCH_SYSCALL_TYPES_H__ */

View file

@ -2,21 +2,7 @@
.section .head.text, "ax"
ENTRY(__export_parasite_head_start)
sub r2, pc, #8 @ get the address of this instruction
adr r0, __export_parasite_cmd
ldr r0, [r0]
adr r1, parasite_args_ptr
ldr r1, [r1]
add r1, r1, r2 @ fixup __export_parasite_args
bl parasite_service
.byte 0xf0, 0x01, 0xf0, 0xe7 @ the instruction UDF #32 generates the signal SIGTRAP in Linux
parasite_args_ptr:
.long __export_parasite_args
__export_parasite_cmd:
.long 0
END(__export_parasite_head_start)

View file

@ -1,27 +1,27 @@
#ifndef __NR_mmap2
# define __NR_mmap2 192
#define __NR_mmap2 192
#endif
#ifndef __ARM_NR_BASE
# define __ARM_NR_BASE 0x0f0000
#define __ARM_NR_BASE 0x0f0000
#endif
#ifndef __ARM_NR_breakpoint
# define __ARM_NR_breakpoint (__ARM_NR_BASE+1)
#define __ARM_NR_breakpoint (__ARM_NR_BASE + 1)
#endif
#ifndef __ARM_NR_cacheflush
# define __ARM_NR_cacheflush (__ARM_NR_BASE+2)
#define __ARM_NR_cacheflush (__ARM_NR_BASE + 2)
#endif
#ifndef __ARM_NR_usr26
# define __ARM_NR_usr26 (__ARM_NR_BASE+3)
#define __ARM_NR_usr26 (__ARM_NR_BASE + 3)
#endif
#ifndef __ARM_NR_usr32
# define __ARM_NR_usr32 (__ARM_NR_BASE+4)
#define __ARM_NR_usr32 (__ARM_NR_BASE + 4)
#endif
#ifndef __ARM_NR_set_tls
# define __ARM_NR_set_tls (__ARM_NR_BASE+5)
#define __ARM_NR_set_tls (__ARM_NR_BASE + 5)
#endif

View file

@ -39,7 +39,7 @@ recvfrom 207 292 (int sockfd, void *ubuf, size_t size, unsigned int flags, str
sendmsg 211 296 (int sockfd, const struct msghdr *msg, int flags)
recvmsg 212 297 (int sockfd, struct msghdr *msg, int flags)
shutdown 210 293 (int sockfd, int how)
bind 235 282 (int sockfd, const struct sockaddr *addr, int addrlen)
bind 200 282 (int sockfd, const struct sockaddr *addr, int addrlen)
setsockopt 208 294 (int sockfd, int level, int optname, const void *optval, socklen_t optlen)
getsockopt 209 295 (int sockfd, int level, int optname, const void *optval, socklen_t *optlen)
clone 220 120 (unsigned long flags, void *child_stack, void *parent_tid, unsigned long newtls, void *child_tid)
@ -85,7 +85,7 @@ timer_settime 110 258 (kernel_timer_t timer_id, int flags, const struct itimer
timer_gettime 108 259 (int timer_id, const struct itimerspec *setting)
timer_getoverrun 109 260 (int timer_id)
timer_delete 111 261 (kernel_timer_t timer_id)
clock_gettime 113 263 (const clockid_t which_clock, const struct timespec *tp)
clock_gettime 113 263 (clockid_t which_clock, struct timespec *tp)
exit_group 94 248 (int error_code)
set_robust_list 99 338 (struct robust_list_head *head, size_t len)
get_robust_list 100 339 (int pid, struct robust_list_head **head_ptr, size_t *len_ptr)
@ -112,3 +112,16 @@ userfaultfd 282 388 (int flags)
fallocate 47 352 (int fd, int mode, loff_t offset, loff_t len)
cacheflush ! 983042 (void *start, void *end, int flags)
ppoll 73 336 (struct pollfd *fds, unsigned int nfds, const struct timespec *tmo, const sigset_t *sigmask, size_t sigsetsize)
open_tree 428 428 (int dirfd, const char *pathname, unsigned int flags)
move_mount 429 429 (int from_dfd, const char *from_pathname, int to_dfd, const char *to_pathname, int flags)
fsopen 430 430 (char *fsname, unsigned int flags)
fsconfig 431 431 (int fd, unsigned int cmd, const char *key, const char *value, int aux)
fsmount 432 432 (int fd, unsigned int flags, unsigned int attr_flags)
clone3 435 435 (struct clone_args *uargs, size_t size)
close_range 436 436 (unsigned int fd, unsigned int max_fd, unsigned int flags)
pidfd_open 434 434 (pid_t pid, unsigned int flags)
openat2 437 437 (int dirfd, char *pathname, struct open_how *how, size_t size)
pidfd_getfd 438 438 (int pidfd, int targetfd, unsigned int flags)
rseq 293 398 (void *rseq, uint32_t rseq_len, int flags, uint32_t sig)
membarrier 283 389 (int cmd, unsigned int flags, int cpu_id)
map_shadow_stack 453 ! (unsigned long addr, unsigned long size, unsigned int flags)

View file

@ -29,8 +29,4 @@ SECTIONS
*(.eh_frame*)
*(*)
}
/* Parasite args should have 4 bytes align, as we have futex inside. */
. = ALIGN(4);
__export_parasite_args = .;
}

View file

@ -1,14 +1,12 @@
#include <string.h>
#include "uapi/compel.h"
#include <errno.h>
#include "handle-elf.h"
#include "piegen.h"
#include "log.h"
static const unsigned char __maybe_unused
elf_ident_32[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x01, 0x01, 0x01, 0x00,
static const unsigned char __maybe_unused elf_ident_32[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x01, 0x01, 0x01, 0x00, /* clang-format */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};

View file

@ -3,8 +3,8 @@
#include "elf32-types.h"
#define __handle_elf handle_elf_arm
#define arch_is_machine_supported(e_machine) (e_machine == EM_ARM)
#define __handle_elf handle_elf_arm
#define arch_is_machine_supported(e_machine) (e_machine == EM_ARM)
extern int handle_elf_arm(void *mem, size_t size);

View file

@ -1,4 +1,8 @@
#ifndef __COMPEL_SYSCALL_H__
#define __COMPEL_SYSCALL_H__
#define __NR(syscall, compat) __NR_##syscall
#define __NR(syscall, compat) \
({ \
(void)compat; \
__NR_##syscall; \
})
#endif

View file

@ -1,6 +1,7 @@
#ifndef UAPI_COMPEL_ASM_CPU_H__
#define UAPI_COMPEL_ASM_CPU_H__
typedef struct { } compel_cpuinfo_t;
typedef struct {
} compel_cpuinfo_t;
#endif /* UAPI_COMPEL_ASM_CPU_H__ */

View file

@ -4,8 +4,8 @@
#include <stdint.h>
#include <sys/mman.h>
#define SIGMAX 64
#define SIGMAX_OLD 31
#define SIGMAX 64
#define SIGMAX_OLD 31
/*
* Copied from the Linux kernel header arch/arm/include/asm/ptrace.h
@ -14,53 +14,62 @@
*/
typedef struct {
long uregs[18];
long uregs[18];
} user_regs_struct_t;
typedef struct user_vfp user_fpregs_struct_t;
#define __compel_arch_fetch_thread_area(tid, th) 0
#define compel_arch_fetch_thread_area(tctl) 0
#define compel_arch_get_tls_task(ctl, tls)
#define compel_arch_get_tls_thread(tctl, tls)
#define ARM_cpsr uregs[16]
#define ARM_pc uregs[15]
#define ARM_lr uregs[14]
#define ARM_sp uregs[13]
#define ARM_ip uregs[12]
#define ARM_fp uregs[11]
#define ARM_r10 uregs[10]
#define ARM_r9 uregs[9]
#define ARM_r8 uregs[8]
#define ARM_r7 uregs[7]
#define ARM_r6 uregs[6]
#define ARM_r5 uregs[5]
#define ARM_r4 uregs[4]
#define ARM_r3 uregs[3]
#define ARM_r2 uregs[2]
#define ARM_r1 uregs[1]
#define ARM_r0 uregs[0]
#define ARM_ORIG_r0 uregs[17]
typedef struct user_vfp user_fpregs_struct_t;
#define ARM_cpsr uregs[16]
#define ARM_pc uregs[15]
#define ARM_lr uregs[14]
#define ARM_sp uregs[13]
#define ARM_ip uregs[12]
#define ARM_fp uregs[11]
#define ARM_r10 uregs[10]
#define ARM_r9 uregs[9]
#define ARM_r8 uregs[8]
#define ARM_r7 uregs[7]
#define ARM_r6 uregs[6]
#define ARM_r5 uregs[5]
#define ARM_r4 uregs[4]
#define ARM_r3 uregs[3]
#define ARM_r2 uregs[2]
#define ARM_r1 uregs[1]
#define ARM_r0 uregs[0]
#define ARM_ORIG_r0 uregs[17]
/* Copied from arch/arm/include/asm/user.h */
struct user_vfp {
unsigned long long fpregs[32];
unsigned long fpscr;
unsigned long long fpregs[32];
unsigned long fpscr;
};
struct user_vfp_exc {
unsigned long fpexc;
unsigned long fpinst;
unsigned long fpinst2;
unsigned long fpexc;
unsigned long fpinst;
unsigned long fpinst2;
};
#define REG_RES(regs) ((regs).ARM_r0)
#define REG_IP(regs) ((regs).ARM_pc)
#define REG_SP(regs) ((regs).ARM_sp)
#define REG_SYSCALL_NR(regs) ((regs).ARM_r7)
#define REG_RES(regs) ((regs).ARM_r0)
#define REG_IP(regs) ((regs).ARM_pc)
#define SET_REG_IP(regs, val) ((regs).ARM_pc = (val))
#define REG_SP(regs) ((regs).ARM_sp)
#define REG_SYSCALL_NR(regs) ((regs).ARM_r7)
#define user_regs_native(pregs) true
#define user_regs_native(pregs) true
#define ARCH_SI_TRAP TRAP_BRKPT
#define ARCH_SI_TRAP TRAP_BRKPT
#define __NR(syscall, compat) __NR_##syscall
#define __NR(syscall, compat) \
({ \
(void)compat; \
__NR_##syscall; \
})
#endif /* UAPI_COMPEL_ASM_TYPES_H__ */

View file

@ -6,37 +6,37 @@
/*
* PSR bits
*/
#define USR26_MODE 0x00000000
#define FIQ26_MODE 0x00000001
#define IRQ26_MODE 0x00000002
#define SVC26_MODE 0x00000003
#define USR_MODE 0x00000010
#define FIQ_MODE 0x00000011
#define IRQ_MODE 0x00000012
#define SVC_MODE 0x00000013
#define ABT_MODE 0x00000017
#define UND_MODE 0x0000001b
#define SYSTEM_MODE 0x0000001f
#define MODE32_BIT 0x00000010
#define MODE_MASK 0x0000001f
#define PSR_T_BIT 0x00000020
#define PSR_F_BIT 0x00000040
#define PSR_I_BIT 0x00000080
#define PSR_A_BIT 0x00000100
#define PSR_E_BIT 0x00000200
#define PSR_J_BIT 0x01000000
#define PSR_Q_BIT 0x08000000
#define PSR_V_BIT 0x10000000
#define PSR_C_BIT 0x20000000
#define PSR_Z_BIT 0x40000000
#define PSR_N_BIT 0x80000000
#define USR26_MODE 0x00000000
#define FIQ26_MODE 0x00000001
#define IRQ26_MODE 0x00000002
#define SVC26_MODE 0x00000003
#define USR_MODE 0x00000010
#define FIQ_MODE 0x00000011
#define IRQ_MODE 0x00000012
#define SVC_MODE 0x00000013
#define ABT_MODE 0x00000017
#define UND_MODE 0x0000001b
#define SYSTEM_MODE 0x0000001f
#define MODE32_BIT 0x00000010
#define MODE_MASK 0x0000001f
#define PSR_T_BIT 0x00000020
#define PSR_F_BIT 0x00000040
#define PSR_I_BIT 0x00000080
#define PSR_A_BIT 0x00000100
#define PSR_E_BIT 0x00000200
#define PSR_J_BIT 0x01000000
#define PSR_Q_BIT 0x08000000
#define PSR_V_BIT 0x10000000
#define PSR_C_BIT 0x20000000
#define PSR_Z_BIT 0x40000000
#define PSR_N_BIT 0x80000000
/*
* Groups of PSR bits
*/
#define PSR_f 0xff000000 /* Flags */
#define PSR_s 0x00ff0000 /* Status */
#define PSR_x 0x0000ff00 /* Extension */
#define PSR_c 0x000000ff /* Control */
#define PSR_f 0xff000000 /* Flags */
#define PSR_s 0x00ff0000 /* Status */
#define PSR_x 0x0000ff00 /* Extension */
#define PSR_c 0x000000ff /* Control */
#endif

View file

@ -6,42 +6,42 @@
/* Copied from the Linux kernel header arch/arm/include/asm/sigcontext.h */
struct rt_sigcontext {
unsigned long trap_no;
unsigned long error_code;
unsigned long oldmask;
unsigned long arm_r0;
unsigned long arm_r1;
unsigned long arm_r2;
unsigned long arm_r3;
unsigned long arm_r4;
unsigned long arm_r5;
unsigned long arm_r6;
unsigned long arm_r7;
unsigned long arm_r8;
unsigned long arm_r9;
unsigned long arm_r10;
unsigned long arm_fp;
unsigned long arm_ip;
unsigned long arm_sp;
unsigned long arm_lr;
unsigned long arm_pc;
unsigned long arm_cpsr;
unsigned long fault_address;
unsigned long trap_no;
unsigned long error_code;
unsigned long oldmask;
unsigned long arm_r0;
unsigned long arm_r1;
unsigned long arm_r2;
unsigned long arm_r3;
unsigned long arm_r4;
unsigned long arm_r5;
unsigned long arm_r6;
unsigned long arm_r7;
unsigned long arm_r8;
unsigned long arm_r9;
unsigned long arm_r10;
unsigned long arm_fp;
unsigned long arm_ip;
unsigned long arm_sp;
unsigned long arm_lr;
unsigned long arm_pc;
unsigned long arm_cpsr;
unsigned long fault_address;
};
/* Copied from the Linux kernel header arch/arm/include/asm/ucontext.h */
#define VFP_MAGIC 0x56465001
#define VFP_STORAGE_SIZE sizeof(struct vfp_sigframe)
#define VFP_MAGIC 0x56465001
#define VFP_STORAGE_SIZE sizeof(struct vfp_sigframe)
struct vfp_sigframe {
unsigned long magic;
unsigned long size;
struct user_vfp ufp;
struct user_vfp_exc ufp_exc;
unsigned long magic;
unsigned long size;
struct user_vfp ufp;
struct user_vfp_exc ufp_exc;
};
typedef struct vfp_sigframe fpu_state_t;
typedef struct vfp_sigframe fpu_state_t;
struct aux_sigframe {
/*
@ -49,23 +49,23 @@ struct aux_sigframe {
struct iwmmxt_sigframe iwmmxt;
*/
struct vfp_sigframe vfp;
unsigned long end_magic;
struct vfp_sigframe vfp;
unsigned long end_magic;
} __attribute__((aligned(8)));
#include <compel/sigframe-common.h>
struct sigframe {
struct rt_ucontext uc;
unsigned long retcode[2];
struct rt_ucontext uc;
unsigned long retcode[2];
};
struct rt_sigframe {
struct rt_siginfo info;
struct sigframe sig;
struct rt_siginfo info;
struct sigframe sig;
};
/* clang-format off */
#define ARCH_RT_SIGRETURN(new_sp, rt_sigframe) \
asm volatile( \
"mov sp, %0 \n" \
@ -74,17 +74,16 @@ struct rt_sigframe {
: \
: "r"(new_sp) \
: "memory")
/* clang-format on */
#define RT_SIGFRAME_UC(rt_sigframe) (&rt_sigframe->sig.uc)
#define RT_SIGFRAME_REGIP(rt_sigframe) (rt_sigframe)->sig.uc.uc_mcontext.arm_ip
#define RT_SIGFRAME_HAS_FPU(rt_sigframe) 1
#define RT_SIGFRAME_AUX_SIGFRAME(rt_sigframe) ((struct aux_sigframe *)&(rt_sigframe)->sig.uc.uc_regspace)
#define RT_SIGFRAME_FPU(rt_sigframe) (&RT_SIGFRAME_AUX_SIGFRAME(rt_sigframe)->vfp)
#define RT_SIGFRAME_OFFSET(rt_sigframe) 0
#define RT_SIGFRAME_UC(rt_sigframe) (&rt_sigframe->sig.uc)
#define RT_SIGFRAME_REGIP(rt_sigframe) (rt_sigframe)->sig.uc.uc_mcontext.arm_ip
#define RT_SIGFRAME_HAS_FPU(rt_sigframe) 1
#define RT_SIGFRAME_AUX_SIGFRAME(rt_sigframe) ((struct aux_sigframe *)&(rt_sigframe)->sig.uc.uc_regspace)
#define RT_SIGFRAME_FPU(rt_sigframe) (&RT_SIGFRAME_AUX_SIGFRAME(rt_sigframe)->vfp)
#define RT_SIGFRAME_OFFSET(rt_sigframe) 0
#define rt_sigframe_erase_sigset(sigframe) \
memset(&sigframe->sig.uc.uc_sigmask, 0, sizeof(k_rtsigset_t))
#define rt_sigframe_copy_sigset(sigframe, from) \
memcpy(&sigframe->sig.uc.uc_sigmask, from, sizeof(k_rtsigset_t))
#define rt_sigframe_erase_sigset(sigframe) memset(&sigframe->sig.uc.uc_sigmask, 0, sizeof(k_rtsigset_t))
#define rt_sigframe_copy_sigset(sigframe, from) memcpy(&sigframe->sig.uc.uc_sigmask, from, sizeof(k_rtsigset_t))
#endif /* UAPI_COMPEL_ASM_SIGFRAME_H__ */

View file

@ -1,8 +1,11 @@
#include <stdlib.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <string.h>
#include <compel/plugins/std/syscall-codes.h>
#include <compel/asm/processor-flags.h>
#include <errno.h>
#include "common/page.h"
#include "uapi/compel/asm/infect-types.h"
#include "log.h"
@ -14,12 +17,11 @@
* Injected syscall instruction
*/
const char code_syscall[] = {
0x00, 0x00, 0x00, 0xef, /* SVC #0 */
0xf0, 0x01, 0xf0, 0xe7 /* UDF #32 */
0x00, 0x00, 0x00, 0xef, /* SVC #0 */
0xf0, 0x01, 0xf0, 0xe7 /* UDF #32 */
};
static const int
code_syscall_aligned = round_up(sizeof(code_syscall), sizeof(long));
static const int code_syscall_aligned = round_up(sizeof(code_syscall), sizeof(long));
static inline __always_unused void __check_code_syscall(void)
{
@ -27,9 +29,7 @@ static inline __always_unused void __check_code_syscall(void)
BUILD_BUG_ON(!is_log2(sizeof(code_syscall)));
}
int sigreturn_prep_regs_plain(struct rt_sigframe *sigframe,
user_regs_struct_t *regs,
user_fpregs_struct_t *fpregs)
int sigreturn_prep_regs_plain(struct rt_sigframe *sigframe, user_regs_struct_t *regs, user_fpregs_struct_t *fpregs)
{
struct aux_sigframe *aux = (struct aux_sigframe *)(void *)&sigframe->sig.uc.uc_regspace;
@ -59,22 +59,20 @@ int sigreturn_prep_regs_plain(struct rt_sigframe *sigframe,
return 0;
}
int sigreturn_prep_fpu_frame_plain(struct rt_sigframe *sigframe,
struct rt_sigframe *rsigframe)
int sigreturn_prep_fpu_frame_plain(struct rt_sigframe *sigframe, struct rt_sigframe *rsigframe)
{
return 0;
}
#define PTRACE_GETVFPREGS 27
int get_task_regs(pid_t pid, user_regs_struct_t *regs, save_regs_t save,
void *arg, __maybe_unused unsigned long flags)
int compel_get_task_regs(pid_t pid, user_regs_struct_t *regs, user_fpregs_struct_t *vfp, save_regs_t save,
void *arg, __maybe_unused unsigned long flags)
{
user_fpregs_struct_t vfp;
int ret = -1;
pr_info("Dumping GP/FPU registers for %d\n", pid);
if (ptrace(PTRACE_GETVFPREGS, pid, NULL, &vfp)) {
if (ptrace(PTRACE_GETVFPREGS, pid, NULL, vfp)) {
pr_perror("Can't obtain FPU registers for %d", pid);
goto err;
}
@ -90,24 +88,30 @@ int get_task_regs(pid_t pid, user_regs_struct_t *regs, save_regs_t save,
regs->ARM_pc -= 4;
break;
case -ERESTART_RESTARTBLOCK:
regs->ARM_r0 = __NR_restart_syscall;
regs->ARM_pc -= 4;
pr_warn("Will restore %d with interrupted system call\n", pid);
regs->ARM_r0 = -EINTR;
break;
}
}
ret = save(arg, regs, &vfp);
ret = save(pid, arg, regs, vfp);
err:
return ret;
}
int compel_syscall(struct parasite_ctl *ctl, int nr, long *ret,
unsigned long arg1,
unsigned long arg2,
unsigned long arg3,
unsigned long arg4,
unsigned long arg5,
unsigned long arg6)
int compel_set_task_ext_regs(pid_t pid, user_fpregs_struct_t *ext_regs)
{
pr_info("Restoring GP/FPU registers for %d\n", pid);
if (ptrace(PTRACE_SETVFPREGS, pid, NULL, ext_regs)) {
pr_perror("Can't set FPU registers for %d", pid);
return -1;
}
return 0;
}
int compel_syscall(struct parasite_ctl *ctl, int nr, long *ret, unsigned long arg1, unsigned long arg2,
unsigned long arg3, unsigned long arg4, unsigned long arg5, unsigned long arg6)
{
user_regs_struct_t regs = ctl->orig.regs;
int err;
@ -126,9 +130,7 @@ int compel_syscall(struct parasite_ctl *ctl, int nr, long *ret,
return err;
}
void *remote_mmap(struct parasite_ctl *ctl,
void *addr, size_t length, int prot,
int flags, int fd, off_t offset)
void *remote_mmap(struct parasite_ctl *ctl, void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
long map;
int err;
@ -136,8 +138,7 @@ void *remote_mmap(struct parasite_ctl *ctl,
if (offset & ~PAGE_MASK)
return 0;
err = compel_syscall(ctl, __NR_mmap2, &map,
(unsigned long)addr, length, prot, flags, fd, offset >> 12);
err = compel_syscall(ctl, __NR_mmap2, &map, (unsigned long)addr, length, prot, flags, fd, offset >> 12);
if (err < 0 || map > ctl->ictx.task_size)
map = 0;
@ -167,9 +168,7 @@ int arch_fetch_sas(struct parasite_ctl *ctl, struct rt_sigframe *s)
long ret;
int err;
err = compel_syscall(ctl, __NR_sigaltstack,
&ret, 0, (unsigned long)&s->sig.uc.uc_stack,
0, 0, 0, 0);
err = compel_syscall(ctl, __NR_sigaltstack, &ret, 0, (unsigned long)&s->sig.uc.uc_stack, 0, 0, 0, 0);
return err ? err : ret;
}
@ -178,9 +177,9 @@ int arch_fetch_sas(struct parasite_ctl *ctl, struct rt_sigframe *s)
* arch/arm/include/asm/memory.h
* arch/arm/Kconfig (PAGE_OFFSET values in Memory split section)
*/
#define TASK_SIZE_MIN 0x3f000000
#define TASK_SIZE_MAX 0xbf000000
#define SZ_1G 0x40000000
#define TASK_SIZE_MIN 0x3f000000
#define TASK_SIZE_MAX 0xbf000000
#define SZ_1G 0x40000000
unsigned long compel_task_size(void)
{
@ -192,4 +191,3 @@ unsigned long compel_task_size(void)
return task_size;
}

View file

@ -0,0 +1,35 @@
#ifndef __ASM_PROLOGUE_H__
#define __ASM_PROLOGUE_H__
#ifndef __ASSEMBLY__
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <errno.h>
#define sys_recv(sockfd, ubuf, size, flags) sys_recvfrom(sockfd, ubuf, size, flags, NULL, NULL)
typedef struct prologue_init_args {
struct sockaddr_un ctl_sock_addr;
unsigned int ctl_sock_addr_len;
unsigned int arg_s;
void *arg_p;
void *sigframe;
} prologue_init_args_t;
#endif /* __ASSEMBLY__ */
/*
* Reserve enough space for sigframe.
*
* FIXME It is rather should be taken from sigframe header.
*/
#define PROLOGUE_SGFRAME_SIZE 4096
#define PROLOGUE_INIT_ARGS_SIZE 1024
#endif /* __ASM_PROLOGUE_H__ */

View file

@ -0,0 +1,30 @@
#ifndef COMPEL_ARCH_SYSCALL_TYPES_H__
#define COMPEL_ARCH_SYSCALL_TYPES_H__
#include <common/asm/bitsperlong.h>
/* Types for sigaction, sigprocmask syscalls */
typedef void rt_signalfn_t(int, siginfo_t *, void *);
typedef rt_signalfn_t *rt_sighandler_t;
typedef void rt_restorefn_t(void);
typedef rt_restorefn_t *rt_sigrestore_t;
/* refer to arch/loongarch/include/uapi/asm/signal.h */
#define _KNSIG 64
#define _NSIG_BPW BITS_PER_LONG
#define _KNSIG_WORDS (_KNSIG / _NSIG_BPW)
typedef struct {
uint64_t sig[_KNSIG_WORDS];
} k_rtsigset_t;
typedef struct {
rt_sighandler_t rt_sa_handler;
unsigned long rt_sa_flags;
rt_sigrestore_t rt_sa_restorer;
k_rtsigset_t rt_sa_mask;
} rt_sigaction_t;
#define SA_RESTORER 0x04000000
#endif /* COMPEL_ARCH_SYSCALL_TYPES_H__ */

View file

@ -0,0 +1,4 @@
#ifndef __COMPEL_ARCH_FEATURES_H
#define __COMPEL_ARCH_FEATURES_H
#endif /* __COMPEL_ARCH_FEATURES_H */

View file

@ -0,0 +1,9 @@
#include "common/asm/linkage.h"
.section .head.text, "ax"
ENTRY(__export_parasite_head_start)
bl parasite_service;
break 0;
END(__export_parasite_head_start)

View file

@ -0,0 +1,117 @@
std-lib-y += ./$(PLUGIN_ARCH_DIR)/std/syscalls-64.o
sys-proto-types := $(obj)/include/uapi/std/syscall-types.h
sys-proto-generic := $(obj)/include/uapi/std/syscall.h
sys-codes-generic := $(obj)/include/uapi/std/syscall-codes.h
sys-codes = $(obj)/include/uapi/std/syscall-codes-$(1).h
sys-proto = $(obj)/include/uapi/std/syscall-$(1).h
sys-def = $(PLUGIN_ARCH_DIR)/std/syscalls/syscall_$(1).tbl
sys-asm = $(PLUGIN_ARCH_DIR)/std/syscalls-$(1).S
sys-asm-common-name = std/syscalls/syscall-common-loongarch-$(1).S
sys-asm-common = $(PLUGIN_ARCH_DIR)/$(sys-asm-common-name)
sys-asm-types := $(obj)/include/uapi/std/asm/syscall-types.h
sys-exec-tbl = $(PLUGIN_ARCH_DIR)/std/sys-exec-tbl-$(1).c
sys-bits := 64
AV := $$$$
define gen-rule-sys-codes
$(sys-codes): $(sys-def) $(sys-proto-types)
$(call msg-gen, $$@)
$(Q) echo "/* Autogenerated, don't edit */" > $$@
$(Q) echo "#ifndef ASM_SYSCALL_CODES_H_$(1)__" >> $$@
$(Q) echo "#define ASM_SYSCALL_CODES_H_$(1)__" >> $$@
$(Q) cat $$< | awk '/^__NR/{SYSN=$(AV)1; \
sub("^__NR", "SYS", SYSN); \
print "\n#ifndef ", $(AV)1; \
print "#define", $(AV)1, $(AV)2; \
print "#endif"; \
print "\n#ifndef ", SYSN; \
print "#define ", SYSN, $(AV)1; \
print "#endif";}' >> $$@
$(Q) echo "#endif /* ASM_SYSCALL_CODES_H_$(1)__ */" >> $$@
endef
define gen-rule-sys-proto
$(sys-proto): $(sys-def) $(sys-proto-types)
$(call msg-gen, $$@)
$(Q) echo "/* Autogenerated, don't edit */" > $$@
$(Q) echo "#ifndef ASM_SYSCALL_PROTO_H_$(1)__" >> $$@
$(Q) echo "#define ASM_SYSCALL_PROTO_H_$(1)__" >> $$@
$(Q) echo '#include <compel/plugins/std/syscall-codes-$(1).h>' >> $$@
$(Q) echo '#include <compel/plugins/std/syscall-types.h>' >> $$@
ifeq ($(1),32)
$(Q) echo '#include "asm/syscall32.h"' >> $$@
endif
$(Q) cat $$< | awk '/^__NR/{print "extern long", $(AV)3, \
substr($(AV)0, index($(AV)0,$(AV)4)), ";"}' >> $$@
$(Q) echo "#endif /* ASM_SYSCALL_PROTO_H_$(1)__ */" >> $$@
endef
define gen-rule-sys-asm
$(sys-asm): $(sys-def) $(sys-asm-common) $(sys-codes) $(sys-proto) $(sys-proto-types)
$(call msg-gen, $$@)
$(Q) echo "/* Autogenerated, don't edit */" > $$@
$(Q) echo '#include <compel/plugins/std/syscall-codes-$(1).h>' >> $$@
$(Q) echo '#include "$(sys-asm-common-name)"' >> $$@
$(Q) cat $$< | awk '/^__NR/{print "SYSCALL(", $(AV)3, ",", $(AV)2, ")"}' >> $$@
endef
define gen-rule-sys-exec-tbl
$(sys-exec-tbl): $(sys-def) $(sys-codes) $(sys-proto) $(sys-proto-generic) $(sys-proto-types)
$(call msg-gen, $$@)
$(Q) echo "/* Autogenerated, don't edit */" > $$@
$(Q) cat $$< | awk '/^__NR/{print \
"SYSCALL(", substr($(AV)3, 5), ",", $(AV)2, ")"}' >> $$@
endef
$(sys-codes-generic): $(sys-proto-types)
$(call msg-gen, $@)
$(Q) echo "/* Autogenerated, don't edit */" > $@
$(Q) echo "#ifndef __ASM_CR_SYSCALL_CODES_H__" >> $@
$(Q) echo "#define __ASM_CR_SYSCALL_CODES_H__" >> $@
$(Q) echo '#include <compel/plugins/std/syscall-codes-64.h>' >> $@
$(Q) cat $< | awk '/^__NR/{NR32=$$1; \
sub("^__NR", "__NR32", NR32); \
print "\n#ifndef ", NR32; \
print "#define ", NR32, $$2; \
print "#endif";}' >> $@
$(Q) echo "#endif /* __ASM_CR_SYSCALL_CODES_H__ */" >> $@
mrproper-y += $(sys-codes-generic)
$(sys-proto-generic): $(strip $(call map,sys-proto,$(sys-bits))) $(sys-proto-types)
$(call msg-gen, $@)
$(Q) echo "/* Autogenerated, don't edit */" > $@
$(Q) echo "#ifndef __ASM_CR_SYSCALL_PROTO_H__" >> $@
$(Q) echo "#define __ASM_CR_SYSCALL_PROTO_H__" >> $@
$(Q) echo "" >> $@
$(Q) echo '#include <compel/plugins/std/syscall-64.h>' >> $@
$(Q) echo "" >> $@
$(Q) echo "#endif /* __ASM_CR_SYSCALL_PROTO_H__ */" >> $@
mrproper-y += $(sys-proto-generic)
define gen-rule-sys-exec-tbl
$(sys-exec-tbl): $(sys-def) $(sys-codes) $(sys-proto) $(sys-proto-generic)
$(call msg-gen, $$@)
$(Q) echo "/* Autogenerated, don't edit */" > $$@
$(Q) cat $$< | awk '/^__NR/{print \
"SYSCALL(", substr($(AV)3, 5), ",", $(AV)2, ")"}' >> $$@
endef
$(eval $(call map,gen-rule-sys-codes,$(sys-bits)))
$(eval $(call map,gen-rule-sys-proto,$(sys-bits)))
$(eval $(call map,gen-rule-sys-asm,$(sys-bits)))
$(eval $(call map,gen-rule-sys-exec-tbl,$(sys-bits)))
$(sys-asm-types): $(PLUGIN_ARCH_DIR)/include/asm/syscall-types.h
$(call msg-gen, $@)
$(Q) ln -s ../../../../../../$(PLUGIN_ARCH_DIR)/include/asm/syscall-types.h $(sys-asm-types)
std-headers-deps += $(call sys-codes,$(sys-bits))
std-headers-deps += $(call sys-proto,$(sys-bits))
std-headers-deps += $(call sys-asm,$(sys-bits))
std-headers-deps += $(call sys-exec-tbl,$(sys-bits))
std-headers-deps += $(sys-codes-generic)
std-headers-deps += $(sys-proto-generic)
std-headers-deps += $(sys-asm-types)
mrproper-y += $(std-headers-deps)

View file

@ -0,0 +1,44 @@
#include "common/asm/linkage.h"
#define SYSCALL(name, opcode) \
ENTRY(name); \
addi.d $a7, $zero, opcode; \
syscall 0; \
jirl $r0, $r1, 0; \
END(name)
#ifndef AT_FDCWD
#define AT_FDCWD -100
#endif
#ifndef AT_REMOVEDIR
#define AT_REMOVEDIR 0x200
#endif
ENTRY(sys_open)
or $a3, $zero, $a2
or $a2, $zero, $a1
or $a1, $zero, $a0
addi.d $a0, $zero, AT_FDCWD
b sys_openat
END(sys_open)
ENTRY(sys_mkdir)
or $a3, $zero, $a2
or $a2, $zero, $a1
or $a1, $zero, $a0
addi.d $a0, $zero, AT_FDCWD
b sys_mkdirat
END(sys_mkdir)
ENTRY(sys_rmdir)
addi.d $a2, $zero, AT_REMOVEDIR
or $a1, $zero, $a0
addi.d $a0, $zero, AT_FDCWD
b sys_unlinkat
END(sys_rmdir)
ENTRY(__cr_restore_rt)
addi.d $a7, $zero, __NR_rt_sigreturn
syscall 0
END(__cr_restore_rt)

View file

@ -0,0 +1,122 @@
#
# System calls table, please make sure the table consist only the syscalls
# really used somewhere in project.
# from kernel/linux-3.10.84/arch/mips/include/uapi/asm/unistd.h Linux 64-bit syscalls are in the range from 5000 to 5999.
#
# __NR_name code name arguments
# -------------------------------------------------------------------------------------------------------------------------------------------------------------
__NR_io_setup 0 sys_io_setup (unsigned nr_events, aio_context_t *ctx)
__NR_io_submit 2 sys_io_submit (aio_context_t ctx, long nr, struct iocb **iocbpp)
__NR_io_getevents 4 sys_io_getevents (aio_context_t ctx, long min_nr, long nr, struct io_event *evs, struct timespec *tmo)
__NR_fcntl 25 sys_fcntl (int fd, int type, long arg)
__NR_ioctl 29 sys_ioctl (unsigned int fd, unsigned int cmd, unsigned long arg)
__NR_flock 32 sys_flock (int fd, unsigned long cmd)
__NR_mkdirat 34 sys_mkdirat (int dfd, const char *pathname, int flag)
__NR_unlinkat 35 sys_unlinkat (int dfd, const char *pathname, int flag)
__NR_umount2 39 sys_umount2 (char *name, int flags)
__NR_mount 40 sys_mount (char *dev_nmae, char *dir_name, char *type, unsigned long flags, void *data)
__NR_fallocate 47 sys_fallocate (int fd, int mode, loff_t offset, loff_t len)
__NR_close 57 sys_close (int fd)
__NR_openat 56 sys_openat (int dfd, const char *filename, int flags, int mode)
__NR_lseek 62 sys_lseek (int fd, unsigned long offset, unsigned long origin)
__NR_read 63 sys_read (int fd, void *buf, unsigned long count)
__NR_write 64 sys_write (int fd, const void *buf, unsigned long count)
__NR_pread64 67 sys_pread (unsigned int fd, char *buf, size_t count, loff_t pos)
__NR_preadv 69 sys_preadv_raw (int fd, struct iovec *iov, unsigned long nr, unsigned long pos_l, unsigned long pos_h)
__NR_ppoll 73 sys_ppoll (struct pollfd *fds, unsigned int nfds, const struct timespec *tmo, const sigset_t *sigmask, size_t sigsetsize)
__NR_signalfd4 74 sys_signalfd (int fd, k_rtsigset_t *mask, size_t sizemask, int flags)
__NR_vmsplice 75 sys_vmsplice (int fd, const struct iovec *iov, unsigned long nr_segs, unsigned int flags)
__NR_readlinkat 78 sys_readlinkat (int fd, const char *path, char *buf, int bufsize)
__NR_timerfd_settime 86 sys_timerfd_settime (int ufd, int flags, const struct itimerspec *utmr, struct itimerspec *otmr)
__NR_capget 90 sys_capget (struct cap_header *h, struct cap_data *d)
__NR_capset 91 sys_capset (struct cap_header *h, struct cap_data *d)
__NR_personality 92 sys_personality (unsigned int personality)
__NR_exit 93 sys_exit (unsigned long error_code)
__NR_exit_group 94 sys_exit_group (int error_code)
__NR_waitid 95 sys_waitid (int which, pid_t pid, struct siginfo *infop, int options, struct rusage *ru)
__NR_set_tid_address 96 sys_set_tid_address (int *tid_addr)
__NR_futex 98 sys_futex (uint32_t *uaddr, int op, uint32_t val, struct timespec *utime, uint32_t *uaddr2, uint32_t val3)
__NR_set_robust_list 99 sys_set_robust_list (struct robust_list_head *head, size_t len)
__NR_get_robust_list 100 sys_get_robust_list (int pid, struct robust_list_head **head_ptr, size_t *len_ptr)
__NR_nanosleep 101 sys_nanosleep (struct timespec *req, struct timespec *rem)
__NR_getitimer 102 sys_getitimer (int which, const struct itimerval *val)
__NR_setitimer 103 sys_setitimer (int which, const struct itimerval *val, struct itimerval *old)
__NR_sys_timer_create 107 sys_timer_create (clockid_t which_clock, struct sigevent *timer_event_spec, kernel_timer_t *created_timer_id)
__NR_sys_timer_gettime 108 sys_timer_gettime (int timer_id, const struct itimerspec *setting)
__NR_sys_timer_getoverrun 109 sys_timer_getoverrun (int timer_id)
__NR_sys_timer_settime 110 sys_timer_settime (kernel_timer_t timer_id, int flags, const struct itimerspec *new_setting, struct itimerspec *old_setting)
__NR_sys_timer_delete 111 sys_timer_delete (kernel_timer_t timer_id)
__NR_clock_gettime 113 sys_clock_gettime (clockid_t which_clock, struct timespec *tp)
__NR_sched_setscheduler 119 sys_sched_setscheduler (int pid, int policy, struct sched_param *p)
__NR_restart_syscall 128 sys_restart_syscall (void)
__NR_kill 129 sys_kill (long pid, int sig)
__NR_sigaltstack 132 sys_sigaltstack (const void *uss, void *uoss)
__NR_rt_sigaction 134 sys_sigaction (int signum, const rt_sigaction_t *act, rt_sigaction_t *oldact, size_t sigsetsize)
__NR_rt_sigprocmask 135 sys_sigprocmask (int how, k_rtsigset_t *set, k_rtsigset_t *old, size_t sigsetsize)
__NR_rt_sigqueueinfo 138 sys_rt_sigqueueinfo (pid_t pid, int sig, siginfo_t *info)
__NR_rt_sigreturn 139 sys_rt_sigreturn (void)
__NR_setpriority 140 sys_setpriority (int which, int who, int nice)
__NR_setresuid 147 sys_setresuid (int uid, int euid, int suid)
__NR_getresuid 148 sys_getresuid (int *uid, int *euid, int *suid)
__NR_setresgid 149 sys_setresgid (int gid, int egid, int sgid)
__NR_getresgid 150 sys_getresgid (int *gid, int *egid, int *sgid)
__NR_getpgid 155 sys_getpgid (pid_t pid)
__NR_setfsuid 151 sys_setfsuid (int fsuid)
__NR_setfsgid 152 sys_setfsgid (int fsgid)
__NR_getsid 156 sys_getsid (void)
__NR_getgroups 158 sys_getgroups (int gsize, unsigned int *groups)
__NR_setgroups 159 sys_setgroups (int gsize, unsigned int *groups)
__NR_setrlimit 164 sys_setrlimit (int resource, struct krlimit *rlim)
__NR_umask 166 sys_umask (int mask)
__NR_prctl 167 sys_prctl (int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5)
__NR_gettimeofday 169 sys_gettimeofday (struct timeval *tv, struct timezone *tz)
__NR_getpid 172 sys_getpid (void)
__NR_ptrace 177 sys_ptrace (long request, pid_t pid, void *addr, void *data)
__NR_gettid 178 sys_gettid (void)
__NR_shmat 196 sys_shmat (int shmid, void *shmaddr, int shmflag)
__NR_socket 198 sys_socket (int domain, int type, int protocol)
__NR_bind 200 sys_bind (int sockfd, const struct sockaddr *addr, int addrlen)
__NR_connect 203 sys_connect (int sockfd, struct sockaddr *addr, int addrlen)
__NR_sendto 206 sys_sendto (int sockfd, void *buff, size_t len, unsigned int flags, struct sockaddr *addr, int addr_len)
__NR_recvfrom 207 sys_recvfrom (int sockfd, void *ubuf, size_t size, unsigned int flags, struct sockaddr *addr, int *addr_len)
__NR_setsockopt 208 sys_setsockopt (int sockfd, int level, int optname, const void *optval, socklen_t optlen)
__NR_getsockopt 209 sys_getsockopt (int sockfd, int level, int optname, const void *optval, socklen_t *optlen)
__NR_shutdown 210 sys_shutdown (int sockfd, int how)
__NR_sendmsg 211 sys_sendmsg (int sockfd, const struct msghdr *msg, int flags)
__NR_recvmsg 212 sys_recvmsg (int sockfd, struct msghdr *msg, int flags)
__NR_brk 214 sys_brk (void *addr)
__NR_munmap 215 sys_munmap (void *addr, unsigned long len)
__NR_mremap 216 sys_mremap (unsigned long addr, unsigned long old_len, unsigned long new_len, unsigned long flags, unsigned long new_addr)
__NR_clone 220 sys_clone (unsigned long flags, void *child_stack, void *parent_tid, unsigned long newtls, void *child_tid)
__NR_mmap 222 sys_mmap (void *addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long offset)
__NR_mprotect 226 sys_mprotect (const void *addr, unsigned long len, unsigned long prot)
__NR_mincore 232 sys_mincore (void *addr, unsigned long size, unsigned char *vec)
__NR_madvise 233 sys_madvise (unsigned long start, size_t len, int behavior)
__NR_rt_tgsigqueueinfo 240 sys_rt_tgsigqueueinfo (pid_t tgid, pid_t pid, int sig, siginfo_t *info)
__NR_wait4 260 sys_wait4 (int pid, int *status, int options, struct rusage *ru)
__NR_fanotify_init 262 sys_fanotify_init (unsigned int flags, unsigned int event_f_flags)
__NR_fanotify_mark 263 sys_fanotify_mark (int fanotify_fd, unsigned int flags, uint64_t mask, int dfd, const char *pathname)
__NR_open_by_handle_at 265 sys_open_by_handle_at (int mountdirfd, struct file_handle *handle, int flags)
__NR_setns 268 sys_setns (int fd, int nstype)
__NR_kcmp 272 sys_kcmp (pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2)
__NR_seccomp 277 sys_seccomp (unsigned int op, unsigned int flags, const char *uargs)
__NR_memfd_create 279 sys_memfd_create (const char *name, unsigned int flags)
__NR_userfaultfd 282 sys_userfaultfd (int flags)
__NR_membarrier 283 sys_membarrier (int cmd, unsigned int flags, int cpu_id)
__NR_rseq 293 sys_rseq (void *rseq, uint32_t rseq_len, int flags, uint32_t sig)
__NR_open_tree 428 sys_open_tree (int dirfd, const char *pathname, unsigned int flags)
__NR_move_mount 429 sys_move_mount (int from_dfd, const char *from_pathname, int to_dfd, const char *to_pathname, int flags)
__NR_fsopen 430 sys_fsopen (char *fsname, unsigned int flags)
__NR_fsconfig 431 sys_fsconfig (int fd, unsigned int cmd, const char *key, const char *value, int aux)
__NR_fsmount 432 sys_fsmount (int fd, unsigned int flags, unsigned int attr_flags)
__NR_pidfd_open 434 sys_pidfd_open (pid_t pid, unsigned int flags)
__NR_clone3 435 sys_clone3 (struct clone_args *uargs, size_t size)
__NR_openat2 437 sys_openat2 (int dirfd, char *pathname, struct open_how *how, size_t size)
__NR_pidfd_getfd 438 sys_pidfd_getfd (int pidfd, int targetfd, unsigned int flags)
#__NR_dup2 ! sys_dup2 (int oldfd, int newfd)
#__NR_rmdir ! sys_rmdir (const char *name)
#__NR_unlink ! sys_unlink (char *pathname)
#__NR_cacheflush ! sys_cacheflush (char *addr, int nbytes, int cache)
#__NR_set_thread_area ! sys_set_thread_area (unsigned long *addr)
#__NR_mkdir ! sys_mkdir (const char *name, int mode)
#__NR_open ! sys_open (const char *filename, unsigned long flags, unsigned long mode)

View file

@ -0,0 +1,32 @@
OUTPUT_ARCH(loongarch)
EXTERN(__export_parasite_head_start)
SECTIONS
{
.crblob 0x0 : {
*(.head.text)
ASSERT(DEFINED(__export_parasite_head_start),
"Symbol __export_parasite_head_start is missing");
*(.text*)
. = ALIGN(32);
*(.data*)
. = ALIGN(32);
*(.rodata*)
. = ALIGN(32);
*(.bss*)
. = ALIGN(32);
*(.got*)
. = ALIGN(32);
*(.toc*)
. = ALIGN(32);
} =0x00000000,
/DISCARD/ : {
*(.debug*)
*(.comment*)
*(.note*)
*(.group*)
*(.eh_frame*)
*(*)
}
}

View file

@ -0,0 +1,41 @@
#include <string.h>
#include <stdbool.h>
#include "compel-cpu.h"
#include "common/bitops.h"
#include "common/compiler.h"
#include "log.h"
#undef LOG_PREFIX
#define LOG_PREFIX "cpu: "
static compel_cpuinfo_t rt_info;
static bool rt_info_done = false;
void compel_set_cpu_cap(compel_cpuinfo_t *c, unsigned int feature)
{
}
void compel_clear_cpu_cap(compel_cpuinfo_t *c, unsigned int feature)
{
}
int compel_test_cpu_cap(compel_cpuinfo_t *c, unsigned int feature)
{
return 0;
}
int compel_cpuid(compel_cpuinfo_t *c)
{
return 0;
}
bool compel_cpu_has_feature(unsigned int feature)
{
if (!rt_info_done) {
compel_cpuid(&rt_info);
rt_info_done = true;
}
return compel_test_cpu_cap(&rt_info, feature);
}

View file

@ -0,0 +1,22 @@
#include <string.h>
#include <errno.h>
#include "handle-elf.h"
#include "piegen.h"
#include "log.h"
static const unsigned char __maybe_unused elf_ident_64_le[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, /* clang-format */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
extern int __handle_elf(void *mem, size_t size);
int handle_binary(void *mem, size_t size)
{
if (memcmp(mem, elf_ident_64_le, sizeof(elf_ident_64_le)) == 0)
return __handle_elf(mem, size);
pr_err("Unsupported Elf format detected\n");
return -EINVAL;
}

View file

@ -0,0 +1,22 @@
#include <string.h>
#include <errno.h>
#include "handle-elf.h"
#include "piegen.h"
#include "log.h"
static const unsigned char __maybe_unused elf_ident_64_le[EI_NIDENT] = {
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, /* clang-format */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
extern int __handle_elf(void *mem, size_t size);
int handle_binary(void *mem, size_t size)
{
if (memcmp(mem, elf_ident_64_le, sizeof(elf_ident_64_le)) == 0)
return __handle_elf(mem, size);
pr_err("Unsupported Elf format detected\n");
return -EINVAL;
}

View file

@ -0,0 +1,8 @@
#ifndef COMPEL_HANDLE_ELF_H__
#define COMPEL_HANDLE_ELF_H__
#include "elf64-types.h"
#define arch_is_machine_supported(e_machine) (e_machine == EM_LOONGARCH)
#endif /* COMPEL_HANDLE_ELF_H__ */

View file

@ -0,0 +1,8 @@
#ifndef __COMPEL_SYSCALL_H__
#define __COMPEL_SYSCALL_H__
#ifndef SIGSTKFLT
#define SIGSTKFLT 16
#endif
#endif

View file

@ -0,0 +1,6 @@
#ifndef __COMPEL_BREAKPOINTS_H__
#define __COMPEL_BREAKPOINTS_H__
#define ARCH_SI_TRAP TRAP_BRKPT
extern int ptrace_set_breakpoint(pid_t pid, void *addr);
extern int ptrace_flush_breakpoints(pid_t pid);
#endif

View file

@ -0,0 +1,6 @@
#ifndef __CR_ASM_CPU_H__
#define __CR_ASM_CPU_H__
typedef struct {
} compel_cpuinfo_t;
#endif /* __CR_ASM_CPU_H__ */

View file

@ -0,0 +1,4 @@
#ifndef __CR_ASM_FPU_H__
#define __CR_ASM_FPU_H__
#endif /* __CR_ASM_FPU_H__ */

View file

@ -0,0 +1,67 @@
#ifndef UAPI_COMPEL_ASM_TYPES_H__
#define UAPI_COMPEL_ASM_TYPES_H__
#include <stdint.h>
#define SIGMAX 64
#define SIGMAX_OLD 31
/*
* From the Linux kernel header arch/loongarch/include/uapi/asm/ptrace.h
*
* A thread LoongArch CPU context
*
* struct user_fp_state {
* uint64_t fpr[32];
* uint64_t fcc;
* uint32_t fcsr;
* };
*
* struct user_pt_regs {
* unsigned long regs[32];
* unsigned long csr_era;
* unsigned long csr_badv;
* unsigned long reserved[11];
* };
*/
struct user_gp_regs {
uint64_t regs[32];
uint64_t orig_a0;
uint64_t pc;
uint64_t csr_badv;
uint64_t reserved[10];
} __attribute__((aligned(8)));
struct user_fp_regs {
uint64_t regs[32];
uint64_t fcc;
uint32_t fcsr;
};
typedef struct user_gp_regs user_regs_struct_t;
typedef struct user_fp_regs user_fpregs_struct_t;
#define user_regs_native(regs) true
#define __compel_arch_fetch_thread_area(tid, th) 0
#define compel_arch_fetch_thread_area(tctl) 0
#define compel_arch_get_tls_task(ctl, tls)
#define compel_arch_get_tls_thread(tctl, tls)
#define REG_RES(r) ((uint64_t)(r).regs[4])
#define REG_IP(r) ((uint64_t)(r).pc)
#define REG_SP(r) ((uint64_t)(r).regs[3])
#define REG_SYSCALL_NR(r) ((uint64_t)(r).regs[11])
#define SET_REG_IP(r, val) ((r).pc = (val))
#define GPR_NUM 32
#define FPR_NUM 32
#define __NR(syscall, compat) \
({ \
(void)compat; \
__NR_##syscall; \
})
#endif /* UAPI_COMPEL_ASM_TYPES_H__ */

View file

@ -0,0 +1,86 @@
#ifndef UAPI_COMPEL_ASM_SIGFRAME_H__
#define UAPI_COMPEL_ASM_SIGFRAME_H__
#include <stdint.h>
#include <stdbool.h>
#include <stdlib.h>
#include <compel/asm/fpu.h>
#include <compel/plugins/std/syscall-codes.h>
#include <asm/types.h>
#define rt_sigcontext sigcontext
/* sigcontext defined in usr/include/uapi/asm/sigcontext.h*/
#include <compel/sigframe-common.h>
typedef __u32 u32;
typedef struct sigcontext_t {
__u64 pc;
__u64 regs[32];
__u32 flags;
__u64 extcontext[0] __attribute__((__aligned__(16)));
} sigcontext_t;
typedef struct context_info_t {
__u32 magic;
__u32 size;
__u64 padding;
} context_info_t;
#define FPU_CTX_MAGIC 0x46505501
#define FPU_CTX_ALIGN 8
typedef struct fpu_context_t {
__u64 regs[32];
__u64 fcc;
__u64 fcsr;
} fpu_context_t;
typedef struct ucontext {
unsigned long uc_flags;
struct ucontext *uc_link;
stack_t uc_stack;
sigset_t uc_sigmask;
__u8 __unused[1024 / 8 - sizeof(sigset_t)];
sigcontext_t uc_mcontext;
} ucontext;
/* Copy from the kernel source arch/loongarch/kernel/signal.c */
struct rt_sigframe {
rt_siginfo_t rs_info;
ucontext rs_uc;
};
#define RT_SIGFRAME_UC(rt_sigframe) (&(rt_sigframe->rs_uc))
#define RT_SIGFRAME_SIGMASK(rt_sigframe) ((k_rtsigset_t *)&RT_SIGFRAME_UC(rt_sigframe)->uc_sigmask)
#define RT_SIGFRAME_SIGCTX(rt_sigframe) (&(RT_SIGFRAME_UC(rt_sigframe)->uc_mcontext))
#define RT_SIGFRAME_REGIP(rt_sigframe) ((long unsigned int)(RT_SIGFRAME_SIGCTX(rt_sigframe)->pc))
#define RT_SIGFRAME_HAS_FPU(rt_sigframe) (1)
#define RT_SIGFRAME_FPU(rt_sigframe) \
({ \
context_info_t *ctx = (context_info_t *)RT_SIGFRAME_SIGCTX(rt_sigframe)->extcontext; \
ctx->magic = FPU_CTX_MAGIC; \
ctx->size = sizeof(context_info_t) + sizeof(fpu_context_t); \
(fpu_context_t *)((char *)ctx + sizeof(context_info_t)); \
})
#define RT_SIGFRAME_OFFSET(rt_sigframe) 0
/* clang-format off */
#define ARCH_RT_SIGRETURN(new_sp, rt_sigframe) \
asm volatile( \
"addi.d $sp, %0, 0 \n" \
"addi.d $a7, $zero, "__stringify(__NR_rt_sigreturn)" \n" \
"syscall 0" \
: \
:"r"(new_sp) \
: "$a7", "memory")
/* clang-format on */
int sigreturn_prep_fpu_frame(struct rt_sigframe *sigframe, struct rt_sigframe *rsigframe);
#define rt_sigframe_erase_sigset(sigframe) memset(RT_SIGFRAME_SIGMASK(sigframe), 0, sizeof(k_rtsigset_t))
#define rt_sigframe_copy_sigset(sigframe, from) memcpy(RT_SIGFRAME_SIGMASK(sigframe), from, sizeof(k_rtsigset_t))
#endif /* UAPI_COMPEL_ASM_SIGFRAME_H__ */

View file

@ -0,0 +1,204 @@
#include <sys/types.h>
#include <sys/uio.h>
#include <sys/auxv.h>
#include <sys/mman.h>
#include <errno.h>
#include <compel/asm/fpu.h>
#include <compel/cpu.h>
#include "errno.h"
#include <compel/plugins/std/syscall-codes.h>
#include <compel/plugins/std/syscall.h>
#include "common/err.h"
#include "common/page.h"
#include "asm/infect-types.h"
#include "ptrace.h"
#include "infect.h"
#include "infect-priv.h"
#include "log.h"
#include "common/bug.h"
/*
* Injected syscall instruction
* loongarch64 is Little Endian
*/
const char code_syscall[] = {
0x00, 0x00, 0x2b, 0x00, /* syscall */
0x00, 0x00, 0x2a, 0x00 /* break */
};
int sigreturn_prep_regs_plain(struct rt_sigframe *sigframe, user_regs_struct_t *regs, user_fpregs_struct_t *fpregs)
{
sigcontext_t *sc;
fpu_context_t *fpu;
sc = RT_SIGFRAME_SIGCTX(sigframe);
memcpy(sc->regs, regs->regs, sizeof(regs->regs));
sc->pc = regs->pc;
fpu = RT_SIGFRAME_FPU(sigframe);
memcpy(fpu->regs, fpregs->regs, sizeof(fpregs->regs));
fpu->fcc = fpregs->fcc;
fpu->fcsr = fpregs->fcsr;
return 0;
}
int sigreturn_prep_fpu_frame_plain(struct rt_sigframe *sigframe, struct rt_sigframe *rsigframe)
{
return 0;
}
int compel_get_task_regs(pid_t pid, user_regs_struct_t *regs, user_fpregs_struct_t *ext_regs, save_regs_t save,
void *arg, __maybe_unused unsigned long flags)
{
user_fpregs_struct_t tmp, *fpregs = ext_regs ? ext_regs : &tmp;
struct iovec iov;
int ret;
pr_info("Dumping GP/FPU registers for %d\n", pid);
iov.iov_base = regs;
iov.iov_len = sizeof(user_regs_struct_t);
if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))) {
pr_perror("Failed to obtain CPU registers for %d", pid);
goto err;
}
/*
* Refer to Linux kernel arch/loongarch/kernel/signal.c
*/
if (regs->regs[0]) {
switch (regs->regs[4]) {
case -ERESTARTNOHAND:
case -ERESTARTSYS:
case -ERESTARTNOINTR:
regs->regs[4] = regs->orig_a0;
regs->pc -= 4;
break;
case -ERESTART_RESTARTBLOCK:
regs->regs[4] = regs->orig_a0;
regs->regs[11] = __NR_restart_syscall;
regs->pc -= 4;
break;
}
regs->regs[0] = 0; /* Don't deal with this again. */
}
iov.iov_base = fpregs;
iov.iov_len = sizeof(user_fpregs_struct_t);
if ((ret = ptrace(PTRACE_GETREGSET, pid, NT_PRFPREG, &iov))) {
pr_perror("Failed to obtain FPU registers for %d", pid);
goto err;
}
ret = save(pid, arg, regs, fpregs);
err:
return 0;
}
int compel_set_task_ext_regs(pid_t pid, user_fpregs_struct_t *ext_regs)
{
struct iovec iov;
pr_info("Restoring GP/FPU registers for %d\n", pid);
iov.iov_base = ext_regs;
iov.iov_len = sizeof(*ext_regs);
if (ptrace(PTRACE_SETREGSET, pid, NT_PRFPREG, &iov)) {
pr_perror("Failed to set FPU registers for %d", pid);
return -1;
}
return 0;
}
/*
* Registers $4 ~ $11 represents arguments a0 ~ a7, especially a7 is
* used as syscall number.
*/
int compel_syscall(struct parasite_ctl *ctl, int nr, long *ret, unsigned long arg1, unsigned long arg2,
unsigned long arg3, unsigned long arg4, unsigned long arg5, unsigned long arg6)
{
int err;
user_regs_struct_t regs = ctl->orig.regs;
regs.regs[11] = (unsigned long)nr;
regs.regs[4] = arg1;
regs.regs[5] = arg2;
regs.regs[6] = arg3;
regs.regs[7] = arg4;
regs.regs[8] = arg5;
regs.regs[9] = arg6;
err = compel_execute_syscall(ctl, &regs, code_syscall);
*ret = regs.regs[4];
return err;
}
void *remote_mmap(struct parasite_ctl *ctl, void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
long map;
int err;
err = compel_syscall(ctl, __NR_mmap, &map, (unsigned long)addr, length, prot, flags, fd, offset >> PAGE_SHIFT);
if (err < 0 || IS_ERR_VALUE(map)) {
pr_err("remote mmap() failed: %s\n", strerror(-map));
return NULL;
}
return (void *)map;
}
/*
* regs must be inited when calling this function from original context
*/
void parasite_setup_regs(unsigned long new_ip, void *stack, user_regs_struct_t *regs)
{
regs->pc = new_ip;
if (stack)
regs->regs[4] = (unsigned long)stack;
}
bool arch_can_dump_task(struct parasite_ctl *ctl)
{
return true;
}
int arch_fetch_sas(struct parasite_ctl *ctl, struct rt_sigframe *s)
{
long ret;
int err;
err = compel_syscall(ctl, __NR_sigaltstack, &ret, 0, (unsigned long)&s->rs_uc.uc_stack, 0, 0, 0, 0);
return err ? err : ret;
}
/*
* TODO: add feature
*/
int ptrace_set_breakpoint(pid_t pid, void *addr)
{
return 0;
}
int ptrace_flush_breakpoints(pid_t pid)
{
return 0;
}
/*
* Refer to Linux kernel arch/loongarch/include/asm/processor.h
*/
#define TASK_SIZE32 (1UL) << 31
#define TASK_SIZE64_MIN (1UL) << 40
#define TASK_SIZE64_MAX (1UL) << 48
unsigned long compel_task_size(void)
{
unsigned long task_size;
for (task_size = TASK_SIZE64_MIN; task_size < TASK_SIZE64_MAX; task_size <<= 1)
if (munmap((void *)task_size, page_size()))
break;
return task_size;
}

View file

@ -0,0 +1,35 @@
#ifndef __ASM_PROLOGUE_H__
#define __ASM_PROLOGUE_H__
#ifndef __ASSEMBLY__
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <errno.h>
#define sys_recv(sockfd, ubuf, size, flags) sys_recvfrom(sockfd, ubuf, size, flags, NULL, NULL)
typedef struct prologue_init_args {
struct sockaddr_un ctl_sock_addr;
unsigned int ctl_sock_addr_len;
unsigned int arg_s;
void *arg_p;
void *sigframe;
} prologue_init_args_t;
#endif /* __ASSEMBLY__ */
/*
* Reserve enough space for sigframe.
*
* FIXME It is rather should be taken from sigframe header.
*/
#define PROLOGUE_SGFRAME_SIZE 4096
#define PROLOGUE_INIT_ARGS_SIZE 1024
#endif /* __ASM_PROLOGUE_H__ */

Some files were not shown because too many files have changed in this diff Show more