No description
Find a file
Yanning Yang 4a3a695dfb plugins/amdgpu: Implement parallel restore
This patch implements the entire logic to enable the offloading of
buffer object content restoration.

The goal of this patch is to offload the buffer object content
restoration to the main CRIU process so that this restoration can occur
in parallel with other restoration logic (mainly the restoration of
memory state in the restore blob, which is time-consuming) to speed up
the restore phase. The restoration of buffer object content usually
takes a significant amount of time for GPU applications, so
parallelizing it with other operations can reduce the overall restore
time.

It has three parts: the first replaces the restoration of buffer objects
in the target process by sending a parallel restore command to the main
CRIU process; the second implements the POST_FORKING hook in the amdgpu
plugin to enable buffer object content restoration in the main CRIU
process; the third stops the parallel thread in the RESUME_DEVICES_LATE
hook.

This optimization only focuses on the single-process situation (common
case). In other scenarios, it will turn to the original method. This is
achieved with the new `parallel_disabled` flag.

Signed-off-by: Yanning Yang <yangyanning@sjtu.edu.cn>
2025-11-14 18:27:31 +00:00
.circleci ci: silence CircleCI warning about deprecated image 2024-09-11 16:02:11 -07:00
.github Use command -v instead of which 2025-11-05 15:41:34 -08:00
compel make: prevent redefinition of 'struct sigcontext' 2025-11-05 15:40:55 -08:00
contrib Use command -v instead of which 2025-11-05 15:41:34 -08:00
coredump coredump: fix handling of num_pages 2025-11-13 08:40:46 -08:00
crit contributing: update links to mailing list 2025-11-02 07:48:24 -08:00
criu limit the field width of 'scanf' 2025-11-14 18:26:27 +00:00
Documentation Revert "plugins/amdgpu: Implement parallel restore" 2025-11-13 08:40:46 -08:00
images vma: introduce --allow-uprobes option 2025-11-05 15:41:34 -08:00
include criu: fix log_keep_err signal deadlock 2025-03-25 14:31:33 -07:00
lib pycriu: use explicit imports for __init__ 2025-11-05 15:41:35 -08:00
plugins plugins/amdgpu: Implement parallel restore 2025-11-14 18:27:31 +00:00
scripts Use command -v instead of which 2025-11-05 15:41:34 -08:00
soccr soccr: Log name of socket queue that failed to restore. 2023-10-22 13:29:25 -07:00
test limit the field width of 'scanf' 2025-11-14 18:26:27 +00:00
.cirrus.yml ci: consolidate aarch64 tests on GitHub runners 2025-11-02 07:48:24 -08:00
.clang-format clang-format: disable column limit constraint 2023-10-22 13:29:25 -07:00
.codespellrc Makefile: move codespell options to .codespellrc 2025-03-21 12:40:31 -07:00
.gitignore Keep images/google/protobuf directory 2025-11-02 07:48:22 -08:00
.lgtm.yml images: remove symlink for descriptor.proto 2025-11-02 07:48:22 -08:00
.mailmap mailmap: update my email 2023-04-15 21:17:21 -07:00
CLAUDE.md docs: add developer overviews for AI assistants 2025-11-02 07:48:23 -08:00
CONTRIBUTING.md contributing: update links to mailing list 2025-11-02 07:48:24 -08:00
COPYING COPYING: fix a typo in a preamble 2016-08-11 16:18:43 +03:00
CREDITS Add the CREDITS file 2012-07-30 13:52:37 +04:00
flake.lock feat: introduce Nix flake 2025-11-02 07:48:22 -08:00
flake.nix feat: introduce Nix flake 2025-11-02 07:48:22 -08:00
GEMINI.md docs: add developer overviews for AI assistants 2025-11-02 07:48:23 -08:00
INSTALL.md docs: mark make commands with same format as elsewhere 2025-03-21 12:40:31 -07:00
MAINTAINERS Add Alexander Mikhalitsyn to maintainers 2023-04-15 21:17:21 -07:00
MAINTAINERS_GUIDE.md Fix some codespell warnings 2022-04-28 17:53:52 -07:00
Makefile test/others: add tests for check() with pycriu 2025-11-05 15:41:35 -08:00
Makefile.compel Remove travis-ci leftovers 2025-11-02 07:48:23 -08:00
Makefile.config make: remove checks and warnings for bsd strlcat and strlcpy 2025-11-02 07:48:21 -08:00
Makefile.install make: don't install external dependencies 2025-11-05 15:41:34 -08:00
Makefile.versions criu: Version 4.2 (CRIUTIBILITY) 2025-11-13 08:40:46 -08:00
README.md readme: update link to FAQ page 2024-09-11 16:02:11 -07:00

X86_64 GCC Test Docker Test Podman Test CircleCI

CRIU -- A project to implement checkpoint/restore functionality for Linux

CRIU (stands for Checkpoint and Restore in Userspace) is a utility to checkpoint/restore Linux tasks.

Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. You can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space. There are some more projects doing C/R for Linux, and so far CRIU appears to be the most feature-rich and up-to-date with the kernel.

CRIU project is (almost) the never-ending story, because we have to always keep up with the Linux kernel supporting checkpoint and restore for all the features it provides. Thus we're looking for contributors of all kinds -- feedback, bug reports, testing, coding, writing, etc. Please refer to CONTRIBUTING.md if you would like to get involved.

The project started as the way to do live migration for OpenVZ Linux containers, but later grew to more sophisticated and flexible tool. It is currently used by (integrated into) OpenVZ, LXC/LXD, Docker, and other software, project gets tremendous help from the community, and its packages are included into many Linux distributions.

The project home is at http://criu.org. This wiki contains all the knowledge base for CRIU we have. Pages worth starting with are:

Checkpoint and restore of simple loop process

Advanced features

As main usage for CRIU is live migration, there's a library for it called P.Haul. Also the project exposes two cool core features as standalone libraries. These are libcompel for parasite code injection and libsoccr for TCP connections checkpoint-restore.

Live migration

True live migration using CRIU is possible, but doing all the steps by hands might be complicated. The phaul sub-project provides a Go library that encapsulates most of the complexity. This library and the Go bindings for CRIU are stored in the go-criu repository.

Parasite code injection

In order to get state of the running process CRIU needs to make this process execute some code, that would fetch the required information. To make this happen without killing the application itself, CRIU uses the parasite code injection technique, which is also available as a standalone library called libcompel.

TCP sockets checkpoint-restore

One of the CRIU features is the ability to save and restore state of a TCP socket without breaking the connection. This functionality is considered to be useful by itself, and we have it available as the libsoccr library.

Licence

The project is licensed under GPLv2 (though files sitting in the lib/ directory are LGPLv2.1).

All files in the images/ directory are licensed under the Expat license (so-called MIT). See the images/LICENSE file.