Note: Silently drops MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED as it's
not currently detectable. This is still better than silently dropping
all membarrier() registrations.
Signed-off-by: Michał Mirosław <emmir@google.com>
The new field cg_set is currently marked as required which causes backward
compatibility problem when using newer CRIU version to restore dumped image
from older version. This commit makes this field optional and reworks the
logic to fallback to use cg_set from task_core when it is not in
thread_core.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Zombie tasks are dumped in dump_zombies() so it is redundant to handle them
in dump_one_task().
Deprecate cg_set in task_core_entry as this field must be per thread now.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Currently, we assume all threads in process are in the same cgroup controllers.
However, with threaded controllers, threads in a process may be in different
controllers. So we need to dump cgroup controllers of every threads in process
and fixup the procfs cgroup parsing to parse from self/task/<tid>/cgroup.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Add SIGTSTP signal dump and restore. Add a corresponding field
in the image, save it only if a task is in the stopped state.
Restore task state by sending desired stop signal if it is present
in the image. Fallback to SIGSTOP if it's absent.
Signed-off-by: Yuriy Vasiliev <yuriy.vasiliev@openvz.org>
Support basic rseq C/R scenario. Assume that:
- there are no processes with IP inside the rseq critical section (CS)
- kernel has ptrace(PTRACE_GET_RSEQ_CONFIGURATION) support
On dump:
1. use ptrace(PTRACE_GET_RSEQ_CONFIGURATION) to get
struct rseq pointer, rseq size and signature from the kernel.
2. save to the image
On restore:
1. get rseq ptr, size, signature from the image
2. register it back using rseq() from the restorer parasite
Fixes: #1696
Reported-by: Radostin Stoyanov <radostin@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
This changes the license of all files in the images/ directory from
GPLv2 to the Expat license (so-called MIT).
According to git the files have been authored by:
Abhishek Dubey
Adrian Reber
Alexander Mikhalitsyn
Alice Frosi
Andrei Vagin (Andrew Vagin, Andrey Vagin)
Cyrill Gorcunov
Dengguangxing
Dmitry Safonov
Guoyun Sun
Kirill Tkhai
Kir Kolyshkin
Laurent Dufour
Michael Holzheu
Michał Cłapiński
Mike Rapoport
Nicolas Viennot
Nikita Spiridonov
Pavel Emelianov (Pavel Emelyanov)
Pavel Tikhomirov
Radostin Stoyanov
rbruno@gsd.inesc-id.pt
Sebastian Pipping
Stanislav Kinsburskiy
Tycho Andersen
Valeriy Vdovin
The Expat license (so-called MIT) can be found here:
https://opensource.org/licenses/MIT
According to that link the correct SPDX short identifier is 'MIT'.
https://spdx.org/licenses/MIT.html
Signed-off-by: Adrian Reber <areber@redhat.com>
The time namespace allows for per-namespace offsets to the system
monotonic and boot-time clocks.
C/R of time namespaces are very straightforward. On dump, criu enters a
target time namespace and dumps currents clocks values, then on restore,
criu creates a new namespace and restores clocks values.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
To ensure consistency of runtime environment processes within a
container need to see same start time values over suspend/resume
cycles. We introduce new field to the core image structure to
store start time of a dumped process. Later same value would be
restored to a newly created task. In future the feature is likely
to be pulled here, so we reserve field id in protobuf descriptor.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin@virtuozzo.com>
1. Checkpoint it via parasite.
2. Restore it after forking.
Signed-off-by: Michał Cłapiński <mclapinski@google.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
At now we pretend that all threads are sharing seccomp chains
and at checkpoint moment we test seccomp modes to make sure
if this assumption is valid refusing to dump otherwise.
Still the kernel tacks seccomp filter chains per each thread
and now we've faced applications (such as java) where per-thread
chains are actively used. Thus we need to bring support of handling
filters via per-thread basis.
In this a bit intrusive patch the restore engine is lifted up
to treat each thread separately. Here what is done:
- Image core file is modified to keep seccomp filters
inside thread_core_entry. For backward compatibility
former seccomp_mode and seccomp_filter members in
task_core_entry are renamed to have old_ prefix and
on restore we test if we're dealing with old images.
Since per-thread dump is not yet implemeneted the
dumping procedure continue operating with old_ members.
- In pie restorer code memory containing filters are addressed
from inside thread_restore_args structure which now
contains seccomp mode itself and chain attributes
(number of filters and etc).
Reading of per-thread data is done in seccomp_prepare_threads
helper -- we take one pstree_item and walks over every thread
inside to allocate pie memory and pin data there.
Because of PIE specific, before jumping into pie code
we have to relocate this memory into new place and
for this seccomp_rst_reloc is served.
In restorer itself we check if thread_restore_args provides
us enabled seccomp mode (strict or filter passed) and call
for restore_seccomp_filter if needed.
- To unify names we start using seccomp_ prefix for all related
stuff involved into this change (prepare_seccomp_filters renamed
to seccomp_read_image because it only reads image and nothing
more, image handler is renamed to seccomp_img_entry instead
of too short 'se'.
With this change we're now allowed to start collecting and
dumping seccomp filters per each thread, which will be
done in next patch.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Right now they all sit in a separate file. Since we
don't support CLONE_SIGHAND (and don't plan to) it's
much better to have them in core, all the more so
by the time we dump/restore sigacts, the core entry
is at hands already.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There are several places in image files, where we store
integers, but these numbers actually mean some string.
E.g. socket families, states and types and tasks states.
So here's the (criu).dict option for such fields that
helps to convert the numbers into strings and back.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
cgroup namespaces are imminent to be merged into the kernel (indeed, they
went into and out of 4.5 for minor issues), and will be carried as a
patchset in the ubuntu 16.04 kernel. Here's an attempt at c/r.
There are essentially three key steps:
* on dump, in parse_task_cgroup, we should ask the task what cgroups it
thinks it is in (unless it has the same cgroup ns id as its parent, then we
should just take the prefixes from the parent's set), and set the prefix on
the cg set
* add a new restore step, prepare_cgroup_namespace(), which happens in
prepare_task_cgroup() that does an unshare() if necessary
* when restoring, in move_in_cgroup, if we're going to restore via usernsd,
leave the full path. if not, use (cgset->path + len(cgset->cgns_prefix) as
the path, since we will have already moved into the cgns_prefix and unshared.
Another observation here is that we can support nesting, since these are
restored heirarchically by nature.
v2: * store cgns prefix length instead of full prefix in images
* set has_cgroup_ns_id conditionally
* drop unused argument to move_in_cgroup
* add extra comments about what is happening when unsharing() on
restore
* add extra comments about what is happening when computing the actual
cgns prefix
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
But keep @protobuf as a symlink: we have
this path encoded in sources. Gonna be
removed with time.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>