This patch implements checkpoint/restore functionality
for binfmt_misc mounts. Both magic and extension types
and "disabled" state are supported.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This commit adds basic support for dumping and restoring seccomp filters
via the new ptrace interface. There are two current known limitations with
this approach:
1. This approach doesn't support restoring tasks who first do a seccomp()
and then a setuid(); the test elaborates on this and I don't think it is
tough to do, but it is not done yet.
2. Filters are compared via memcmp(), so two tasks which have the same
parent task and install identical (via memory) filters will have those
filters considered to be the "same". Since we force all tasks to have
the same creds (including seccomp filters) right now, this isn't a
problem.
The approach used here is very similar to the cgroup approach: the actual
filters are stored in a seccomp.img, and each task has an id that points to
the part of the filter tree it needs to restore. This keeps us from dumping
the same filter multiple times, since filters are inherited on fork.
v2:
* remove unused seccomp_filters field from struct rst_info
* rework memory layout for passing filters to restorer blob
* add a sanity check when finding inherited filters
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
"ip route dump" dumps only ipv4 routes.
Reported-by: Ross Boucher <boucher@gmail.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These images have common magic in front of per-image one. With
this we have 3 "types" of images -- inventory (head), other
images, service files. The latter would be stats (not an image,
just happen to be in PB format) and irmap cache (not an image
again, just auxiliary thing which is in PB for convenience).
Since inventory file is the first one we read on restore it's
OK to set the global "new images" flag there. Dump (write) is
always in new format.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
At the moment only x86 is covered, ARM needs own handler.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
One file system can be mounted a few times, so mnt_id isn't unique for it.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The exact structure of the image will be revealed in the
next patch(es). What is important here, is that cgroup
image is somewhat new.
It will likely contain arrays of objects of different types,
so I introduce the "header" object, that will link these
arrays using pb repeated fields. This will help us to avoid
many image files for different cgroup objects and will make
the amount of write()-s required be 1.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here is nothing interecting. If a file can't be dumped by criu,
plugins are called. If one of plugins knows how to dump the file,
the file entry is marked as need_callback. On restore if we see
this mark, we execute plugins for restoring the file.
v2: Callbacks are called for all files, which are not supported by CRIU.
v3: Call plugins for a file instead of file descriptor. A few file
descriptors can be associated with one file.
v4: A file descriptor is opened in a callback. It's required for
restoring anon vmas.
v5: Add a separate type for unsupported files
v6: define FD_TYPES__UNSUPP
v7: s/unsupp/ext (external)
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
By default just use the iptables-save and iptables-restore commands.
User may define CR_IPTABLES variable, in this case the "sh -c $CR_IPTABLES"
would be called.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have generic do_pb_show() call and tons of show_foo
routines, that just call one with proper args. Compact
the code by putting the args into array and calling
the do_pb_show() in one place.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There will be two entities handled:
1. tun file -- an opened char device with misc major and tun minor
that can be attached to item #2
2. tun netdevice -- another type of links
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll have one more "image" file generated by dump and (surprisingly)
restore commands -- the stats one. It will contain in a single pb
object all the statistics collected by dump/restore.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After reworkring the way pagemap is stored the backward compatibility
was not preserved for patches simplicity. Time to return it back.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since now we drain pages out of parasite, we can invent any format for
page dumps. Let is be ... prorobuf one! :)
Another thing to keep in mind, is that we're about to use splices and
implement iterative migration, so it's better to have actual pages be
page-aligned in the image.
And -- backward compatibility. That said the new format is:
1. pagemap-... file which contains a header (currently with a ID of
the image with pages, see below) and an array of <nr_pages:vaddr>
pairs. The first value means "how many pages to take from the
file with pages (see below)" and the second -- where in the task
address space to put them. Simple.
2. pages-... file which containes only pages one by one (thus aligned
as we want).
This patch breaks backward compatibility (old images with pages wil
be restored and then crash). Need to do it before v0.5 release.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows to reuse magic numbers outside of crtools code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>