Reading stops after an EOF or a specified charecter.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Contrary to a popular opinion, there is no need to check
an argument for being non-NULL before calling free().
>From free(3) man page:
> > If ptr is NULL, no operation is performed.
Let's change xfree macro to be a synonym for free().
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch(set) is inspired by similar from Andrey Vagin sent
sime time earlier.
The major idea is to artificially fail criu dump or restore at
specific places and let zdtm tests check whether failed dump
or restore resulted in anything bad.
This particular patch introduces the ability to tell criu "fail
at X point". Each point is specified with a integer constant
and with the next patches there will appear places over the
code checking for specific fail code being set and failing.
Two points are introduced -- early on dump, right after loading
the parasite and right after creation of the root task.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The kernel prior 4.3 is exporting FS_EVENT_ON_CHILD
bit via procfs fdinfo interface. This bit is kernel's
internal and should not be passed in inotify_add_watch
call. Thus simply filter it out when obtain from old
images for backward compatibility reason.
More details here https://lkml.org/lkml/2015/9/21/680
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows the user to perform actions before dumping or restoration
occurs.
Signed-off-by: Matthew Krafczyk <krafczyk.matthew@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In commit c2271198, Laurent Dufour kindly reunified the VDSO code
that had become duplicated between architectures. Unfortunately
this introduced a regression in AArch64 where apparently due to
the scope of vdso_symbols array of pointers to characters changing
from local to global, load-time relocations became necessary.
The following thread on the GCC mailing list discusses why
load-time relocations can be necessary when pointers are used,
although it doesn't mention the potential for locally scoped
arrays to be handled differently:
https://gcc.gnu.org/ml/gcc/2004-05/msg01016.html
Because the alternatives, such as porting piegen to AArch64, are
far more involved, simply revert the change in scope.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In the recent VDSO code reunification, some types were changed but
a pair of necessary corresponding changes was omitted. Fix that so
the AArch64 build succeeds without type-related
warnings-turned-errors. Also move the definition to the
AArch64-specific header since it's not currently being used by any
other architectures.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When in a userns, tasks can't write to certain sysctl files:
(00.009653) 1: Error (sysctl.c:142): Can't open sysctl kernel/hostname: Permission denied
See inline comments for details on affected namespaces.
Mostly for my own education in what is required to port something to be
userns restorable, I ported the sysctl stuff. A potential concern for this
patch is that copying structures with pointers around is kind of gory. I
did it ad-hoc here, but it may be worth inventing some mechanisms to make
it easier, although I'm not sure what exactly that would look like
(potentially re-using some of the protobuf bits; I'll investigate this more
if it looks helpful when doing the cgroup user namespaces port?).
Another issue is that there is not a great way to return non-fd stuff in
memory right now from userns_call; one of the little hacks in this code
would be "simplified" if we invented a way to do this.
v2: coalesce the individual struct sysctl_req requests into one big
sysctl_userns_req that is in a contiguous region of memory so that we
can pass it via userns_call. Hopefully nobody finds my little ascii
diagram too offensive :)
v3: use the fork/setns trick to change the syctl values in the right ns for
IPC/UTS nses; see inline comment for details
v4: only use sysctl_userns_req when actually doing a userns_call.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
"ip route dump" dumps only ipv4 routes.
Reported-by: Ross Boucher <boucher@gmail.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
v2: use struct irmap directly in irmap_path_opt
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Issue #18. When restore fails ghost files remain there. And
to remove them we have to know their list, paths to original
files (to construct the ghost name) and the namespace ghost
lives in.
For the latter we keep the restore task namespace at hands
till the final stage and setns into it to kill ghosts.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Info about ghosts presence and paths will be needed to
remove the ghosts itself and thus are needed in criu.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
First -- avoid two memory copies by printing ns root directly, and
second -- remove extra argument from create_ghost, the mnt_id value
we need there can be found on the ghost_file object.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
So here it is. If root task dies on restore the roots yard
dir remains unrmdired :( Since we already know its name, we
can remove one from criu. By the time we get to this place
the sub mount namespace(s) are already dead and yard dir
is empty. But umounting should be done by tasks after
successfull restore, so keep depopulation there.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Same thing as in previous patch -- we have too many generic
clean_ and fini_ prefixes over the code. And we need more (see
next patch), so let's specify what exactly we clean or fini.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In case root task restore failure we'll have to remove the
roots yard dir from criu, so we have to create one by
criu to at least have the dit name.
It's OK to do it in criu, since the yards is created in
the opts.root which is the same for any mnt ns we deal
with on restore.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There's already two things we do in criu namespaces before
forking the init task (start unsd and keep netnsfd for back
reference). Next patches will introduce the 3rd action for
mount namespaces, so have a special pre-call for all this
stuff.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Actually make use of the ns->type field and remove all getpid()'s
and other strange/inconsistent checks.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We (may) have 3 types of namespace objects in criu -- criu's one,
root task's one and others. All of them sometimes make sense and
we differentiate them in a weird way -- by checking the ns->pid
field against getpid() or by comparing with root_item's.
The proposal is to mark ns_id objects explicitly with type field.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll use this in the next patch to correctly write sysctls.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll use this size in the next patch to avoid having to do some dynamic
allocation.
v2: call it MAX_UNSFD_MSG_SIZE instead
v3: fix all uses of MAX_MSG_SIZE :)
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
They both can container the MS_READONLY flag. And in one case it will be
read-only bind-mount and in another case it will be read-only
super-block.
v2: set mnt and sb for one call of mount() when it's posiable
v3: return a comment which was deleted by mistake
v4: Fix the sentense about restoring mnt flags
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This option allows users to specify their own irmap paths to scan in the event
that they don't have a path in one of the hard coded hints.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some controllers can be disabled in kernel options. In this case they
are shown in /proc/cgroups, but they could not be mounted.
All enabled controllers can be collected from /proc/self/cgroup.
https://github.com/xemul/criu/issues/28
v2: ',' is used to separate controllers
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Reported-by: Ross Boucher <boucher@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There were multiple copy of the same code spread over the different
architectures handling the vDSO.
This patch is merging the duplicated code in arch/*/vdso-pie.c and
arch/*/include/asm/vdso.h in the common files and let only the architecture
specific part in the arch/*/* files.
The file are now organized this way:
include/asm-generic/vdso.h
contains basic definition which could be overwritten by
architectures.
arch/*/include/asm/vdso.h
contains per architecture definitions.
It may includes include/asm-generic/vdso.h
pie/util-vdso.c
include/util-vdso.h
These files contains code and definitions common to both criu and
the parasite code.
The file include/util-vdso.h includes arch/*/include/asm/vdso.h.
pie/parsite-vdso.c
include/parasite-vdso.h
contains code and definition specific to the parasite code handling
the vDSO.
The file include/parasite-vdso.h includes include/util-vdso.h.
arch/*/vdso-pie.c
contains the architecture specific code installing the vDSO
trampoline.
vdso.c
include/vdso.h
contains code and definition specific to the criu code handling the
vDSO.
The file include/vdso.h includes include/util-vdso.h.
CC: Christopher Covington <cov@codeaurora.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To handle deleted bindmounts we simply create
the former directory bindmount lived at, mount
the target and remove the directory back.
For this sake we add @deleted entry into the image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Make it handle both postfixes and return
non-zero code if stipping happened.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Useless, at least in the form present
now it's unreadable anyway. So stop
welling out the logs.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's quite unclean while this structure lives
in proc_parse.h, which only have to fill this
structure on procfs read, but real handling
is inside mount.c. Move it as appropriate.
Same time ext_mount structure should be moved
into a header as well with sane @list name
used instead of @l.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For example we hit a case where systemd carries journal
file with 4M in size.
https://jira.sw.ru/browse/PSBM-38571
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Without using a freezer cgroup, we need to do a few iterations to catch
all tasks, because a new tasks can be born. If new tasks appear faster
than criu collects them, criu fails. The freezer cgroup allows to
solve this problem.
We freeze the freezer group, then attaches to tasks with ptrace and thaw
the freezer cgroup. We suppose that all tasks which are going to be
dumped in a specified freezer group.
v2: fix comments from Christopher
Reviewed-by: Christopher Covington <cov@codeaurora.org>
v3: refactor task_seize
v4: fix comments from Pavel
Cc: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's required for dumping tmpfs, where we use tar to save content.
If we need to execute tar from a proper userns to get right uid-s and
gid-s for files.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is here only to support the Linux Kernel between versions
3.18 and 4.2. After that, this workaround is not needed anymore,
but it will work properly on both a kernel with and without the bug.
The bug is that when a process has a file open in an OverlayFS directory,
the information in /proc/<pid>/fd/<fd> and /proc/<pid>/fdinfo/<fd>
is wrong, so we grab that information from the mountinfo table instead.
This is done every time fill_fdlink is called.
We first check to see if the mnt_id and st_dev numbers currently match
some entry in the mountinfo table. If so, we already have the correct mnt_id
and no fixup is needed.
Then we proceed to see if there are any overlayFS mounted directories
in the mountinfo table. If so, we concatenate the mountpoint with the
name of the file, and stat the resulting path to check if we found the
correct device id and node number. If that is the case, we update the
mount id and link variables with the correct values.
Signed-off-by: Gabriel Guimaraes <gabriellimaguimaraes@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It's preparation to use a freezer cgroup for freezing tasks.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded. Since
vma_area_is_private() is used by both restorer blob code and non
restorer blob code, which must use different variables for recording
the task size, make task_size a function argument and modify the call
sites accordingly. This fixes the following error on AArch64 kernels
with CONFIG_ARM64_64K_PAGES=y.
pie: Error (pie/restorer.c:929): Can't restore 0x3ffb7e70000 mapping w>
pie: ith 0xfffffffffffffff7
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded.
This fixes the following error on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y.
pie: Error (pie/restorer.c:772): Unable to unmap (-): -1211695104
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently each task subtracts number of zombies from
task_entries->nr_threads without locks, so if two tasks will do this
operation concurrently, the result may be unpredictable.
https://github.com/xemul/criu/issues/13
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>