This commit adds basic support for dumping and restoring seccomp filters
via the new ptrace interface. There are two current known limitations with
this approach:
1. This approach doesn't support restoring tasks who first do a seccomp()
and then a setuid(); the test elaborates on this and I don't think it is
tough to do, but it is not done yet.
2. Filters are compared via memcmp(), so two tasks which have the same
parent task and install identical (via memory) filters will have those
filters considered to be the "same". Since we force all tasks to have
the same creds (including seccomp filters) right now, this isn't a
problem.
The approach used here is very similar to the cgroup approach: the actual
filters are stored in a seccomp.img, and each task has an id that points to
the part of the filter tree it needs to restore. This keeps us from dumping
the same filter multiple times, since filters are inherited on fork.
v2:
* remove unused seccomp_filters field from struct rst_info
* rework memory layout for passing filters to restorer blob
* add a sanity check when finding inherited filters
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In order to restore seccomp filters, we need to have access to dynamically
allocated memory from the restorer blob, so we should unmap this memory
afterwards. In order to do this, we need to suspend seccomp earlier, right
after we attach to the tasks instead of just before we do the unmap of the
restorer blob itself.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The restorer prints pointer addresses in error codes
Reported-by: Artem Kuzmitskiy <artem.kuzmitskiy@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch change the build chain to not use pie objects in the crtools
executable.
This done by building the shared source files twice:
1. for parasite/restorer as '<file>-pie-build.o'
2. for crtools as '<file>.o'
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In commit c2271198, Laurent Dufour kindly reunified the VDSO code
that had become duplicated between architectures. Unfortunately
this introduced a regression in AArch64 where apparently due to
the scope of vdso_symbols array of pointers to characters changing
from local to global, load-time relocations became necessary.
The following thread on the GCC mailing list discusses why
load-time relocations can be necessary when pointers are used,
although it doesn't mention the potential for locally scoped
arrays to be handled differently:
https://gcc.gnu.org/ml/gcc/2004-05/msg01016.html
Because the alternatives, such as porting piegen to AArch64, are
far more involved, simply revert the change in scope.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There were multiple copy of the same code spread over the different
architectures handling the vDSO.
This patch is merging the duplicated code in arch/*/vdso-pie.c and
arch/*/include/asm/vdso.h in the common files and let only the architecture
specific part in the arch/*/* files.
The file are now organized this way:
include/asm-generic/vdso.h
contains basic definition which could be overwritten by
architectures.
arch/*/include/asm/vdso.h
contains per architecture definitions.
It may includes include/asm-generic/vdso.h
pie/util-vdso.c
include/util-vdso.h
These files contains code and definitions common to both criu and
the parasite code.
The file include/util-vdso.h includes arch/*/include/asm/vdso.h.
pie/parsite-vdso.c
include/parasite-vdso.h
contains code and definition specific to the parasite code handling
the vDSO.
The file include/parasite-vdso.h includes include/util-vdso.h.
arch/*/vdso-pie.c
contains the architecture specific code installing the vDSO
trampoline.
vdso.c
include/vdso.h
contains code and definition specific to the criu code handling the
vDSO.
The file include/vdso.h includes include/util-vdso.h.
CC: Christopher Covington <cov@codeaurora.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When building without the vDSO support on PowerPC (which is not a good
idea), the build is failing because few files are not included in the
build.
This fix moves those files inclusion outside of the vDSO directive.
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded. Since
vma_area_is_private() is used by both restorer blob code and non
restorer blob code, which must use different variables for recording
the task size, make task_size a function argument and modify the call
sites accordingly. This fixes the following error on AArch64 kernels
with CONFIG_ARM64_64K_PAGES=y.
pie: Error (pie/restorer.c:929): Can't restore 0x3ffb7e70000 mapping w>
pie: ith 0xfffffffffffffff7
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded.
This fixes the following error on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y.
pie: Error (pie/restorer.c:772): Unable to unmap (-): -1211695104
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Currently each task subtracts number of zombies from
task_entries->nr_threads without locks, so if two tasks will do this
operation concurrently, the result may be unpredictable.
https://github.com/xemul/criu/issues/13
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Also remove the cast of a pointer-to-void variable to the type
it already is.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This fixes the following error for CRIU on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y.
Error (cr-restore.c:2828): Can't mmap section for restore code
This occurred because the address being requested (0x16000 in
one case) was not page aligned.
Also change the capitalization of the pie_size() macro to make it
clear that the value is not necessarily a build-time constant.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When a TASK_HELPER would exit just before a zombie, sometimes the signal
would get coalesced, and we would miss the zombie exit, causing us to block
forever waiting for the zombie to complete. Let's use an entirely different
strategy for waiting on zombies: explicitly wait on them with waitid, and
use WNOWAIT to prevent their data from actually being reaped.
v2: don't decrement nr_{tasks,threads} in the loop
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
| CID 96750 (#1 of 1): Resource leak (RESOURCE_LEAK)
| 163. leaked_storage: Variable sec_hdrs going out of scope leaks the storage it points to.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
linux/seccomp.h may not be available, and the seccomp mode might not be
listed in /proc/pid/status, so let's not assume those two things are
present.
v2: add a seccomp.h with all the constants we use from linux/seccomp.h
v3: don't do a compile time check for PTRACE_O_SUSPEND_SECCOMP, just let
ptrace return EINVAL for it; also add a checkskip to skip the
seccomp_strict test if PTRACE_O_SUSPEND_SECCOMP or linux/seccomp.h
aren't present.
v4: use criu check --feature instead of checkskip to check whether the
kernel supports seccomp_suspend
Reported-by: Mr. Jenkins
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Unfortunately, SECCOMP_MODE_FILTER is not currently exposed to userspace,
so we can't checkpoint that. In any case, this is what we need to do for
SECCOMP_MODE_STRICT, so let's do it.
This patch works by first disabling seccomp for any processes who are going
to have seccomp filters restored, then restoring the process (including the
seccomp filters), and finally resuming the seccomp filters before detaching
from the process.
v2 changes:
* update for kernel patch v2
* use protobuf enum for seccomp type
* don't parse /proc/pid/status twice
v3 changes:
* get rid of extra CR_STAGE_SECCOMP_SUSPEND stage
* only suspend seccomp in finalize_restore(), just before the unmap
* restore the (same) seccomp state in threads too; also add a note about
how this is slightly wrong, and that we should at least check for a
mismatch
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For testing purpose we need to disable using of
piegen utility. So lets add PIEGEN make option
thus one can "PIEGEN=no make" to build criu
without piegen at all.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of keeping around multiple fds that point to various places in
/proc, let's just use /proc and openat() things relative to it.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This is a little tricky, since the threads are forked in the restorer blob, we
can't open their attr/curent files to pass into the restorer blob. So, we pass
in an fd for /proc that the restorer blob can use to access the attr/current
files once they exist.
N.B. this is still incorrect in that it restores the same credentials for all
threads in the group; however, it matches the behavior of the current creds
restore code, which also restores the same creds for all threads in the group.
v2: use simple_sprintf() instead of pie_strcat()
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll use this in the next patch for printing paths to LSM files in /proc.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Otherwise getting
| parasite-syscall.c: In function ‘parasite_infect_seized’:
| parasite-syscall.c:1222:5: error: ‘elf_relocs’ undeclared (first use in this function)
Simply wrap the @elf_relocs_apply with macros.
Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
For example some linkers generate @__export_parasite_args
as symbol which won't relocate. Handle such case properly.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The error I got was:
CC pie/piegen/elf-x86-64.o
In file included from pie/piegen/elf-x86-32.c:16:0:
pie/piegen/elf.c: In function ‘handle_elf_x86_32’:
pie/piegen/elf.c:476:3: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 6 has type ‘Elf32_Word’ [-Werror=format=]
pr_debug("Copying section '%s'\n" \
^
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add prefix to the piegen's error and debug output to avoid confusion and
fix few debug lines.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since opts is defined as extern in piegen.h, there is no need to pass it as
argument.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Introduce a new -o argument to piegen to specify generate file name.
Send the debug stream to stdout and force it to /dev/null in the makefile
if V=1 is not specify.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When building the blob in the generated header file, we may
shrink the output blobk and only copy the sections with the SHF_ALLOC
bit set, the other ones are not needed at runtime.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This cleans the assembly code, removing no more needed trick with the
register 2 (TOC pointer). As a consequence, the __export_restore_task_trampoline()
and __export_unmap_trampoline() are no more needed.
Thus, the changes introduced by the commit de9df91002 ("Per architecture restorer
trampolines") in cr-restore.c are no more used but are not impacting
runtime code anyway.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
PowerPC linking requires the TOC to be in its own section
and to be aligned.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
- Move relocs application into a separate file which get
compiled as a regular C file in criu (pie/pie-relocs.[ch])
- Move types used by piegen into pie/piegen/uapi/types.h
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We should use provided @nr_relocs instead of ARRAY_SIZE here.
Didn't spot it earlier simply because at moment on x86-64
there is no relocs at all.
Also when we apply relocation they are to be computed from
virtual base of parasite address, not from local memory
map address, so add @vbase parameter. And fix typo on
addend in gotpcrel.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There is no rip addressing in 32bit code but PIE code
require GOT tables and friend which we better escape
for performance sake. So lets use pc relocations it
should do the trick.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since at the moment we're running only x86-64 not 32 bit tasks,
and our code is not carrying any big statically defined structures
we can use relocatable files directly with all relocation applied
during building.
This gonna be changed soon once we start supporting 32 bit tasks.
IOW even currently we need (which is not yet done but it's safe)
- check for gotpcrel relocations
- apply relocations with generated elf_apply_relocs helper
Currently overall scheme looks this way
- our object files are linked together into parasite.built-in.bin.o file
- then pie/piegen/piegen tool is called which parses this file and generates
C source code file with bytestream and all information needed to rellocate
this bytestream into a new place (see elf_apply_relocs helper)
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Here we simply build piegen tool which gonna be used
to generate parasite code safe to rellocate. The tool
is taking object file as an argument, parses it and
generates C file with rellocations encoded in form
suitable for fast appliance.
Currently only x86-32 x86-64 is supported.
v2 (by ldufour@):
- Filter PIEGEN
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
On restore we have several arrays of objects that get remapped
into pie area and their number is also passed. Clean and shorten
the remapping code a bit and bing their naming to common format.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of belonging to the common C memcmp() function, belong on the
optimized one stolen from the kernel.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of belonging to the common C memcpy function, belong on the
optimized one stolen from the kernel.
Cc: Anton Blanchard <anton@au.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch adds support for checkpoint and restore of two linux security
modules (apparmor and selinux). The actual checkpoint or restore code isn't
that interesting, other than that we have to do the LSM restore in the restorer
blob since it may block any number of things that we want to do as part of the
restore process.
I tried originally to get this to work using libraries in the restorer blob,
but I could _not_ get things to work correctly (I assume I was doing something
wrong with all the static linking, you can see my draft attempts here:
https://github.com/tych0/criu/commits/apparmor-using-libraries ). I can try to
resurrect this if it makes more sense, to do it that way, though.
v2: lsm_profile lives in creds.proto instead of the task core, look in a more
canonical place for selinuxfs and don't try to special case any selinux
profile names.
v3: only allow unconfined selinux profiles
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch initiates the ppc64le architecture support in CRIU.
Note that ppc64 (Big Endian) architecture is not yet supported since there
are still several issues to address with this architecture. However, in the
long term, the two architectures should be addressed using the almost the
same code, so sharing the ppc64 directory.
Major ppc64 issues:
Loader is not involved when the parasite code is loaded. So no relocation
is done for the parasite code. As a consequence r2 must be set manually
when entering the parasite code, and GOT is not filled.
Furthermore, the r2 fixup code at the services's global address which has
not been fixed by the loader should not be run. Branching at local address,
as the assembly code does is jumping over it.
On the long term, relocation should be done when loading the parasite code.
We are introducing 2 trampolines for the 2 entry points of the restorer
blob. These entry points are dealing with r2. These ppc64 specific entry
points are overwritting the standard one in sigreturn_restore() from
cr-restore.c. Instead of using #ifdef, we may introduce a per arch wrapper
here.
CRIU needs 2 kernel patches to be run powerpc which are not yet upstream:
- Tracking the vDSO remapping
- Enabling the kcmp system call on powerpc
Feature not yet supported:
- Altivec registers C/R
- VSX registers C/R
- TM support
- all lot of things I missed..
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We need a custom flags to build 32bit varian of criu
on 64bit host system, lets pass @ldflags-y here for
that.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>