MAP_ANON has been deprecated, use MAP_ANONYMOUS instead.
https://lkml.org/lkml/2007/2/3/55
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
This reverts commit 6b0292e126.
handle_binary() is needed for preparing PIE header during compilation.
If we need to provide API for building another compel binary, probably
it should be a separate library from libcompel.{a,so}
IOW, there is no sense to have handle_binary() included into CRIU.
For CRIU it's needed only during the build to generate parasite header.
Move handle_elf() to compel binary from library, as it was previously
corrected in commit 64bb75a859 ("compel: Drop off handle-elf routines
from library").
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Libcompel supplies to the parasite PAGE_SIZE during infection.
On ppc64/aarch64 PAGE_SIZE can be different with Large pages,
under the cover it returns __page_size.
At this moment __page_size is located in criu binary, which
prevents using libcompel as .so library in other applications
than criu on ppc64/aarch64.
Export __page_size and __page_shift in libcompel.
Fixes: #572
Fixes: 2d965008d3 ("ppc64/aarch64: Dynamically define PAGE_SIZE")
Cc: Adrian Reber <areber@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Alexey Shabalin
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
for i in $(find . -name "*.[ch]" -type f); do
sed -i 's/\(pr_warn(".*[^n]\)\("[),]\)/\1\\n\2/' $i
done
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need to know what are stack pointers of every thread to ensure that the
current stack page will not be treated as lazy.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
They were occasionally kept in obj-y.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
arm's cpu code is the same as aarch64, so use
the symlink.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
- externd compel_cpuinfo_t to keep all fpu information
neded for xsaves mode
- fetch xsaves data in compel_cpuid
All this will allow us to extend criu to support
avx-512 intructions.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This reduces memory usage if image files are stored on tmpfs.
Signed-off-by: Pawel Stradomski <pstradomski@google.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Will need them to mask some of the features from
command line options.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Tracking cpuid features is easier when sync'ed with kernel
source code. Note though that while in kernel feature bits
are not part of ABI, we're saving bits into an image so
as result make sure they are posted in proper place together
with keeping in mind the backward compatibility issue.
Here we also start using v2 of cpuinfo image with more
feature bits.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Commit 37e4c7bfc264 fixed arm, ppc, x86 (32bit),
while it made wrong definition of x86_64. Fix that.
Also, add commentary to raw fork() implementation.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On fedora rawhide seccomp_metadata for some
reason is not defined (while in kernel it introduced
together with PTRACE_SECCOMP_GET_METADATA). So
lets do a trick for a while -- define own alias.
Once system headers get settled down we might find
more suitable solution. Because it's a part of kernel
API we're on the safe side.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We will use it to figure out if filter log target is used.
Metadata associated with seccomp filter is relatively new
feature which allows userspace to get and set it back.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
To be close to the kernel code.
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
For architectures like aarch64/ppc64 it's needed to propagate the size
of page inside PIEs. For the parasite page size will be defined during
seizing, and for restorer during early initialization.
Afterward we can use PAGE_SIZE in PIEs like we did before.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
There are a lot of lines, which are longer than 79:
(04.331172)pie: 1: Error (criu/pie/restorer.c:460): seccomp: Unexpected tid ->
(04.331172)pie: 1: 1 != 1
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This function is an analogue to vsprintf(), and is used in very much the
same way. The caller expects the modified string pointer to be pointing to
a null-terminated string.
Signed-off-by: Joel Nider <joeln@il.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As it's aligned to 16, all structures that contain it should be
also aligned to 16. In the kernel there is no such align as
there two separate definitions of i387_fxsave_struct:
one for ia32 and another for x86_64.
Fixes newly introduced align warning in gcc-8.1:
In file included from compel/include/uapi/compel/asm/sigframe.h:7,
from compel/plugins/std/infect.c:13:
compel/include/uapi/compel/asm/fpu.h:89:1: error: alignment 1 of 'struct xsave_struct_ia32' is less than 16 [-Werror=packed-not-aligned]
} __packed;
^
It doesn't change the current align of the struct, as containing
structures are __packed and it aligned already *by fact*.
It only affects the function users of the struct's local variables:
now they lay aligned on a stack.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Ugh, I've spent 25 mins at 4 A.M. to figure out why the tests are failing.
And the reason is stupied me, who defined a new flag after 0x8
as 0x16, not as 0x10. Simplify those definitions for such simple-minded
living creatures like Dima.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On Skylake processors and kernel older than v4.14
ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
may return not full xstate, ommiting FP part (that is XFEATURE_MASK_FP).
There is a patch which describes this bug:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1318800.html
Anyway, it's fixed in v4.14 kernel by (what we believe with Andrey) this:
https://patchwork.kernel.org/patch/9567939/
As we still support kernels from v3.10 and newer, we need to have a
workaround for this kernel bug on Skylake CPUs.
Big thanks to Shlomi for the reports, the effort and for providing an
Amazon VM to test this. I wish more bug reporters were like you.
Reported-by: Shlomi Matichin <shlomi@binaris.com>
Provided-test-env: Shlomi Matichin <shlomi@binaris.com>
Investigated-with: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Mere cleanup. For Skylake workaround I'll call one after another,
so it's better separate it in a small helpers.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
get_task_regs() needs to know if it needs to use workaround
for a Skylake ptrace() bug. The next patch will introduce a
new flag for that.
I also thought about making 3 versions of get_task_regs() and
adding them to ictx->get_task_regs() depending on the flags..
But get_task_regs() is a private function and infect_ctx is
a uapi.. So, let's just pass context flags to get_task_regs().
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As we anyway define save_regs_t for other purposes,
use it in the function declaration.
To unify infect_ctx style, add make_sigframe_t.
Mere cleanup.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It has two arguments "pos_l and "pos_h" instead of one "off". It is used
to handle 64-bit offsets on 32-bit kernels.
SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
https://github.com/checkpoint-restore/criu/issues/424
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Except for several false positives done by:
find -type f -name "*.c" -not -path "./test/*" -exec sed -i
's/\(\<pr_err.*[^\][^n]\)\("[,)]\)/\1\\n\2/g' {} \;
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently we use the "map_files/%p-%p" format, but actually it should
be "map_files/%lx-%lx".
The kernel could handle both formats, but recently Alexey Dobriyan fixed
the kernel and it accept only the second format.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Regs are present in unsigned format so convert them
into signed first to provide results.
In particular if memfd_create syscall failed we won't
notice -ENOMEM error but rather treat it as unsigned
hex value
| (05.303002) Putting parasite blob into 0x7f1c6ffe0000->0xfffffff4
| (05.303234) Putting tsock into pid 42773
Signed-off-by: Cyrill Gorcunov <gorcunov@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Its declaration was changed in glibc headers and we don't want
to handle two version of itю
In file included from compel/include/uapi/compel/infect.h:6:0,
from compel/include/uapi/compel/compel.h:12,
from compel/include/log.h:4,
from compel/arch/aarch64/src/lib/infect.c:8:
compel/arch/aarch64/src/lib/infect.c: In function 'sigreturn_prep_regs_plain':
compel/include/uapi/compel/asm/sigframe.h:47:99: error: 'mcontext_t {aka struct <anonymous>}' has no member named '__reserved'; did you mean '__glibc_reserved1'?
#define RT_SIGFRAME_AUX_CONTEXT(rt_sigframe) ((struct aux_context*)&(rt_sigframe)->uc.uc_mcontext.__reserved)
^
compel/include/uapi/compel/asm/sigframe.h:48:41: note: in expansion of macro 'RT_SIGFRAME_AUX_CONTEXT'
#define RT_SIGFRAME_FPU(rt_sigframe) (&RT_SIGFRAME_AUX_CONTEXT(rt_sigframe)->fpsimd)
^~~~~~~~~~~~~~~~~~~~~~~
compel/arch/aarch64/src/lib/infect.c:34:34: note: in expansion of macro 'RT_SIGFRAME_FPU'
struct fpsimd_context *fpsimd = RT_SIGFRAME_FPU(sigframe);
^~~~~~~~~~~~~~~
make[1]: *** [/builddir/build/BUILD/criu-3.5/scripts/nmk/scripts/build.mk:209: compel/arch/aarch64/src/lib/infect.o] Error 1
make: *** [Makefile.compel:36: compel/libcompel.a] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.IExY9K (%build)
This seems to be related to the following glibc changes:
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=4fa9b3bfe6759c82beb4b043a54a3598ca467289
Reported-by: Adrian Reber <adrian@lisas.de>
Tested-by: Adrian Reber <adrian@lisas.de>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add handeling of R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX.
They are not that old, so I provided ifdef-guards for them.
According to x86-64 ABI specification paper, they should be
generated instead of R_X86_64_GOTPCREL for cases when relaxation
is possible.
At this moment we can handle them the same way like R_X86_64_GOTPCREL.
[0] https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdfFixes: #397
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Dump and restore process with RI control block. Runtime instrumentation
allows to collect information about program execution as CPU data.
The RI control block is dumped and restored for all threads.
Ptrace kernel interface is provided by:
c122bc239b13 ("s390/ptrace: add runtime instrumention register get/set")
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Dump and restore tasks with GS control blocks. Guarded-storage is a new
s390 feature to improve garbage collecting languages like Java.
There are two control blocks in the CPU:
- GS control block
- GS broadcast control block
Both control blocks have to be dumped and restored for all threads.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This is what we have:
> compel/src/lib/infect.c:1145:38: error: taking address of packed member
> 'uc_sigmask' of class or structure 'ucontext_ia32' may result in an
> unaligned pointer value [-Werror,-Waddress-of-packed-member]
> blk_sigset = RT_SIGFRAME_UC_SIGMASK(f);
> ~~~~~~~~~~~~~~~~~~~~~~~^~
> compel/include/uapi/asm/sigframe.h:133:4: note: expanded from macro
> 'RT_SIGFRAME_UC_SIGMASK'
> (&rt_sigframe->compat.uc.uc_sigmask))
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1 error generated.
Indeed this results in an unaligned pointer, but as this is intended and
well known (see commit dd6736bd "compel/x86/compat: pack ucontext_ia32"),
we need to silence the warning here.
For more details, see https://reviews.llvm.org/D20561
Originally found by Travis on Alpine Linux, reproduced on Ubuntu 17.10.
[v2: fix for non-x86]
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The right order for all of our 4 archs is:
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
int __user *, parent_tidptr,
unsigned long, tls,
int __user *, child_tidptr)
See Linux kernel for the details.
Note, this is just a fix, and it's not connected with the second patch.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The recent changes to user address space limits in the Linux kernel break
the assumption that TASK_SIZE is 128TB. For now, the maximal task size on
ppc64le is 512TB and we need to detect it in runtime for compatibility with
older kernels.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Add linux/userfaultfd.h to criu sources. This header is a part
of the kernel API and I see nothing wrong to have in the repo.
Why we want to do this:
* to check that criu works correctly if a kernel doesn't
support userfaultfd.
* to check compilation of the userfaultfd part in travis-ci.
v2: remove UFFD from FEATURES_LIST
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Adrian Reber <areber@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
For dumping threads we execute parasite code before we collect the
floating point registers with get_task_regs().
The following describes the code path were this is done:
dump_task_threads()
for (i = 0; i < item->nr_threads; i++)
dump_task_thread()
parasite_dump_thread_seized()
compel_prepare_thread()
prepare_thread()
### Get general purpose registers ###
ptrace_get_regs(ctx->regs)
### parasite code is executed in thread ###
compel_run_in_thread(tctl, PARASITE_CMD_DUMP_THREAD)
compel_get_thread_regs()
### Get FP and VX registers ###
get_task_regs()
Since on s390 gcc uses floating point registers to cache general
purpose without the -msoft-float option the floating point
registers would have been clobbered after compel_run_in_thread().
With this patch we first save all thread registers with task_get_regs() and
then call the parasite code.
We introduce a new function arch_set_thread_regs() that restores the saved
registers for all threads. This is necessary for criu dump --leave-running,
--check-only, and error handling.
Pre-dump does not require to save the registers for the threads because
because only the main thread (task) is used for parsite code execution.
The above changes allow us to use all register sets in the parasite
code - therefore we can remove -msoft-float on s390.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>