GCC now assumes by default that the stack is aligned to a 16-byte boundary.
It's very unlikely that parasite head's first call will contain
an SSE instruction which will segfault, but to be pedantically correct
will lose additional 8 bytes.
See also:
http://sourceforge.net/p/fbc/bugs/659/
Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We will need it for cr-check.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
iAcked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll use this when restoring eBPF programs in FILTER mode.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Task size depends from VM_SPLIT_* kernel option and cannot be hard coded.
This patch based on c0c0546c31 from
Christopher Covington.
Signed-off-by: Artem Kuzmitskiy <artem.kuzmitskiy@lge.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Since crtools is no more linked with object build for the
parasite/restorer blob, it no more needs the builtin memory services.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In places where we have errno value set, such as after calling ptrace(),
it makes sense to use pr_perror as it appends the errno string. This
also fixes missing '\n' at the end (as pr_perror() adds it).
In places where we keep using pr_err(), don't forget to have '\n'.
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
As ptrace() sets errno, it makes sense to use pr_perror().
This also fixes the bug of missing '\n'.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Acked-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The commit 871da9a111 ("pie: Give VDSO symbol table local scope")
move the definition of the vdso_symbols from global to local variable
in vdso_fill_symtable(). This makes sense since this variable is only
used in this function. However this raises a build issue on powerPC,
where a memcpy undefined symbol is detected when doing the first
relocation phase of the parasite code:
parasite_blob: Error (pie/piegen/elf.c:258): Unexpected undefined
symbol:memcpy
This memcpy symbol is pulled by the C compiler generated code which
tries to optimize the stack initialization when entering
vdso_fill_symtable(). The optimization is done by copying the
initialized data to the stack using memcpy. But when building the
parasite code, the C library is not linked and there is no memcpy
symbol. However there is builtin_memcpy() which is doing the same.
Ideally, the builtin_memcpy should be named memcpy() to replace the C
library one, and it should only be built for the parasite/restorer
code. But the way CRIU is built, the same vdso-util.o file is used
twice for criu which is linked with the C library and by the
parasite/restorer code. Thus naming builtin_memcpy memcpy leads to
belongs on builtin_memcpy even when the C library is in the picture,
which is not the best option (assuming C library mem operation are
more efficient).
Among the memcpy symbol issue, this shows that same objects are used
both in CRIU and the parasite/restorer code. This should not be the
case since parasite/restorer are built in pie form and criu's object
not. The shared code should be built twice, once on pie form for the
parasite/restorer code, once *normally* for the criu binary.
Addressing the build issue implies more work than expected.
For the moment, this patch is defining a memcpy service when building
the parasite code to fix the build issue on ppc64.
Once the build issue is addressed, builtin_memcpy should be renamed
memcpy and only be used for parasite/restorer code, and this
definition removed.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In the recent VDSO code reunification, some types were changed but
a pair of necessary corresponding changes was omitted. Fix that so
the AArch64 build succeeds without type-related
warnings-turned-errors. Also move the definition to the
AArch64-specific header since it's not currently being used by any
other architectures.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The commit ba74350954 which introduces support for Altivec and VSX
support was breaking the checkpoint --leave-running.
The root cause is that the address of the Altivec and VSX registers in the
signal frame should be computed for the stack in the context of the
checkpointed process.
This patch fixes this issue through the sigreturn_prep_fpu_frame which is
designed to update the signal frame based on the remote address.
Fixes: ba74350954 ("ppc64: Add Altivec and VSX support")
Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There was a \n missing.
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
There were multiple copy of the same code spread over the different
architectures handling the vDSO.
This patch is merging the duplicated code in arch/*/vdso-pie.c and
arch/*/include/asm/vdso.h in the common files and let only the architecture
specific part in the arch/*/* files.
The file are now organized this way:
include/asm-generic/vdso.h
contains basic definition which could be overwritten by
architectures.
arch/*/include/asm/vdso.h
contains per architecture definitions.
It may includes include/asm-generic/vdso.h
pie/util-vdso.c
include/util-vdso.h
These files contains code and definitions common to both criu and
the parasite code.
The file include/util-vdso.h includes arch/*/include/asm/vdso.h.
pie/parsite-vdso.c
include/parasite-vdso.h
contains code and definition specific to the parasite code handling
the vDSO.
The file include/parasite-vdso.h includes include/util-vdso.h.
arch/*/vdso-pie.c
contains the architecture specific code installing the vDSO
trampoline.
vdso.c
include/vdso.h
contains code and definition specific to the criu code handling the
vDSO.
The file include/vdso.h includes include/util-vdso.h.
CC: Christopher Covington <cov@codeaurora.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Merge arch/*/vdso.c files in vdso.c.
The merge has been done by copying arch/x86/vdso.c in vdso.c since it
contains patch a67e9f7bb9 ("vdso: don't play with a function exit code").
The commit f9ae6d9dd4 ("Replace remaining hard-coded TASK_SIZE use")
pushed in the aarch64 has been replicated since it should be cross
platform.
CC: Christopher Covington <cov@codeaurora.org>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The commit 69d008d567 ("Use run-time page_size() for mremap")
introduces the use of dynamic page size in rst-malloc.c.
The commit also add the include of unistd.h in
arch/aarch64/include/asm/page.h to allow the build to succeed on this
architecture. Since ppc64 is also using the same way to deal with page
size, the same include is required in arch/ppc64/include/asm/page.h
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
CC: Christopher Covington <cov@codeaurora.org>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded. While
trivial applications successfully checkpoint and restore on AArch64
kernels with CONFIG_ARM64_64K_PAGES=y without this patch, replacing
the remaining use of the hard-coded value seems like the best way to
guard against failures that more complex process trees and future uses
may expose.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
If we want one CRIU binary to work across all AArch64 kernel
configurations, a single task size value cannot be hard coded.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Make sure we don't pass any trash value here, because
the kernel does copy it explicitly. We allocate the
memory for frame as zero filled but stack segment
is special and zero is not acceptable (we've had
a discussion on LKML if we need a special handling
for zero ss but end up that new kernels need new CRIU
version, upon which all agreed). Finally in
commit 296bbf7e3 I managed to hit exactly
this problem :)
Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The old and new address parameters passed to the mremap system
call must be page size aligned. On AArch64, the page size can
only be correctly determined at run time. This fixes the following
errors for CRIU on AArch64 kernels with CONFIG_ARM64_64K_PAGES=y.
call mremap(0x3ffb7d50000, 8192, 8192, MAYMOVE | FIXED, 0x2a000)
Error (rst-malloc.c:201): Can't mremap rst mem: Invalid argument
call mremap(0x3ffb7d90000, 8192, 8192, MAYMOVE | FIXED, 0x32000)
Error (rst-malloc.c:201): Can't mremap rst mem: Invalid argument
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add checkpoint and restore of the Altivec and VSX registers.
Currently we rely on the return value of ptrace to detect if the CPU is
supporting these features or not. In the future, we should rely on the
AT_HWCAP vector and check feature at restart time.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This will help to support compat mode in the future.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll use this in the next patch for collecting the zombies without
actually waiting on them.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The initial support of the SYS V shared memory on ppc64 is broken. The call
to shmat done in the restore blob has no chance to work correctly.
This patch fixes the sys_shmat call.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We'll need this for use in the restorer blob for restoring LSMs. It looks like
arm already has openat, so I think it's just x86 and ppc that need it. In any
case, please double check this, as I've only tested it on x86.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
The checkpoint and restore of the Power PC floating point registers is
buggy.
The issue is that the signal frame context is defined to store double value
while the protocol buffer is handling unsigned 64bits integer value. A
silent cast done by the compiler was modifying the restored value in our
back.
This fix changes the type used when manipulating the FP registers value to
be consistent between checkpoint and restart.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This cleans the assembly code, removing no more needed trick with the
register 2 (TOC pointer). As a consequence, the __export_restore_task_trampoline()
and __export_unmap_trampoline() are no more needed.
Thus, the changes introduced by the commit de9df91002 ("Per architecture restorer
trampolines") in cr-restore.c are no more used but are not impacting
runtime code anyway.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
After this patch one can run ARCH="ia32" make to build
32bit version on CRIU on 64bit host. Note this is only
build procedure which tuned up, the CRIU itself is not
yet ready to make a checkpoint/restore cycle -- a lot
of additional code is needed and here we rather put
stubs simply to make build procedure run.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
To support x86-32 mode we will need own syscall table.
Here is it. Note the CRIU itself doesn't support such
mode yet.
Meanwhile put syscall table here just in case if someone
is adding new syscall 32 bit variant should be updated
as well.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Introduce optimized bit operation for PowerPc
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of belonging to the common C memcmp() function, belong on the
optimized one stolen from the kernel.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Instead of belonging to the common C memcpy function, belong on the
optimized one stolen from the kernel.
Cc: Anton Blanchard <anton@au.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Add various register definition to clean the assembly code.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Page size may change, so page size should be read through sysconf.
Suggested-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Fixing comment leading to think that 32bit tasks are supported, which is
not the case.
For the record, ppc64le is not supporting 32bit task, while ppc64 (the Big
Endian architecture) has an option to support 32bit task but CRIU doesn't
yet run on ppc64.
Reported-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Fix the following error:
> > LINK arch/x86/syscalls.built-in.o
> > arch/x86/crtools.c:36:20: error: unused function '__check_code_syscall'
> > [-Werror,-Wunused-function]
> > static inline void __check_code_syscall(void)
As the function consists of a few BUILD_BUG_ONs, it gets optimized out.
Let's add __attribute__((__unused__)) so clang stops complaining.
[v2: s/used/unused/, fix all the arches, whitespace cleanup]
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Some architectures like ppc64 requires a trampoline to be called prior to
the standard restorer services.
This patch introduces 3 trampolines which can be overwritten by
architectures in arch/x/include/asm/restore.h:
- arch_export_restore_thread
- arch_export_restore_task
- arch_export_unmap
The architecture which doesn't need to overwrite them, has nothing to do.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This patch initiates the ppc64le architecture support in CRIU.
Note that ppc64 (Big Endian) architecture is not yet supported since there
are still several issues to address with this architecture. However, in the
long term, the two architectures should be addressed using the almost the
same code, so sharing the ppc64 directory.
Major ppc64 issues:
Loader is not involved when the parasite code is loaded. So no relocation
is done for the parasite code. As a consequence r2 must be set manually
when entering the parasite code, and GOT is not filled.
Furthermore, the r2 fixup code at the services's global address which has
not been fixed by the loader should not be run. Branching at local address,
as the assembly code does is jumping over it.
On the long term, relocation should be done when loading the parasite code.
We are introducing 2 trampolines for the 2 entry points of the restorer
blob. These entry points are dealing with r2. These ppc64 specific entry
points are overwritting the standard one in sigreturn_restore() from
cr-restore.c. Instead of using #ifdef, we may introduce a per arch wrapper
here.
CRIU needs 2 kernel patches to be run powerpc which are not yet upstream:
- Tracking the vDSO remapping
- Enabling the kcmp system call on powerpc
Feature not yet supported:
- Altivec registers C/R
- VSX registers C/R
- TM support
- all lot of things I missed..
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
So won't be a warning on x86-32 (I don't like PRI conversion,
it's ugly as hell, plain long is enough here).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
- Get rid of shell script, we can do everything via make itself in parallel mode
- Collect syscall related data into syscalls subdirectory (we gonna implement
32 bit mode soon)
- We can't drop off __NR_ constants because we're using them in parasite code
(when we inject dumper and for "criu exec" mode)
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
In AArch64, pages may be 4K or 64K depending on kernel configuration.
The GNU C Library documentation suggests [1], "the correct interface
to query about the page size is sysconf". Introduce one new
architecture-specific function-like macro, page_size(), that on x86
and AArch32 remains a constant so as to minimally affect performance,
but on AArch64 is sysconf(_SC_PAGESIZE) for correctness.
1. https://www.gnu.org/software/libc/manual/html_node/Query-Memory-Parameters.html
To minimize churn, the PAGE_SIZE macro is left as a build-time
estimation of what the run-time page size might be.
This fixes the following errors for CRIU on AArch64 kernels with
CONFIG_ARM64_64K_PAGES=y, allowing dump of
`setsid sleep < /dev/null &> /dev/null` to succeed.
Error (kerndat.c:48): Can't stat self map_files: No such file or directory
Error (util.c:668): Can't read pme for pid 90: No such file or directory
Error (parasite-syscall.c:1135): Can't open 89/map_files/0x3ffb7da0000-0x3ffb7dac000 on procfs: No such file or directory
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We should not have a chance to exit with a wrong code on error
paths.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
selinux can deny mmap(PROT_WRITE | PROT_EXEC) and in this case it is
not clear why CRIU fails, "Can't allocate memory for parasite blob"
doesn't tell too much. Add a pr_warn() hint for the user.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Cyrill Gorcunov<gorcunov@openvz.org>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
tested under Gentoo Linux, with sys-kernel/linux-headers-3.19 installed
"struct user_pt_regs" is defined at file /usr/include/asm/ptrace.h
Signed-off-by: Yixun Lan <yixun.lan@gmail.com>
Reviewed-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>