We don't need to push 0 on the stack. This seems to be a remnant of the
initial commit of 2011 that had a `pushq $0`.
The 16 bytes %rsp alignment was added with commit 2a0cea29 in 2012.
This is no longer necessary as we already guarantee that %rsp is 16
bytes aligned. A BUG_ON() is added to enforce this guarantee.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Previously, __export_parasite_cmd was located in parasite-head.S, and
__export_parasite_args located exactly at the end of the parasite blob.
This is not ideal for various reasons:
1) These two variables work together. It would be preferrable to have
them in the same location
2) This prevent us from allocating another section betweeen the parasite
blob and the args area. We'll need this to allocate a GOT table
This commit changes the allocation of these symbols from assembly/linker
script to a C file.
Moreover, the assembly entry points that invoke parasite_service()
prepares arguments with hand crafted assembly. This is unecessary.
This commit rewrite this logic with regular C code.
Note: if it wasn't for the x86 compat mode, we could remove all
parasite-head.S files and directly jump to parasite_service() via
ptrace. An int3 architecture specific equivalent could be called at the
end of parasite_service() with an inline asm statement.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
It is unnecessary and potentially confusing for understanding the memory
layout requirement of the parasite blob.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
It removes the potential confusion when it comes to virtual address vs
offsets. Further, doing so makes naming more consistent with the rest of
the parasite_blob_desc struct.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
compel_relocs_apply() was taking arguments mostly from the struct
parasite_blob_desc. Instead of passing all the arguments, we pass a
pointer to the struct itself.
This makes the code safer, as cr-restore.c calls compel_relocs_apply().
It previously needed to poke into what can be considered private
variables of the restorer-pie.h file.
To allow the parasite_blob_desc struct to be populated without a
parasite_ctl struct, we expand the compel API to export a
parasite_setup_c_header_desc() in the generated pie.h.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
COMMON symbols are emitted for global variable that are not initialized
in shared libraries.
They typically end up in the .bss section when linking an executable.
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
We should follow Linux Kernel Codding Style:
... the closing brace is empty on a line of its own, except in the cases
where it is followed by a continuation of the same statement, ie ... an
else in an if-statement ...
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces
Automaticly fixing with:
:!git grep --files-with-matches "^\s*else[^{]*{" | xargs
:argadd <files>
:argdo :%s/}\s*\n\s*\(else[^{]*{\)/} \1/g | update
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The file only includes other headers (which may be not needed).
If we aim for one-include-for-compel, we could instead paste all
subheaders into "compel.h".
Rather, I think it's worth to migrate to more fine-grained compel
headers than follow the strategy 'one header to rule them all'.
Further, the header creates problems for cross-compilation: it's
included in files, those are used by host-compel. Which rightfully
confuses compiler/linker as host's definitions for fpu regs/other
platform details get drained into host's compel.
Signed-off-by: Dmitry Safonov <dima@arista.com>
This patch fixes the problem with SSE (xmm) registers corruption on amd64
architecture. The problem was that gcc generates parasite blob that uses
xmm registers, but we don't preserve this registers in CRIU when injecting
parasite. Also, gcc, even with -nostdlib option uses builtin memcpy,
memset functions that optimized for amd64 and involves SSE registers.
It seems, that optimal solution is to use -ffreestanding gcc option
to compile parasite. This option implies -fno-builtin and also it designed
for OS kernels compilation/another code that suited to work on non-hosted
environments and could prevent future sumilar bugs.
To check that you amd64 CRIU build affected by this problem you could simply
objdump -dS criu/pie/parasite.o | grep xmm
Output should be empty.
Reported-by: Diyu Zhou <zhoudiyupku at gmail.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Just like on all other supported architectures gcc complains about the
stack pointer register being part of the clobber list:
error: listing the stack pointer register ‘15’ in a clobber list is deprecated [-Werror=deprecated]
This removes the stack pointer from the clobber list.
'zdtm.py run -a' still runs without any errors after this change.
Signed-off-by: Adrian Reber <areber@redhat.com>
Linux kernel 5.4 extends clone3() with set_tid to allow processes to
specify the PID of a newly created process. This introduces detection
of the clone3() syscall and if set_tid is supported.
This first implementation is X86_64 only.
Signed-off-by: Adrian Reber <areber@redhat.com>
Compiling 'criu-dev' on Fedora 31 gives two errors about wrong clobber
lists:
compel/include/uapi/compel/asm/sigframe.h:47:9: error: listing the stack pointer register ‘1’ in a clobber list is deprecated [-Werror=deprecated]
criu/arch/ppc64/include/asm/restore.h:14:2: error: listing the stack pointer register ‘1’ in a clobber list is deprecated [-Werror=deprecated]
There was also a bug report from Debian that CRIU does not build because
of this.
Each of these errors comes with the following note:
note: the value of the stack pointer after an ‘asm’ statement must be the same as it was before the statement
As far as I understand it this should not be a problem in this cases as
the code never returns anyway.
Running zdtm very seldom fails during 'zdtm/static/cgroup_ifpriomap'
with a double free or corruption. This happens not very often and I
cannot verify if it happens without this patch. As CRIU does not build
without the patch.
Signed-off-by: Adrian Reber <areber@redhat.com>
SRCARCH is always equal ARCH. There are no rules when to use one or
another and architectures may forget to set one of them up.
No need for a second variable meaning the same and confusing people.
Remove it completely.
Self-correction [after some debug]: SRCARCH was different in one place:
zdtm Makefile by some unintentional mistake:
> ifeq ($(ARCH),arm64)
> ARCH ?= aarch64
> SRCARCH ?= aarch64
> endif
That meant to be "ARCH := aarch64" because "?=" would never work inside
that ifeq. Fix up this part of mess too.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
All those compel functions can fail by various reasons.
It may be status of the system, interruption by user or anything else.
It's really desired to handle as many PIE related errors as possible
otherwise it's hard to analyze statuses of parasite/restorer
and the C/R process.
At least warning for logs should be produced or even C/R stopped.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Unknown state means that the task in the end may be not in wanted state.
Return err code.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
As a preparation for __must_check on compel_syscall(), check it on
close() too - maybe not as useful as with other syscalls, but why not.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Also, don't use the magic -2 => return errno on failure.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
>From man ptrace:
> On error, all requests return -1, and errno is set appropriately.
> Since the value returned by a successful PTRACE_PEEK* request may be
> -1, the caller must clear errno before the call, and then check
> it afterward to determine whether or not an error occurred.
FWIW: if ptrace_peek_area() is called with (errno != 0) it may
false-fail if the data is (-1).
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Before the 5.2 kernel, only fpu_state->fpu_state_64.xsave has to be
64-byte aligned. But staring with the 5.2 kernel, the same is required
for pu_state->fpu_state_ia32.xsave.
The behavior was changed in:
c2ff9e9a3d9d ("x86/fpu: Merge the two code paths in __fpu__restore_sig()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
`pushq` sign-extends the value. Which is a bummer as the label's address
may be higher that 2Gb, which means that the sign-bit will be set.
As it long-jumps with ia32 selector, %r11 can be scratched.
Use %r11 register as a temporary to push the 32-bit address.
Complements: a9a760278c ("arch/x86: push correct eip on the stack
before lretq")
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Right now we use pushq, but it pushes sign-extended value, so if the
parasite code is placed higher that 2Gb, we will see something like
this:
0xf7efd5b0: pushq $0x23
0xf7efd5b2: pushq $0xfffffffff7efd5b9
=> 0xf7efd5b7: lretq
Actually we want to push 0xf7efd5b9 instead of 0xfffffffff7efd5b9.
Fixes: #398
Cc: Dmitry Safonov <dima@arista.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
I've mentioned the problem that after c/r each inotify receives one or
more unexpected events.
This happens because our algorithm mixes setting up an inotify watch on
the file with opening and closing it.
We mix inotify creation and watched file open/close because we need to
create the inotify watch on the file from another mntns (generally). And
we do a trick opening the file so that it can be referenced in current
mntns by /proc/<pid>/fd/<id> path.
Moreover if we have several inotifies on the same file, than queue gets
even more events than just one which happens in a simple case.
note: For now we don't have a way to c/r events in queue but we need to
at least leave the queue clean from events generated by our own.
These, still, looks harder to rewrite wd creation without this proc-fd
trick than to remove unexpected events from queues.
So just cleanup these events for each fdt-restorer process, for each of
its inotify fds _after_ restore stage (at CR_STATE_RESTORE_SIGCHLD).
These is a closest place where for an _alive_ process we know that all
prepare_fds() are done by all processes. These means we need to do the
cleanup in PIE code, so need to add sys_ppoll definitions for PIE and
divide process in two phases: first collect and transfer fds, second do
real cleanup.
note: We still do prepare_fds() for zombies. But zombies have no fds in
/proc/pid/fd so we will collect no in collect_fds() and therefore we
have no in prepare_fds(), thus there is no need to cleanup inotifies for
zombies.
v2: adopt to multiple unexpected events
v3: do not cleanup from fdt-receivers, done from fdt-restorer
v4: do without additional fds restore stage
v5: replace sys_poll with sys_ppoll and fix minor nits
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
use ppoll always and remove poll
Provide a way to set gettimeofday() function for an infected task.
CRIU's parasite & restorer are very voluble as more logs are better
than lesser in terms of bug investigations.
In all modern kernels there is a way to get time without entering
kernel: vdso. So, add a way to reduce the cost of logging without making
it less valuable.
[I'm not particularly fond of std_log_set_gettimeofday() name, so
if someone can come with a better naming - I'm up for a change]
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Doesn't change uapi, but makes it a bit more friendly and documented
which loglevel means what for foreign user.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
After patching code - we need to flush CPU cache, it's done with
__builtin___clear_cache(). As we don't link to libgcc, provide a helper
that wraps ARM-specific syscall.
Fixes:
LINK criu/pie/restorer.built-in.o
ld: ./criu/arch/arm/vdso-pie.o: in function `insert_trampoline':
/root/criu/criu/arch/arm/vdso-pie.c:32: undefined reference to `__clear_cache'
Signed-off-by: Dmitry Safonov <dima@arista.com>
It's needed for PIEs, but not for the library.
It comes earlier than commit 61e6c01d09, but I don't see the point.
Regardles, I'm a bit afraid to break s390, hopefully testing covers the
platform.
This and the next one "make: Move CR_NOGLIBC into CFLAGS_PIE" should be
reverted/dropped from criu-dev if they turn to be breaking something.
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
__ASSEMBLY__ is used to guard C-related code in headers from
asm-compatible defines. We actually want every .S file to be assembled
with -D__ASSEMBLY__ not to burst with C in asm file.
Move __ASSEMBLY__ from all local asflags to top Makefile's AFLAGS.
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
In a function with return type bool, returning a non-zero value is
interpreted as returning true. In the error paths we want to return
false to indicate failure. Change -1 to false to fix this.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
sizeof(sizeof(x)) is the size of size_t. Instead use the size of the
array to ensure the entire array is zeroed.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
to enable Hygon Dhyana, which can reuse most AMD CPU support codes.
Signed-off-by: hygonsoc <hygonsoc@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
There are a few places where spaces have been used instead of tabs for
indentation. This patch converts the spaces to tabs for consistency
with the rest of the code base.
Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
It has a different alignment - rework ugly macro RT_SIGFRAME_UC_SIGMASK
into helpers.
Fixes: #666
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
with Android P's Clang versoin: 6.0.2, and Android NDK's Clang version 8.0.2
Clang will report below error:
criu/compel/include/uapi/compel/sigframe-common.h:55:34: error: expected member name or ';' after declaration specifiers
int __unused[32 - (sizeof (k_rtsigset_t) / sizeof (int))];
~~~ ^
it takes __unused as an attribute, not a varible, chang to _unused, pass compile.
Cc: Chen Hu <hu1.chen@intel.com>
Signed-off-by: Zhang Ning <ning.a.zhang@intel.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
in Android NDK, <elf.h> doesn't has define for:
NT_X86_XSTATE
NT_PRSTATUS
so add these defines to pass compile.
NOTE: add <linux/elf.h> will have more build errors
Cc: Chen Hu <hu1.chen@intel.com>
Signed-off-by: Zhang Ning <ning.a.zhang@intel.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Removed return value assignment statements as they are not referenced or used
anywhere after the assignment is done.
Fixes#334: Removing Unneeded Assignments
Signed-off-by: Mitul Karnik <mitulkarnik.92@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
I received this patch from Jeff Law as a fix for build failures with the
upcoming GCC 9. The following is Jeff Law's description of the patch:
Attached you'll find the fix for criu. You'll see it's just a matter
of dropping the sp/esp clobber from the relevant asm statements. THe
details:
criu has a macro which defines an asm which appears to want to set a new
stack pointer, then directly issue a sigreturn call to the kernel. Some
variants clobber sp (aarch64, arm, x86), others do not (ppc, s390)
While the asm does indeed set a new stack pointer, we never return from
a sigreturn syscall -- at least not in the normal way. We actually
return back to the point where the process was interrupted by the
signal. So describing the affect of the asm on the stack pointer is
pedantically correct, it actually has no real effect and can just be
dropped to avoid the hard error from gcc-9.
Suggested-by: Jeff Law <law@redhat.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
This avoids installing .gitignore file to compel includes.
While at it, describe why this .gitignore is needed.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>