We've slowed down a little, but still new features and bugfixes appear.
This time we've improved lazy migration, completed SCM messages support,
added nesting netns support (now we have two of them -- net and mnt)
and did many small fixes here and there.
Signed-off-by: Pavel Emelyanov <xemul@virtuozzo.com>
Child may see close() result before it receives signal,
while it shouldn't see it. Instead of games with later
close(), just stop do it. sys_exit() after program finish
will close them all.
Reported-by: Andrey Vagin <avagin@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently we save only attributes with non-zero values. For example,
a default value for IFLA_IPTUN_PROTO is IPPROTO_IPV6 (41), so we have to
save even attributes with zero values.
https://github.com/checkpoint-restore/criu/issues/445
Fixes: 4a044e6af9 ("net: Dump regular sit device")
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We open ns_fd via ioctl(SIOCGSKNS), but never close. Fix that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
x86's kerndat section in crtools.c has grown too much.
Let's make it more readable and *looking at cleared include-list*,
it'll better parallelize build.
Maybe we should turn __weak function into 0-defines.
Or clean 0-defines with ifdefs in generic file.
I have no strong opinion on that.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
The --lazy-migrate option allows testing of lazy migration when running ns
or uns flavor.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
CID 85039 (#1 of 1): Unchecked return value (CHECKED_RETURN)
6. check_return: Calling ptrace without checking return value (as is done elsewhere 44 out of 49 times).
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
CID 155804 (#1 of 1): Unchecked return value (CHECKED_RETURN)
2. check_return: Calling umount2 without checking return value (as is done elsewhere 8 out of 9 times).
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
CID 172198 (#1 of 1): Resource leak (RESOURCE_LEAK)
9. leaked_storage: Variable sk going out of scope leaks the storage it points to.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
CID 154076 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
1. check_return: Calling setsockopt(sk, 6, 1, &val, 4U) without checking return value. This library function may fail and return an error code.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
CID 161693 (#1 of 1): Resource leak (RESOURCE_LEAK)
5. leaked_storage: Variable new going out of scope leaks the storage it points to.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
*** CID 179043: (USE_AFTER_FREE)
close bfd fd safe so that we won't have double close
*** CID 179041: Resource leaks (RESOURCE_LEAK)
don't forget to close fd on error
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
*** CID 185302: Null pointer dereferences (NULL_RETURNS)
/test/zdtm/static/cgroup_ifpriomap.c: 107 in read_one_priomap()
>>> Dereferencing a pointer that might be null "out->ifname" when calling "strncpy".
There is also a warning about using rand(), but..
Not sure that we need to entangle everything just for pleasing Coverity:
>>> CID 185301: Security best practices violations (DC.WEAK_CRYPTO)
>>> "rand" should not be used for security related applications, as linear congruential algorithms are too easy to break.
Leaving that as-is and marking in Coverity as WONTFIX.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
I've also dropped `noauto' in this patch, reverting the
commit be98273cf1 ("zdtm: mark static/cgroup_ifpriomap as noauto")
Don't see any sense to separate it as another patch.
Fixes: #383
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Currently, if pipe is shared between lazy and non-lazy PPBs lazy migration
fails because data that should be transfered on demand is spliced into the
images. Preventing pipe sharing between PPBs of different type resolves
this issue.
In order to still minimize pipe fragmentation, we track the last pipe that
was used for certain PPB type and re-use it for the PPB of the same type.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As it's aligned to 16, all structures that contain it should be
also aligned to 16. In the kernel there is no such align as
there two separate definitions of i387_fxsave_struct:
one for ia32 and another for x86_64.
Fixes newly introduced align warning in gcc-8.1:
In file included from compel/include/uapi/compel/asm/sigframe.h:7,
from compel/plugins/std/infect.c:13:
compel/include/uapi/compel/asm/fpu.h:89:1: error: alignment 1 of 'struct xsave_struct_ia32' is less than 16 [-Werror=packed-not-aligned]
} __packed;
^
It doesn't change the current align of the struct, as containing
structures are __packed and it aligned already *by fact*.
It only affects the function users of the struct's local variables:
now they lay aligned on a stack.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
It helped a bit to debug Skylake ptrace() bug, let's put it in.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Ugh, I've spent 25 mins at 4 A.M. to figure out why the tests are failing.
And the reason is stupied me, who defined a new flag after 0x8
as 0x16, not as 0x10. Simplify those definitions for such simple-minded
living creatures like Dima.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
On Skylake processors and kernel older than v4.14
ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
may return not full xstate, ommiting FP part (that is XFEATURE_MASK_FP).
There is a patch which describes this bug:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1318800.html
Anyway, it's fixed in v4.14 kernel by (what we believe with Andrey) this:
https://patchwork.kernel.org/patch/9567939/
As we still support kernels from v3.10 and newer, we need to have a
workaround for this kernel bug on Skylake CPUs.
Big thanks to Shlomi for the reports, the effort and for providing an
Amazon VM to test this. I wish more bug reporters were like you.
Reported-by: Shlomi Matichin <shlomi@binaris.com>
Provided-test-env: Shlomi Matichin <shlomi@binaris.com>
Investigated-with: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Mere cleanup. For Skylake workaround I'll call one after another,
so it's better separate it in a small helpers.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
get_task_regs() needs to know if it needs to use workaround
for a Skylake ptrace() bug. The next patch will introduce a
new flag for that.
I also thought about making 3 versions of get_task_regs() and
adding them to ictx->get_task_regs() depending on the flags..
But get_task_regs() is a private function and infect_ctx is
a uapi.. So, let's just pass context flags to get_task_regs().
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
As we anyway define save_regs_t for other purposes,
use it in the function declaration.
To unify infect_ctx style, add make_sigframe_t.
Mere cleanup.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We need to know if ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, iov)
returns xsave without FP state part.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Error-path for failed fork().
Looks originally forgotten, oops!
Also print a message on failed fork().
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We ignore restore_one_*notify() error code, while we mustn't.
Make open function fail when we can't restore them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Imagine, we have to restore inotify with watch descriptor 0x34d71d6.
Then we have:
1.235021 5578: fsnotify: Watch got 0x1 but 0x34d71d6 expected
...
...
527.378042 5578: fsnotify: Watch got 0x34d71d3 but 0x34d71d6 expected
527.378042 5578: fsnotify: Watch got 0x34d71d4 but 0x34d71d6 expected
527.378042 5578: fsnotify: Watch got 0x34d71d5 but 0x34d71d6 expected
Stop doing this and stop generating GBs of debug messages.
We already have print message before restore_one_inotify().
Let's add just one more after it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This test makes looped unix sockets queues and tries
to iterate over them after the restore.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Everything is ready. Message queue restores are in
the second stage of open for all types of unix sockets.
We just need to make scm wait before restore_unix_queue()
and allow to dump such scm context.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
When we allow unix sockets sent over unix sockets,
dump_sk_queue() may dump and resolve some peers.
So, we need run it firstly and avoid linking our
peer_node to peer's peer_list.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Similar to previous patch, this makes the second end
of dgram socketpair to be open till post open. This
allows to delay restore of message queue.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This makes the second end of socketpair to live till post_open.
We need it alive if we want to restore message queue later.
Otherwise, we do not have a queuer, which fd is used to actually
write messages.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Extract the functionality, which makes socket memory initialization.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This function will be used to allocate id for fake files
(don't confuse with fake fds, e.g. fles).
Suggested-by: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
restore_sk_common() may shutdown a socket, and queuer
won't be able to connect to it. So, this action must
be postponed.
We have this problem since long ago, but we are lucky
we haven't bumped in it.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Actually, there is no functional changes. We just postpone
restore of the queues. This will be used in the further
patches to restore unix sockets sent over unix sockets.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
After previous patch, master and slave ends of socketpair
are owned by the only task. So, we may avoid using
of send_desc_to_peer() of the second end, and just
reopen it with right pid.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
We're going to split interconnected pair restore
on two stages. Since we need the second end
to restore message queue in (future) post open,
we add it to the process, who is owner of the first
end.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since new file engine was introduced, we don't care
which particular pid should be master or slave.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This adds a new argument and changes debug print
(it will be used for any fle, not only for scm).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This functional allows to make a fle a master on
the time of collection. We will use it to add fake
files when we need to do this after add_fake_fds_masters().
This will be used to add second end of socketpair as
a fake fle (as the first end is placed in the right
place, we will force add the second end there).
See next patches.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since this function is used by standalone sockets only,
we move it to appropriate place. No functional changes.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Since epoll restore is split in two parts,
epoll_create() does not depend on another
files state. Since epoll is created, it
can be sent to everywhere. So, there is
no circular dependences, and we allow epolls
sent over unix socket.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>