Commit graph

1725 commits

Author SHA1 Message Date
Andrew Vagin
6280625aac crtools: add ability to set list of external resources
This option is used to mark external resources on dump.

Currently it's going to be used to handle external tty-s,
but in a future it can be used to any type of resources.

We can have a few ways to restore external resources and
we will have a separate options to say how to restore each type.

For example, we can use --inherit-fd to restore external
file descriptors.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:47:56 +03:00
Andrew Vagin
2245a43393 tty: use a pair of dev and rdev to identify a terminal
We can't use only a terminal device, because we can not distinguish
two pty-s from different mounts in this case.

$ mount -t devpts -o newinstance xxx pts1
$ mount -t devpts -o newinstance xxx pts2
$ stat pts1/0
Device: 27h/39d	Inode: 3           Links: 1     Device type: 88,0
$ stat pts2/0
Device: 28h/40d	Inode: 3           Links: 1     Device type: 88,0

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-29 14:47:46 +03:00
Andrew Vagin
20592acef7 syscall: use a correct type for timer_t
timer_t is (void *) in glibc, but timer_t is (int) in kernel.
When we call system calls, we need to use timer_t from kernl.

https://github.com/xemul/criu/issues/98
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-28 13:11:04 +03:00
Dmitry Safonov
1137c3f80a kerndat: add has_loginuid to kerndat_s
This value will differ on C/R:
  - on checkpoint it means that it's possible to dump logiuid values;
  - on restore it means that it's possible to unset loginuid and write
    saved value to unsetted loginuid.

Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-28 13:09:01 +03:00
Cyrill Gorcunov
d1db4faf9b vdso: Don't fail if pagemap is not accessbile
We use page frame number to detect vDSO which has been remapped
in-place from runtime vDSO during restore. In such case if the
kernel is younger than 3.16 the "[vdso]" mark won't be reported
in procfs output.

Still to address recently reported CVEs and be able to run CRIU
in unprivileged mode we need to handle vDSO without pagemap access
and here is the deal -- when we find VMA which "looks like" vDSO
we try to scan it for vDSO symbols and if it matches we restore
its status without PFN access.

Here is some details on @pagemap access in-kernel history:

 - @pagemap introduced in commit 85863e475e59 where anyone
   which can attach to a task via ptrace is allowed to read
   data from @pagemap (Feb 4 2008, v2.6.25-rc1)

 - in commit 006ebb40d3d65 ptrace attach rule has been changed
   into ptrace read permission (May 19 2008, v2.6.27-rc1)

 - in commit ab676b7d6fbf4 opening of @pagemap become guarded
   with CAP_SYS_ADMIN because of leak of physical addresses
   into userspace (Mar 9 2015, v4.0-rc5)

 - in commit 1c90308e7a77a opening of @pagemap become available
   for regular users again (with ptrace read permission) but
   physical addresses of pages are hidden from non-privileged
   userd (Sep 8 2015, v4.3-rc1)

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Looks-good-to-me: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 14:40:05 +03:00
Pavel Emelyanov
df1729e3ec util: Ability to ignore errno when opening proc
When run from regular user criu will get EACCES/EPERM from
opening proc, but in some situations criu will now how to
deal with it. So this patch makes it possible not to print
error message in logs for such cases.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Looks-good-to-me: Andrew Vagin <avagin@virtuozzo.com>
2015-12-24 14:40:02 +03:00
Pavel Emelyanov
6c22bfe216 criu: Remove security
We no longer support root-mode service and suid binaries, so
any artificial restrictions no longer make sense.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Looks-good-to-me: Andrew Vagin <avagin@virtuozzo.com>
2015-12-24 14:39:58 +03:00
Cyrill Gorcunov
e9fc593cde creds: dump -- Implement per-thread dump of credentials
This as well as restore requires several steps to reach per-thread
support during dump stage

 - @creds area to be fetched from the parasite is embedded into
   parasite_dump_structure

 - when test for task to be dumpable we no longer compare caps
   because we now allow them to be different (and I renamed
   proc_status_creds_eq to proc_status_creds_dumpable for this
   sake)

 - have to extend dump_thread_common to support dumping of
   creds (we call for dump_thread_common in several places,
   in particular when we need to fetch misc params we don't
   need creds, here @creds option comes into the play)

 - after this patch no creds-X.img file be generated anymore,
   I guess we might drop it off with time from descriptors

https://jira.sw.ru/browse/PSBM-41416

v2:
 - In dump_task_creds() don't mangle the call for parasite_dump_creds
   and collect_lsm_profile
 - PARASITE_MAX_GROUPS takes parasite_dump_thread into account because
   dump_thread_common now serves two cases: for plain misc parameters
   fetching and for creds as well (depending on the context)
 - when test for dumpable we still require the seccomp filters
   to match, they can be different and we need to support such
   configuration too but not in this series

v3:
 - Rip off dump_task_creds completely, together with PARASITE_CMD_DUMP_CREDS,
   we dump creds unconditionally in dump_thread_common
 - the group leader thread data is fetched via new
   parasite_dump_thread_leader_seized helper

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:21:50 +03:00
Cyrill Gorcunov
eb5d84428e creds: restore -- Implement per-thread restore of credentials
Because the creds parameters are to be passed inside pie/restorer
code but read before thread_restore_args and task_restore_args
structures are allocated we need a small trick and prepare
creds int several stages

 - collect all creds data into separate private memory blobs
 - once all memory needed for restorer is allocated we relocate
   pointers in this blocks and setup
   thread_restore_args::thread_creds_args to appropriate
   address
 - restorer works as usual and setup creds parameters as before

v2:
 - fix addressing in positioning of rst_ memory (I've occasionally
   zap pointers and when been sending patches forgot to merge changes
   back, so while I've the series successfully restoring containers
   with different creds, if been merged the series won't work. So
   all changes are merged as appropriate)

 - drop module's global @cap_last_cap from pie/restorer.c

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:20:58 +03:00
Cyrill Gorcunov
212e210552 creds: Move proc_status_creds::cap_X at the end of structure
For easier comparision which gonna be addressed in next patch.

https://jira.sw.ru/PSBM-41416

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:18:39 +03:00
Cyrill Gorcunov
767e3e994e xmalloc: Add xmemdup helper
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-24 13:18:05 +03:00
Kirill Tkhai
36c4cba986 binfmt_misc: Skip dumping if it's not virtual
Similar to devtmpfs and devpts, skip binfmt_misc
mount if it's not virtual.

Signed-off-by: Kirill Tkhai <ktkhai@odin.com>
Acked-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-23 15:44:45 +03:00
Andrey Ryabinin
d0ff73077d dump: add timeout for collecting processes
Currently criu dump may hang indefinitely. E.g. in wait for task
that blocked in vfork() or task could be in D state for some other
reason. This patch adds time limit on collecting tasks during the
dump operation. If collecting processes takes too long, the dump
process will be terminated. Timeout is 5 seconds by default, but
it could be changed via parameter.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-21 12:00:49 +03:00
Stanislav Kinsburskiy
16fd19895c util: new string helpers introduced
This patch brings add_to_string() and construct_string() helpers.
They allow to create a string with variable amount of parameters in sprintf()
manner, but supporting string allocation (and reallocation if necessary)

v2:
1) Helpers were renamed to xstrcat() and xsprintf() respectively.
2) Added printf attributes to force compiler check

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-21 11:57:00 +03:00
Evgeniy Akimov
8b04551c48 restore: restore freezer cgroup state
Patch restores freezer cgroup state between finalize_restore stages.
It should be done after first stage because we cannot unmap restorer blob
from frozen process, and before second stage because we must freeze processes
before they continue run.
We also need to move fini_cgroup between these stages to provide freezer
cgroup state restorer access to cgroup mount directories.
Error handlers contains fini_cgroup, so we are sure that fini_cgroup call
won't be missed.

Patch restores state only for one freezer cgroup from --freeze-cgroup option,
not all states from whole hierarchy, because CRIU supports checkpoint from
freezer cgroup hierarchy only with THAWED state, except root cgroup from
--freeze-cgroup option.

Signed-off-by: Evgeniy Akimov <geka666@gmail.com>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-16 18:17:04 +03:00
Evgeniy Akimov
34662a68c8 cgroups: save freezer state during dump
CRIU sets freezer.state to "THAWED" during process tree dumping. That's why
we can't simply save freezer.state file contents to cgroups image. New
special function get_real_freezer_state() returns freezer cgroup state
observed before CRIU dumping start. Patch puts its return value to dump file.

Signed-off-by: Evgeniy Akimov <geka666@gmail.com>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Acked-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-16 18:17:01 +03:00
Dmitry Safonov
e5c99983a4 criu: dump loginuid & oom_score_adj values
https://jira.sw.ru/browse/PSBM-41993

Signed-off-by: Dmitry Safonov <dsafonov@odin.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-16 18:08:58 +03:00
Rodrigo Bruno
f993926f5b Rename cr_opts.ps_port into port
Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-15 14:00:09 +03:00
Rodrigo Bruno
91b689a3a4 Introduce the read_into_buffer helper
This will be required for page-cache and page-proxy set.

Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-15 13:58:03 +03:00
Andrew Vagin
391efc9d7e seize: don't wory if a cgroup contains some extra tasks (v3)
A freezer cgroup can contain tasks which will be not dumped,
criu unfreezes the group, so we need to freeze all extra
task with ptrace like we do for target tasks.

Currently we attache and send an interrupt signals to these tasks,
but we don't call waitpid() for them, so then waitpid(-1, ...)
returns these tasks where we don't expect to see them.

v2: execute freezer_detach() only if opts.freeze_cgroup is set
    calculate extra tasks in a freezer cgroup correctly
v3: s/frozen_processes/processes_to_wait/

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-14 15:55:30 +03:00
Stanislav Kinsburskiy
8e863a94c7 fstype: "mount" callback introduced
It will be used to mount AutoFS, because context creation is required in
addition to actual mount operation.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-14 14:04:32 +03:00
Stanislav Kinsburskiy
1617579a27 pstree: more pstree-related helpers
This patch introduces three helpers:
1) pstree_item_by_real() - search for pstree item by real pid.
2) pstree_item_by_virt() - search for pstree item by virtual pid.
3) pid_to_virt() - return virtual pis by real one.

Note: pstree_item_by_virt() and pid_to_virt() will be used to migrate AutoFS.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-14 14:04:21 +03:00
Andrew Vagin
af55c059fb mount: fix a race between restoring namespaces and file mappings (v2)
Currently we wait when a namespace will be restored to get its root.
We need to open a namespace root to open a file to restore a memory mapping.

A process restores mappings and only then forks children. So we can have
a situation, when we need to open a file from a namespace, which will be
"restored" by one of our children.

The root task restores all mount namespaces and opens a file descriptor
for each of them. In this patch we open root for each mntns in the root
task.

If we neeed to get root of a namespace which isn't populated, we can get
it from the root task. After the CR_STATE_FORKING stage, the root task
closes all namespace descriptors ane we know that all namespaces are
populated at this moment.

v2: don't close root_fd for root ns, because it was not opened
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-10 14:58:59 +03:00
Fyodor
78a521163f pagemap-cache: add const-qualifier to pmc's vma
We need to perform dirty page tracking when dumping shmem but there
we have only const vmas so we need pmc to work with them. Also pmc concept
implies that it won't change its vmas so it would be natural to declared
them as const.

Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru>
Signed-off-by: Eugene Batalov <eabatalov89@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:45:44 +03:00
Tycho Andersen
6af96c8404 lsm: add a --lsm-profile flag
In LXD, we use the container name in the LSM profile. If the container name
is changed on migrate (on the host side), we want to use a different LSM
profile name (a. la. --cgroup-root). This flag adds that support.

v2: remove unused field, add comment about double detection in
    kerndat_lsm()

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 15:07:26 +03:00
Cyrill Gorcunov
6217a84ae3 mnt: Carry run-time device ID in mount_info
When we're restoring fsnotify watchees we need to resolve
path to a handle at some mountpoint referred by @s_dev
member (device ID) which is saved inside image. This
ID actually may be changed at the every mount (say
one restores container after machine reboot) or in
case of container's migration.

Thus the test for overmounting in __open_mountpoint
will fail and we get an error.

Lets do a trick: introduce @s_dev_rt member which
is supposed to carry run-time device ID. When dumping
this member simply equal to traditional @s_dev fetched
from the procfs, but when restoring we fetch it from
stat call once mountpoint become alive.

https://jira.sw.ru/browse/PSBM-41610

v2:
 - predefine MOUNT_INVALID_DEV
 - use fetch_rt_stat instead of assigning device in restore_shared_options
 - copy @s_dev_rt in propagate_siblings and propagate_mount

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:58:32 +03:00
Kirill Tkhai
de8fd000d0 fs: Add binfmt_misc support
This patch implements checkpoint/restore functionality
for binfmt_misc mounts. Both magic and extension types
and "disabled" state are supported.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:52:26 +03:00
Pavel Emelyanov
a90d01a078 service: Remove systemd startup mode
Due to security reasons the systemd-spawn mode is no longer
supported in service.

Also fix the default binding address to be in local cwd not
to start global service by chance.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-07 11:28:49 +03:00
Andrew Vagin
fde1116fee proc: parse sigpnd and shdpnd separatly
We found that we want to know whether SIGSTOP is queue
in both or is in one of this queues.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-26 09:05:12 +03:00
Tycho Andersen
e6a3aef43e remap: don't allocate dead pids in wrong context
Closes #87

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-23 11:47:29 +03:00
Tycho Andersen
cc9587ffc5 seccomp: is optional when parsing /proc/pid/status
Also define some constants for people who don't have them in their headers.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-23 11:44:50 +03:00
Andrew Vagin
028998c588 proc_parse: parse pending signals
It's required to check the SIGSTOP signal, which can't be blocked.

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-20 21:13:31 +03:00
Cyrill Gorcunov
7de345d6b7 net: Move node's net fd reference into service fd
So we keep it and dont close inside close_old_fds()
helper but pass into veth creation so the kernel
can fetch the net namespace of the veth peer.

v2 (by avagin@):
 - don't forget to close opened descriptor

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-19 16:46:36 +03:00
Andrew Vagin
1e8a0594db net: dump iptables for ipv6 (v2)
v2: don't dump iptables if ipv6 isn't supported
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-19 15:19:01 +03:00
Andrew Vagin
1648db970c kerndat: check whether ipv6 is supported or not (v2)
v2: use a cached value to dump ipv6 interface addesses
    call get_ipv6() from kerndat_init_rst too

Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-19 15:18:08 +03:00
Andrew Vagin
a2780c6131 lock: futex() with timeout isn't restarted after signals (v2)
It returns EINTR, so we need to handle it.

$ bash test/zdtm.sh --restore-sibling ns/static/env00
...
futex(0x7fc20ec92010, FUTEX_WAIT, 1, {120, 0}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-19 15:15:39 +03:00
Andrew Vagin
4c00ac2908 lock: print a message if a futex is locked for more than 120 second
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-17 10:52:22 +03:00
Tycho Andersen
221af18ea0 seccomp: add support for SECCOMP_MODE_FILTER
This commit adds basic support for dumping and restoring seccomp filters
via the new ptrace interface. There are two current known limitations with
this approach:

1. This approach doesn't support restoring tasks who first do a seccomp()
   and then a setuid(); the test elaborates on this and I don't think it is
   tough to do, but it is not done yet.

2. Filters are compared via memcmp(), so two tasks which have the same
   parent task and install identical (via memory) filters will have those
   filters considered to be the "same". Since we force all tasks to have
   the same creds (including seccomp filters) right now, this isn't a
   problem.

The approach used here is very similar to the cgroup approach: the actual
filters are stored in a seccomp.img, and each task has an id that points to
the part of the filter tree it needs to restore. This keeps us from dumping
the same filter multiple times, since filters are inherited on fork.

v2:
 * remove unused seccomp_filters field from struct rst_info
 * rework memory layout for passing filters to restorer blob
 * add a sanity check when finding inherited filters

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-17 10:51:20 +03:00
Andrew Vagin
b78af1923b mount: wait when mntns will be created to get its root (v2)
v2: add comments and rename ns_created to ns_populated.

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-17 10:46:00 +03:00
Andrew Vagin
7017181849 mount: don't inherit mount namespace descriptors to each process
close_olds_fds() knows nothing about more than one set of service file
descriptros, so it's better to call it before forking children as it was
bedore 9d60724eca ("restore: restore mntns before creating private vma-s")

The root task restores all processes and pin them with file descriptors,
then a task restores a mount namespace by opening the file descriptor of
the root task via /proc/pid/fd/X.

Reported-by: Mr Jenkins
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-17 10:45:09 +03:00
Andrew Vagin
9d60724eca restore: restore mntns before creating private vma-s (v3)
We need to open a file to restore a file mapping and this file
can be from a current mntns.

v2: All namespaces are resotred from the root task and then
other tasks calls setns() to set a proper mntns.

v3: fix comments from Pavel
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-14 09:53:47 +03:00
Pavel Emelyanov
dc00fea333 net: Dont print error in rule save
This thing is new and can be absent in ip tool, which is OK
and is handled by net.c code itself.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 16:31:21 +03:00
Pavel Emelyanov
18d9170858 util: Add flags to cr_system
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 16:31:19 +03:00
Cyrill Gorcunov
ee2409ec37 compiler: Grab min_t, max_t from the kernel
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 14:57:00 +03:00
Cyrill Gorcunov
ba475b8dcf bitmap -- Add few helpers for bits manipulations
Grabbed from kernel. Probably worth to gather
all bits manipulators here in future.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 11:15:02 +03:00
Pavel Emelyanov
780d699401 page-read: Teach page-read to read multiple pages at once
This is preparatory patch, the problem to solve is described in
the next one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 11:14:43 +03:00
Tycho Andersen
8a95be0679 net: allow c/r of empty bridges in the container
Implementing c/r of bridges with slaves shouldn't be too hard (viz. the
comment), but this is all I need to for right now.

v2: remove extra debug statement
v3: * remember to close fd in dump_bridge
    * use "known" buffer length and snprintf for spath in dump_bridge
    * change brace style

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-12 10:31:58 +03:00
Pavel Emelyanov
a20ed3c6f0 page-server: Fine grained corking control (v3)
When live migrating a container with large amount of processes
inside the time to do page-server-ed dump may be up to 10 times
slower than for the local dump.

The delay is always introduced in the open_page_server_xfer()
when criu negotiates the has_parent bit on the 2nd task. This
likely happens because of the Nagel algo taking place -- after
the write() of the OPEN2 command happened kernel delays this
command sending waiting for more data.

v2:
Fix this by turning on CORK option on memory transfer sockets
on send side, and NODELAY one once on urgent data. Receive
side is always NODELAY-ed. According to Alexey Kuznetsov this
is the best mode ever for such type of transfers.

v3:
Push packets in pre-dump's check_parent_server_xfer too.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@odin.com>
2015-11-10 16:00:25 +03:00
Pavel Emelyanov
d6d06c9dfc Open proc links with O_PATH
These three are like map_files one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2015-11-10 15:58:36 +03:00
Cyrill Gorcunov
049a7c828a userns: Wrap call with a macro fore readability
Pass function name into a helper instead of pointer
wich doesn't provide much useful info.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-05 15:29:04 +03:00