Mirrors/criu

mirror of https://github.com/checkpoint-restore/criu.git synced 2026-01-23 10:16:41 +00:00

Author	SHA1	Message	Date
Dmitry Safonov	e5c99983a4	criu: dump loginuid & oom_score_adj values https://jira.sw.ru/browse/PSBM-41993 Signed-off-by: Dmitry Safonov <dsafonov@odin.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-16 18:08:58 +03:00
Rodrigo Bruno	f993926f5b	Rename cr_opts.ps_port into port Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-15 14:00:09 +03:00
Rodrigo Bruno	91b689a3a4	Introduce the read_into_buffer helper This will be required for page-cache and page-proxy set. Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-15 13:58:03 +03:00
Andrew Vagin	391efc9d7e	seize: don't wory if a cgroup contains some extra tasks (v3) A freezer cgroup can contain tasks which will be not dumped, criu unfreezes the group, so we need to freeze all extra task with ptrace like we do for target tasks. Currently we attache and send an interrupt signals to these tasks, but we don't call waitpid() for them, so then waitpid(-1, ...) returns these tasks where we don't expect to see them. v2: execute freezer_detach() only if opts.freeze_cgroup is set calculate extra tasks in a freezer cgroup correctly v3: s/frozen_processes/processes_to_wait/ Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-14 15:55:30 +03:00
Stanislav Kinsburskiy	8e863a94c7	fstype: "mount" callback introduced It will be used to mount AutoFS, because context creation is required in addition to actual mount operation. Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-14 14:04:32 +03:00
Stanislav Kinsburskiy	1617579a27	pstree: more pstree-related helpers This patch introduces three helpers: 1) pstree_item_by_real() - search for pstree item by real pid. 2) pstree_item_by_virt() - search for pstree item by virtual pid. 3) pid_to_virt() - return virtual pis by real one. Note: pstree_item_by_virt() and pid_to_virt() will be used to migrate AutoFS. Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-14 14:04:21 +03:00
Andrew Vagin	af55c059fb	mount: fix a race between restoring namespaces and file mappings (v2) Currently we wait when a namespace will be restored to get its root. We need to open a namespace root to open a file to restore a memory mapping. A process restores mappings and only then forks children. So we can have a situation, when we need to open a file from a namespace, which will be "restored" by one of our children. The root task restores all mount namespaces and opens a file descriptor for each of them. In this patch we open root for each mntns in the root task. If we neeed to get root of a namespace which isn't populated, we can get it from the root task. After the CR_STATE_FORKING stage, the root task closes all namespace descriptors ane we know that all namespaces are populated at this moment. v2: don't close root_fd for root ns, because it was not opened Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-10 14:58:59 +03:00
Fyodor	78a521163f	pagemap-cache: add const-qualifier to pmc's vma We need to perform dirty page tracking when dumping shmem but there we have only const vmas so we need pmc to work with them. Also pmc concept implies that it won't change its vmas so it would be natural to declared them as const. Signed-off-by: Fyodor Bocharov <fbocharov@yandex.ru> Signed-off-by: Eugene Batalov <eabatalov89@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-08 15:45:44 +03:00
Tycho Andersen	6af96c8404	lsm: add a --lsm-profile flag In LXD, we use the container name in the LSM profile. If the container name is changed on migrate (on the host side), we want to use a different LSM profile name (a. la. --cgroup-root). This flag adds that support. v2: remove unused field, add comment about double detection in kerndat_lsm() Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-08 15:07:26 +03:00
Cyrill Gorcunov	6217a84ae3	mnt: Carry run-time device ID in mount_info When we're restoring fsnotify watchees we need to resolve path to a handle at some mountpoint referred by @s_dev member (device ID) which is saved inside image. This ID actually may be changed at the every mount (say one restores container after machine reboot) or in case of container's migration. Thus the test for overmounting in __open_mountpoint will fail and we get an error. Lets do a trick: introduce @s_dev_rt member which is supposed to carry run-time device ID. When dumping this member simply equal to traditional @s_dev fetched from the procfs, but when restoring we fetch it from stat call once mountpoint become alive. https://jira.sw.ru/browse/PSBM-41610 v2: - predefine MOUNT_INVALID_DEV - use fetch_rt_stat instead of assigning device in restore_shared_options - copy @s_dev_rt in propagate_siblings and propagate_mount Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-08 14:58:32 +03:00
Kirill Tkhai	de8fd000d0	fs: Add binfmt_misc support This patch implements checkpoint/restore functionality for binfmt_misc mounts. Both magic and extension types and "disabled" state are supported. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-08 14:52:26 +03:00
Pavel Emelyanov	a90d01a078	service: Remove systemd startup mode Due to security reasons the systemd-spawn mode is no longer supported in service. Also fix the default binding address to be in local cwd not to start global service by chance. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-12-07 11:28:49 +03:00
Andrew Vagin	fde1116fee	proc: parse sigpnd and shdpnd separatly We found that we want to know whether SIGSTOP is queue in both or is in one of this queues. Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-26 09:05:12 +03:00
Tycho Andersen	e6a3aef43e	remap: don't allocate dead pids in wrong context Closes #87 Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> CC: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-23 11:47:29 +03:00
Tycho Andersen	cc9587ffc5	seccomp: is optional when parsing /proc/pid/status Also define some constants for people who don't have them in their headers. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-23 11:44:50 +03:00
Andrew Vagin	028998c588	proc_parse: parse pending signals It's required to check the SIGSTOP signal, which can't be blocked. Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-20 21:13:31 +03:00
Cyrill Gorcunov	7de345d6b7	net: Move node's net fd reference into service fd So we keep it and dont close inside close_old_fds() helper but pass into veth creation so the kernel can fetch the net namespace of the veth peer. v2 (by avagin@): - don't forget to close opened descriptor Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 16:46:36 +03:00
Andrew Vagin	1e8a0594db	net: dump iptables for ipv6 (v2) v2: don't dump iptables if ipv6 isn't supported Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 15:19:01 +03:00
Andrew Vagin	1648db970c	kerndat: check whether ipv6 is supported or not (v2) v2: use a cached value to dump ipv6 interface addesses call get_ipv6() from kerndat_init_rst too Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 15:18:08 +03:00
Andrew Vagin	a2780c6131	lock: futex() with timeout isn't restarted after signals (v2) It returns EINTR, so we need to handle it. $ bash test/zdtm.sh --restore-sibling ns/static/env00 ... futex(0x7fc20ec92010, FUTEX_WAIT, 1, {120, 0}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-19 15:15:39 +03:00
Andrew Vagin	4c00ac2908	lock: print a message if a futex is locked for more than 120 second Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:52:22 +03:00
Tycho Andersen	221af18ea0	seccomp: add support for SECCOMP_MODE_FILTER This commit adds basic support for dumping and restoring seccomp filters via the new ptrace interface. There are two current known limitations with this approach: 1. This approach doesn't support restoring tasks who first do a seccomp() and then a setuid(); the test elaborates on this and I don't think it is tough to do, but it is not done yet. 2. Filters are compared via memcmp(), so two tasks which have the same parent task and install identical (via memory) filters will have those filters considered to be the "same". Since we force all tasks to have the same creds (including seccomp filters) right now, this isn't a problem. The approach used here is very similar to the cgroup approach: the actual filters are stored in a seccomp.img, and each task has an id that points to the part of the filter tree it needs to restore. This keeps us from dumping the same filter multiple times, since filters are inherited on fork. v2: * remove unused seccomp_filters field from struct rst_info * rework memory layout for passing filters to restorer blob * add a sanity check when finding inherited filters Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:51:20 +03:00
Andrew Vagin	b78af1923b	mount: wait when mntns will be created to get its root (v2) v2: add comments and rename ns_created to ns_populated. Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:46:00 +03:00
Andrew Vagin	7017181849	mount: don't inherit mount namespace descriptors to each process close_olds_fds() knows nothing about more than one set of service file descriptros, so it's better to call it before forking children as it was bedore `9d60724eca` ("restore: restore mntns before creating private vma-s") The root task restores all processes and pin them with file descriptors, then a task restores a mount namespace by opening the file descriptor of the root task via /proc/pid/fd/X. Reported-by: Mr Jenkins Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-17 10:45:09 +03:00
Andrew Vagin	9d60724eca	restore: restore mntns before creating private vma-s (v3) We need to open a file to restore a file mapping and this file can be from a current mntns. v2: All namespaces are resotred from the root task and then other tasks calls setns() to set a proper mntns. v3: fix comments from Pavel Signed-off-by: Andrew Vagin <avagin@virtuozzo.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-14 09:53:47 +03:00
Pavel Emelyanov	dc00fea333	net: Dont print error in rule save This thing is new and can be absent in ip tool, which is OK and is handled by net.c code itself. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 16:31:21 +03:00
Pavel Emelyanov	18d9170858	util: Add flags to cr_system Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 16:31:19 +03:00
Cyrill Gorcunov	ee2409ec37	compiler: Grab min_t, max_t from the kernel Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 14:57:00 +03:00
Cyrill Gorcunov	ba475b8dcf	bitmap -- Add few helpers for bits manipulations Grabbed from kernel. Probably worth to gather all bits manipulators here in future. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 11:15:02 +03:00
Pavel Emelyanov	780d699401	page-read: Teach page-read to read multiple pages at once This is preparatory patch, the problem to solve is described in the next one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 11:14:43 +03:00
Tycho Andersen	8a95be0679	net: allow c/r of empty bridges in the container Implementing c/r of bridges with slaves shouldn't be too hard (viz. the comment), but this is all I need to for right now. v2: remove extra debug statement v3: * remember to close fd in dump_bridge * use "known" buffer length and snprintf for spath in dump_bridge * change brace style Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-12 10:31:58 +03:00
Pavel Emelyanov	a20ed3c6f0	page-server: Fine grained corking control (v3) When live migrating a container with large amount of processes inside the time to do page-server-ed dump may be up to 10 times slower than for the local dump. The delay is always introduced in the open_page_server_xfer() when criu negotiates the has_parent bit on the 2nd task. This likely happens because of the Nagel algo taking place -- after the write() of the OPEN2 command happened kernel delays this command sending waiting for more data. v2: Fix this by turning on CORK option on memory transfer sockets on send side, and NODELAY one once on urgent data. Receive side is always NODELAY-ed. According to Alexey Kuznetsov this is the best mode ever for such type of transfers. v3: Push packets in pre-dump's check_parent_server_xfer too. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@odin.com>	2015-11-10 16:00:25 +03:00
Pavel Emelyanov	d6d06c9dfc	Open proc links with O_PATH These three are like map_files one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>	2015-11-10 15:58:36 +03:00
Cyrill Gorcunov	049a7c828a	userns: Wrap call with a macro fore readability Pass function name into a helper instead of pointer wich doesn't provide much useful info. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-11-05 15:29:04 +03:00
Kirill Tkhai	c9afd17ad6	net: Add ip rule save/restore Add support for save and restore of ip rules. It uses new functionality of iproute which is already in iproute git: http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=2f4e171f7df22107b38fddcffa56c1ecb5e73359 v2: Use xstrdup() instead of strdup(). v3: Use open/close instead of helper. v4: Return -1 on empty dump. Signed-off-by: Kirill Tkhai <ktkhai@odin.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-27 22:56:33 +03:00
Andrew Vagin	1d8fcb6b94	bfd: add breadchr Reading stops after an EOF or a specified charecter. Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-27 22:51:09 +03:00
Cyrill Gorcunov	7a99e699ce	mnt: Export __open_mountpoint We gonna need it for inotify handle testing. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-21 15:08:03 +03:00
Kir Kolyshkin	5940e3d14c	xfree(): simplify Contrary to a popular opinion, there is no need to check an argument for being non-NULL before calling free(). >From free(3) man page: > > If ptr is NULL, no operation is performed. Let's change xfree macro to be a synonym for free(). Signed-off-by: Kir Kolyshkin <kir@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-21 14:58:39 +03:00
Pavel Emelyanov	68baf8e77d	criu: Fault injection core This patch(set) is inspired by similar from Andrey Vagin sent sime time earlier. The major idea is to artificially fail criu dump or restore at specific places and let zdtm tests check whether failed dump or restore resulted in anything bad. This particular patch introduces the ability to tell criu "fail at X point". Each point is specified with a integer constant and with the next patches there will appear places over the code checking for specific fail code being set and failing. Two points are introduced -- early on dump, right after loading the parasite and right after creation of the root task. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-19 12:42:29 +03:00
Cyrill Gorcunov	61859d1176	fsnotify: Filter out internal inotify bits when restoring marks The kernel prior 4.3 is exporting FS_EVENT_ON_CHILD bit via procfs fdinfo interface. This bit is kernel's internal and should not be passed in inotify_add_watch call. Thus simply filter it out when obtain from old images for backward compatibility reason. More details here https://lkml.org/lkml/2015/9/21/680 Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-14 15:51:55 +03:00
Matthew Krafczyk	29c08d8672	Add pre-dump and pre-restore action scripts This allows the user to perform actions before dumping or restoration occurs. Signed-off-by: Matthew Krafczyk <krafczyk.matthew@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-09 18:23:41 +03:00
Christopher Covington	871da9a111	pie: Give VDSO symbol table local scope In commit c2271198, Laurent Dufour kindly reunified the VDSO code that had become duplicated between architectures. Unfortunately this introduced a regression in AArch64 where apparently due to the scope of vdso_symbols array of pointers to characters changing from local to global, load-time relocations became necessary. The following thread on the GCC mailing list discusses why load-time relocations can be necessary when pointers are used, although it doesn't mention the potential for locally scoped arrays to be handled differently: https://gcc.gnu.org/ml/gcc/2004-05/msg01016.html Because the alternatives, such as porting piegen to AArch64, are far more involved, simply revert the change in scope. Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:21:16 +03:00
Christopher Covington	627f9a9e5f	aarch64: Fix write_intraprocedure_branch types In the recent VDSO code reunification, some types were changed but a pair of necessary corresponding changes was omitted. Fix that so the AArch64 build succeeds without type-related warnings-turned-errors. Also move the definition to the AArch64-specific header since it's not currently being used by any other architectures. Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:20:01 +03:00
Tycho Andersen	f79f4546cf	sysctl: move sysctl calls to usernsd When in a userns, tasks can't write to certain sysctl files: (00.009653) 1: Error (sysctl.c:142): Can't open sysctl kernel/hostname: Permission denied See inline comments for details on affected namespaces. Mostly for my own education in what is required to port something to be userns restorable, I ported the sysctl stuff. A potential concern for this patch is that copying structures with pointers around is kind of gory. I did it ad-hoc here, but it may be worth inventing some mechanisms to make it easier, although I'm not sure what exactly that would look like (potentially re-using some of the protobuf bits; I'll investigate this more if it looks helpful when doing the cgroup user namespaces port?). Another issue is that there is not a great way to return non-fd stuff in memory right now from userns_call; one of the little hacks in this code would be "simplified" if we invented a way to do this. v2: coalesce the individual struct sysctl_req requests into one big sysctl_userns_req that is in a contiguous region of memory so that we can pass it via userns_call. Hopefully nobody finds my little ascii diagram too offensive :) v3: use the fork/setns trick to change the syctl values in the right ns for IPC/UTS nses; see inline comment for details v4: only use sysctl_userns_req when actually doing a userns_call. Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:16:14 +03:00
Andrew Vagin	a973e6fcb3	net: dump ipv6 routes "ip route dump" dumps only ipv4 routes. Reported-by: Ross Boucher <boucher@gmail.com> Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-10-05 13:11:31 +03:00
Tycho Andersen	97cb181cbc	irmap: don't leak irmap objects in --irmap-scan-path v2: use struct irmap directly in irmap_path_opt Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:02:51 +03:00
Pavel Emelyanov	efa7dcf7c2	ghost: Remove ghost files if restore fails Issue #18. When restore fails ghost files remain there. And to remove them we have to know their list, paths to original files (to construct the ghost name) and the namespace ghost lives in. For the latter we keep the restore task namespace at hands till the final stage and setns into it to kill ghosts. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:00:37 +03:00
Pavel Emelyanov	a7c9f3011d	mnt: Read mount images early Mappings from mount id to namespace will be required to remove ghosts on restore failure. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:00:36 +03:00
Pavel Emelyanov	b0e23c3d4f	files: Collect ghosts and regilfes early Info about ghosts presence and paths will be needed to remove the ghosts itself and thus are needed in criu. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 22:00:35 +03:00
Pavel Emelyanov	152222a6b7	remap: Sanitize ghost file path printing First -- avoid two memory copies by printing ns root directly, and second -- remove extra argument from create_ghost, the mnt_id value we need there can be found on the ghost_file object. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-09-28 21:59:45 +03:00

1 2 3 4 5 ...

1609 commits