Mirrors/criu

mirror of https://github.com/checkpoint-restore/criu.git synced 2026-01-23 02:14:37 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	8ce37e676a	img: Don't create empty images Currently on dump we generate too many image files, effectively all the stuff from the GLOB set is created. The thing is that sometimes some of created images can be empty (just contain the magic number at the head). Thos images are useless and just waste the space. When applied after the "empty images" set, this introduces the lazy images -- when we call open_image() the actual file is only created (and the magic number is written into it) when the very first object goes into it. For example for the simplest test we have, then static/env00 one, the created image files are core-7290.img creds-7290.img fdinfo-2.img fs-7290.img ids-7290.img inventory.img mm-7290.img pagemap-7290.img pages-1.img pstree.img reg-files.img sigacts-7290.img Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-16 15:58:32 +03:00
Pavel Emelyanov	7ede4697cf	bfd: Don't leak image-open flags into bfdopen Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-16 15:58:14 +03:00
Pavel Emelyanov	f7f76d6ba6	img: Introduce empty images When an image of a certian type is not found, CRIU sometimes fails, sometimes ignores this fact. I propose to ignore this fact always and treat absent images and those containing no objects inside (i.e. -- empty). If the latter code flow will _need_ objects, then criu will fail later. Why object will be explicitly required? For example, due to restoring code reading the image with pb_read_one, w/o the _eof suffix thus required the object to be in the image. Another example is objects dependencies. E.g. fdinfo objects require various files objects. So missing image files will result in non-resolved searches later. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-13 14:42:54 +03:00
Pavel Emelyanov	45a0cc4234	page-read: Explicitly mark ENOENT with return code When page-read fails to open the pagemap image it reports error. One place (stacked page-reads) need to handle the absent images case gracefully, so fix the return codes to make this check work. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-13 14:42:11 +03:00
Pavel Emelyanov	e29c9daec2	img: Remove O_OPT and COLLECT_OPTIONAL Current code doesn't make any difference between OPT and no-OPT except for the message is printed or not in the open_image(). So this particular change changes nothing but the availability of this message. In the next patches I wil introduce "empty images" to deal with the ENOENT situation in a more graceful manner. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-13 14:42:01 +03:00
Cyrill Gorcunov	19948472d9	tty: Rename tty_type to tty_driver There are too many "type" in code. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-10 21:16:22 +03:00
Cyrill Gorcunov	652fbf3bd1	tty: Drop redundant constants Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-10 21:16:10 +03:00
Pavel Emelyanov	f32f4ffa76	img: Open images for dump in O_WRONLY mode Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-09 22:21:15 +03:00
Pavel Emelyanov	618c17b6f8	img: Simplify the open_image() macro Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-09 22:21:08 +03:00
Pavel Emelyanov	dceb6633c7	page-read: Introduce custom flags for opening Instead of open flags and boolean is_shmem argument. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-04 17:50:32 +03:00
Cyrill Gorcunov	3bd6d9d7b0	image: Add comments about VMA_AREA constants and drop FORCE_READ flag Force-read came from very first dev version of CRIU (even before 1.0 release) and never been used actually in image. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-04 17:48:47 +03:00
Pavel Emelyanov	057f00ce92	tty: Make tty type be object rather than integer The plan is to replace tons of if (type == TTY_TYPE_FOO) checks with type->something dereferences. To do this, start with replacing int type with struct tty_type * in relevant places and fixing compilation. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-04 17:47:04 +03:00
Pavel Emelyanov	a7601d6a50	tty: Move tty_type() and is_pty() to tty.c Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-03-04 17:46:16 +03:00
Cyrill Gorcunov	bec5a023d1	tty: Fix mistyping of /dev/tty /dev/tty stands for current terminal which we don't yet implemented a support for. This is a bugfix for upcoming stable version, the proper support of /dev/tty is gonna be implemented separately. Reported-by: Saied Kazemi <saied@google.com> CC: Andrew Vagin <avagin@parallels.com> CC: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-20 00:11:38 +03:00
Saied Kazemi	1b4e9058e8	Do not call listen() when SO_REUSEADDR is off For an established TCP connection, the send queue is restored in two steps: in step (1), we retransmit the data that was sent before but not yet acknowledged, and in step (2), we transmit the data that was never sent outside before. The TCP_REPAIR option is disabled before step (2) and re-enabled after step (2) (without this patch). If the amount of data to be sent in step (2) is large, the TCP_REPAIR flag on the socket can remain off for some time (O(milliseconds)). If a listen() is called on another socket bound to the same port during this time window, it fails. This is because -- turning TCP_REPAIR off clears the SO_REUSEADDR flag on the socket. This patch adds a mutex (reuseaddr_lock) per port number, so that a listen() on a port number does not happen while SO_REUSEADDR for another socket on the same port is off. Thanks to Amey Deshpande <ameyd@google.com> for debugging. Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-16 13:18:32 +03:00
Andrey Vagin	3f23bde548	criu: print correct errno messages from pr_perror() "%m" can't be used to print strerror(errno), because print_on_level() calls gettimeofday() which can overwrite errno. For example: 13486 connect(4, {sa_family=AF_INET, sin_port=htons(8880), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENETUNREACH (Network is unreachable) 13486 gettimeofday({1423756664, 717423}, NULL) = 0 13486 open("/etc/localtime", O_RDONLY\|O_CLOEXEC) = -1 EACCES (Permission denied) 13486 write(2, "15:57:44.717: 4: ERR: socket_udp.c:73: Can't connect (errno = 101 (Permission denied))\n", 91) = 91 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-13 15:14:44 +03:00
Pavel Emelyanov	9a392dff3a	reg-files: Do not try to linkat with wrong user We link files to each other at restore time to restore unlinked paths. Kernel has strange secutiry restrictions about linkat we use. If the fsuid of the caller doesn't equals the uid of the file and the file is not "safe" one, then only global CAP_CHOWN will be allowed to link(). This brings problems in user namespaces -- uns root is not allowed to linkat any file, unlike global root. Fortunately, we can change the fsuid temporarily and still linkat the file we want. Hopefully this hack will go away some day soon, when the kernel will have saner checks for linkat capabilities. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2015-02-13 16:11:38 +04:00
Pavel Emelyanov	b8556e8084	usernsd: The way to restore priviledged stuff in userns We have collected a good set of calls that cannot be done inside user namespaces, but we need to [1]. Some of them has already being addressed, like prctl mm bits restore, but some are not. I'm pretty sceptical about the ability to relax the security checks on quite a lot of them (e.g. open-by-handle is indeed a very dangerous operation if allowed to unpriviledged user), so we need some way to call those things even in user namespaces. The good news about it its that all the calls I've found operate on file descriptors this way or another. So if we had a process, that lived outside of user namespace, we could ask one to do the high priority operation we need and exchange the affected file descriptor via unix socket. So the usernsd is the one doing exactly this. It starts before we create the user namespace and accepts requests via unix socket. Clients (the processes we restore) send him the functions they want to call, the descriptor they want to operate on and the arguments blob. Optionally, they can request some file descriptor back after the call. In non usernamespace case the daemon is not started and the calls are done right in the requestor's process environment. In the next patch there's an example of how to use this daemon to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on a socket. [1] http://criu.org/UserNamespace Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@openvz.org>	2015-02-13 16:11:38 +04:00
Ruslan Kuprieiev	09c3f5d0c7	security: add cr_fchown Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-10 16:54:31 +03:00
Ruslan Kuprieiev	df301b7eb7	security: create separate security.h header Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-02-10 16:53:54 +03:00
Pavel Emelyanov	1bbc994ccf	sysctl: Remove dead CTL_PRINT\|_SHOW code Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-27 16:18:27 +03:00
Andrey Vagin	4dbc3f093a	sockets: define NETLINK_SOCK_DIAG in sockets.h sockets.c: In function ‘preload_socket_modules’: sockets.c:153:36: error: ‘NETLINK_SOCK_DIAG’ undeclared (first use in this function) sockets.c:153:36: note: each undeclared identifier is reported only once for each function it appears in Reported-by: Mr Travis Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-23 15:40:02 +03:00
Pavel Emelyanov	0749ef23e9	check/zdtm: Introduce fine-grained feature testing Right now we state that CRIU works on 3.11 and above kernels and, at the same time, have support for a couple of new features like aio, tun, timerfd etc. available in later kernels. Since these new features do not break generic operations we do not require them in the kernel strictly. However, in the zdtm tests it's very important to know exactly what can and what cannot be tested. Right now this is done in a tough manner -- if the kernel is not 3.11 or criu check fails for _any_ reason we treat the kernel as being "bad" and throw out a set of tests. I propose to test some individual features and form the list of tests in a more fine-grained manner. This patch only fixes the AIO, mnt_id, tun and posix-timers tests. Next I will add checks and fixes for user-namespaces tests. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2015-01-22 18:55:34 +03:00
Pavel Emelyanov	674df19a34	nlk: Add error callback to do_rtnl_req In the next patch we will need to care about the exact error reported by the kernel, so add the error callback for this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-22 18:54:37 +03:00
Saied Kazemi	296129295a	Allow the veth-pair option to specify a bridge When restoring a pair of veth devices that had one end inside a namespace or container and the other end outside, CRIU creates a new veth pair, puts one end in the namespace/container, and names the other end from what's specified in the --veth-pair IN=OUT command line option. This patch allows for appending a bridge name to the OUT string in the form of OUT@<BRIDGE-NAME> in order for CRIU to move the outside veth to the named bridge. For example, --veth-pair eth0=veth1@br0 tells CRIU to name the peer of eth0 veth1 and move it to bridge br0. This is a simple and handy extension of the --veth-pair option that obviates the need for an action script although one can still do the same (and possibly more) if they prefer to use action scripts. Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-12 14:54:18 +03:00
Pavel Emelyanov	a1b1959dd1	shmem: Turn shmem-info into shared objects from shremap ones We have a nasty issue with it. Current code allocates these entries in shremap area one by one. We do NOT allocate any OTHER entries in this region, but if we will this array will be spoiled. Fortunately we no longer need shmem-infos as plain array, neither we need one in restorer. So just turn this into plain shared objects and collect them in a list. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-12 14:47:24 +03:00
Pavel Emelyanov	b246ccb181	shmem: Move some code to shmem.c file The struct and find routine used to be use by restorer code. Now the former fully uses vmas and fd opened, so we can move the code into .c file not to spoil global namespace. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-12 14:47:17 +03:00
Pavel Emelyanov	455f9b564e	fd: Factor out inheriting FDs code We have two places where we lookup the inherited-fd list by name and dup() the descriptor found. I propose to factor out this piece in a single inherited_fd() call. When we will want to support inheritance for sockets or any other files we'll simply add the inherited_fd() call there. I'm also thinking about moving the call to inherited_fd into generic level, but the open_path() routine doesn't allow to do it in a simple manner. Also we have not yet finished issue with files-vs-inodes mapping. Keeping all the logic in one function should make the solution simpler. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-12 14:46:51 +03:00
Pavel Emelyanov	8f691c40d5	fd: Mark inherit_fd_lookup_fd static Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2015-01-12 14:46:42 +03:00
Cyrill Gorcunov	fd07bc7791	cpu: Add 'ins' mode to --cpu-cap option In this mode we test if target cpu has all features present in image file but do not require bit to bit match: target cpu may be a new one with more features present. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:15:46 +03:00
Pavel Emelyanov	2694a74a00	aio: Restore AIO contexts Restoring AIO is quite simple. Once all VMAs are put in their places we can call io_setup() to let kernel create the context back and then move the ring into proper place. Another thing we should "restore" is the context ID. But the thing is, upon ring creation kernel repots the ring start address as this ID. And there's a patch in the -next tree that changes the ID when we remap the ring. That said after AIO context creation and ring remap we need to check that the new ID is seen by the kernel. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:13:40 +03:00
Pavel Emelyanov	08c204820f	aio: Dump AIO rings When AIO context is set up kernel does two things: 1. creates an in-kernel aioctx object 2. maps a ring into process memory The 2nd thing gives us all the needed information about how the AIO was set up. So, in order to dump one we need to pick the ring in memory and get all the information we need from it. One thing to note -- we cannot dump tasks if there are any AIO requests pending. So we also need to go to parasite and check the ring to be empty. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:13:36 +03:00
Pavel Emelyanov	80cf042695	x86: Add io syscalls Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-26 18:13:33 +03:00
Pavel Emelyanov	6a6cdb8d4a	proc: Drop always true last argument of parse_smaps() Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-12-22 13:52:03 +03:00
Ruslan Kuprieiev	b30940eee2	cr_errno: move cr_err helpers into cr_errno.h Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-22 13:50:45 +03:00
Ruslan Kuprieiev	1ace257022	tty: add vt support, v2 /dev/ttyN are the virtual terminals which are provided by the system with major 4 and minor 1..63. You can run some program on ttyN by pressing alt+ctrl+FN and running it manualy or by using open(openvt nowadays). This patch also allows us to run all our tests from a vt. v2, style fix + using linux/vt.h for constants Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-22 13:48:31 +03:00
Ruslan Kuprieiev	8eaf0142ab	cr-service: set cr_errno to EBADRQC if set_opts_from_req fails Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-19 18:59:28 +03:00
Ruslan Kuprieiev	e76749b790	cr-restore: set cr_error to EEXIST if such pid already exists, v3 This is a very common error when using criu. The problem here is that we need to somehow transfer cr_errno from one process to another. I suggest using pipe to give one end to children and read cr_errno on other after restore is finished. v2, Pavel suggested putting errno into shared task_entries. v3. and he also suggested using cmpxchg Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-19 18:59:17 +03:00
Ruslan Kuprieiev	b09a88b5f9	util: set cr_errno to ESRCH if no PID dir in proc Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-19 18:59:14 +03:00
Ruslan Kuprieiev	ef283e505c	cr-errno: initial commit Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-19 18:58:46 +03:00
Saied Kazemi	0412152fc5	Add inherit fd support There are cases where a process's file descriptor cannot be restored from the checkpoint images. For example, a pipe file descriptor with one end in the checkpointed process and the other end in a separate process (that was not part of the checkpointed process tree) cannot be restored because after checkpoint the pipe will be broken. There are also cases where the user wants to use a new file during restore instead of the original file at checkpoint time. For example, the user wants to change the log file of a process from /path/to/oldlog to /path/to/newlog. In these cases, criu's caller should set up a new file descriptor to be inherited by the restored process and specify the file descriptor with the --inherit-fd command line option. The argument of --inherit-fd has the format fd[%d]:%s, where %d tells criu which of its own file descriptors to use for restoring the file identified by %s. As a debugging aid, if the argument has the format debug[%d]:%s, it tells criu to write out the string after colon to the file descriptor %d. This can be used, for example, as an easy way to leave a "restore marker" in the output stream of the process. It's important to note that inherit fd support breaks applications that depend on the state of the file descriptor being inherited. So, consider inherit fd only for specific use cases that you know for sure won't break the application. For examples please visit http://criu.org/Category:HOWTO. v2: Added a check in send_fd_to_self() to avoid closing an inherit fd. Also, as an extra measure of caution, added checks in the inherit fd look up functions to make sure that the inherit fd hasn't been reused. The patch also includes minor cosmetic changes. Signed-off-by: Saied Kazemi <saied@google.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-10 12:48:30 +03:00
Andrey Vagin	4bca68ba49	tcp: don't split packets for restoring a send queue The kernel can do it better. The problem exists only for recv queues. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-12-08 15:46:44 +03:00
Andrey Vagin	71a0b5dc31	mem: check existence of parent images before dumping pages (v2) When we are doing pre-dump, we splice pages in pipes and only then open images and dump pages. But when we are splicing pages, we need to know about existence of parent images. This patch adds a new call to determin existence of parent images. In addition this patch fixes a following issue: CID 83244 (#1 of 1): Uninitialized pointer read (UNINIT) 14. uninit_use: Using uninitialized value xfer.parent. v2: initialize unused field of struct page_server_iov, because it sends in network. CID 83451 (#1 of 1): Uninitialized scalar variable (UNINIT) 2. uninit_use_in_call: Using uninitialized value pi. Field pi.nr_pages is uninitialized when calling write. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-29 19:32:40 +03:00
Pavel Emelyanov	69bffe26d3	kerndat: Make fs-virtualized check report yes/no Right now it returns the whole struct stat which is excessive. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:15:09 +04:00
Pavel Emelyanov	19a76494a9	kerndat: Collect all global variables on one struct Not to spoil the global namespace and unify the kerndat data names. Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:14:53 +04:00
Pavel Emelyanov	f33908a897	ns: Rename "created" futex and comment what it is Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:11:58 +04:00
Pavel Emelyanov	ee2e8e5bb9	parasite: Cleanup args size fetching Right now we push all the auxiliary arguments to parasite_infect_seized while 2 of them are only required to calculate the size of args area. Let's better keep track of required args size and get rid of excessive arguments to parasite_infect_seized(). Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-11 20:11:34 +04:00
Pavel Emelyanov	1cad9b1049	util: Fix the ispathsub corner case ispathsub("/foo", "/") reports false. This is a corner case, as 2nd argument is not expected to end with /. Fix this and add comment about ispathsub() arguments assumptions. Reported-by: Andrey Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-09 23:26:56 +04:00
Pavel Emelyanov	32f58742ca	mnt: Introduce and use issubpath helper When we validate the mount tree not to have overmounts we need to check one path to be the sub-path of another. Here's a helper for this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Andrew Vagin <avagin@parallels.com>	2014-11-07 17:39:23 +04:00
Andrey Vagin	cb2f9223a0	dump: dump user namespaces (v2) For that we need to save per-namespace mappings of user and group IDs. And all id-s for tasks and files are saved from the target user namespace. v2: move code into collect_namespaces() Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>	2014-11-07 17:16:16 +04:00

... 4 5 6 7 8 ...

1725 commits