Commit graph

60 commits

Author SHA1 Message Date
Cyrill Gorcunov
6217a84ae3 mnt: Carry run-time device ID in mount_info
When we're restoring fsnotify watchees we need to resolve
path to a handle at some mountpoint referred by @s_dev
member (device ID) which is saved inside image. This
ID actually may be changed at the every mount (say
one restores container after machine reboot) or in
case of container's migration.

Thus the test for overmounting in __open_mountpoint
will fail and we get an error.

Lets do a trick: introduce @s_dev_rt member which
is supposed to carry run-time device ID. When dumping
this member simply equal to traditional @s_dev fetched
from the procfs, but when restoring we fetch it from
stat call once mountpoint become alive.

https://jira.sw.ru/browse/PSBM-41610

v2:
 - predefine MOUNT_INVALID_DEV
 - use fetch_rt_stat instead of assigning device in restore_shared_options
 - copy @s_dev_rt in propagate_siblings and propagate_mount

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-12-08 14:58:32 +03:00
Andrew Vagin
7017181849 mount: don't inherit mount namespace descriptors to each process
close_olds_fds() knows nothing about more than one set of service file
descriptros, so it's better to call it before forking children as it was
bedore 9d60724eca ("restore: restore mntns before creating private vma-s")

The root task restores all processes and pin them with file descriptors,
then a task restores a mount namespace by opening the file descriptor of
the root task via /proc/pid/fd/X.

Reported-by: Mr Jenkins
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-11-17 10:45:09 +03:00
Cyrill Gorcunov
7a99e699ce mnt: Export __open_mountpoint
We gonna need it for inotify handle testing.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-10-21 15:08:03 +03:00
Pavel Emelyanov
a7c9f3011d mnt: Read mount images early
Mappings from mount id to namespace will be required to
remove ghosts on restore failure.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 22:00:36 +03:00
Pavel Emelyanov
152222a6b7 remap: Sanitize ghost file path printing
First -- avoid two memory copies by printing ns root directly, and
second -- remove extra argument from create_ghost, the mnt_id value
we need there can be found on the ghost_file object.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 21:59:45 +03:00
Pavel Emelyanov
7ca6cc1eb2 mnt: Clean roots yard from criu process
So here it is. If root task dies on restore the roots yard
dir remains unrmdired :( Since we already know its name, we
can remove one from criu. By the time we get to this place
the sub mount namespace(s) are already dead and yard dir
is empty. But umounting should be done by tasks after
successfull restore, so keep depopulation there.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 21:57:35 +03:00
Pavel Emelyanov
3e7c92ed02 mnt: Renames around roots yard
Same thing as in previous patch -- we have too many generic
clean_ and fini_ prefixes over the code. And we need more (see
next patch), so let's specify what exactly we clean or fini.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 21:57:21 +03:00
Pavel Emelyanov
c5c65fe17a mnt: Create roots in criu context
In case root task restore failure we'll have to remove the
roots yard dir from criu, so we have to create one by
criu to at least have the dit name.

It's OK to do it in criu, since the yards is created in
the opts.root which is the same for any mnt ns we deal
with on restore.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-28 21:56:51 +03:00
Andrey Vagin
1174a2ad0f mount: handle mnt_flags and sb_flags separatly (v4)
They both can container the MS_READONLY flag. And in one case it will be
read-only bind-mount and in another case it will be read-only
super-block.

v2: set mnt and sb for one call of mount() when it's posiable
v3: return a comment which was deleted by mistake
v4: Fix the sentense about restoring mnt flags
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-09-21 11:55:17 +03:00
Cyrill Gorcunov
80ef8fd2fb mount: Handle deleted bindmounts
To handle deleted bindmounts we simply create
the former directory bindmount lived at, mount
the target and remove the directory back.

For this sake we add @deleted entry into the image.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-21 21:26:17 +03:00
Cyrill Gorcunov
4f8f97e0bd mount: mount_info -- Drop unused @is_file
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tycho Andersen <tycho.andersen@canonical.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-21 21:25:14 +03:00
Cyrill Gorcunov
3ae67d7b3a mount: Move mount_info and ext_mount to mount.h
It's quite unclean while this structure lives
in proc_parse.h, which only have to fill this
structure on procfs read, but real handling
is inside mount.c. Move it as appropriate.

Same time ext_mount structure should be moved
into a header as well with sane @list name
used instead of @l.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-13 17:16:47 +03:00
Cyrill Gorcunov
291e632e60 mount: Reorder declarations
- gather structs on top
 - then extern vars

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-13 17:16:05 +03:00
Gabriel Guimaraes
dbaab31f31 Workaround for the OverlayFS bug present before Kernel 4.2
This is here only to support the Linux Kernel between versions
3.18 and 4.2. After that, this workaround is not needed anymore,
but it will work properly on both a kernel with and without the bug.

The bug is that when a process has a file open in an OverlayFS directory,
the information in /proc/<pid>/fd/<fd> and /proc/<pid>/fdinfo/<fd>
is wrong, so we grab that information from the mountinfo table instead.

This is done every time fill_fdlink is called.
We first check to see if the mnt_id and st_dev numbers currently match
some entry in the mountinfo table. If so, we already have the correct mnt_id
and no fixup is needed.

Then we proceed to see if there are any overlayFS mounted directories
in the mountinfo table. If so, we concatenate the mountpoint with the
name of the file, and stat the resulting path to check if we found the
correct device id and node number. If that is the case, we update the
mount id and link variables with the correct values.

Signed-off-by: Gabriel Guimaraes <gabriellimaguimaraes@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-08-07 14:30:41 +03:00
Oleg Nesterov
e2c38245c6 introduce --enable-fs cli option
Finally add --enable-fs option to specify the comma separated list of
filesystem names which should be treated as FSTYPE_AUTO.

Note: obviously this option is not safe, use at your own risk. "dump"
will always succeed if the mntpoint is auto, but "restore" can fail or
do something wrong if mount(src, mountpoint, flags, options) can not
actually "just work" as FSTYPE_AUTO logic expects.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-10 17:35:43 +03:00
Oleg Nesterov
9fee3dc817 pass "bool for_dump" argument down to collect_mntinfo() and parse_mountinfo()
Preparation.

1. Add the new "bool for_dump" arg to collect/parse_mntinfo().

2. Introduce "struct collect_mntns_arg" to pass the additional
   "bool for_dump" field to collect_mntinfo() and change it to
   pass this boolean to collect_mntinfo()->parse_mountinfo() path.

3. Change other callers of collect_mntinfo() to pass "false".

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2015-04-03 17:55:18 +03:00
Pavel Emelyanov
5f2a7ac27b img: Rename fdset -> imgset
Since we're going to switch from int-fd-s to class-image
soon the fdset name will not fit into the new terminology.

This patch is

 sed -e 's/fdset/imgset/g' -i *
 sed -e 's/imgset_fd/img_from_set/g' -i *
 git mv include/fdset.h include/imgset.h

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
2014-09-30 21:48:10 +04:00
Pavel Emelyanov
9fd793e565 stat: Pass namespace into phys_stat_resolve_dev, not mnt tree
This makes the API simpler.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-06 10:57:27 +04:00
Pavel Emelyanov
090587e1a1 stat: Pass namespace into phys_stat_dev_match, not mnt tree
This makes the API simpler.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-06 10:57:25 +04:00
Andrey Vagin
967dba606a mount: add helper mntns_get_root_by_mnt_id
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-08-05 16:38:19 +04:00
Pavel Emelyanov
c7e0042946 crtools: Introduce the --ext-mount-map option (v3)
On dump one uses one or more --ext-mount-map option with A:B arguments.
A denotes a mountpoint (as seen from the target mount namespace) criu
dumps and B is the string that will be written into the image file
instead of the mountpoint's root.

On restore one uses the same --ext-mount-map option(s) with similar
A:B arguments, but this time criu treats A as string from the image's
root field (foobar in the example above) and B as the path in criu's
mount namespace the should be bind mounted into the mountpoint.

v3:
* Added documentation
* Added RPC bits
* Changed option name into --ext-mount-map
* Use colon as key and value separator

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-06-17 10:36:30 +04:00
Pavel Emelyanov
68e2841a9b mnt: Turn mntns_get_root_fd into accepting mnt ns_id
The only exception (for now) is the irmap -- it should
operate on ns as well.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 02:31:16 +04:00
Pavel Emelyanov
1435617c40 mnt: Rename _collect_root into _get_root_fd
Nowadays this routine is mainly used for getting an
fd, rather than keeping one for future reference.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-23 01:38:58 +04:00
Pavel Emelyanov
79f3e90856 rst: Less arguments to restore_task_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:46 +04:00
Pavel Emelyanov
8550f52017 mnt: Move local mntns collecting on restore into prepare_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:43 +04:00
Pavel Emelyanov
f4b7a6fedd mnt: Mark rst_collect_local_mntns as void
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:38 +04:00
Pavel Emelyanov
88eef43e41 mnt: Mark dump_mnt_ns as static
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:33 +04:00
Pavel Emelyanov
4ffa79695d mnt: Remove unneeded argument from prepare_mnt_ns
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-22 23:48:23 +04:00
Andrey Vagin
2f4be997b6 mount: use per-namespace mntinfo_tree (v2)
This patch removes the global mntinfo_tree and collect_mount_info where
it was constructed. The mntinfo list is filled from dump_mnt_ns,
rst_collect_local_mntns, collect_mnt_namespaces and read_mnt_ns_img.

A mountinfo entry contains a reference on a proper ns_id entry, so
we cau use mnt_id to look up a proper mount namespace.

v2: remove trash after rebasing.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:19 +04:00
Andrey Vagin
fb3ce0fbeb mount: prepare to work without mnt_id
Kernels before 3.15 doesn't show mnt_id and mnt_id isn't saved in
images, if mntns isn't dumped.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:10 +04:00
Andrey Vagin
b6d3314c54 check: collect mounts of the current mntns
They are used for collecting unix sockets

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:40:04 +04:00
Andrey Vagin
26a0dc91dd mount: add a function to get a temporary root for mntns
On restore all mount namespaces are restored in the root mntns and
sub-namecpeaces are restored in temorary places.

This function allows to get paths to these places.

It will be used in open_remap_ghost(), because it's called in the root
task, when other tasks are not forked yet.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:59 +04:00
Andrey Vagin
5418938ec3 resotre: collect mounts of current mntns
It's required for restoring in the current mntns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:46 +04:00
Andrey Vagin
e827a695f3 mount: separate collect_mnt_ns from dump_mnt_ns
We are going to support nested mntns, so the global mntinfo_tree
variable are useless and information about tree should be connected
to a proper namespace.

But when we don't dump mntns, we need to collect mounts for the current
mntns.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:41 +04:00
Andrey Vagin
cc1fd5760a mount: save mount tree for each namespace
We are going to support nested mount namespaces and each NS has own
tree. The mount tree is used for checking that a file is reachable.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:39:34 +04:00
Andrey Vagin
3a291e33ff crtools: restore nested mount namespaces (v2)
Known issue:
* currently only namespaces with the same root is supported
* nested namespaces can be dumped and restored only if the root task
  has own mount namespace.

All nested namespaces are restored in a root namespace in temporary
directories. All mount points restored in one tree and then they are
divided into namesaces.
The task with minimal pid for each namespaces unshared mntns and
then it makes pivot_root in a proper temporary directory. All other
tasks makes setns to enter into a mount namespace of the task with
minimal pid.

v2: clean up

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:17 +04:00
Andrey Vagin
e7e9c2ee6e mounts: create a temporary directory for restoring non-root mntns (v2)
All non-root namespaces will be restored as sub-trees of the root tree.

This patch adds helpers to create a temporary directory and mount tmpfs
in it, then create directories for each non-root mount namespace.

tmpfs is quite useful here to simplify destroying this construction,
we don't need to unmount each namespace separately.

v2: add a comment why MNT_DETACH is not dangerous here
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-21 22:38:12 +04:00
Andrey Vagin
87a49bdfaf servicefd: add a service fd for current root
It's already used for dumping files and it will be used for restoring,
so it should be service fd to avoid intersection with restored
descriptors.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2014-04-17 12:03:11 +04:00
Pavel Emelyanov
ae98ef6ae0 mount: Factor out mount tree build for NEWNS and non-NS cases
We anyway build the tree, in the NS case -- few calls later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-12 16:19:48 +04:00
Cyrill Gorcunov
1ba08ca664 mount: Extend phys_stat_dev_match to use path resolving instead of btrfs engine
Instead of scanning btrfs subvolumes (which can be even unaccessbile
if mount point lays on directory instead of subvolume itself) we use
path resolving feature here -- once we need to figure out if some
device number need to be altered up to mount point (as we know stat()
called on subvolume returns st_dev for subvolume itself, but not
one that associated with a superblock and shown in /proc/self/mountinfo
output).

This as well implies that we need to check if device number for ghost
files are to be updated to match mountinfo, thus we use phys_stat_resolve_dev
helper here.

After this patch the previously merged btrfs engine is no longer needed
(at least it seems so) and can be dropped.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-11 16:05:22 +04:00
Cyrill Gorcunov
5372e3910c mount: btrfs -- Introduce phys_stat_resolve_dev helper (v2)
This routine is aimed to find a mount point on which
the path passed as argument is laying on. We walk over
all mount points and see which one is matching.

Once found (in worst case it will be a root mount point
so function is never failing) we're checking if this is
btrfs and then return subvolume0 device id.

See commit 921cf873f3
for details what the hell we're doing here.

v2: rewrite mount_resolve_path w/o recursion

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-11 16:05:21 +04:00
Cyrill Gorcunov
cf1ce5f817 mount: Build mount tree on dump restore early, if needed
For paths resolution we will need mount tree to be parsed
and built, but it's not that simple -- the current code
implies that once parsed the tree must not be re-parsed
again, so we pass @parse argument from a caller: if a task
we're restoring do not use mount namespace, we should parse
mount tree early, otherwise defer this action until mount
tree is read from the image.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-11 16:05:19 +04:00
Cyrill Gorcunov
e54ad19a06 mount: Add phys_stat_dev_match helper
This helper serves to hide fs specifics (in particular
btrfs) thus the caller won't need the details.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-12-04 19:23:32 +04:00
Cyrill Gorcunov
291aa3f6d6 headers: Add extern specificator to functions
We really have a mess of extern/non-extern declaration
of functions in our headers. Always use extern for
unification purpose.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-11-15 17:00:58 +04:00
Andrey Vagin
b895c73c82 mntns: don't use global fdset for dumping namespace
We are going to replace pid on id in names of image files. The id is
uniq for each namespace, so it's more convient, if image files are
opened per namespace.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-10-01 12:11:09 +04:00
Pavel Emelyanov
b18fb09eb9 show: Replace one-line show_foo calls with args array
We have generic do_pb_show() call and tons of show_foo
routines, that just call one with proper args. Compact
the code by putting the args into array and calling
the do_pb_show() in one place.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-08-24 04:00:32 +04:00
Pavel Emelyanov
6bf22f8c75 crtools: Get rid of on-stack cr_options
We have global instance of them, that's enough.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-28 21:11:13 +04:00
Pavel Emelyanov
add21b75c9 show: Remove options args from ->show callback
This thing is global, we can address one explicitly.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-05-08 00:23:42 +04:00
Andrey Vagin
13a8e60bca cr-dump: collect mount points in the target namespace
Information about mount points is used for dumping fanotify.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-04-11 22:06:48 +04:00
Pavel Emelyanov
3a1c7d1d76 ns: Introduce ns descriptors
These are structs that (now) tie together ns string
and the CLONE_ flag. It's nice to have one (some code
becomes simpler) and will help us with auto-namespaces
detection.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
2013-01-15 23:24:01 +04:00