This patch initiates the ppc64le architecture support in CRIU.
Note that ppc64 (Big Endian) architecture is not yet supported since there
are still several issues to address with this architecture. However, in the
long term, the two architectures should be addressed using the almost the
same code, so sharing the ppc64 directory.
Major ppc64 issues:
Loader is not involved when the parasite code is loaded. So no relocation
is done for the parasite code. As a consequence r2 must be set manually
when entering the parasite code, and GOT is not filled.
Furthermore, the r2 fixup code at the services's global address which has
not been fixed by the loader should not be run. Branching at local address,
as the assembly code does is jumping over it.
On the long term, relocation should be done when loading the parasite code.
We are introducing 2 trampolines for the 2 entry points of the restorer
blob. These entry points are dealing with r2. These ppc64 specific entry
points are overwritting the standard one in sigreturn_restore() from
cr-restore.c. Instead of using #ifdef, we may introduce a per arch wrapper
here.
CRIU needs 2 kernel patches to be run powerpc which are not yet upstream:
- Tracking the vDSO remapping
- Enabling the kcmp system call on powerpc
Feature not yet supported:
- Altivec registers C/R
- VSX registers C/R
- TM support
- all lot of things I missed..
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
These images have common magic in front of per-image one. With
this we have 3 "types" of images -- inventory (head), other
images, service files. The latter would be stats (not an image,
just happen to be in PB format) and irmap cache (not an image
again, just auxiliary thing which is in PB for convenience).
Since inventory file is the first one we read on restore it's
OK to set the global "new images" flag there. Dump (write) is
always in new format.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Ruslan Kuprieiev <rkuprieiev@cloudlinux.com>
Acked-by: Andrew Vagin <avagin@odin.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Currently on dump we generate too many image files, effectively
all the stuff from the GLOB set is created. The thing is that
sometimes some of created images can be empty (just contain the
magic number at the head). Thos images are useless and just
waste the space.
When applied after the "empty images" set, this introduces the
lazy images -- when we call open_image() the actual file is
only created (and the magic number is written into it) when the
very first object goes into it.
For example for the simplest test we have, then static/env00
one, the created image files are
core-7290.img
creds-7290.img
fdinfo-2.img
fs-7290.img
ids-7290.img
inventory.img
mm-7290.img
pagemap-7290.img
pages-1.img
pstree.img
reg-files.img
sigacts-7290.img
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When an image of a certian type is not found, CRIU sometimes
fails, sometimes ignores this fact. I propose to ignore this
fact always and treat absent images and those containing no
objects inside (i.e. -- empty). If the latter code flow will
_need_ objects, then criu will fail later.
Why object will be explicitly required? For example, due to
restoring code reading the image with pb_read_one, w/o the
_eof suffix thus required the object to be in the image.
Another example is objects dependencies. E.g. fdinfo objects
require various files objects. So missing image files will
result in non-resolved searches later.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Current code doesn't make any difference between OPT and no-OPT
except for the message is printed or not in the open_image().
So this particular change changes nothing but the availability of
this message.
In the next patches I wil introduce "empty images" to deal with
the ENOENT situation in a more graceful manner.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Force-read came from very first dev version of CRIU (even before 1.0 release)
and never been used actually in image.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When AIO context is set up kernel does two things:
1. creates an in-kernel aioctx object
2. maps a ring into process memory
The 2nd thing gives us all the needed information
about how the AIO was set up. So, in order to dump
one we need to pick the ring in memory and get all
the information we need from it.
One thing to note -- we cannot dump tasks if there
are any AIO requests pending. So we also need to
go to parasite and check the ring to be empty.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We have some images that store raw data together with
the pb objects (and one that just stores raw data) and
use custom access to this. E.g. pipe-data images splice
data into them and sk-queue one lseeks the image for
queue packets.
For those using buffered mode mixed with raw may lead
to troubles. Explicitly mark such images, so that the
buffering (next patches) handle such images carefully.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
We want to have buffered images to speed up dump and,
slightly, restore. Right now we use plan file descriptors
to write and read images to/from. Making them buffered
cannot be gracefully done on plain fds, so introduce
a new class.
This will also help if (when?) we will want to do more
complex changes with images, e.g. store them all in one
file or send them directly to the network.
For now the cr_img just contains one int _fd variable.
This patch chages the prototype of open_image() to
return struct cr_img *, pb_(read|write)* to accept one
and fixes the compilation of the rest of the code :)
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
This is to simplify the change from int fd to more
generic image class data-type.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Will need it to handle vvar zones in a special way.
Because VMA_UNSUPP never goes into the image file
lets reuse bit 12 for VVAR.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
vmsplice doesn't work for such VMA-s.
This flags is set in a kernel function remap_pfn_range()
(remap kernel memory to userspace), which is widely used by device
drivers to provide direct access to a device memory.
Reported-by: J F <jgmb45@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
No all distros (Rpi) provide O_PATH definition,
so include fcntl.h here thus we don't hit compilation
problem like
| CC image.o
| image.c: In function ‘open_image_at’:
| image.c:187:29: error: ‘O_PATH’ undeclared (first use in this function)
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows us to distinguish the situation where image
to be opened is missing but optional, thus no error message
should be printed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We want to write into empty image files, so we
unlink them before dumping into. Let's O_TRUNC
it instead.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
We don't know a state behind an external socket. It depends on logic
of the program, which handles this socket.
This patch adds ability to load a library with callbacks for dumping
and restoring external sockets.
This patch introduces two callbacks cr_plugin_dump_unix_sk and
cr_plugin_restore_unix_sk. If a callback can not handle a socket, it
must return -ENOTSUP.
The main questions, what kind of information should be tranfered in
these callbacks. Pls, think a few minutes about that and send me
your opinion.
v2: Use uflags instread of adding a new field
v3: clean up
v4: Unsuitable callbacks return -ENOTSUP.
v5: set USK_CALLBACK, if a socket was dumped by callback.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
We really have a mess of extern/non-extern declaration
of functions in our headers. Always use extern for
unification purpose.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
crtools.h is too heavy to be included in many sources
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Service connection is actually an 'external' one from unix sockets engine POV,
but we don't want to dump it as such. Thus, we explicitly find one and dump it
as half-closed connection. On restore we push an artificial message into it
to report to the program that the dump-request was served, but the program is
restored.
Signed-off-by: Ruslan Kuprieiev <kupruser@gmail.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Move image object descriptors to own image-desc
file(s). This allow to reuse the code in other tools.
I had to move show declarations to cr-show.h as well.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This header consists of a number of constants
and a few macro helper. No need to make it
asm/types.h dependant. Instead only include
<stdbool.h> for bool definition as required
by forward fdinfo_per_id (I think this declaration
should live in somewhere else place, not sure
where yet, so I left it untouched).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
This allows to reuse magic numbers outside of crtools code.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
It is read together with pstree items for checking what kind of
resources should be shared. Core is too big for reading it in
this place.
v2: fix check_core
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Dump the with "new" prlimit syscall that works on arbitrary pid.
Restore is done in restorer _after_ mappings mixup and _before_
caps drop to make it set any max value.
The RLIM_INFINITY is handled explicitly to help future 64<->32
bits migration.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
* The following files goes into the directory arch/x86/include/asm unmodified:
- include/atomic.h,
- include/linkage.h,
- include/memcpy_64.h,
- include/types.h,
- include/bitops.h,
- pie/parasite-head-x86-64.S,
- include/processor-flags.h,
- include/syscall-x86-64.def.
* Changed include directives in the source files that include the headers
listed above.
* Modified build scripts to reflect the source moves.
Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Make them look like __CR_<smth>_H__ with
sed -e '1,2s/#\(ifndef\|define\) _\?_\?\(CR_\)\?/#\1 __CR_/' -e '1,2s/_H_\?_\?.*$/_H__/'
on every header file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
A private vma will be mapped in a temporary place and an address will
be saved in vma->shmid, because nobody was used this field for private
vma-s before. For easy reading vma_premmapped_addr will be used for
getting the temporary address.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Three parts.
Proc: open of map_files' link doesn't work on sockets. We fstatat
it and check that it's a socket (it will be packet), then save
the socket inode on vma_area.
Dump: we resolve socket inode to socket id and save it on vma.
We use id, not inode, since on restore we'll have to mmap some
opened file, not just abstract socket with inode.
Restore: when reading vma-s we just need to find out on what fd
the respective packet socket is opened (i.e. -- no map-and-close
sockets supported by now) and dup() it to let restorer mmap it
back.
All this make it possible to c/r the tcpdump tool!
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Usually the PTYs represent a pair of links -- master peer and slave
peer. Master peer must be opened before slave. Internally, when kernel
creates master peer it also generates a slave interface in a form of
/dev/pts/N, where N is that named pty "index". Master/slave connection
unambiguously identified by this index.
Still, one master can carry multiple slaves -- for example a user opens
one master via /dev/ptmx and appropriate /dev/pts/N in sequence.
The result will be the following
master
`- slave 1
`- slave 2
both slave will have same master index but different file descriptors.
Still inside the kernel pty parameters are same for both slaves. Thus
only one slave parameters should be restored, there is no need to carry
all parameters for every slave peer we've found.
Not yet addressed problems:
- At moment of restore the master peer might be already closed for
any reason so to resolve such problem we need to open a fake master
peer with proper index and hook a slave on it, then we close
master peer.
- Need to figure out how to deal with ttys which have some
data in buffers not yet flushed, at moment this data will
be simply lost during c/r
- Need to restore control terminals
- Need to fetch tty flags such as exclusive/packet-mode,
this can't be done without kernel patching
[ avagin@:
- ideas on contol terminals restore
- overall code redesign and simplification
]
v4:
- drop redundant pid from dump_chrdev
- make sure optional fown is passed on regular ptys
- add a comments about zeroifying termios
- get rid of redundant empty line in files.c
v5 (by avagin@):
- complete rework of tty image format, now we have
two files -- tty.img and tty-info.img. The idea
behind to reduce data being stored.
v6 (by xemul@):
- packet mode should be set to true in image,
until properly fetched from the kernel
- verify image data on retrieval
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
When dumping a tmpfs mount we need to take its contents with us.
So, use tar for it and put it into the image dir.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>