Checkpointing is in


Kip Macy’s checkpointing code has been committed; I’m pasting Matt Dillon’s post about it as there’s a lot of issues to consider.

For those of you late to the party, checkpointing allows you to “freeze” a copy of an application so that, in theory, you can restore the program to that running state at a later point in time. Useful, for instance, if you have a program that takes a long time to complete and you don’t want to have to restart from the beginning if there’s an interruption.

“Kip’s checkpointing code is now in the tree. Basically you use it
by kldload’ing the checkpt.ko module, which should now be built
automatically. You then ^E the program you want to checkpoint,
and use the ‘checkpt’ utility in /usr/bin to resume it from the checkpoint
file. The program is NOT killed by this signal, it continues to run
after the checkpoint file(s) have been generated.

The checkpoint program is currently designed to work only with simple
programs… it will save the signal, descriptors references regular
files, the VM state (anonymous memory), as well as any nominal
file mappings, but it does not save sockets, pipes, or device descriptors,
so while you can checkpoint a pipe sequence you can’t really restore it.

Please note that there are SEVERE security issues with this module.
The module is not loaded into the kernel by default and, when loaded,
can only be used by users in the wheel group. You can change the group
requirements with a sysctl (see the manual page for checkpt). The
security issues relate to the restoration of signals and file descriptors
(in particular, the restoration system call will convert file handles
into file descriptors which could potentially allow any file in the system
to be accessed). I’ve put in some basic security checks but they are not
meant to be all encompassing!

It is going into the tree now because Kip and I have done enough work on
it that anyone else interested in working on it can theoretically dig in.
Significant debugging is still in place. We’ve left it as a module to
facilitate debugging.

It should be useable for scientific applications now though I am not
entirely certain that FP registers are saved and restored (maybe someone
can play with that!). It should already work considerably better then
the linux equivalent what with the regular file descriptor save/restore
capability.

Any developer who wishes to work on the checkpointing module and related
code is welcome to!”

Posted by     Categories: Goings-on     0 Comments
0 Comments on Checkpointing is in

Closed