This is Gentoo's testing wiki. It is a non-operational environment and its textual content is outdated.

Please visit our production wiki at https://wiki.gentoo.org

s6 and s6-rc-based init system

From Gentoo Wiki (test)
Jump to:navigation Jump to:search

An s6 and s6-rc-based init system is an init system built using components from the s6 and s6-rc packages, following a general design supported by the s6-linux-init-maker program from package s6-linux-init. It can be used as alternative to sysvinit (sys-apps/sysvinit) + OpenRC, or systemd.

General setup

Warning
While Gentoo does offer s6, s6-rc and s6-linux-init packages in its official repository, it does not completely support using them to make an init system. Gentoo users wanting to do that might need to use alternative ebuild repositories and/or do some local tweaking.

The general setup of an s6 and s6-rc based init system is as follows:

  1. When the machine boots, all initialization tasks needed to bring it to its stable, normal 'up and running' state, are split into a stage1 init and a a stage2 init. The stage1 init is invoked by the kernel, runs as process 1, and replaces itself with the s6-svscan program from s6 when its work is done. The stage2 init is invoked by the stage1 init, runs as a child of process 1, blocks until s6-svscan starts to execute, and exits when its work is done.
  2. During most of the machine's uptime, s6-svscan runs as process 1 with signal diversion turned on, and there is an s6 supervision tree rooted in process 1, that is launched as soon as s6-svscan starts to execute.
  3. A supervised catch-all logger is started as part of the supervision tree. The catch-all logger logs messages sent by supervision tree processes to s6-svscan's standard output and error.
  4. The stage2 init initializes the s6-rc service manager and starts a subset of the services defined in the compiled service database it was initialized with. Some of these s6-rc-managed services might carry out part of the machine's initialization tasks.
  5. While s6-svscan is running as process 1, services are normally managed using s6-rc tools.
  6. The administrator initiates the machine's shutdown sequence by running a program that sends a signal to process 1. The BusyBox (sys-apps/busybox) halt, poweroff and reboot applets, or the s6-halt, s6-poweroff and s6-reboot programs from s6-linux-init, can be used for this.
  7. s6-svscan then executes an appropriate diverted signal handler as a child process, that performs some of the tasks needed to shut the machine down, and stops all s6-rc-managed services.
  8. When the diverted signal handler's work is done, it invokes the s6-svscanctl program, which makes s6-svscan perform its finish procedure, and results in execution of the .s6-svscan/finish file in process 1's scan directory.
  9. The finish file becomes the stage3 init: it runs as process 1, makes the catch-all logger exit cleanly, if it didn't when the supervision tree was brought down by s6-svscan's finish procedure, and then performs all remaining tasks needed to shut the machine down.
  10. When the stage3 init's work is done, it halts, powers off or reboots the machine, as requested by the administrator.

The boot sequence

The stage1 init

When the machine starts booting (if an initramfs is being used, after it passes control to the 'main' init), a stage1 init executes as process 1. Therefore, if the stage1 init is named, for example, s6-gentoo-init, and placed in /sbin, to use an s6 and s6-rc-based init system, an init=/sbin/s6-gentoo-init argument can be added to the kernel's command line using the bootloader's available mechanisms (e.g. a linux command in some 'Gentoo with s6 + s6-rc' menu entry for GRUB2). It is possible to go back to sysvinit + OpenRC at any time, or to any other init system, by reverting the change.

The stage1 init runs with its standard input, output and error redirected to the machine's console. It must do all necessary setup for s6-svscan to be able to run. This includes setting up its scan directory, and because at that point the root filesystem might be the only mounted filesystem, and possibly read-only, the stage1 init must also mount a read-write filesystem to hold s6-svscan and s6-supervise control files that need to be written to. The customary setup of an s6 and s6-rc-based init system uses a run image containing the initial scan directory, that is copied to a tmpfs that the stage1 init mounts read-write, normally on /run. When s6-svscan starts running as process 1, it uses as its scan directory the copy in the tmpfs. The run image can be in a read-only filesystem.

Also, all special files that might be needed by s6-svscan and the stage1 and stage2 inits, such as the /dev/null and /dev/console device nodes, must be made available by the stage1 init before they are needed. Because of this and requirements of programs and libc functions that might be used for machine initialization, the Linux /dev and /proc filesystems will likely have to be mounted by the stage1 init.

Because the stage1 init runs as process 1, if it exits or is killed, there will be a kernel panic and the machine will hang. Therefore, it must be simple enough and not fail, because recovery in this stage of initialization is almost impossible. So s6 and s6-rc-based init systems split initialization into a stage1 init and a stage2 init. The stage2 init is spawned as a child process by the stage1 init, which, as soon as it finishes its work, replaces itself with s6-svscan using a POSIX exec...() call.

The author of s6 has designed the execline package (dev-lang/execline) so that the stage1 init can be an execline script. The general structure of an execline stage1 script is as follows, or a variation thereof:

CODE Execline stage1 script
#!/bin/execlineb -S0
# 'execlineb -S0' allows the script to use arguments supplied by the kernel as $1, $2, etc.
# If no arguments are used, '-P' can be specified instead of '-S0'.

# Adjust the environment set up by the kernel:
# /bin/s6-envdir -I -- ${stage1_envdir}
# Or at least set a suitable PATH environment variable:
# /bin/export PATH xxx

cd /
s6-setsid -qb
# Set umask:
# umask xxx

ifelse -nX { 
# Initialization.
# ...

# This includes mounting a read-write tmpfs.
# Using mount from util-linux; s6-mount from s6-linux-utils works too:
# if { mount -t tmpfs -o rw,xxx tmpfs ${tmpfsdir} }

# This also includes copying the run image to the tmpfs.
# Using cp from GNU Coreutils; s6-hiercopy from s6-portable-utils works too:
# if { cp -a -- ${run_image} ${tmpfsdir} }
}
{
# Do something if anything in the ifelse block failed, e.g. call sulogin(8) or sh(1).
# ...
}

# Can be done here for both s6-svscan and the stage2 init, or later:
# redirfd -r 0 /dev/null
redirfd -wnb 1 ${logger_fifo}
background
{
   s6-setsid
   redirfd -w 1 ${logger_fifo}
   # stdin: /dev/null or /dev/console
   # stdout: the catch-all logger's FIFO
   # stderr: /dev/console
   # Further file descriptor adjustments can be done here with execline's fdmove,
   # or left to the stage2 init to do it.
   ${stage2_init}
}

emptyenv -p
# Set up the supervision tree's environment if desired:
# s6-envdir -I -- ${s6_svscan_envdir}

# If it hasn't been done yet:
# redirfd -r 0 /dev/null
fdmove -c 2 1
# stdin: /dev/null
# stdout: the catch-all logger's FIFO
# stderr: the catch-all logger's FIFO
s6-svscan -st 0 -- ${tmpfsdir}/${scandir}

Where:

  • ${stage1_envdir} is a placeholder for the absolute pathname of an environment directory to be used by the stage1 and stage2 init (e.g. /lib/s6-init/env).
  • ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the read-write tmpfs will be mounted (normally /run).
  • ${run_image} is a placeholder for the absolute pathname of the directory where the run image is stored (e.g. /lib/s6-init/run-image in the rootfs).
  • ${logger_fifo} is a placeholder for the absolute pathname of the catch-all logger's FIFO (e.g. ${tmpfsdir}/${scandir}/s6-svscan-log/fifo).
  • ${stage2_init} is a placeholder for the name (if PATH search would find it) or absolute pathname of the stage2 init (e.g. /lib/s6-init/init-stage2).
  • ${s6_svscan_envdir} is a placeholder for the absolute pathname of an environment directory used to set up the supervision tree's initial environment (e.g. /etc/s6-svscan/env).
  • ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g. s6/service, making the scan directory's absolute pathname /run/s6/service).

Gentoo's official repository does not supply any package with a stage1 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init-maker program from s6-linux-init can create a minimal execline stage1 script with the aforementioned structure, can be used as a basis for writing a custom or more elaborate one, if so desired.

The stage2 init

The stage2 init is spawned by the stage1 init as a child process, and is blocked from running until the latter replaces itself with s6-svscan. To achieve this, the child process of the stage1 init opens the catch-all logger's FIFO for writing using the POSIX open() call. The call will block until some other process opens the FIFO for reading. The catch-all logger is a supervised process, so it starts executing when s6-svscan does, and opens the FIFO for reading, thereby unblocking the process, which then replaces itself with the stage2 init.

The stage2 init executes with s6-svscan as process 1, and performs all remaining initialization tasks needed to bring the machine to its stable, normal 'up and running' state. It can execute with a few vital supervised long-lived processes already running, started as part of process 1's supervision tree, including the catch-all logger.

When the stage2 init finishes its work, it just exits and gets reaped by s6-svscan. The stage2 init can be, and normally is, an execline or shell script. Gentoo's official repository does not supply any package with a stage2 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init package contains an example execline stage2 script, it is the examples/rc.init file in the package's /usr/share/doc subdirectory.

s6-rc initialization

The s6-rc service manager needs to be initialized, which must be done when s6-svscan is already running. Therefore, initialization is performed by having the stage2 init invoke the s6-rc-init program. This program takes the pathname of a compiled service database as an argument (or defaults it to /etc/s6-rc/compiled), as well as the pathname of process 1's scan directory. So a suitable service database must exist and be available at least in a read-only filesystem. This is the boot-time service database. The live state directory must be in a read-write filesystem, and the customary setup of an s6 and s6-rc-based init system has s6-rc-init create it in the read-write tmpfs mounted by the stage1 init.

The initial state of all s6-rc services, as set by s6-rc-init, is down. So the the stage2 init must also start all atomic services (oneshots and longruns) that are needed to complete the machine's initialization, if any, as well as all longruns that are wanted up at the end of the boot sequence. This is performed by defining a service bundle in the boot-time service database that groups these atomic services, and having the stage2 init start them with an s6-rc -u change command naming the bundle. This bundle would be the s6-rc counterpart to OpenRC's sysinit + boot + default runlevels, systemd's default.target unit, or nosh's normal target bundle directory.

The catch-all logger

In the context of an s6 and s6-rc-based init system, the catch-all logger is a supervised long-lived process that logs messages sent by supervision tree processes to s6-svscan's standard output and error, normally in an automatically rotated logging directory. In a logging chain arrangement, the leaf processes of a supervision tree normally have dedicated loggers that collect and store messages sent to the process' standard output and error in per-service logs. Messages from s6-svscan, s6-supervise processes, logger processes themselves, and leaf processes that exceptionally don't have logger, are printed on process 1's standard output or error, which, at the beginning of the boot sequence, are redirected to the machine's console. It is possible to redirect them later so that the messages are delivered to the catch-all logger, using a setup that involves a FIFO. Only the catch-all logger's standard error remains redirected to the machine's console, as a last resort.

The logging directory is owned by the catch-all logger's effective user after dropping privileges, and normally has permissions 2700 (i.e. the output of ls -l displays drwx--S---). Because it is possible to have a setup where a read-only rootfs is the only filesystem available, the logging directory is also normally placed in the read-write tmpfs mounted by the stage1 init, unless a different read-write filesystem can be guaranteed to exist before s6-svscan starts executing as process 1 (e.g. /var/log/s6-svscan is used, but /var is guaranteed to be in the rootfs, and either the kernel mounts the rootfs read-write or the stage1 init remounts it read-write, or /var is a filesystem mounted read-write by the stage1 init or the initramfs, etc.). If the logging directory is in the aforementioned tmpfs, it must be created with appropriate owner and permissions by the code of the catch-all logger's run file, or be present as an empty directory with appropriate owner and permissions in the run image copied to the tmpfs.

Gentoo's official repository does not supply any package with a catch-all logger service directory for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init-maker program from s6-linux-init can create a catch-all logger service directory named s6-svscan-log, that can be used as a basis for writing a custom or more elaborate one, if so desired.

The catch-all logger's FIFO

An s6 and s6-rc-based init system has a FIFO some place in the filesystem, reserved for the catch-all logger. The FIFO is owned by root and has permissions 0600 (i.e. the output of ls -l displays prw-------). The run image that is copied to the read-write tmpfs mounted by the stage1 init contains s6-svscan's initial scan directory, with at least a service directory for the catch-all logger already present, and possibly an additional service directory for an agetty process or similar also present. The former, so that the catch-all logger is started as soon as s6-svscan begins execution as process 1, and the latter, so that it is possible to log in to the machine if the supervision tree starts successfully, even if something else fails (e.g. s6-rc's setup). The code of the catch-all logger's run file opens the FIFO for reading, redirects its standard input to it, its standard error to /dev/console, drops privileges (e.g. by invoking s6-setuidgid or s6-applyuidgid if it is a script) and calls the logger program, which is normally s6-log.

The stage1 init redirects its standard output and error to the catch-all logger's FIFO before replacing itself with s6-svscan. However, opening a FIFO for writing is an operation that blocks until some other process opens it for reading, and a POSIX non-blocking open() call fails with an error status if it specifies the 'open for writing only' flag (O_WRONLY) and there is no reader. Execline's readirfd program was written in a way that specifically addresses this problem: it is a chain loading program that, if invoked with options -w, -n and -b, will execute the next program in the chain with the specified file descriptor open for writing and without blocking, even if the specified pathname corresponds to a FIFO and there is no reader.

The s6-linux-init-maker catch-all logger has its FIFO located in the logger's service directory.

Stopping the catch-all logger

The s6-log program supports a -p option that makes it ignore the SIGTERM signal, so that it can't get killed that way. If s6-log is being used as the catch-all logger program and, to minimize the risk of losing logs, was invoked with this option, a special procedure is used by the code of process 1's finish file to make it exit cleanly. When the parent s6-supervise process receives a SIGTERM signal while the supervision tree is being brought down by s6-svscan's finish procedure, it sends s6-log a SIGTERM signal followed by a SIGCONT signal. But because s6-supervise doesn't exit until its supervised process does, and s6-log ignores SIGTERM and keeps running, the s6-svc program supports a special option, -X (capital 'x'), that works like -x (small 'x'), but also makes s6-supervise redirect its standard input, output and error to /dev/null.

The stage3 init's code can use an s6-svc -X command with the catch-all logger's service directory as the argument; this would leave the catch-all logger's FIFO with no writers, because s6-svscan and all other s6-supervise processes would normally have exited by then, causing s6-log to detect end-of-file on its standard input and exit.

Shutdown and reboot

The s6-svscan diverted signal handlers

An s6 and s6-rc-based init system is asked to initiate the shutdown sequence by sending signals to process 1. Because the program running as process 1 is s6-svscan with signal diversion turned on, the signals must be chosen from the set it can divert. The BusyBox halt, poweroff and reboot applets, and the s6-halt, s6-poweroff and s6-reboot programs from s6-linux-init, are capable of sending suitable signals to process 1:

Operation BusyBox signal s6-linux-init signal
Halt SIGUSR1 SIGUSR2
Poweroff SIGUSR2 SIGUSR1
Reboot SIGTERM SIGINT


When process 1 receives such a signal, the corresponding diverted signal handler is executed as a child process. The handler then performs part of the tasks needed to shut the machine down, and when it finishes its work, it invokes the s6-svscanctl program with the option that corresponds to the action associated with the corresponding signal.

Generally speaking, the handlers undo what the stage2 init has done at boot time. Because most of this work is the same for all diverted signal handlers, they usually execute a common file, named the shutdown file, and wait for it to finish before invoking s6-svscanctl. The shutdown file's code can use s6 tools and s6-rc services to do its work, because s6-svscan is still running. However, all s6-rc-managed services have to be stopped (normally with a s6-rc -da change command) before s6-svscanctl is invoked, because s6-svscan will stop running after that, and s6-rc does not work without an s6 supervision tree. The s6-svscan diverted signal handlers and the shutdown file can be, and normally are, execline or shell scripts.

The general structure of an execline handler script is as follows, or a variation thereof:

FILE ${tmpfsdir}/${scandir}/.s6-svscan/SIGxxxExecline diverted signal handler script
#!/bin/execlineb -P
foreground { ${shutdown_file} }
s6-svscanctl ${option} .

Where:

  • ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally /run).
  • ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g. s6/service, making the scan directory's absolute pathname /run/s6/service).
  • ${shutdown_file} is a placeholder for the name (if PATH search would find it) or absolute pathname of the shutdown file (e.g. /lib/s6-init/shutdown).
  • ${option} is the s6-svscanctl option for the action that corresponds to the signal:
    • -0 or -st for halt.
    • -7 or -pt for poweroff.
    • -6 or -rt for reboot.

Gentoo's official repository does not supply any package with s6-svscan diverted signal handlers or a shutdown file for s6 and s6-rc-based init systems. Users must create them from scratch or take them from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init-maker program from s6-linux-init can create execline handler scripts for all s6-svscan diverted signals, compatible with s6-halt, s6-poweroff and s6-reboot. They can currently work without modifications for BusyBox halt, poweroff and reboot, by swapping the SIGUSR1 and SIGUSR2 handlers. This package also contains an example execline shutdown script, it is the examples/rc.shutdown file in the package's /usr/share/doc subdirectory.

This means that s6-svscan is not directly compatible with sysvinit's telinit, halt, poweroff, reboot, and shutdown commands. However, many programs (e.g. those from desktop environments) expect to be able to call programs with those names during operation, so if such thing is needed, it is possible to use compatibility execline scripts:

FILE shutdown
#!/bin/execlineb -P
# For BusyBox:
# busybox poweroff
# For s6-linux-init:
# s6-poweroff
FILE reboot
#!/bin/execlineb -P
# For BusyBox:
# busybox reboot
# For s6-linux-init:
# s6-reboot

The stage3 init

When an s6-svscan diverted signal handler invokes the s6-svscanctl program, s6-svscan performs its finish procedure, executing the finish file in the .s6-svscan control subdirectory of its scan directory, using the POSIX execve() call, and passing a halt, poweroff or reboot argument to it. Therefore, it replaces s6-svscan as process 1 and becomes the stage3 init.

The stage3 init redirects its standard output and error to /dev/console, uses the s6-svc -X command to make the catch-all logger exit cleanly, and performs all remaining tasks needed to shut the machine down. It must also kill all other processes that are still running at that point, after a grace period to allow them to exit on their own, so that filesystems can be synced and unmounted, or remounted read-only. This can be done with a POSIX kill() call specifying -1 as the process ID argument, usually to send a SIGTERM signal followed by a SIGCONT signal first, waiting for a short period of time, and then sending a SIGKILL signal. Because the stage3 init runs as process 1, and process 1 does not get killed by a kill(-1, SIGKILL) call, it continues executing after that. Sending a SIGKILL signal to all processes from a non-PID 1 process that is expected to continue running is much harder. The stage3 init can be, and normally is, an execline or shell script. The kill program provided by either the GNU Core Utilities package (sys-apps/coreutils), the util-linux package (sys-apps/util-linux) or the procps package (sys-process/procps), can be used in such a script as kill -TERM -1, kill -CONT -1 and kill -KILL -1 (the last form will also kill itself, but not the stage3 init). The s6-nuke program from the s6-portable-utils package can also be used in such a script, as s6-nuke -t (SIGTERM + SIGCONT) and s6-nuke -k (SIGKILL). And a shell stage3 script that invokes a shell with a builtin kill utility works too. In that case, process 1 will be a shell process that sends the signals itself. The wait command can be used in an execline stage3 script to reap all resulting zombie processes.

When the stage3 init finishes its work, it performs the halt, poweroff or reboot operation with a Linux reboot() call. If it is a script, it can use the BusyBox halt, poweroff and reboot applets, or the s6-halt, s6-poweroff and s6-reboot programs from s6-linux-init, passing them an -f (force) option and the argument supplied by s6-svscan.

The general structure of an execline stage3 script is as follows, or a variation thereof:

FILE ${tmpfsdir}/${scandir}/.s6-svscan/finishExecline stage3 script
#!/bin/execlineb -S0
cd /
redirfd -w 2 /dev/console
fdmove -c 1 2
foreground { s6-svc -X -- ${tmpfsdir}/${scandir}/${logger_servicedir} }
unexport ?
wait -r { }

# Shutdown tasks
# ...

# This includes killing all processes so that filesystems can be unmounted / remounted read-only.
# Using kill from procps and sleep from GNU Coreutils; s6-nuke and s6-sleep from s6-portable-utils work too:
# foreground { kill -TERM -1 }
# foreground { kill -CONT -1 }
# foreground { sleep 2 }
# foreground { kill -KILL -1 }
# wait { }

# ...

# Final action if using BusyBox:
# busybox $1 -f
# If using s6-linux-init:
# s6-$1 -f

Where:

  • ${tmpfsdir} is a placeholder for the absolute pathname of the directory where the stage1 init mounted the read-write tmpfs (normally /run).
  • ${scandir} is a placeholder for the pathname, relative to ${tmpfsdir}, of process 1's scan directory (e.g. s6/service, making the scan directory's absolute pathname /run/s6/service).
  • ${logger_servicedir} is a placeholder for the name of the catch-all logger's service directory (e.g. s6-svscan-log, making the service directory's absolute pathname /run/s6/service/s6-svscan-log).

Gentoo's official repository does not supply any package with a stage3 init for s6 and s6-rc-based init systems. Users must create one from scratch or take it from somewhere else (e.g. alternative ebuild repositories). The s6-linux-init-maker program from s6-linux-init can create an execline stage3 script with the aforementioned structure.

Service management

On an s6 and s6-rc-based init system, the s6-rc package is used for service management. In particular, the administrator can replace the init system's compiled service database with a new one using the s6-rc-update program, and can create a new boot-time service database, to be used next time the machine boots, with the s6-rc-compile program and a set of service definitions in the program's supported source format. It is best to have the s6-rc-init invocation in the stage2 init use a symbolic link as the compiled service database pathname, so that the boot-time database can be changed by modifying the symlink instead of the stage2 init code, e.g. by having an /etc/s6-rc/db directory for storing one or more compiled databases, making /etc/s6-rc/boot a symbolic link to one of those databases, and using the symlink in the s6-rc-init invocation.

It is possible to have long-lived processes not managed by s6-rc but supervised by process 1, by directly managing s6 service directories, placing them (or symbolic links to them) in process 1's scan directory, and using s6-svscanctl -a, s6-svscanctl -n or s6-svscanctl -N commands as needed. It is also possible to use s6-svscan as process 1 and just s6 tools, without s6-rc, but then the init system becomes more like runit. In that case, executing s6-svscan with signal diversion turned on is not necessary.

s6 service directories and s6-rc service definitions for anything not supplied in packages from Gentoo's official repository must be created by the administrator, either from scratch or taken from somewhere else (e.g. alternative ebuild repositories).

See also

External resources

  • lh-bootstrap, a set of scripts that build a disk image for a virtual machine such as QEMU. The image contains a Linux kernel and a collection of small user-space tools such as BusyBox and dropbear (sys-apps/busybox, net-misc/dropbear), all statically linked to musl (sys-libs/musl), and an s6 and s6-rc-based init system.
  • Obarun, an Arch derivative with an s6 and s6-rc-based init system.
  • Slew, a project that provides stage1, stage2, stage3 inits and s6-svscan diverted signal handlers, as well as s6-rc service definition directories in s6-rc-compile's source format for several services and other supporting scripts, to make an s6 and s6-rc-based init system. Most scripts require Byron Rakitzis's implementation of the Plan 9 shell, rc, for Unix (app-shells/rc).