This is Gentoo's testing wiki. It is a non-operational environment and its textual content is outdated.

Please visit our production wiki at https://wiki.gentoo.org

s6

From Gentoo Wiki (test)
Jump to:navigation Jump to:search

s6 is a package that provides a daemontools-inspired process supervision suite, a notification framework, a UNIX domain super-server, and tools for file descriptor holding and suidless privilege gain. It can be used as an init system component, and also as a helper for supervising OpenRC services. A high level overview of s6 is available here. The package's documentation is provided in HTML format, and can be read on a text user interface using for example www-client/links.

Installation

USE flags

USE flags for sys-apps/s6 skarnet.org's small and secure supervision software suite

execline enable support for dev-lang/execline

Emerge

root #emerge --ask sys-apps/s6
Important
The above command will install s6 version 2.1.3.0 for systems on the stable branch, which doesn't support up-to-date service readiness notification tools, and can't be used in conjuntion with s6-rc. Users who want a more recent version will need to add dev-libs/skalibs, dev-lang/execline and sys-apps/s6 to /etc/portage/package.accept_keywords (if using Portage). While it is generally not advised to mix packages of stable and testing branches, the skarnet.org software stack only depends on the libc, so in this case it should be safe.

Configuration

Environment variables

  • UID - The process' user ID set by s6-applyuidgid when invoked with the -U option.
  • GID - The process' group ID set by s6-applyuidgid when invoked with the -U option.
  • GIDLIST - The process' supplementary group list set by s6-applyuidgid when invoked with the -U option. Must be a comma-separated list of numeric group IDs, without spaces.
  • PROTO - Set by s6-ipcclient and s6-ipcserverd to the value 'IPC', as per the IPC UCSPI specification, and used by s6-ipcserver-access and s6-connlimit to construct the names of other environment variables.
  • IPCLOCALPATH - Set by s6-ipcclient to the pathname associated with the local UNIX domain socket it is using for the connection, as per the IPC UCSPI specification. Also set by s6-ipcserver-access (if the value of PROTO is 'IPC') to the pathname associated with the local UNIX domain socket (the one its standard input and output read from and write to, respectively), as reported by the POSIX getsockname() call.
  • IPCREMOTEPATH - Set by s6-ipcserverd to the pathname associated with the remote UNIX domain socket (on Gentoo, as contained in the sun_path field of the struct sockaddr_un object filled by the Linux accept4() call), if any, as per the IPC UCSPI specification. Be aware that it may contain arbitrary characters.
  • IPCREMOTEEUID - Set by s6-ipcserverd to the effective user ID of the client, as per the IPC UCSPI specification, unless credentials lookups have been disabled. Read by s6-ipcserver-access (if the value of PROTO is 'IPC') to decide whether to allow or refuse access to the server.
  • IPCREMOTEEGID - Set by s6-ipcserverd to the effective group ID of the client, as per the IPC UCSPI specification, unless credentials lookups have been disabled. Read by s6-ipcserver-access (if the value of PROTO is 'IPC') to decide whether to allow or refuse access to the server.
  • IPCCONNNUM - Set by s6-ipcserverd to the number of connections originating from the same user (i.e. same user ID), and read by s6-connlimit (if the value of PROTO is 'IPC') to decide if the maximum number of connections originating from the same user will exceeded the maximum allowed.
  • IPCCONNMAX - Maximum number of connections originating from the same client allowed by s6-connlimit (if the value of PROTO is 'IPC').
  • S6_FD# - Number of file descriptors transferred to or from an s6-fdholderd process by s6-fdholder-setdumpc or s6-fdholder-getdumpc.
  • S6_FD_0, S6_FD_1, ... - File descriptors transferred to or from an s6-fdholderd process by s6-fdholder-setdumpc or s6-fdholder-getdumpc.
  • S6_FDID_0, S6_FDID_1, ... - Identifiers of the file descriptors transferred to or from an s6-fdholderd process by s6-fdholder-setdumpc or s6-fdholder-getdumpc.
  • S6_FDLIMIT_0, S6_FDLIMIT_1, ... - Expiration dates, or remaining time until expiration, of the file descriptors transferred to or from an s6-fdholderd process by s6-fdholder-setdumpc or s6-fdholder-getdumpc, as a timestamp in external TAI64N format.

Files

  • /run/openrc/s6-scan - s6-svscan's scan directory when using OpenRC's s6 integration feature.
  • /var/svc.d - Service directory repository searched by OpenRC when using the s6 integration feature.

Service

OpenRC

See here.

Usage

Process supervision

For more in-depth information about the process supervision aspects of s6, see daemontools-encore. A summary follows.

s6 program daemontools program with similar functionality
s6-log multilog
s6-setsid pgrphack


Other s6 programs that have a functionality similar to a daemontools program have the daemontools name prefixed with s6-.

The program implementing the supervisor features in s6 is s6-supervise, and just like daemontools' supervise, it takes the (absolute or relative to the working directory) pathname of a service directory (or servicedir) as an argument. An s6 service directory must contain at least an executable file named run, and can contain an optional, regular file named down, and an optional subdirectory or symbolic link to directory named log, all of which work like their daemontools counterparts. Like runit service directories, it can also contain an optional, executable file named finish, that can be used to perfom cleanup actions each time the supervised process stops, possibly depending on its exit status information. s6-supervise calls finish with two arguments: the first one is the supervised process' exit code, or 256 if it was killed by a signal, and the second one is the signal number if the supervised process was killed by a signal, or an undefined number otherwise. Unlike runit's runsv, s6-supervise sends finish a SIGKILL signal if it runs for too long. If using s6 version 2.2.0.0 or later, there can be an optional, regular file in the service directory, named timeout-finish, and containing an unsigned integer value specifying how much time (in milliseconds) finish is allowed to run until being killed. If that file is absent, a default value of 5 seconds is used. Like daemontools-encore, s6-supervise makes its child process the leader of a new session using the POSIX setsid() call, unless the servicedir contains a regular file named nosetsid (daemontools-encore's is named no-setsid, though). In that case, the child process will run in s6-supervise's session instead. s6-supervise waits for a minimum of 1 second between two run spawns, so that it does not loop too quickly if the supervised process exits immediately. If s6-supervise receives a SIGTERM signal, it behaves as if an s6-svc -dx command naming the corresponding service directory had been used (see later), and if it receives a SIGHUP signal, it behaves as if an s6-svc -x command naming the corresponding service directory had been used.

Just like daemontools' supervise, s6-supervise keeps control files in a subdirectory of the servicedir, named supervise, and if it finds a symbolic link to directory with that name, s6-supervise will follow it and use the linked-to directory for its control files. Unlike other process supervision suites, s6-supervise also uses a subdirectory in the servicedir, named event, for notifications about the supervised process' state changes (see the notification framework). The event is a FIFO directory, and if doesn't exist, s6-supervise will create it as a fifodir restricted to members of its effective group. If event exists s6-supervise will use it as-is, and if event is a symbolic link to directory, s6-supervise will follow it. Complete information about the service directory structure is available here, and for further information about s6-supervise please consult the HTML documentation in the package's /usr/share/doc subdirectory.

The author of s6 is also the author of the execline package (dev-lang/execline), that implements the execline language, a scripting language built around chain loading[1]. Execline aims to help producing lightweight and efficient scripts, among other things, by reducing the time involved in spawning and initializing a big command interpreter (like e.g. the sh program for shell scripts), and by simplifying parsing and doing it only once, when the script is read by the interpreter[2]. The s6 package depends on execline because some of its programs call execline programs or use the execline library, libexecline. However, it is not required that run or finish files in service directories be execline scripts. Just like with daemontools, any file format that the kernel knows how to execute is acceptable, and, in particular, they can be shell scripts if so desired.

The s6-svscan program allows supervising a set of processes running in parallel using a scan directory (or scandir), just like daemontools' svscan, so it will be the supervision tree's root. s6-svscan from package version 2.3.0.0 or later does not perform periodic scans by default, like other process supervision suites do, unless it it passed a -t option with a scan period (as an unsigned integer value in milliseconds). Earlier versions had a default scan period of 5 seconds (equivalent to a -t 5000 argument) for compatibiliy with daemontools, that could be turned off with a -t 0 argument. s6-svscan can be forced to perform a scan by sending it a SIGALRM signal, or by using s6-svscanctl (see later). When s6-svscan performs a scan, it checks the scan directory and launches an s6-supervise child process for each new servicedir it finds, or old servicedir for which it finds its s6-supervise process has exited. All services with a corresponding servicedir are considered active. s6-supervise children for which s6-svscan finds that their corresponding servicedir is no longer present are not stopped, but their service is considered inactive.

s6-svscan keeps control files in a subdirectory of the scandir, named .s6-svscan. If this subdirectory or any of its files doesn't exist when s6-svscan is invoked, they will be created. s6-svscan can be controlled by sending it signals, or by using the s6-svscanctl program. s6-svscanctl communicates with s6-svscan using a FIFO in the .s6-svscan subdirectory, and accepts a scan directory pathname, and options that specify what to do. Some of s6-svscanctl's options are:

  • s6-svscanctl -a (alarm): make s6-svscan perform a scan. Equivalent to sending s6-svscan a SIGALRM signal.
  • s6-svscanctl -n (nuke): make s6-svscan stop s6-supervise child processes corresponding to inactive services, by sending each of them a SIGTERM signal, or a SIGHUP signal if they are running on the log subdirectory of a service directory.
  • s6-svscanctl -N (really nuke): make s6-svscan stop s6-supervise child processes corresponding to inactive services, by sending each of them a SIGTERM signal, even if they are running on the log subdirectory of a service directory.
  • s6-svscanctl -t (terminate): make s6-svscan stop all s6-supervise child processes by sending each of them a SIGTERM signal, or a SIGHUP signal if they are running on the log subdirectory of a service directory, and then make s6-svscan start its finish procedure. Equivalent to sending s6-svscan a SIGTERM signal, unless signal diversion is turned on.
  • s6-svscanctl -q (quit): make s6-svscan stop all s6-supervise child processes by sending each of them a SIGTERM signal, even if they are running on the log subdirectory of a service directory, and then make s6-svscan start its finish procedure. Equivalent to sending s6-svscan a SIGQUIT signal, unless signal diversion is turned on.
  • s6-svscanctl -h (hangup): make s6-svscan stop all s6-supervise child processes by sending each of them a SIGHUP signal, and then make s6-svscan start its finish procedure. Equivalent to sending s6-svscan a SIGHUP signal, unless signal diversion is turned on.
  • s6-svscanctl -b (abort): make s6-svscan start its finish procedure without stopping its s6-supervise child processes. Equivalent to sending s6-svscan a SIGABRT signal.

Other s6-svscanctl options are used by s6-svscan's finish procedure. For further information about s6-svscan, and the full description of s6-svscanctl's functionality, please consult the HTML documentation in the package's /usr/share/doc subdirectory.

s6-log is the logger program provided by the s6 package. Just like daemontools' multilog program, it treats its arguments as a logging script, composed by a sequence of directives that specify what to do with text lines read from its standard input. Directives starting with . or / (daemontools-style automatically rotated logging directories or logdirs) behave like their daemontools' multilog counterparts, and so do directives starting with s, n, !, t, + and -, except that patterns in + and - directives are POSIX extended regular expressions (like those of the grep -E command), and the processor specified in an !arguments directive is invoked as execlineb -Pc arguments, so that arguments can use execline syntax. A T directive prepends each logged line with a ISO 8601 timestamp for combined date and time representing local time according to the system's timezone, with a space (not a 'T') between the date and the time and two spaces after the time. For s6-log, t and T directives can appear in any place of the logging script; directives appearing before them apply to read lines without the timestamp, and directives appearing after them apply to lines with the prepended timesptamp. s6-log can be forced to perform a rotation on a logdir by sending it a SIGALRM signal. For the full description of s6-log's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

s6 also provides chain loading programs that can be used to modify a supervised process' execution state. s6-envdir, s6-envuidgid, s6-setlock, s6-setuidgid, s6-softlimit and s6-setsid are similar to daemontools' envdir, envuidgid, setlock, setuidgid, softlimit and pgrphack, respectively. s6-envuidgid also sets environment variable GIDLIST to the supplementary group list (as a comma separated list of group IDs) of its effective user, obtained using the POSIX getgrent() call. s6-setuidgid can also accept an argument of the form uid:gid with a numeric user and group ID as an alternative to an account database username. s6-setlock can also take a shared lock on a file (calling Linux flock() with a LOCK_SH operation on Gentoo) instead of an exclusive lock by invoking it with an -r option, and can take a timed lock (using a helper program, s6lockd-helper) by invoking it with a -t option followed by a time value in milliseconds specifying the timeout. s6-setsid can also make the process the leader of a new (background) process group without creating a new session (using POSIX setpgid()) by invoking it with a -b option, and can make the process the leader of a new process group in the same session and then attach the session's controlling terminal to the process group to make it the foreground group (using POSIX tcsetpgrp()) by invoking it with an -f or -g option. In the latter case, the process will ignore the resulting SIGTTOU signal, so that it doesn't get stopped. There is also a generalized version of s6-setuidgid, named s6-applyuidgid: s6-applyuidgid -u uid sets the effective user ID of the process to uid, s6-applyuidgid -g gid sets the effective group ID of the process to gid, s6-applyuidgid -G gidlist sets the supplementary group list of the process to gidlist (using Linux setgroups() on Gentoo), which must be a comma-separated list of numeric group IDs, without spaces, and s6-applyuidgid -U sets the effective user ID, group ID and supplementary group list of the process to the values of environment variables UID, GID and GIDLIST, respectively. For the full description of all these programs' functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

s6-svc is s6's program for controlling supervised processes, and s6-svstat, the program for querying status information about them. s6-svc accepts a service directory pathname and options that specify what to do; unlike daemontools' svc, any pathname after the first one will be ignored. s6-svc -u, s6-svc -d, s6-svc -o and s6-svc -x commands behave like daemontools' svc -u, svc -d, svc -o and svc -x commands, respectively. s6-svc -o is actually defined as the equivalent of s6-svc -uO, and s6-svc -O (capital 'o') behaves like daemontools' svc -o, except that if the supervised process is not running, it won't be started. Other s6-svc options allow reliably sending signals to a supervised process, and interacting with s6-supervise's notification features. In particular, s6-svc -a can be used to send a SIGALRM signal to a supervised s6-log process to force it to perform a rotation. If using s6 version 2.5.0.0 or later, and if a service directory contains a finish file that exits with an exit code of 125 (indicating permanent failure), the supervised process won't be started, as if an s6-svc -O command naming the corresponding service directory had been used while the process was running. If using s6 version 2.5.1.0 or later, there can be an optional, regular file in a service directory, named timeout-kill, and containing a time value in milliseconds (as an unsigned integer); if the value is nonzero, an s6-svc -d command naming the corresponding service directory is used, and the supervised process is still running after the specified time has elapsed from the moment it was sent the SIGTERM signal followed by the SIGCONT signal, then s6-supervise sends it a SIGKILL signal to finally kill it.

s6-svstat accepts a service directory pathname and options; unlike daemontools' svstat, any pathname after the first one will be ignored. Without options, or with only the -n option, s6-svstat prints a human-readable summary of all the available information on the service. In this case, it displays whether the supervised process is running ('run') or not ('down'), whether it is transitioning to the desired state or already there ('want up' or 'want down'), how long it has been in the current state, and whether its current up or down status matches the presence or absence of a down file in the servicedir ('normally up' or 'normally down'). It also shows if the supervised process is paused (because of a SIGSTOP signal). If the process is up, s6-svstat prints its process ID (PID), and, if it suports readiness notification, whether s6-supervise has been already notified ('ready') or not, and how much time has passed since the notification has been received. If the process is down, s6-svstat prints its exit status (signal name, or signal number if the -n option was given, if the supervised process was killed by a signal, or exit code otherwise), whether the really down event has happened or not, and how much time has passed since the event has happened. When s6-svstat is invoked with options other than -n, it outputs programmatically parsable information instead, as a series of space-separated values, one value per requested field.

For the full description of s6-svc's and s6-svstat's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory. s6 also provides an s6-svok program similar to daemontools' svok, that checks whether a s6-supervise process is currently running on a service directory specified as an argument. Its exit status is 0 if there is one, and 1 if there isn't.

Example s6 scan directory with down, finish, and timeout-kill files, as well as a symbolic link to a supervise directory elsewhere, and execline scripts:

user $ls -l *
test-service1:
total 8
-rwxr-xr-x 1 user user 32 Jul 16 12:00 run
lrwxrwxrwx 1 user user 24 Jul 16 12:00 supervise -> ../../external-supervise

test-service2:
total 20
-rw-r--r-- 1 user user  0 Jul 16 12:00 down
-rwxr-xr-x 1 user user 99 Jul 16 12:00 finish
-rwxr-xr-x 1 user user 76 Jul 16 12:00 run
-rw-r--r-- 1 user user  6 Jul 16 12:00 timeout-finish

test-service3:
total 20
-rw-r--r-- 1 user user  0 Jul 16 12:00 down
-rwxr-xr-x 1 user user 75 Jul 16 12:00 finish
-rwxr-xr-x 1 user user 43 Jul 16 12:00 run
-rw-r--r-- 1 user user  6 Jul 16 12:00 timeout-kill
FILE test-service1/run
#!/bin/execlineb -P
test-daemon
FILE test-service2/run
#!/bin/execlineb -P
foreground { echo Starting test-service2/run }
sleep 10
FILE test-service2/finish
#!/bin/execlineb -S0
foreground { echo Executing test-service2/finish with arguments $@ }
sleep 10
FILE test-service2/timeout-finish
20000
FILE test-service3/run
#!/bin/execlineb -P
test-daemon-ignoreterm
FILE test-service3/finish
#!/bin/execlineb -S0
echo Executing test-service3/finish with arguments $@
FILE test-service3/timeout-kill
10000

It is assumed test-daemon-ignoreterm is a program that ignores the SIGTERM signal. Since the test-service2/finish script runs for more than 5 seconds, the timeout-finish file specifying a 20 seconds timeout will prevent it from being killed by s6-supervise before it completes its execution.

Resulting supervision tree when s6-svscan is run on this scandir as a background process in an interactive shell, assuming it is a subdirectory named scan in the working directory (i.e. launched with s6-svscan scan &):

user $ps xf -o pid,ppid,pgrp,euser,args
 PID  PPID  PGRP EUSER    COMMAND
...
1833  1820  1833 user     -bash
2201  1833  2201 user      \_ s6-svscan scan
2202  2201  2201 user          \_ s6-supervise test-service3
2203  2201  2201 user          \_ s6-supervise test-service1
2205  2203  2205 user          |   \_ test-daemon
2204  2201  2201 user          \_ s6-supervise test-service2
...
Important
Since processes in a supervision tree are created using the POSIX fork() call, all of them will inherit s6-svscan's enviroment, which, in the context of this example, is the user's login shell environment. If s6-svscan is launched in some other way (see later), the environment will likely be completely different. This must be taken into account when trying to debug a supervision tree with an interactive shell.

supervise subdirectory contents:

user $ls -l */supervise
lrwxrwxrwx 1 user user 24 Jul 16 12:00 test-service1/supervise -> ../../external-supervise

test-service2/supervise:
total 4
prw------- 1 user user  0 Jul 16 12:10 control
-rw-r--r-- 1 user user  0 Jul 16 12:10 lock
-rw-r--r-- 1 user user 35 Jul 16 12:10 status

test-service3/supervise:
total 4
prw------- 1 user user  0 Jul 16 12:10 control
-rw-r--r-- 1 user user  0 Jul 16 12:10 lock
-rw-r--r-- 1 user user 35 Jul 16 12:10 status
user $ls -l ../external-supervise
total 4
prw------- 1 user user  0 Jul 16 12:10 control
-rw-r--r-- 1 user user  0 Jul 16 12:10 lock
-rw-r--r-- 1 user user 35 Jul 16 12:10 status

Messages sent by test-service2/run to s6-svscan's standard output when manually started:

user $s6-svc -u test-service2
Starting test-service2/run
Executing test-service2/finish with arguments 0 0
Starting test-service2/run
Executing test-service2/finish with arguments 0 0
Starting test-service2/run
...
user $for i in *; do printf "$i: %s\n" "$(s6-svstat $i)"; done
test-service1: up (pid 2205) 126 seconds
test-service2: up (pid 2237) 5 seconds, normally down
test-service3: down (exitcode 0) 126 seconds, ready 126 seconds

After enough seconds have elapsed:

user $for i in *; do printf "$i: %s\n" "$(s6-svstat $i)"; done
test-service1: up (pid 2205) 137 seconds
test-service2: down (exitcode 0) 6 seconds, want up
test-service3: down (exitcode 0) 137 seconds, ready 137 seconds

The output of s6-svstat and test-service2/finish shows that test-service2/run exits each time with an exit code of 0. Reliably sending a SIGSTOP signal, and later a SIGTERM signal, to test-service2/run:

user $s6-svc -p test-service2
user $s6-svc -t test-service2
user $s6-svstat test-service2
up (pid 2312) 18 seconds, normally down, paused

The output of s6-svstat shows that test-service2/run is stopped indeed ("paused"), so SIGTERM doesn't have any efect yet. To resume the process a SIGCONT signal is needed:

user $s6-svc -c test-service2
Executing test-service2/finish with arguments 256 15
Starting test-service2/run
Executing test-service2/finish with arguments 0 0
Starting test-service2/run
...

The output of test-service2/finish shows that after resuming execution, test-service2/run was killed by the SIGTERM signal that was awaiting delivery (signal 15), and since the process is supervised, s6-supervise restarts test-service2/run after test-service2/finish exits.

Messages sent by test-service2/run to s6-svscan's standard output when manually stopped:

user $s6-svc -d test-service2
Executing test-service2/finish with arguments 256 15

As shown by test-service2/finish, s6-supervise stopped test-service2/run by killing it with a SIGTERM signal (signal 15).

Manually starting test-daemon-ignoreterm:

user $s6-svc -u test-service3
user $s6-svstat test-service3
up (pid 2390) 16 seconds, normally down

The resulting supervision tree:

user $ps xf -o pid,ppid,pgrp,euser,args
 PID  PPID  PGRP EUSER    COMMAND
...
1833  1820  1833 user     -bash
2201  1833  2201 user      \_ s6-svscan scan
2202  2201  2201 user          \_ s6-supervise test-service3
2390  2202  2390 user          |   \_ test-daemon-ignoreterm
2203  2201  2201 user          \_ s6-supervise test-service1
2205  2203  2205 user          |   \_ test-daemon
2204  2201  2201 user          \_ s6-supervise test-service2
...

Manually stopping test-daemon-ignoreterm:

user $s6-svc -d test-service3
user $for i in *; do printf "$i: %s\n" "$(s6-svstat $i)"; done
test-service1: up (pid 2205) 382 seconds
test-service2: down (signal SIGTERM) 69 seconds, ready 59 seconds
test-service3: up (pid 2390) 40 seconds, normally down, want down
Executing test-service3/finish with arguments 256 9

The output of s6-svstat confirms that test-service2/run was killed by a SIGTERM signal, and shows that test-daemon-ignoreterm could not be stopped ("up" but also "want down") because it ignores SIGTERM. The service directory contains a timeout-kill file, so after waiting the specified 10 seconds, s6-supervise killed test-daemon-ignoreterm with a SIGKILL signal (signal 9), as shown by the message test-service3/finish sent to s6-svscan's standard output.

user $s6-svstat test-service3
down (signal SIGKILL) 14 seconds, ready 14 seconds

The output of s6-svstat confirms that test-daemon-ignoreterm was killed by a SIGKILL signal.

s6-svscan's finish procedure

When s6-svscan is asked to exit using s6-svscanctl, it tries to execute a file named finish, expected to be in the .s6-svscan control subdirectory of the scan directory. The program does this using the POSIX execve() call, so no new process will be created, and .s6-svscan/finish will have the same process ID as s6-svscan.

.s6-svscan/finish is invoked with a single argument that depends on how s6-svscanctl is invoked:

  • If s6-svscanctl is invoked with the -s option, .s6-svscan/finish will be invoked with a halt argument.
  • If s6-svscanctl is invoked with the -p option, .s6-svscan/finish will be invoked with a poweroff argument.
  • If s6-svscanctl is invoked with the -r option, .s6-svscan/finish will be invoked with a reboot argument.

This behaviour supports running s6-svscan as process 1. Just as run or finish files in a service directory, .s6-svscan/finish can have any file format that the kernel knows how to execute, but is usually an execline script. If s6-svscan is not running as process 1, the argument supplied to .s6-svscan/finish is usually meaningless and can be ignored. The file can be used just for cleanup in that case, and if no special cleanup is needed, it can be this minimal do-nothing execline script:

FILE .s6-svscan/finishMinimal execline finish script
#!/bin/execlineb -P
exit

If no -s, -p or -r option is passed to s6-svscanctl, or if s6-svscan receives a SIGABRT, or if s6-svscan receives a SIGTERM, SIGTHUP or SIGQUIT signal and signal diversion is turned off, .s6-svscan/finish will be invoked with a 'reboot' argument.

If s6-svscan encounters a error situation it cannot handle, or if it is asked to exit and there is no .s6-svscan/finish file, it will try to execute a file named crash, also expected to be in the .s6-svscan control subdirectory. This is also done using execve(), so no new process will be created, and .s6-svscan/crash will have the same process ID as s6-svscan. If there is no .s6-svscan/crash file, s6-svscan will give up and exit with an exit code of 111.

s6-svscanctl can also be invoked in this abbreviated forms:

  • s6-svscanctl -0 (halt) is equivalent to s6-svscanctl -st.
  • s6-svscanctl -6 (reboot) is equivalent to s6-svscanctl -rt.
  • s6-svscanctl -7 (poweroff) is equivalent to s6-svscanctl -pt.
  • s6-svscanctl -8 (other) is equivalent to s6-svscanctl -0, but .s6-svscan/finish will be invoked with an 'other' argument instead of a 'halt' argument.
  • s6-svscanctl -i (interrupt) is equivalent to s6-svscanctl -6, and equivalent to sending s6-svscan a SIGINT signal, unless signal diversion is turned on.

Contents of the .s6-svscan subdirectory with example finish and crash files, once s6-svscan is running:

user $ls -l .s6-svscan
total 8
prw------- 1 user user  0 Jul 19 12:00 control
-rwxr-xr-x 1 user user 53 Jul 19 12:00 crash
-rwxr-xr-x 1 user user 72 Jul 19 12:00 finish
-rw-r--r-- 1 user user  0 Jul 19 12:00 lock
FILE .s6-svscan/finish
#!/bin/execlineb -S0
echo Executing .s6-svscan/finish with arguments $@
FILE .s6-svscan/crash
#!/bin/execlineb -S0
echo Executing .s6-svscan/crash

Messages sent by .s6-svscan/finish to s6-svscan's standard output as a result of different s6-svscanctl invocations:

user $s6-svscanctl -t .
Executing .s6-svscan/finish with arguments reboot
user $s6-svscanctl -st .
Executing .s6-svscan/finish with arguments halt
user $s6-svscanctl -7 .
Executing .s6-svscan/finish with arguments poweroff
user $s6-svscanctl -8 .
Executing .s6-svscan/finish with arguments other

Messages printed by s6-svscan on its standard error, and sent by .s6-svscan/crash to s6-svscan's standard output, as a result of invoking s6-svscanctl after deleting .s6-svscan/finish:

user $rm .s6-svscan/finish
user $s6-svscanctl -t .
s6-svscan: warning: unable to exec finish script .s6-svscan/finish: No such file or directory
s6-svscan: warning: executing into .s6-svscan/crash
Executing .s6-svscan/crash

s6-svscan's signal diversion feature

When s6-svscan is invoked with an -S option, or with neither an -s nor an -S option, and it receives a SIGINT, SIGHUP, SIGTERM or SIGQUIT signal, it behaves as if s6-svscanctl had been invoked with its scan directory pathname and an option that depends on the signal.

When s6-svscan is invoked with an -s option, signal diversion is turned on: if it receives any of the aforementioned signals, a SIGUSR1 signal, or a SIGUSR2 signal, s6-svscan tries to execute a file with the same name as the received signal, expected to be in the .s6-svscan control subdirectory of the scan directory (e.g. .s6-svscan/SIGTERM, .s6-svscan/SIGHUP, etc.). These files will be called diverted signal handlers, and are executed as a child process of s6-svscan. Just as run or finish files in a service directory, they can have any file format that the kernel knows how to execute, but are usually execline scripts. If the diverted signal handler corresponding to a received signal does not exist, the signal will have no effect. When signal diversion is turned on, s6-svscan can still be controlled using s6-svscanctl.

The best known use of this feature is to support the s6-rc service manager as an init system component when s6-svscan is running as process 1; see s6 and s6-rc-based init system.

Example .s6-svscan subdirectory with diverted signal handlers for SIGHUP, SIGTERM and SIGUSR1:

user $ls -l .s6-svscan
total 16
-rwxr-xr-x 1 user user 53 Jul 19 12:00 crash
-rwxr-xr-x 1 user user 72 Jul 19 12:00 finish
-rwxr-xr-x 1 user user 51 Jul 19 12:00 SIGHUP
-rwxr-xr-x 1 user user 52 Jul 19 12:00 SIGTERM
-rwxr-xr-x 1 user user 52 Jul 19 12:00 SIGUSR1
FILE .s6-svscan/SIGHUP
#!/bin/execlineb -P
echo s6-svscan received SIGHUP
FILE .s6-svscan/SIGTERM
#!/bin/execlineb -P
echo s6-svscan received SIGTERM
FILE .s6-svscan/SIGUSR1
#!/bin/execlineb -P
echo s6-svscan received SIGUSR1

Output of ps showing s6-svscan's process ID and arguments:

user $ps -o pid,args
 PID COMMAND
...
2047 s6-svscan -s
...

Messages printed to s6-svscan's standard output as a result of sending signals with the kill utility:

user $kill 2047
s6-svscan received SIGTERM
user $kill -HUP 2047
s6-svscan received SIGHUP
user $kill -USR1 2047
s6-svscan received SIGUSR1

Starting the supervision tree

From OpenRC

As of version 0.16, OpenRC provides a service script that can launch s6-svscan, also named s6-svscan. On Gentoo, the scan directory will be /run/openrc/s6-scan. This script exists to support the OpenRC-s6 integration feature, but can be used to just launch an s6 supervision tree when the machine boots by adding it to an OpenRC runlevel:

root #rc-update add s6-svscan default

Or it can also be started manually:

root #rc-service s6-svscan start
Note
The service script launches s6-svscan using OpenRC's start-stop-daemon program, so it will run unsupervised, and have its standard input, output and error redirected to /dev/null.

Because /run is a tmpfs, and therefore volatile, servicedir symlinks must be created in the scan directory each time the machine boots, before s6-svscan starts. The tmpfiles.d interface, which is supported by OpenRC using package opentmpfiles (sys-apps/opentmpfiles), can be used for this:

FILE /etc/tmpfiles.d/s6-svscan.conf
#Type Path Mode UID GID Age Argument
d /run/openrc/s6-scan
L /run/openrc/s6-scan/service1 - - - - /path/to/servicedir1
L /run/openrc/s6-scan/service2 - - - - /path/to/servicedir2
L /run/openrc/s6-scan/service3 - - - - /path/to/servicedir3

As an alternative, OpenRC's local service could be used to start the supervision tree when entering OpenRC's 'default' runlevel, by placing '.start' and '.stop' files in /etc/local.d (please read /etc/local.d/README for more details) that perform actions similar to those of the s6-svscan service script:

FILE /etc/local.d/s6-svscan.start
#!/bin/execlineb -P
# Remember to add --user if you don't want to run as root
start-stop-daemon --start --background --make-pidfile
   --pidfile /run/s6-svscan.pid
   --exec /bin/s6-svscan -- -S /path/to/scandir
FILE /etc/local.d/s6-svscan.stop
#!/bin/execlineb -P
start-stop-daemon --stop --retry 5 --pidfile /run/s6-svscan.pid

The -S option will explicitly disable signal diversion so that the SIGTERM signal that start-stop-daemon sends to s6-svscan will make it act as if an s6-svscanctl -rt command had been used.

And as another alternative, OpenRC's local service could be used to start the supervision tree when entering OpenRC's 'default' runlevel, with /service as the scan directory, using a '.start' file that calls the s6-svscanboot script provided as an example (see starting the supervision tree from sysvinit), instead of s6-svscan directly. This allows setting up a logger program to log messages sent by supervision tree processes to s6-svscan's standard output and error, provided a service directory for the logger exists in /service:

FILE /etc/local.d/s6-svscan.start
#!/bin/execlineb -P
# Remember to add --user if you don't want to run as root
# Remember to symlink /command to /bin
start-stop-daemon --start --background --make-pidfile
   --pidfile /run/s6-svscan.pid
   --exec /bin/s6-svscanboot
FILE /etc/local.d/s6-svscan.stop
#!/bin/execlineb -P
start-stop-daemon --stop --retry 5 --pidfile /run/s6-svscan.pid

From sysvinit

The s6 package provides a script called s6-svscanboot, that can be launched and supervised by sysvinit (sys-apps/sysvinit) by adding a 'respawn' line for it in /etc/inittab[3]. It is an execline script that launches an s6-svscan process, with its standard output and error redirected to /service/s6-svscan-log/fifo. This allows setting up a FIFO and a logger program to log messages sent by supervision tree processes to s6-svscan's standard output and error, with the the same technique used by s6 and s6-rc-based init systems. s6-svscan's standard input will be redirected to /dev/null. The enviroment will be emptied and then set according to the contents of environment directory /service/.s6-svscan/env, if it exists, with an s6-envdir invocation. The scan directory will be /service.

s6-svscanboot is provided as an example; it is the examples/s6-svscanboot file in the package's /usr/share/doc subdirectory. Users that want this setup will need to copy (and possibly uncompress) the script to /bin, manually edit /etc/inittab, and then call telinit:

FILE /etc/inittab
SV:12345:respawn:/bin/s6-svscanboot
root #telinit q

This will make sysvinit launch and supervise s6-svscan when entering runlevels 1 to 5. Because s6 and execline programs used in the script and invoked using absolute pathnames are asumed to be in directory /command, a symlink to the correct path for Gentoo must be created:

root #ln -s bin /command

An s6 service directory for the s6-svscan logger can be created with the s6-linux-init-maker program from package s6-linux-init (sys-apps/s6-linux-init):

root #s6-envuidgid user s6-linux-init-maker -l /service -U temp
root #cp -a temp/run-image/{service/s6-svscan-log,uncaught-logs} /service

The logger will be an s6-log process that logs to directory /service/uncaught-logs, prepending messages with a timestamp in external TAI64N format. Username user should be replaced by a valid account's username, to allow s6-log to run as an unprivileged process, and temp will be a temporary directory created by s6-linux-init-maker on the working directory, that can be removed once the necessary subdirectories are copied to /service.

The logging chain

A supervision tree where all leaf processes have a logger can be arranged into what the software package's author calls the logging chain[4], which he considers to be technically superior to the traditional syslog-based centralized approach[5].

Since processes in a supervision tree are created using the POSIX fork() call, each of them will inherit s6-svscan's standard input, output and error. A logging chain arrangement is as follows:

  • Leaf processes should normally have a logger, so their standard output and error connect to their logger's standard input. Therefore, all their messages are collected and stored in dedicated, per-service logs by their logger. Some programs might need to be invoked with special options to make them send messages to their standard error, and redirection of standard error to standard output (i.e. 2>&1 in a shell script or fdmove -c 2 1 in an execline script) must be performed in the servicedir's run file.
  • Leaf processes with a controlling terminal are an exception: their standard input, output and error connect to the terminal.
  • s6-supervise, the loggers, and leaf processes that exceptionally don't have logger for some reason, inherit their standard input, output and error from s6-svscan, so their messages are sent wherever the ones from s6-svscan are.
  • Leaf processes that still unavoidably report their messages using syslog() have them collected and logged by a (possibly supervised) syslog server.

s6 and s6-rc-based init systems are arranged in such a way that s6-svscan's messages are collected by a catch-all logger, and that logger's standard error is redirected to /dev/console.

The notification framework

Notification is a mechanism by which a process can become instantly aware that a certain event has happened, as opposed to the process actively and periodically checking whether it happened (which is called polling)[6]. The s6 package provides a general notification framework that doesn't rely on a long-lived process (e.g. a bus daemon), so that it can be integrated with its supervision suite. The notification framework is based instead on FIFO directories.

FIFO directories and related tools

A FIFO directory (or fifodir) is a directory in the filesystem asociated with a notifier, a process in charge of notifying other processes about some set of events. As the name implies, the directory contains FIFOs, each of them associated with a listener, a process that wants to be notified about one or more events. A listener creates a FIFO in the fifodir and opens it for reading, this is called subscribing to the fifodir. When a certain event happens, the notifier writes to each FIFO in the fifodir. Written data is conventionally a single character encoding the identity of the event. Listeners wait for notifications using some blocking I/O call on the FIFO; unblocking and successfully reading data from it is their notification. A listener that no longer wants to receive notifications removes its FIFO from the fifodir, this is called unsubscribing.

FIFOs and FIFO directories need a special ownership and permission setup to work. The owner of a fifodir must be the notifier's effective user. A publically accesible fifodir can be subscribed to by any user, and its permissions must be 1733 (i.e. the fifodir shows up as 'drwx-wx-wt' in the output of ls -l). A restricted fifodir can be subscribed to only by members of the fifodir's group, and its permissions must be 3730 (i.e. the fifodir shows up as 'drwx-ws--T' in the output of ls -l). The owner of a FIFO in the fifodir must be the corresponding listener's effective user, and its permissions must be 0622 (i.e. the FIFO shows up as 'prw--w--w-' in the output of ls -l). Complete information about the FIFO directory internals is available here.

s6 provides an s6-mkfifodir program that creates a FIFO directory with correct ownership and permissions. It accepts the pathname of the fifodir. A restricted fifodir is created by specifying the -g option followed by a numeric group ID, which s6-mkfifodir's effective user must be a member of. s6-mkfifodir without a -g option creates a publically accesible fifodir. A fifodir can be removed with an rm -r command. There is also a s6-cleanfifodir program that accepts the pathname of a fifodir and removes all FIFOs in it that don't have an active listener. Its effective user must be a member of the fifodir's group. In the normal case FIFOs are removed when the corresponding listener unsubscribes, so s6-cleanfifodir is a cleanup tool for cases when this fails (e.g. the listener was killed by a signal). For further information about s6-mkfifodir and s6-cleanfifodir please consult the HTML documentation in the package's /usr/share/doc subdirectory.

The s6-ftrig-notify program allows notifying all subscribers of a fifodir, so it can be used to create a notifier program. It accepts the pathname of a fifodir and a message that is written as-is to all FIFOs in the fifodir. Each character in the message is assumed to encode an event, and the character sequence should reflect the events sequence. The s6-ftrig-wait program allows subscription to a fifodir and waiting for a notification, so it can be used to create a listener program. It accepts the pathname of a fifodir and a POSIX extended regular expression (like those of the grep -E command), creates a FIFO in the fifodir with correct ownership and permissions, and waits until it reads a sequence of characters that match the regular expression. Then it unsubcribes from the fifodir by removing the FIFO, prints the last character read from it to its standard output, and exits. For further information about s6-ftrig-notify and s6-ftrig-wait please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Because performing an action that might trigger an event recognized by a notifier, and subscribing to its fifodir to be notified of the event is susceptible to races that might lead to missing the notification, s6 provides two additional programs, s6-ftrig-listen and s6-ftrig-listen1. s6-ftrig-listen is a program that accepts options, a set of fifodir pathname and extended regular expression pairs, a program name and its arguments. It subscribes to each specified fifodir, runs the program as a child process with the supplied arguments, and waits for notifications. It makes sure that the program is executed after there are listeners reading from their FIFOs.

s6-ftrig-listen expects its arguments to be in the format execline's execlineb program generates when parsing the block syntax, so the forward compatible way to use it is in an execline script or execlineb -c command: the invocation can be written using a the syntax s6-ftrig-listen { f1 re1 f2 re2 ... } prog args, where f1, f2, ... are the fifodir pathnames, re1, re2, ... are the regular expressions corresponding to f1, f2, ..., respectively, prog is the program name and args, the program's arguments. If s6-ftrig-listen is invoked with an -o option (or), it will unsubscribe from all fifodirs and exit when it reads a matching sequence of characters from any of the created FIFOs. If s6-ftrig-listen is invoked without an -o option, or with an explicit -a option (and), it will wait until it reads a matching sequence from every FIFO. The s6-ftrig-listen1 program is a single fifodir and regular expression version of s6-ftrig-listen that doesn't need execlineb-encoded arguments, and that prints the last character read from the created FIFO to its standard output. For further information about s6-ftrig-listen and s6-ftrig-listen1 please consult the HTML documentation in the package's /usr/share/doc subdirectory.

A timeout can be set for s6-ftrig-wait, s6-ftrig-listen and s6-ftrig-listen1 by specifying a -t option followed by a time value in milliseconds. The programs exit with an error status it they haven't been notified about the desired events after the specified time.

The fifodir and notification management code are implemented in the s6 package's library, libs6, and an internal helper program, s6-ftrigrd. The library exposes a public C language API than can be used by programs; for details about the API for notifiers see here, and for details about the API for listeners see here. s6-ftrigrd is launched by the library code.

Creating a publically accesible fifodir named fifodir1 and a fifodir restricted to members of group user (assumed to have group ID 1000) named fifodir2:

user $s6-mkfifodir fifodir1
user $s6-mkfifodir -g 1000 fifodir2
user $ls -ld fifodir*
drwx-wx-wt 2 user user 4096 Aug  2 12:00 fifodir1
drwx-ws--T 2 user user 4096 Aug  2 12:00 fifodir2

Creating listeners that subscribe to fifodir1 and wait for event sequences 'message1' and 'message2', respectively, as background processes:

user $s6-ftrig-wait fifodir1 message1 &
user $s6-ftrig-wait -t 20000 fifodir1 message2 &
user $ls -l fifodir1
total 0
prw--w--w- 1 user user 0 Aug  2 21:44 ftrig1:@40000000598272220ea9fa39:-KnFNSkhmW1pQPY0
prw--w--w- 1 user user 0 Aug  2 21:46 ftrig1:@400000005982728b3a8d09c2:_UjWhNPn3Z0Q_VFQ

This shows that a FIFO has been created in the fifodir for each s6-ftrig-wait process, with names starting with 'ftrig1:'.

user $ps f -o pid,ppid,args
 PID  PPID COMMAND
...
2026  2023 \_ bash
2043  2026     \_ s6-ftrig-wait fifodir1 message1
2044  2043     |   \_ s6-ftrigrd
2051  2026     \_ s6-ftrig-wait -t 20000 fifodir1 message2
2052  2051         \_ s6-ftrigrd
...
s6-ftrig-wait: fatal: unable to match regexp on message2: Connection timed out

The output of ps shows that each s6-ftrig-wait process has spawned a child s6-ftrigrd helper, and because the one waiting for event sequence 'message2' has a timeout of 20 seconds ("-t 20000"), after that time has elapsed whithout getting the expected notifications it unsubscribes, and exits with an error status that is printed on the shell's terminal ("Connection timed out").

user $ls -l fifodir1
total 0
prw--w--w- 1 user user 0 Aug  2 21:44 ftrig1:@40000000598272220ea9fa39:-KnFNSkhmW1pQPY0

This shows that the s6-ftrig-wait process without a timeout is still running, and its FIFO is still there. Notifying all fifodir1 listeners about event sequence 'message1':

user $s6-ftrig-notify fifodir1 message1
1

The '1' printed on the shell's terminal after the s6-ftrig-notify invocation is the last event the s6-ftrig-wait process was notified about (i.e. the last character in string 'message1'), which then exits because the notifications have matched its regular expression.

user $ls -l fifodir1
total 0

This shows that since all listeners have unsubscribed, the fifodir is empty.

FILE test-scriptExample execline script for testing s6-ftrig-listen
#!/bin/execlineb -P
foreground {
   s6-ftrig-listen -o { fifodir1 message fifodir2 message }
   foreground { ls -l fifodir1 fifodir2 }
   foreground { ps f -o pid,ppid,args }
   s6-ftrig-notify fifodir1 message
}
echo s6-ftrig-listen exited

Executing the example script:

user $./test-script
fifodir1:
total 0
prw--w--w- 1 user user 0 Aug  2 22:28 ftrig1:@4000000059827c60124f916d:51Xhg7STswW-yFst

fifodir2:
total 0
prw--w--w- 1 user user 0 Aug  2 22:28 ftrig1:@4000000059827c601250c752:oXikN3Vko3JipuvU
 PID  PPID COMMAND
...
2176  2026 \_ foreground  s6-ftrig-listen ...
2177  2176     \_ s6-ftrig-listen -o  fifodir1 ...
2178  2177         \_ s6-ftrigrd
2179  2177         \_ foreground  ps ...
2181  2179             \_ ps f -o pid,ppid,args
...
s6-ftrig-listen exited

The output of ls shows that two listeners were created, one subscribed to fifodir1 and the other to fifodir2, and the output of ps shows that both are implemented by a single s6-ftrigrd process that is a child of s6-ftrig-listen. It also shows that s6-ftrig-listen has another child process, executing (at that time) the execline foreground program, which in turn has spawned the ps process. After that, foreground replaces itself with s6-ftrig-notify, which notifies all fifodir1 listeners about event sequence 'message'. Because s6-ftrig-listen was invoked with an -o option, and the fifodir1 listener got notifications that match its regular expression, s6-ftrig-listen exits at that point ("s6-ftrig-listen exited").

user $ls fifodir*
fifodir1:
total 0

fifodir2:
total 0

This shows that the listener subscribed to fifodir2 has unsubscribed and exited, even if it didn't get the expected notifications.

Modifying the test script to invoke s6-ftrig-listen with the -a option instead (i.e. as s6-ftrig-listen -a { fifodir1 message fifodir2 message }) and reexecuting it in the background:

user $./test-script &
fifodir1:
total 0
prw--w--w- 1 user user 0 Aug  2 22:56 ftrig1:@40000000598282e4210384d5:wikPBCD-Aw5Erijp

fifodir2:
total 0
prw--w--w- 1 user user 0 Aug  2 22:56 ftrig1:@40000000598282e42104bc57:Yop6JbMNBJo1r-uI
 PID  PPID COMMAND
...

The output of the script does not have a "s6-ftrig-listen exited" message, so it is still running:

user $ls -l fifodir*
fifodir1:
total 0

fifodir2:
total 0
prw--w--w- 1 user user 0 Aug  2 22:56 ftrig1:@40000000598282e42104bc57:Yop6JbMNBJo1r-uI

This confirms that the listener subscribed to fifodir2 is still running, waiting for events. Notifying all fifodir2 listeners about event sequence 'message':

user $s6-ftrig-notify fifodir2 message
s6-ftrig-listen exited

This shows that once the remaining listener has gotten notifications that match its regular expression, s6-ftrig-listen exits.

The process supervision suite's use of notification

The event subdirectory of an s6 service directory is a fifodir used by s6-supervise to notify interested listeners about its supervised process' state changes. That is, s6-supervise acts as the notifier associated with the event fifodir, and writes a single character to each FIFO in it when there is a state change:

  • At program startup, after creating event if it doesn't exist, s6-supervise writes an s character (start event).
  • Each time s6-supervise spawns a child process executing the run file, it writes a u character (up event).
  • If the supervised process supports readiness notification, s6-supervise writes a U character (up and ready event) when the child process notifies its readiness.
  • If the service directory contains a finish file, and, when executed, exits with exit code 125 (permanent failure), s6-supervise writes an O character (once event, the character is a capital 'o').
  • Each time the supervised process stops running, s6-supervise writes a d character (down event).
  • If the service directory contains a finish file, s6-supervise writes a D character (really down) each time finish exits or is killed. Otherwise, s6-supervise writes the character right after the down event notification.
  • When s6-supervise is about to exit normally, it writes an x character (exit event) after the supervised process stops and it has notified listeners about the really down event.

s6 provides an s6-svwait program, that is a process supervision-specific notification tool. It accepts service directory pathnames and options that specify an event to wait for. At program startup, for each specified servicedir it checks the status file in its supervise control subdirectory to see if the corresponding supervised process is already in the state implied by the specified event, and if not, it subscribes to the event fifodir and waits for notifications from the corresponding s6-supervise process. A -u option specifies an up event, a -U option, an up and ready event, a -d option, a down event, and a -D option, a really down event. Options -a and -o work as for s6-ftrig-listen.

There is also an s6-svlisten program, that is a process supervision-specific version of s6-ftrig-listen. It accepts servicedir pathnames in the format execline's execlineb program generates when parsing the block syntax, a program name and its arguments, and options that specify an event to wait for. Therefore, the forward compatible way to use it is in an execline script or execlineb -c command: the invocation can be written using the syntax s6-svlisten { s1 s2 ... } prog args, where s1, s2, ... are the servicedir pathnames, prog is the program name and args are the program's arguments. Options -u, -U, -d and -D work as for s6-svwait. Options -a and -o work as for s6-ftrig-listen. s6-svlisten also accepts an -r option (restart event) that makes it wait for a down event followed by an up event, and a -R option (restart and ready event) that makes it wait for a down event followed by an up and ready event. The s6-svlisten1 program is a single servicedir version of s6-svlisten that doesn't need execlineb-encoded arguments.

s6-svwait, s6-svlisten and s6-svlisten1 accept a -t option to specify a timeout in the same way as s6-ftrig-wait. For further information about these programs please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Finally, the s6-svc program accepts a -w option that makes it wait for notifications from the s6-supervise process corresponding to the service directory specified as argument, after asking it to perform an action on its child process. An s6-svc -wu, s6-svc -wU, s6-svc -wd, s6-svc -wD, s6-svc -wr or s6-svc -wR command is equivalent to an s6-svlisten1 -u, s6-svlisten1 -U, s6-svlisten1 -d, s6-svlisten1 -D, s6-svlisten1 -r or s6-svlisten1 -R command, respectively, specifying the same servicedir, and s6-svc with the same arguments except the -w option as the spawned program. s6-svc also accepts a timeout specified with a -T option, that is translated to the s6-svlisten1 -t option.

See the service readiness notification section for usage examples.

Service readiness notification

When a process is supervised, it transitions to the 'up' state when its supervisor has successfully spawned a child process executing the run file. s6-supervise considers this an up event, and notifies all listeners subscribed to the corresponding event fifodir about it. But when the supervised process is executing a server program for example, it might not be ready to provide its service immediately after startup. Programs might do initialization work that could take some noticeable time before they are actually ready to serve, but it is impossible for the supervisor to know exactly how much. Because of this, and because the kind of initialization to do is program-specific, some sort of collaboration from the supervised process is needed to help the supervisor know when it is ready[7]. This is called readiness notification.

systemd has the concept of readiness notification, called start-up completion notification in its documentation. To support readiness notification under systemd, a program implements the $NOTIFY_SOCKET protocol, based on message passing over a datagram mode UNIX domain socket, bound to a pathname specified as the value of the NOTIFY_SOCKET environment variable. The protocol is implemented by libsystemd's sd_..._notify...() family of functions, although it is covered by systemd's interface stability promise, so it is possible to have alternative implementations of it. The program can perform start-up completion notification by linking to libsystemd and calling one of those functions. systemd uses start-up completion notification when a service unit file contains a 'Type=notify' directive.

To support readiness notification under s6, a program implements the s6 readiness notification protocol, which works like this:

  1. At program startup, the program expects to have a file descriptor open for writing, associated with a notification channel. The program chooses the file descriptor. For example, it can be specified as a program argument, or be a fixed, program-specific well-know number specified in the program's documentation.
  2. When all initialization work necessary to reach the program's definition of 'service ready state' has been completed, it writes a newline character to the notification channel.
  3. The program closes the notification channel after writing to it.

Therefore, a typical code snippet in the C language that implements the last two steps would be as follows:

CODE
/* notification_fd is an int object storing the notification channel's file descriptor */
write(notification_fd, "\n", 1);
close(notification_fd);

The code only relies on POSIX calls, so the program doesn't need to link to any specific library other than the libc to implement the readiness protocol. s6 uses readiness notification when a regular file named notification-fd is present in a service directory, containing an integer that specifies the program's chosen notification channel file descriptor. s6-supervise implements the notification channel as a pipe between the supervised process and itself; when it receives a newline character signalling the service's readiness, it considers that an up and ready event and notifies all listeners subscribed to the event fifodir about it. After that, s6-supervise no longer reads from the notification pipe, so it can be safely closed by the child process.

Example s6 scan directory containing services that support readiness notification:

user $s6-mkfifodir test-service1/event
user $ls -l *
test-service1:
total 12
-rw-r--r-- 1 user user    0 Jul 30 12:00 down
drwx-wx-wt 2 user user 4096 Jul 30 12:00 event
-rwxr-xr-x 1 user user   29 Jul 30 12:00 finish
-rwxr-xr-x 1 user user   32 Jul 30 12:00 run

test-service2:
total 8
-rw-r--r-- 1 user user  0 Jul 30 12:00 down
-rw-r--r-- 1 user user  2 Jul 30 12:00 notification-fd
-rwxr-xr-x 1 user user 39 Jul 30 12:00 run

test-service3:
total 16
-rw-r--r-- 1 user user  0 Jul 30 12:00 down
-rwxr-xr-x 1 user user 29 Jul 30 12:00 finish
-rw-r--r-- 1 user user  2 Jul 30 12:00 notification-fd
-rwxr-xr-x 1 user user 39 Jul 30 12:00 run
-rw-r--r-- 1 user user  6 Jul 30 12:00 timeout-finish
FILE test-service1/run
#!/bin/execlineb -P
test-daemon
FILE test-service1/finish
#!/bin/execlineb -P
exit 125
FILE test-service2/run
#!/bin/execlineb -P
test-daemon --s6=5
FILE test-service2/notification-fd
5
FILE test-service3/run
#!/bin/execlineb -P
test-daemon --s6=5
FILE test-service3/notification-fd
5
FILE test-service3/finish
#!/bin/execlineb -P
sleep 10
FILE test-service3/timeout-finish
20000

It is assumed that test-daemon is a program that supports an --s6 option to turn readiness notification on, specifying the notification channel's file descriptor (5), which is also stored in a notification-fd file. test-service1/finish exits with an exit code of 125, so that if the corresponding test-daemon process stops, it won't be restarted. The s6-mkfifodir invocation creates test-service1/event as a publically accesible fifodir. Using s6-ftrig-listen1 on it to start the supervision tree and verify that s6-supervise notifies listeners about the start event:

user $s6-ftrig-listen1 test-service1/event s s6-svscan
s
user $ls -ld */event
drwx-wx-wt 2 user user 4096 Jul 30 12:22 test-service1/event
drwx-ws--T 2 user user 4096 Jul 30 12:22 test-service2/event
drwx-ws--T 2 user user 4096 Jul 30 12:22 test-service3/event

This shows that s6-supervise has created all missing event directories as restricted fifodirs, but uses the publicly accessible one created by s6-mkfifodir.

FILE test-scriptExample execline script for testing s6-svwait
#!/bin/execlineb -P
foreground { s6-svwait -u test-service1 }
echo s6-svwait exited

Executing the example script:

user $../test-script &
user $ps xf -o pid,ppid,args
 PID  PPID COMMAND
...
2166  2039 \_bash
2387  2166    \_ foreground  s6-svwait ...
2388  2387        \_ s6-svwait -u test-service1
2389  2388            \_ s6-ftrigrd
...
user $ls -l test-service1/event
total 0
prw--w--w- 1 user user 0 Jul 30 12:22 ftrig1:@40000000597df9d12c8328da:v84Zc_E_LyaqxlDh

This shows that the s6-svwait process has spawned a child s6-ftrigrd helper, and created a FIFO in test-service1/event so that it can be notified about the up event. Manually starting test-service1/run:

user $s6-svc -u test-service1
s6-svwait exited

The message printed by the test script to its standard output shows that the s6-svwait process got the expected notification, so it exited.

FILE test-scriptExample execline script for testing up and ready event notifications
#!/bin/execlineb -P
define -s services "test-service2 test-service3"
foreground {
   s6-svlisten -U { $services }
   foreground {
      forx svc { $services }
         importas svc svc
         foreground { s6-svc -wu -u $svc }
         pipeline { echo s6-svc -wu -u $svc exited } s6-tai64n
   }
   ps xf -o pid,ppid,args
}
pipeline { echo s6-svlisten -U exited } s6-tai64n

The script calls s6-svlisten to subscribe to fifodirs test-service2/event and test-service3/event and wait for up and ready events. Then it uses a s6-svc -wu -u command to manually start test-service2/run and test-service3/run, and wait for up events. Both run scripts invoke test-daemon with readiness notification on. A message timestamped using s6-tai64n is printed to the standard output when the listeners get their expected notifications. Executing the example script:

user $../test-script | s6-tai64nlocal
2017-07-30 19:45:38.458536857 s6-svc -wu -u test-service2 exited
2017-07-30 19:45:38.467353962 s6-svc -wu -u test-service3 exited
 PID  PPID COMMAND
2379  2378 \_ foreground  s6-svlisten  -U ...
2381  2379     \_ s6-svlisten -U ...
2382  2381         \_ s6-ftrigrd
2383  2381         \_ ps xf -o pid,ppid,args
2017-07-30 19:45:48.472237201 s6-svlisten -U exited

This shows that the s6-svc processes waiting for up events are notified first, so they exit, and that the s6-svlisten process waiting for up and ready events is notified 10 seconds later. The output of ps shows that when the s6-svc processes exited, the s6-svlisten process and its s6-ftrigrd child were still running.

user $for i in *; do printf "$i: %sn" "$(s6-svstat $i)"; done
test-service1: up (pid 2124) 42 seconds, normally down
test-service2: up (pid 2332) 29 seconds, normally down, ready 19 seconds
test-service3: up (pid 2338) 29 seconds, normally down, ready 19 seconds

This confirms that both test-daemon processes have notified readiness to their s6-supervise parent ("ready 19 seconds") 10 seconds after being started. Using s6-ftrig-listen1 on fifodir test-service1/event to verify that s6-supervise notifies listeners about a once event when test-daemon is killed with a SIGTERM, because of test-service1/finish's exit code:

user $s6-ftrig-listen1 test-service1/event O s6-svc -t test-service1
O
FILE test-scriptExample execline script for testing really down event notifications
#!/bin/execlineb -P
define -s services "test-service2 test-service3"
foreground {
   s6-svlisten -d { $services }
   forx svc { $services }
      importas svc svc
      foreground { s6-svc -wD -d $svc }
      pipeline { echo s6-svc -wD -d $svc exited } s6-tai64n
}
foreground {
   pipeline { echo s6-listen -d exited } s6-tai64n
}
ps xf -o pid,ppid,args

The script calls s6-svlisten to subscribe to fifodirs test-service2/event and test-service3/event and wait for down events. Then it uses a s6-svc -wD -d command to manually stop the test-daemon processes corresponding to test-service2 and test-service3, and wait for really down events. test-service3 has a finish script that sleeps for 10 seconds, so test-service2/event listeners should be notified earlier than test-service3/event listeners. A message timestamped using s6-tai64n is printed to the standard output when the listeners get their expected notifications. Executing the example script:

user $../test-script | s6-tai64nlocal
2017-07-30 22:23:17.063815232 s6-svc -wD -d test-service2 exited
2017-07-30 22:23:17.071855769 s6-listen -d exited
 PID  PPID COMMAND
2326     1 forx svc  test-service2  test-service3 ...
2333  2326  \_ foreground  s6-svc  -wD  ...
2334  2333      \_ s6-svlisten1 -D -- test-service3 s6-svc -d -- test-service3
2335  2334          \_ s6-ftrigrd
2017-07-30 22:23:27.078874158 s6-svc -wD -d test-service3 exited

This shows that the s6-svlisten process waiting for down events and the s6-svc process subscribed to test-service2/event and waiting for a really down event are notified first with almost no delay between them, so they exit, and that the s6-svc process subscribed to test-service3/event and waiting for a really down event is notified 10 seconds later. The output of ps shows that when the s6-svlisten process exited, an s6-svc process that had replaced itself with s6-svlisten1 (because of the -w option) and its s6-ftrigrd child were still running.

user $for i in *; do printf "$i: %sn" "$(s6-svstat $i)"; done
test-service1: down (signal SIGTERM) 83 seconds, ready 83 seconds
test-service2: down (exitcode 0) 31 seconds, ready 31 seconds
test-service3: down (exitcode 0) 31 seconds, ready 21 seconds

This confirms that the test-daemon process corresponding to test-service1 hasn't been restarted after test-service1/finish exited (83 seconds in down state and no 'wanted up'), and that the down and ready events for the test-daemon processes corresponding to test-service2 and test-service3 have a 10 seconds delay between them ("ready 21 seconds" compared to "ready 31 seconds"). Using s6-ftrig-listen1 on fifodir test-service2/event to stop the supervision tree and verify that s6-supervise notifies listeners about the exit event:

user $s6-ftrig-listen1 test-service2/event x s6-svscanctl -t .
x

The UNIX domain super-server and related tools

s6 provides two programs, s6-ipcserver-socketbinder and s6-ipcserverd, that together implement a UNIX domain super-server. A UNIX domain super-server creates a listening sream mode UNIX domain socket (i.e. a SOCK_STREAM socket for address family AF_UNIX) and spawns a server program to handle each incoming connection after accepting it, so that from there on, the client communicates over the connection with the spawned server. This a UNIX domain equivalent of a TCP/IP super-server, such as xinetd (sys-apps/xinetd) or ipsvd (net-misc/ipsvd).

More specifically, what s6-ipcserver-socketbinder and s6-ipcserverd implement together is an IPC UCSPI super-server, i.e. a super-server that adheres to the server side of Daniel J. Bernstein's UNIX Client-Server Program Interface (UCSPI) and supports the IPC UCSPI protocol[8]. The UCSPI defines an interface for client-server communications tools; UCSPI tools are executable programs that accept options, a protocol-specific address, an application name and its arguments. Tools can be either clients or servers, clients communicate with servers using a connection. If the UCSPI tool is a server, the application is invoked with the supplied arguments each time there is an incoming connection to the specified address, with file descriptor 0 opened for reading from the connection, file descriptor 1 opened for writing to the connection, and environment variables set to defined values. If the UCSPI tool is a client, a connection is made to the specified address and, if successful, the application is invoked with the supplied arguments, with file descriptor 6 opened for reading from the connection, file descriptor 7 opened for writing to the connection, and environment variables set to defined values. One of the application environment variables set by both UCSPI clients and servers is PROTO, with the name of the supported protocol as its value. The protocol implemented by the s6 programs is the IPC UCSPI protocol, for which the address specified to UCSPI tools is defined to be a UNIX domain socket pathname, and the value of PROTO is defined to be 'IPC'.

s6-ipcserver-socketbinder is a chain loading program that accepts options and a pathname. It creates a UNIX domain socket, binds it to the specified pathname and prepares it to accept connections with the POSIX listen() call. The next program in the chain will have its standard input connected to the listening socket, which will be non-blocking(O_NONBLOCK). The number of backlog connections, i.e. the number of outstanding connections in the socket's listen queue, can be set with the -b option; additional connection attempts will be rejected by the kernel. If s6-ipcserver-socketbinder is invoked with a -b 0 argument, the socket will be created but won't be listening, i.e. listen() won't be called. If it is invoked with the -m option, the created socket will be a datagram mode socket (SOCK_DGRAM), which also requires specifying a -b 0 argument. If it is invoked without the -m option, or with the -M option, the created socket will be a stream mode socket (SOCK_STREAM).

s6-ipcserverd is a program that must have its standard input redirected to a bound and listening stream mode UNIX domain socket, and accepts a program name and its arguments. For each connection made to the socket, s6-ipcserverd executes the program with the supplied arguments as a child process, that has its file descriptors 0 and 1 redirected, on Gentoo, to the socket returned by a Linux accept4() with the listening socket's file descriptor as an argument, and the following environment variables set to the appropriate values (see environment variables): PROTO, IPCREMOTEPATH, IPCREMOTEEUID, IPCREMOTEEGID and IPCCONNNUM. On Gentoo, the variables IPCREMOTEEUID, IPCREMOTEEGID are set with information obtained from a POSIX getsockopt() call with the Linux SO_PEERCRED option; this is called credentials lookup. If s6-ipcserverd is invoked with a -P option, credentials lookup will be disabled, and IPCREMOTEEUID and IPCREMOTEEGID will be unset. If it is invoked without a -P option, or with a -p option, credentials lookup will be enabled. s6-ipcserverd supports the s6 readiness protocol, and if it is invoked with a -1 option, it will turn readiness notification on, with file descriptor 1 (i.e. its standard output) as the notification channel's file descriptor. If it receives a SIGTERM signal it will exit, but its children will continue running. If it receives a SIGQUIT signal, it will send its children a SIGTERM signal followed by a SIGCONT signal and then exit, and if it receives a SIGABRT signal, it will send its children a SIGKILL signal and then exit. It is possible to make s6-ipcserverd kill its children without exiting (with a SIGTERM signal followed by a SIGCONT signal), by sending it a SIGHUP signal.

s6-ipcserver is a helper program that accepts options, a UNIX domain socket pathname, a program name and its arguments, and invokes s6-ipcserver-socketbinder chained to s6-ipcserverd, or s6-ipcserver-socketbinder chained to s6-applyuidgid, chained to s6-ipcserverd, depending on the options. The socket pathname is passed to s6-ipcserver-socketbinder, and the program name and its arguments, to s6-ipcserverd. s6-ipcserver options specify corresponding s6-ipcserver-socketbinder, s6-applyuidgid and s6-ipcserverd options. The created socket is a stream mode socket. For further information about s6-ipcserver, s6-ipcserver-socketbinder or s6-ipcserverd please consult the HTML documentation in the package's /usr/share/doc subdirectory.

s6 also provides an IPC UCSPI client, s6-ipcclient, and an access control tool for UNIX domain sockets, s6-ipcserver-access. s6-ipcclient is a chain loading program that accepts options and a UNIX domain socket pathname. It creates a stream mode socket, makes a connection to the socket specified by the supplied pathname, and executes the next program in the chain with file descriptors 6 and 7 redirected to the local connected socket, and the environment variables PROTO and IPCLOCALPATH set to the appropriate values (see environment variables). If s6-ipcclient is invoked with a -p pathname argument, it will bind the created socket to pathname (using the POSIX bind() call) before initiating the connection to the remote socket. If it is invoked with a -l value argument, it will set IPCLOCALPATH to value instead of setting it with information obtained from a POSIX getsockname() call. s6-ipcserver-access is a chain loading program that must be spawned by a UCSPI server (like s6-ipcserverd) that appropriately sets the PROTO, ${PROTO}REMOTEEUID and ${PROTO}REMOTEEGID environment variables, where ${PROTO} is the value of the PROTO variable. It decides whether or not to execute the next program in the chain, or to execute a completely different program instead, based on either a rules directory or a rules file. The -i option specifies the pathname of a rules directory and the -x option specifies the pathname of a rules file. A rules file is a constant database (CDB) file created from a rules directory using the s6-accessrules-cdb-from-fs program.

When a rules directory R is specified to s6-ipcserver-access:

  • On program invocation, s6-ipcserver-access searches the R/uid directory for a subdirectory with a name that matches the value of the ${PROTO}REMOTEEUID variable, which must be a numeric user ID. If it finds a matching subdirectory, and it contains a file named allow, s6-ipcserver-access executes the next program in the chain. If the subdirectory does not contain a file named allow, but contains a file named deny, or does not contain any of those files, it exits with exit code 1, and the next program in the chain is not executed.
  • If R/uid does not contain a subdirectory with a matching name, s6-ipcserver-access will then search the R/gid directory for a subdirectory with a name that matches the value of the ${PROTO}REMOTEEGID variable, which must be a numeric group ID. If it finds finds a matching subdirectory, it follows the procedure described for the R/uid directory.
  • If R/gid does not contain a subdirectory with a matching name, s6-ipcserver-access will finally search for an R/default directory. If the directory exists, it follows the procedure described for the R/uid directory. If it doesn't, it exits with exit code 1, and the next program in the chain is not executed.
  • If the next program in the chain is executed, and if the matching directory M contains a subdirectory named env, s6-ipcserver-access will modify its environment as if an s6-envdir M/env command had been used.
  • If the next program in the chain is executed, s6-ipcserver-access will also set the IPCLOCALPATH environment variable (see environment variables), unless it was invoked with an -E option. If s6-ipcserver-access is invoked without a -E option, or with a -e option, IPCLOCALPATH will be set.
  • If the next program in the chain is executed, and s6-ipcserver-access was invoked with an -E option, it will also unset environment variables PROTO, ${PROTO}REMOTEPATH, ${PROTO}REMOTEEUID, ${PROTO}REMOTEEGID and ${PROTO}CONNNUM.
  • If the next program in the chain would otherwise be executed, but the matching directory M contains a regular file named exec, s6-ipcserver-access will execute a different program instead, as if an execlineb -c contents command had been used, where contents is the contents of the M/exec file (e.g. the name of a program that can be found via PATH search plus its arguments).

If s6-ipcserver-access is invoked with neither the -i option nor the -x option, it will execute the next program in the chain, i.e. it will unconditionally grant access.

A rules directory can be re-created from a rules CDB file using the s6-accessrules-fs-from-cdb program. For the full description of s6-ipcclient's, s6-accessrules-cdb-from-fs's, s6-accessrules-fs-from-cdb's and s6-ipcserver-access's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Finally, s6 also provides the s6-connlimit program, a chain loading program that limits connections from the same client to an UCSPI server based on the PROTO, ${PROTO}CONNNUM, ${PROTO}CONNMAX environment variables (see environment variables), where ${PROTO} is the value of the PROTO variable, and s6-ioconnect, a program that performs data transmission from file descriptor 0 to file descriptor 7, and from file descriptor 6 to file descriptor 1, all of them assumed to be open at program startup. That is, s6-ioconnect performs full-duplex data transmission. For the full description of s6-connlimit's and s6-ioconnect's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

FILE test-server.cxxExample C++ aplication to be executed by a IPC UCSPI server
#include <cerrno>
#include <cinttypes>
#include <cstring>
#include <iostream>
#include <pwd.h>
#include <sstream>
#include <string>
#include <unistd.h>

const int ucspi_server_read = 0;
const int ucspi_server_write = 1;

extern "C" void ignore_sigpipe();

void read_from_socket(char *buffer, const size_t buffer_size) {
   size_t n = 0;
   while (ssize_t r = read(ucspi_server_read, buffer + n, buffer_size - n)) {
      if (r < 0) throw errno;
      n += r;
      if (n == buffer_size) break;
   }
   buffer[n] = 0;
}

void write_to_socket() {
   std::ostringstream out;
   const char *env_uid = getenv("IPCREMOTEEUID");
   const passwd *acct = env_uid? getpwuid(static_cast<uid_t>(std::strtoimax(env_uid, 0, 10))): nullptr;
   out << "Server process created with PID " << getpid() <<
      ", client is \"" << (acct? acct->pw_name: "&lt;unavailable&gt;") << "\"\n";
   if (write(ucspi_server_write, out.str().data(), out.str().length()) < 0) throw errno;
}

int main(){
try {
   ignore_sigpipe();
   char buffer[] = "Hello!";
   const std::string greeting(buffer);
   read_from_socket(buffer, greeting.length());
   if (greeting == buffer) write_to_socket();
   sleep(10);
   return 0;
}
catch (int err) {
   std::cerr << "test-server: fatal: " << std::strerror(err) << '\n';
   return 1;
}
catch (...) {
   return 1;
}
}

The application reads from the open file descriptor supplied by the UCSPI server, expecting to receive a 'Hello!' message from the client, and if it does, it sends a response that contains the application's process ID and the account database username corresponding to the client's user ID, supplied by the UCSPI server via the IPCREMOTEEGID environment variable. The application then waits for 10 seconds, and finally exits. Because a write() call on a socket file descriptor can fail with a 'broken pipe' error (errno == EPIPE), and because the kernel will send a SIGPIPE signal to the process if it does, an external ignore_sigpipe() function assumed to be available to the program is called, that makes the process ignore the signal using the POSIX sigaction() call.

FILE test-client.cxxExample C++ aplication to be executed by a IPC UCSPI client
#include <cerrno>
#include <cstring>
#include <iostream>
#include <string>
#include <unistd.h>

const int ucspi_client_read = 6;
const int ucspi_client_write = 7;

extern "C" void ignore_sigpipe();

void read_from_socket(std::string &in) {
   const int buffer_size(100);
   char buffer[buffer_size];
   while (ssize_t r = read(ucspi_client_read, buffer, buffer_size)) {
      if (r < 0) throw errno;
      char *p = static_cast<char *>(std::memchr(buffer, '\n', buffer_size));
      in.append(buffer, p? p + 1: buffer + r);
      if (p) break;
   }
}

inline void write_to_socket(const char *greeting) {
   if (write(ucspi_client_write, greeting, std::strlen(greeting)) < 0) throw errno;
}

int main() {
try {
   ignore_sigpipe();
   std::cout << "Connecting to server...\n";
   std::cout.flush();
   const char greeting[] = "Hello!";
   write_to_socket(greeting);
   std::string in;
   read_from_socket(in);
   if (!in.empty()) std::cout << in;
   return 0;
}
catch (int err) {
   std::cerr << "test-client: fatal: " << std::strerror(err) << '\n';
   return 1;
}
catch (...) {
   return 1;
}
}

The application prints 'Connecting to server...' to its standard output, sends a 'Hello!' message to the server using the open file descriptor supplied by the UCSPI client, and waits for a reply, which is printed to its standard output. The application then exits.

Starting the UCSPI super-server:

user1 $s6-ipcserver test-socket ./test-server &
user1 $ls -l test-socket
srwxrwxrwx 1 user1 user1 0 Aug  4 12:00 test-socket

This shows that a socket named test-socket has been created in the current working directory. Starting a UCSPI client to connect to the socket three times:

user1 $s6-ipcclient test-socket ./test-client
Connecting to server...
Server process created with PID 1992, client is "user1"
user1 $s6-ipcclient test-socket ./test-client
Connecting to server...
Server process created with PID 1994, client is "user1"
user1 $s6-ipcclient test-socket ./test-client
Connecting to server...
Server process created with PID 1996, client is "user1"
user1 $ps f -o pid,ppid,args
 PID  PPID COMMAND
...
1977  1974 \_ bash
1985  1977     \_ s6-ipcserverd -- ./test-server
1992  1985         \_ ./test-server
1994  1985         \_ ./test-server
1996  1985         \_ ./test-server
...

This shows that the super-server spawned three test-server processes to handle each connection, and set the IPCREMOTEEUID environment variable to user1's user ID. s6-ipcserver test-socket ./test-server is equivalent to s6-ipcserver-socketbinder test-socket s6-ipcserverd ./test-server, but shorter. Starting a UCSPI client with effective user user2:

user2 $s6-ipcclient test-socket ./test-client
Connecting to server...
Server process created with PID 2009, client is "user2"

This shows that the super-server set the IPCREMOTEEUID environment variable to user2's user ID. Starting the super-server with the -P option, and a client to connect to test-socket:

user1 $s6-ipcserver -P test-socket ./test-server
user1 $s6-ipcclient test-socket ./test-client
Connecting to server...
Server process created with PID 2021, client is "<unavailable>"

s6-ipcserver -P test-socket ./test-server is equivalent to s6-ipcserver-socketbinder test-socket s6-ipcserverd -P ./test-server, but shorter. This shows that since credentials lookup was disabled, environment variable IPCREMOTEEGID is unset, and test-server displays '<unavailable>' in place of a username.

See the suidless privilege gain section for s6-ipcserver-access usage examples.

Suidless privilege gain tools

s6 provides two programs, s6-sudoc and s6-sudod, that can be used to implement controlled privilege gains without setuid programs. This is achieved by having s6-sudod run as a long-lived process with an effective user that has the required privileges, and bound to a stream mode UNIX domain socket, and having s6-sudod, which can run with an unprivileged effective user, ask the s6-sudod process over a connection to its socket to perform an action on its behalf.

s6-sudod is a program that must be spawned by a UCSPI server (like s6-ipcserverd) and accepts options and an argument sequence s1, s2, ... that can be empty. s6-sudoc is a program that must be spawned by a UCSPI client and accepts options and an argument sequence c1, c2, ... that can also be empty. s6-sudoc transmits the argument sequence over the connection to the server, that must be an s6-sudod process, and its environment variables, unless it is invoked with an -e option. s6-sudod concatenates its argument sequence with the one received from the client, and passes it to a POSIX execve() call, which results in a program invocation. s6-sudoc also transmits its standard input, output and error file descriptors to s6-sudod using SCM_RIGHTS control messages (i.e. fd-passing, see the file descriptor holder and related tools), so that the invoked program will run as a child process of s6-sudod, with s6-sudod's effective user, but its standard input, output and error descriptors will be a copy of s6-sudoc's. The program's environment will be s6-sudod's environment, except that every variable that is defined but has an empty value will set to the value it has in s6-sudoc 's enviroment, if it is also set. s6-sudoc waits until s6-sudod's child process exits. If it is invoked with a -T option followed by a time value in milliseconds, it will close the conection and exit after the specified time has passed if s6-sudod's child is still running.

s6-sudo is a helper program that accepts options, a UNIX domain socket pathname and an s6-sudoc argument sequence, and invokes s6-ipcclient chained to s6-sudoc. The socket pathname is passed to s6-ipcclient, and the argument sequence, to s6-sudoc. s6-sudo options specify corresponding s6-ipcserver-socketbinder and s6-sudoc options. For the full description of s6-sudo's, s6-sudoc's and s6-sudod's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Standard permissions settings on s6-sudo's listening socket can be used to implement some access control, and credentials passing over a UNIX domain socket also allows finer-grained control. The s6-ipcserver-access program can be used to take advantage of credentials passing.

Important
If s6-sudoc is killed, or exits while s6-sudod's child process is still running, s6-sudod will send a SIGTERM followed by a SIGCONT signal to its child, and then exit 1. However, sending a SIGTERM to the child does not guarantee that it will die, and if it keeps running, it might still read from the file descriptor that was s6-sudoc's standard input, or write to the file descriptors that were s6-sudoc's standard output or error. This is a potential security risk. Administrators should audit their server programs to make sure this does not happen. More generally, anything using signals or terminals will not be handled transparently by the s6-sudoc + s6-sudod mechanism. The mechanism was designed to allow programs to gain privileges in specific situations: short-lived, simple, noninteractive processes. It was not designed to emulate the full suid functionality and will not go out of its way to do so. Also, s6-sudoc's argument sequence may be empty. In that case, the client is in complete control of the program executed as s6-sudod's child. This setup is permitted but very dangerous, and extreme attention should be paid to access control.
FILE test-scriptExample execline script to be executed by s6-sudod
#!/bin/execlineb -S0
pipeline { id -u } withstdinas -n localuser
importas localuser localuser
importas -D unavailable IPCREMOTEEUID IPCREMOTEEUID
importas -D unset VAR1 VAR1
importas -D unset VAR2 VAR2
importas -D unset VAR3 VAR3
foreground { echo Script run with effective user ID $localuser and arguments $@ }
echo IPCREMOTEEUID=$IPCREMOTEEUID VAR1=$VAR1 VAR2=$VAR2 VAR3=$VAR3

Testing the script by executing it directly:

user1 $VAR1="s6-sudoc value" VAR2="ignored variable" ./test-script arg1 arg2
Script run with effective user ID 1000 and arguments arg1 arg2
IPCREMOTEEUID=unavailable VAR1=s6-sudoc value VAR2=ignored variable VAR3=unset

The script is executed with effective user user1 (UID 1000), IPCREMOTEEUID and VAR3 are unset, and VAR1 and VAR2 are set to the specified values.

FILE s6-sudod-wrapperExample execline script to launch an s6-sudod process with access control
s6-ipcserver run-test-script
s6-ipcserver-access -v 2 -i rules
s6-sudod ./test-script arg1 arg2

s6-ipcserver-access's -v 2 argument increments its verbosity level. Contents of rules directory rules:

user1 $ls -l rules/*/*
rules/uid/1002:
total 4
-rw-r--r-- 1 user1 user1    0 Aug  4 12:00 allow
drwxr-xr-x 2 user1 user1 4096 Aug  4 12:00 env

rules/uid/default:
total 0
-rw-r--r-- 1 user1 user1 0 Aug  4 12:00 deny
user1 $ls -1 rules/uid/1002/env
VAR1
VAR3
FILE rules/uid/1002/env/VAR3
s6-sudod value

File rules/uid/1002/env/VAR1 contains an empty line, so the corresponding environment variable will be set, but empty. Launching the s6-sudod process:

user1 $execlineb -P s6-sudod-wrapper &
user1 $ls -l run-test-script
srwxrwxrwx 1 user1 user1 0 Aug  4 12:10 run-test-script

This shows that a UNIX domain socket named run-test-script was created in the working directory. Running s6-sudo with effective user user2 (UID 1001):

user2 $VAR1="s6-sudoc value" VAR2="ignored variable" s6-sudo run-test-script arg3 arg4
s6-ipcserver-access: info: deny pid 2125 uid 1001 gid 1001: Permission denied
s6-sudoc: fatal: connect to the s6-sudod server - check that you have appropriate permissions

s6-sudo run-test-script arg3 arg4 is equivalent to s6-ipcclient run-test-script s6-sudoc arg3 arg4, but shorter. This shows that the rules directory setup denied execution of test-script to user2 (UID 1001); it only allows it to the user with UID 1002. Modifying rules:

user1 $mv rules/uid/100{2,1}
user1 $ls -1 rules/*/*
rules/uid/1001:
allow
env

rules/uid/default:
deny

Retrying s6-sudo:

user2 $VAR1="s6-sudoc value" VAR2="ignored variable" s6-sudo run-test-script arg3 arg4
s6-ipcserver-access: info: allow pid 2148 uid 1001 gid 1001
Script run with effective user ID 1000 and arguments arg1 arg2 arg3 arg4
IPCREMOTEEUID=1001 VAR1=s6-sudoc value VAR2=unset VAR3=s6-sudod value

Comparing to the output of the script when run directly by user1, this shows that test-script's arguments are the concatenation of the ones supplied to s6-sudod in script s6-sudod-wrapper, arg1 and arg2, and the ones specified in the s6-sudo invocation, arg3 and arg4. Also, test-script's environment has s6-sudod's variables: IPCREMOTEEUID, inherited from s6-ipcserverd, and VAR3, inherited from s6-ipcserver-access, which in turn sets it based on environment directory rules/uid/1002/env. Because variable VAR1 is set by s6-ipcserver-access but empty, s6-sudod sets it to the value it has in s6-sudoc's environment. And because variable VAR2 is set in s6-sudoc's environment but not in s6-sudod's, it is also unset in test-script's environment.

The file descriptor holder and related tools

The Linux kernel allows one process to send a copy if its open file descriptors to a different process. This is done by transmitting SCM_RIGHTS control messages with an array of file descriptors from one process to the other over a UNIX domain socket as ancillary data (i.e. over a socket for address family AF_UNIX as the object pointed to by the msg_control field of a struct msghdr object) using the POSIX sendmsg() and recvmsg() calls. This works like POSIX dup2() does for a single process, and in s6's documentation, this is called fd-passing. A file descriptor holder (of fd-holder) is a process that receives file descriptors from others via fd-passing and holds them, for later retrieval either from the same process or from a different one. The fd-holder doesn't do anything with the file descriptors, it only keeps them open. For some possible uses of this feature, see later.

s6 provides a program implementing fd-holder functionality, named s6-fdholderd. It must have its standard input redirected to a bound and listening stream mode UNIX domain socket (i.e. a SOCK_STREAM socket), and accepts a set of options that control its behaviour. It has builtin access control features, and all operations requested to it must be explicitly granted to the requesting client. For this purpose, it accepts either an -i option specifying the pathname of a rules directory, or an -x option specifying the pathname of a rules file, just like s6-ipcserver-access. s6-fdholderd will exit with an error status if neither of these options is supplied. The environment specified via appropriate env subdirectories in a rules directory controls which operations supported by s6-fdholderd are allowed to which clients. s6-fdholderd ignores exec files in a rules directory.

s6-fdholderd runs until told to exit with a SIGTERM signal; after that it can keep running for limited time to allow connected clients to finish their ongoing operations. This is called the lame duck timeout, which can be specified with a -T option followed by a time value in milliseconds. If s6-fdholderd is invoked with a -T 0 argument or no -T option, the lame duck timeout is infinite: after receiving the signal, it will wait until all clients disconnect before exiting. s6-fdholderd supports the s6 readiness protocol, and if it is invoked with a -1 option, it will turn readiness notification on, with file descriptor 1 (i.e. its standard output) as the notification channel's file descriptor. If s6-fdholderd was invoked with a rules file (-x) and receives a SIGHUP signal, it will re-read it. If s6-fdholderd was invoked with a rules directory (-i), changes are picked up automatically so SIGHUP isn't needed.

s6-fdholder-daemon is a helper program that accepts options and a UNIX domain socket pathname, and invokes s6-ipcserver-socketbinder chained to s6-fdholderd, or s6-ipcserver-socketbinder chained to s6-applyuidgid, chained to s6-fdholderd, depending on the options. The socket pathname is passed to s6-ipcserver-socketbinder. s6-fdholder-daemon options specify corresponding s6-ipcserver-socketbinder, s6-applyuidgid and s6-fdholderd options. For further information about s6-fdholder-daemon or s6-fdholderd please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Operations supported by s6-fdholderd are store, retrieve, list, delete, get dump and set dump. s6 provides a client program for each fd-holder operation, that invokes, using a POSIX execve() call, s6-ipcclient chained to an operation-specific program that handles communications with s6-fdholderd, using the connection set up by s6-ipcclient. To store, retrieve or delete a file descriptor, s6-fdholderd uses identifiers to refer to them. An identifier is a character string containing 1 to 255 characters; any non-null character can be used in an identifier, however it is recommended to only use reasonable characters. For file descriptor identifiers associated with UNIX domain sockets, it is conventional to adhere to the 'unix:$path' format, where $path is the socket's absoulte pathname, and for file descriptor identifiers associated with TCP or UDP sockets, it is conventional to adhere to the '$protocol:$address:$port' or '$protocol:$host:$port' format, where $protocol is 'tcp' or 'udp', $address is an IPv4 or IPv6 address, $host is a domain name, and $port is the TCP or UDP port number. If an identifier is currently in use by s6-fdholderd, it cannot be reused to store a new file descriptor until the one currently using the identifier is deleted or has expired.

  • A store operation transfers a file descriptor from a client to the fd-holder, specifying an identifier for it, and optionally an expiration time. When a file descriptor has been held for a period of time equal to the expiration time, the fd-holder closes it and frees its identifier. To allow store operations, a file named S6_FDHOLDER_STORE_REGEX must exist in the appropriate env subdirectory of s6-fdholderd's rules directory, or rules file created from a rules directory, containing a POSIX extended regular expression (like those of the grep -E command). A store operation must specify an identifier that matches the regular expression.
    • The client program that performs store operations is s6-fdholder-store. It accepts options, a UNIX domain socket pathname and a file descriptor identifier, and invokes s6-ipcclient chained to the s6-fdholder-storec program. The socket pathname is passed to s6-ipcclient, and the identifier, to s6-fdholder-storec. s6-fdholder-store accepts a -d option followed by an unsigned integer value, that specifies the file descriptor to store. If it is invoked without a -d option, it will store its standard input (file descriptor 0). All other supplied options are passed to s6-fdholder-storec. s6-fdholder-storec makes the store request, transferring the specified identifier to the server over the connection, and file descriptor 0 via fd-passing. If it is invoked with a -T option followed by a time value in milliseconds, the file descriptor's expiration time is set to that value in the fd-holder. If it is invoked with a -T 0 argument or no -T option, the held file descriptor does not expire. s6-fdholder-store copies the file descriptor specified by the -d option to its standard input before replacing itself with s6-ipcclient.
  • A retrieve operation transfers a file descriptor from the fd-holder to a client, specifying its corresponding identifier. To allow retrieve operations, a file named S6_FDHOLDER_RETRIEVE_REGEX must exist in the appropriate env subdirectory of s6-fdholderd's rules directory, or rules file created from a rules directory, containing a POSIX extended regular expression. A retrieve operation must specify an identifier that matches the regular expression.
    • The client program that performs retrieve operations is s6-fdholder-retrieve. It is a chain loading program that accepts options, a UNIX domain socket pathname and a file descriptor identifier, and invokes s6-ipcclient chained to the s6-fdholder-retrievec program, chained to execline's fdclose program. The socket pathname is passed to s6-ipcclient, and the options and identifier, to s6-fdholder-retrievec. s6-fdholder-retrievec makes the retrieve request, transferring the specified identifier over the connection to the server, and receiving the corresponding file descriptor via fd-passing. fdclose closes UCSPI socket file descriptors 6 and 7 inherited from s6-ipcclient; its standard input will be a copy of the descriptor received from s6-fdholderd, which will be passed to the next program in the chain. If s6-fdholder-retrievec is invoked with a -D option, it will also request a delete operation for the file descriptor after retrieving it.
  • A delete operation requests the fd-holder to close a currently held file descriptor, specifying its corresponding identifier, which is then freed. A delete operation is allowed if a store operation specifying the same identifier would be allowed.
    • The s6 client that performs delete operations is s6-fdholder-delete. It accepts options, a UNIX domain socket pathname and a file descriptor identifier, and invokes s6-ipcclient chained to the s6-fdholder-deletec program. The socket pathname is passed to s6-ipcclient, and the options and identifier, to s6-fdholder-deletec. s6-fdholder-deletec makes the delete request, transferring the specified identifier over the connection to the server.
  • A list operation requests the fd-holder a list of the identifiers of all currently held file descriptors. To allow list operations, a nonempty file named S6_FDHOLDER_LIST must exist in the appropriate env subdirectory of s6-fdholderd's rules directory, or rules file created from a rules directory.
    • The client program that performs list operations is s6-fdholder-list. It accepts options and a UNIX domain socket pathname, and invokes s6-ipcclient chained to the s6-fdholder-listc program. The socket pathname is passed to s6-ipcclient, and the options, to s6-fdholder-listc. s6-fdholder-listc makes the list request over the connection to the server, and prints the identifier list received from s6-fdholderd to its standard output.
  • A get dump operation requests the fd-holder to transfer to a client all currently held file descriptors. The file descriptors are not deleted. To allow get dump operations, a nonempty file named S6_FDHOLDER_GETDUMP must exist in the appropriate env subdirectory of s6-fdholderd's rules directory, or rules file created from a rules directory.
    • The client program that performs get dump operations is s6-fdholder-getdump. It is a chain loading program that accepts options and a UNIX domain socket pathname, and invokes s6-ipcclient chained to the s6-fdholder-getdumpc program, chained to execline's fdclose program. The socket pathname is passed to s6-ipcclient, and the options, to s6-fdholder-getdumpc. s6-fdholder-getdumpc makes the get dump request over the connection to the server, receiving all file descriptors held by s6-fdholderd via fd-passing, and sets environment variables. fdclose closes UCSPI socket file descriptors 6 and 7 inherited from s6-ipcclient, and passes the received file descriptors and environment variables set by s6-fdholder-getdumpc to the next program in the chain.
  • A set dump operation transfers a subset of a client's file descriptors to the fd-holder, under the control of environment variables. To allow set dump operations, a nonempty file named S6_FDHOLDER_SETDUMP must exist in the appropriate env subdirectory of s6-fdholderd's rules directory, or rules file created from a rules directory.
    • The client program that performs get dump operations is s6-fdholder-setdump. It accepts options and a UNIX domain socket pathname, and invokes s6-ipcclient chained to the s6-fdholder-setdumpc program. The socket pathname is passed to s6-ipcclient, and the options, to s6-fdholder-setdumpc. s6-fdholder-setdumpc makes the set dump request over the connection to the server, transferring a subset of its file descriptors via fd-passing under the control of environment variables.

The value of the S6_FD# environment variable is set by s6-fdholder-getdumpc to the number of file descriptors received via fd-passing, and used by s6-fdholder-setdumpc to construct the names of other environment variables. The values of environment variables S6_FD_0, S6_FD_1, ..., S6_FD_${N}, where ${N} is the value of S6_FD#, are set by s6-fdholder-getdumpc to the file descriptors received via fd-passing, and used by s6-fdholder-setdumpc to select the file descriptors that will be transferred via fd-passing. The values of environment variables S6_FDID_0, S6_FDID_1, ..., S6_FDID_${N} are set by s6-fdholder-setdumpc to the corresponding file descriptor identifiers, and used by s6-fdholder-setdumpc to specify to the fd-holder the corresponding file descriptor identifiers. The values of environment variables S6_FDLIMIT_0, S6_FDLIMIT_1, ..., S6_FDLIMIT_${N} are set by s6-fdholder-setdumpc to the time remaining for each file descriptor before expiration, for those that have an expiration time set. The values of environment variables S6_FDLIMIT_0, S6_FDLIMIT_1, ..., S6_FDLIMIT_${N}, if set, are used by s6-fdholder-setdumpc to specify to the fd-holder an expiration time for the corresponding file descriptors. For the full description of all s6 fd-holder client programs' functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Finally, s6 provides an s6-fdholder-transferdumpc program, and a helper program, s6-fdholder-transferdump, that allow transferring all currently held file descriptors in one fd-holder process to a another one. The transferred set of file descriptors are added in the destination process to the currently held ones. This is implemented by performing a get dump operation on the source process and a set dump operation on the destination process, so they must be allowed by each s6-fdholderd's access control policies. s6-fdholder-transferdumpc handles communications with the fd-holders to make the get dump and set dump requests; its standard input must be redirected to the source s6-fdholderd process' socket, and its standard output, to the destination s6-fdholderd process' socket. s6-fdholder-transferdump accepts options and the socket pathnames of the source and destinations fd-holders, and invokes, using a POSIX execve() call, s6-ipcclient chained to execline's fdmove and fdclose programs, chained to s6-fdholder-transferdumpc. The socket pathnames are passed to s6-ipcclient, and the options are used to construct s6-fdholder-transferdumpc options. fdmove and fdclose perform all file descriptor manipulations neccesary to move UCSPI socket file descriptors inherited from s6-ipcclient to s6-fdholder-transferdumpc's standard input and output. For the full description of s6-fdholder-transferdump's and s6-fdholder-transferdumpc's functionality please consult the HTML documentation in the package's /usr/share/doc subdirectory.

Example s6-rc service definitions in s6-rc-compile source format for holding the reading end of a FIFO:

user $ls -l *
fifo-reader:
total 8
-rwxr-xr-x 1 user user 117 Aug  1 12:00 run
-rw-r--r-- 1 user user   8 Aug  1 12:00 type

fifo-reader-heldfd:
total 12
-rw-r--r-- 1 user user  18 Aug  1 12:00 dependencies
-rwxr-xr-x 1 user user 157 Aug  1 12:00 run
-rw-r--r-- 1 user user   8 Aug  1 12:00 type

fifo-reader-setup:
total 16
-rw-r--r-- 1 user user  14 Aug  1 12:00 dependencies
-rw-r--r-- 1 user user  90 Aug  1 12:00 down
-rw-r--r-- 1 user user   8 Aug  1 12:00 type
-rw-r--r-- 1 user user 143 Aug  1 12:00 up

fifo-writer:
total 8
-rwxr-xr-x 1 user user 84 Aug  1 12:00 run
-rw-r--r-- 1 user user  8 Aug  1 12:00 type

test-fdholder:
total 16
drwxr-xr-x 2 user user 4096 Aug  1 12:00 data
-rw-r--r-- 1 user user    2 Aug  1 12:00 notification-fd
-rwxr-xr-x 1 user user  101 Aug  1 12:00 run
-rw-r--r-- 1 user user    8 Aug  1 12:00 type
FILE fifo-writer/type
longrun
FILE fifo-writer/run
#!/bin/execlineb -P
redirfd -w 1 /home/user/test-fifo
test-daemon

test-daemon is assumed to be a program that prints to its standard output a message of the form 'Message #n', with an incrementing number n between 0 and 9, each time it receives a SIGHUP signal. Service fifo-writer, a longrun, is a test-daemon process with its standard output redirected to the /home/user/test-fifo FIFO.

FILE fifo-reader/type
longrun
FILE fifo-reader/run
#!/bin/execlineb -P
redirfd -r 0 /home/user/test-fifo
s6-log t /home/user/logdir

Service fifo-reader, a longrun, is an s6-log process that reads from the /home/user/test-fifo FIFO and logs to the /home/user/logdir logging directory. Creating the FIFO:

user $mkfifo -m ug=rw,o= test-fifo
user $ls -l test-fifo
prw-rw---- 1 user user 0 Aug  5 22:00 test-fifo

Starting both services, assuming that s6-rc-compile has been called on the service definitions set to create a compiled services database, that s6-rc-init has been called after s6-svscan, and that the s6 scan directory and the s6-rc live state directory (named live) are both in the same directory:

user $s6-rc -l ../live -v 2 -u change fifo-writer fifo-reader
s6-rc: info: processing service fifo-writer: starting
s6-rc: info: processing service fifo-reader: starting
s6-rc: info: service fifo-writer started successfully
s6-rc: info: service fifo-reader started successfully

s6-rc's -v 2 argument increments its verbosity level. Sending three SIGHUP signals to test-daemon:

user $for i in 1 2 3; do s6-svc -h fifo-writer; done
user $cat ../logdir/current | s6-tai64nlocal
2017-08-05 22:06:37.556779100 Message #1
2017-08-05 22:06:37.557858265 Message #2
2017-08-05 22:06:37.558856806 Message #3

This shows that test-daemon's messages were sent over the FIFO to s6-log and logged. Stopping momentarily the fifo-reader service:

user $s6-rc -l ../live -v 2 -d change fifo-reader
s6-rc: info: processing service fifo-reader: stopping
s6-rc: info: service fifo-reader stopped successfully

Sending three more SIGHUP signals to test-daemon:

user $for i in 4 5 6; do s6-svc -h fifo-writer; done
test-daemon: warning: Got SIGPIPE
test-daemon: warning: Got SIGPIPE
test-daemon: warning: Got SIGPIPE

Since the FIFO no longer has any reader, writing to it makes the kernel send a SIGPIPE signal to the test-daemon process. Restarting the fifo-reader service and sending three final SIGHUP signals to test-daemon:

user $s6-rc -l ../live -u change fifo-reader
user $for i in 7 8 9; do s6-svc -h fifo-writer; done
user $cat ../logdir/current | s6-tai64nlocal
2017-08-05 22:06:37.556779100 Message #1
2017-08-05 22:06:37.557858265 Message #2
2017-08-05 22:06:37.558856806 Message #3
2017-08-05 22:10:58.036594559 Message #7
2017-08-05 22:10:58.037747723 Message #8
2017-08-05 22:10:58.038749083 Message #9

This shows that the messages sent by test-daemon when s6-log was not running are effectively lost. This can be avoided by modifying the services so that the reading end of the FIFO is held by an fd-holder process.

FILE test-fdholder/type
longrun
FILE test-fdholder/run
#!/bin/execlineb -P
s6-fdholder-daemon -1 -x data/rules /home/user/fdholder-socket
FILE test-fdholder/notification-fd
1

Service test-fdholder, a longrun, is an s6-fdholderd process bound to socket /home/user/fdholder-socket, with readiness notification enabled (-1), and using rules file rules in the data subdirectory of its compiled s6 service directory. s6-fdholder-daemon -1 -x data/rules /home/user/fdholder-socket is equivalent to s6-ipcserver-socketbinder /home/user/fdholder-socket s6-fdholderd -1 -x data/rules, but shorter. The rules file is created from the following rules directory using s6-accessrules-cdb-from-fs test-fdholder/data/rules ../rules.d:

user $ls -l ../rules.d/*/*
../rules.d/uid/1000:
total 4
-rw-r--r-- 1 user user    0 Aug  1 12:00 allow
drwxr-xr-x 2 user user 4096 Aug  1 12:00 env

../rules.d/uid/default:
total 0
-rw-r--r-- 1 user user 0 Aug  1 12:00 deny
user $ls -l ../rules.d/uid/1000/env
total 8
-rw-r--r-- 1 user user  1 Aug  1 12:00 S6_FDHOLDER_LIST
lrwxrwxrwx 1 user user 23 Aug  1 12:00 S6_FDHOLDER_RETRIEVE_REGEX -> S6_FDHOLDER_STORE_REGEX
-rw-r--r-- 1 user user 32 Aug  1 12:00 S6_FDHOLDER_STORE_REGEX
FILE S6_FDHOLDER_STORE_REGEX
^(unix|fifo):/home/user/test-

This rules directory allows user (assumed to have user ID 1000) to perform store (and therefore delete), retrieve and list operations. The S6_FDHOLDER_LIST file contains a single empty line.

FILE fifo-reader-setup/type
oneshot
FILE fifo-reader-setup/dependencies
test-fdholder
FILE fifo-reader-setup/up
redirfd -rnb 0 /home/user/test-fifo
s6-fdholder-store /home/user/fdholder-socket fifo:/home/user/test-fifo
FILE fifo-reader-setup/down
s6-fdholder-delete /home/user/fdholder-socket fifo:/home/user/test-fifo

Service fifo-reader-setup, a oneshot, opens FIFO /home/user/test-fifo for reading (in non-blocking mode that is changed to blocking afterwards, using execline's redirfd program with options -n and -b) and stores the corresponding file descriptor in the fd-holder using identifier fifo:/home/user/test-fifo. s6-fdholder-store /home/user/fdholder-socket fifo:/home/user/test-fifo and s6-fdholder-delete /home/user/fdholder-socket fifo:/home/user/test-fifo are equivalent to s6-ipcclient -l 0 /home/user/fdholder-socket s6-fdholder-storec fifo:/home/user/test-fifo and s6-ipcclient -l 0 /home/user/fdholder-socket s6-fdholder-deletec fifo:/home/user/test-fifo, respectively, but shorter. The dependency on service test-fdholder ensures that s6-rc -u change will start s6-fdholderd before trying to store to it. When fifo-reader-setup transitions to the down state, it will delete the held file descriptor.

FILE fifo-reader-heldfd/type
longrun
FILE fifo-reader-heldfd/dependencies
fifo-reader-setup
FILE fifo-reader-heldfd/run
#!/bin/execlineb -P
s6-fdholder-retrieve /home/user/fdholder-socket fifo:/home/user/test-fifo
s6-log t /home/user/logdir

Service fifo-reader-heldfd, a longrun, is a modified version of fifo-reader that retrieves the FIFO's reading end from the fd-holder. s6-fdholder-retrieve /home/user/fdholder-socket fifo:/home/user/test-fifo is equivalent to s6-ipcclient -l 0 /home/user/fdholder-socket s6-fdholder-retrievec fifo:/home/user/test-fifo fdclose 6 fdclose 7, but shorter. The dependency on service fifo-reader-setup ensures that s6-rc -u change will have stored the FIFO's reading end first in the fd-holder with the appropriate identifier. Starting the fifo-writer and fifo-reader-heldfd services:

user $s6-rc -l ../live -v 2 -u change fifo-writer fifo-reader-heldfd
s6-rc -l ../live -v 2 -u change fifo-writer fifo-reader-heldfd
s6-rc: info: processing service fifo-writer: starting
s6-rc: info: processing service test-fdholder: starting
s6-rc: info: processing service s6rc-oneshot-runner: starting
s6-rc: info: service fifo-writer started successfully
s6-rc: info: service test-fdholder started successfully
s6-rc: info: service s6rc-oneshot-runner started successfully
s6-rc: info: processing service fifo-reader-setup: starting
s6-rc: info: service fifo-reader-setup started successfully
s6-rc: info: processing service fifo-reader-heldfd: starting
s6-rc: info: service fifo-reader-heldfd started successfully

Sending three SIGHUP signals to test-daemon, stopping momentarily the fifo-reader-heldfd service, sending three more SIGHUP signals to test-daemon:

user $for i in 1 2 3; do s6-svc -h fifo-writer; done
user $s6-rc -l ../live -v 2 -d change fifo-reader-heldfd
s6-rc: info: processing service fifo-reader-heldfd: stopping
s6-rc: info: service fifo-reader-heldfd stopped successfully
user $for i in 4 5 6; do s6-svc -h fifo-writer; done
user $cat ../logdir/current | s6-tai64nlocal
2017-08-05 22:29:45.042414497 Message #1
2017-08-05 22:29:45.043441185 Message #2
2017-08-05 22:29:45.044699928 Message #3
user $s6-fdholder-list ../fdholder-socket
fifo:/home/user/test-fifo

This shows that no more messages are being logged as a consequence, but because there is still an open file descriptor held by s6-fdholderd), there is no SIGPIPE signal this time. s6-fdholder-list /home/user/fdholder-socket is equivalent to s6-ipcclient -l 0 /home/user/fdholder-socket s6-fdholder-listc, but shorter. Restarting the fifo-reader-heldfd service:

user $s6-rc -l ../live -u change fifo-reader-heldfd
user $cat ../logdir/current | s6-tai64nlocal
2017-08-05 22:29:45.042414497 Message #1
2017-08-05 22:29:45.043441185 Message #2
2017-08-05 22:29:45.044699928 Message #3
2017-08-05 22:33:07.907057008 Message #4
2017-08-05 22:33:07.907113023 Message #5
2017-08-05 22:33:07.907114734 Message #6

This shows that once the file descriptor is retrieved for s6-log, all pending messages are delivered and logged. Sending three final SIGHUP signals to test-daemon:

user $for i in 7 8 9; do s6-svc -h fifo-writer; done
user $cat ../logdir/current | s6-tai64nlocal
2017-08-05 22:29:45.042414497 Message #1
2017-08-05 22:29:45.043441185 Message #2
2017-08-05 22:29:45.044699928 Message #3
2017-08-05 22:33:07.907057008 Message #4
2017-08-05 22:33:07.907113023 Message #5
2017-08-05 22:33:07.907114734 Message #6
2017-08-05 22:33:43.821315726 Message #7
2017-08-05 22:33:43.822098686 Message #8
2017-08-05 22:33:43.823644164 Message #9

This shows that no messages are lost even if the FIFO reader momentarily stops.

Pre-opening sockets before their servers are running

systemd supports a mechanism it calls socket activation, that makes a systemd process pre-open UNIX domain, TCP/IP or Netlink sockets, FIFOs and other special files, and pass them to a child process when a connection is made, a writer opens the FIFO, new data is available for reading, etc. Socket activation is performed when a socket unit file is provided with an accompanying service unit file. This mechanism combines superserver, fd-passing and fd-holding funcionality implemented in process 1, with programs written to communicate using pre-opened file descriptors. Similar behaviour can be achieved, for services that want it, by a combination of individual s6 programs like s6-ipcserver-socketbinder, s6-ipcserverd, s6-fdholderd and the fd-holder client programs, without involving process 1. The software package's author notes however that since read operations, and write operations when buffers are full, on a file descriptor held by an fd-holder will block, speed gains might no be that significant, and that, dependending on the scenario (e.g. logging infrastructure), communicating with a service assuming it is ready when it actually isn't might hurt reliability[9].

Example s6-rc service definitions in s6-rc-compile source format for pre-opening a UNIX domain socket and pass it to a program written to be executed by a IPC UCSPI server:

user $ls -l *
test-fdholder:
total 16
drwxr-xr-x 2 user user 4096 Aug  1 12:00 data
-rw-r--r-- 1 user user    2 Aug  1 12:00 notification-fd
-rwxr-xr-x 1 user user  101 Aug  1 12:00 run
-rw-r--r-- 1 user user    8 Aug  1 12:00 type

test-server-heldfd:
total 12
-rw-r--r-- 1 user user  18 Aug  1 12:00 dependencies
-rwxr-xr-x 1 user user 140 Aug  1 12:00 run
-rw-r--r-- 1 user user   8 Aug  1 12:00 type

test-server-setup:
total 16
-rw-r--r-- 1 user user  14 Aug  1 12:00 dependencies
-rw-r--r-- 1 user user  92 Aug  1 12:00 down
-rw-r--r-- 1 user user   8 Aug  1 12:00 type
-rw-r--r-- 1 user user 158 Aug  1 12:00 up

Service test-fdholder is the same as in the previous FIFO example.

FILE test-server-setup/type
oneshot
FILE test-server-setup/dependencies
test-fdholder
FILE test-server-setup/up
s6-ipcserver-socketbinder /home/user/test-socket
s6-fdholder-store /home/user/fdholder-socket unix:/home/user/test-socket
FILE test-server-setup/down
s6-fdholder-delete /home/user/fdholder-socket unix:/home/user/test-socket

Service test-server-setup, a oneshot, pre-opens a listening UNIX domain socket bound to /home/user/test-socket using s6-ipcserver-socketbinder, and stores the corresponding file descriptor in the fd-holder using identifier unix:/home/user/test-socket. The dependency on service test-fdholder ensures that s6-rc -u change will start s6-fdholderd before trying to store to it. When test-server-setup transitions to the down state, it will delete the held file descriptor.

FILE test-server-heldfd/type
longrun
FILE test-server-heldfd/dependencies
test-server-setup
FILE test-server-heldfd/run
#!/bin/execlineb -P
s6-fdholder-retrieve /home/user/fdholder-socket unix:/home/user/test-socket
s6-ipcserverd test-server

Service test-server-heldfd, a longrun, retrieves the listening socket's file descriptor from the fd-holder, and invokes super-server s6-ipcserverd to handle incoming connections, spawning a test-server process for each one. test-server is the same program used in the example contained in section "The UNIX domain super-server and related tools". The dependency on service test-server-setup ensures that s6-rc -u change will have stored the listening socket's file descriptor first in the fd-holder with the appropriate identifier. Starting the test-server-setup service:

user $s6-rc -l ../live -v 2 -u change test-server-setup
s6-rc: info: processing service test-fdholder: starting
s6-rc: info: processing service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner started successfully
s6-rc: info: service test-fdholder started successfully
s6-rc: info: processing service test-server-setup: starting
s6-rc: info: service test-server-setup started successfully

s6-rc's -v 2 argument increments its verbosity level. Starting a UCSPI client in the background to connect to the server:

user $s6-ipcclient ../test-socket test-client &
Connecting to server...

test-client is the same program used in the example contained in section "The UNIX domain super-server and related tools". This shows that test-client has been launched with a connection to the server's listening socket, but because its file descriptor is held by s6-fdholderd and no server program is currently running, it blocks on an I/O operation.

user $s6-fdholder-list ../fdholder-socket
unix:/home/user/test-socket

This shows that s6-fdholderd is holding the listening socket's file descriptor. Starting the test-server-heldfd service:

user $s6-rc -l ../live -v 2 -u change test-server-heldfd
s6-rc: info: processing service test-fdholder: already up
s6-rc: info: processing service s6rc-oneshot-runner: already up
s6-rc: info: processing service test-server-setup: already up
s6-rc: info: processing service test-server-heldfd: starting
s6-rc: info: service test-server-heldfd started successfully
Server process created with PID 2377, client is "user"

This shows that once s6-ipcserverd has started and retrieved the listening socket's file descriptor from s6-fdholderd, it accepts the connection and spawns test-server to handle it.

s6-svscan as process 1

The s6-svscan program was also written to be robust enough and go out of its way to stay alive, even in dire situations, so that it is suitable for running as process 1 during most of a machine's uptime. However, the duties of process 1 vary widely during the machine's boot sequence, its normal, stable 'up and running' state, and its shutdown sequence, and in the first and third cases, they are heavily system-dependent, so it is not possible to use a program designed to be as portable as possible[10]. Because of that, auxiliary and system-dependent programs, named the stage1 init and the stage3 init, are used during the boot sequence and the shutdown sequence, respectively, to run as process 1, and s6-svscan is used the rest of the time. For details, see s6 and s6-rc-based init system.

To support its role as process 1, s6-svscan performs a reaper routine each time it receives a SIGCHLD signal, i.e. it uses a POSIX waitpid() call for each child process that becomes a zombie, both the ones it has spawned itself, and the ones that were reparented to process 1 by the kernel because its parent process died. An s6-svscanctl -z command naming its scan directory can be used to force s6-svscan to perform its reaper routine.

OpenRC's s6 integration feature

Starting with version 0.16, OpenRC can launch supervised long-lived processes using the s6 package as a helper [11]. This is an alternative to 'classic' unsupervised long-lived processes launched using the start-stop-daemon program. It should be noted that service scripts that don't contain start() and stop() functions implicitly use start-stop-daemon.

OpenRC services that want to use s6 supervision need both a service script in /etc/init.d and an s6 service directory. The service script must contain a supervisor=s6 variable assignment to turn the feature on, and must have a 'need' dependency on the s6-svscan service in its depend() function, to make sure the s6-svscan program is launched. It can contain neither a start() function, nor a stop() function (but their _pre() and _post() variants are OK), nor a status() function:

  • OpenRC internally invokes s6-svc with a -u option when the service script is called with a 'start' argument, and can also call s6-svwait after s6-svc to wait for an event, by assigning s6-svwait options to the s6_svwait_options_start variable (e.g. in the service script or the service-specific configuration file in /etc/conf.d). For example, if the service supports readiness notification, s6_svwait_options_start="-U -t 5000" could be used to make OpenRC wait for the up and ready event with a 5 seconds timeout.
  • OpenRC internally invokes s6-svc with -d, -wD and -T options when the service script is called with a 'stop' argument, so it will wait for a really down event with a default timeout of 10 seconds. The timeout can be changed by assigning a time value in milliseconds to s6_service_timeout_stop variable (e.g. in the service script or the service-specific configuration file in /etc/conf.d).
  • OpenRC internally invokes s6-svstat when the service script is called with a 'status' argument.

The s6 service directory can be placed anywhere in the filesystem, and have any name, as long as the service script (or the service-specific configuration file in /etc/conf.d) assigns the servicedir's absolute path to the s6_service_path variable. If s6_service_path is not assigned to, the s6 servicedir must have the same name as the OpenRC service script, and will be searched in /var/svc.d. The scan directory when using this feature is /run/openrc/s6-scan, and OpenRC will create a symlink to the service directory when the service is started.

Warning
OpenRC does not integrate as expected when s6-svscan is running as process 1, since OpenRC will launch another s6-svscan process with /run/openrc/s6-scan as its scan directory. So the result will be two independent supervision trees.

Example setup for a hypothetical supervised test-daemon process with a dedicated logger:

FILE /etc/init.d/test-serviceOpenRC service script
#!/sbin/openrc-run
description="A supervised test service with a logger"
supervisor=s6
s6_service_path=/home/user/test/svc-repo/test-service

depend() {
   need s6-svscan
}
FILE /etc/conf.d/test-serviceOpenRC service-specific configuration file
s6_svwait_options_start=-U
user $/sbin/rc-service test-service describe
* A supervised test service with a logger
* cgroup_cleanup: Kill all processes in the cgroup

The service directory:

user $ls -l /home/user/test/svc-repo/test-service /home/user/test/svc-repo/test-service/log
/home/user/test/svc-repo/test-service:
total 12
drwxr-xr-x 2 user user 4096 Aug  8 12:00 log
-rw-r--r-- 1 user user    2 Aug  8 12:00 notification-fd
-rwxr-xr-x 1 user user   86 Aug  8 12:00 run

/home/user/test/svc-repo/test-service/log:
total 4
-rwxr-xr-x 1 user user 65 Aug  8 12:00 run
FILE /home/user/test/svc-repo/test-service/run
#!/bin/execlineb -P
s6-softlimit -o 5
s6-setuidgid daemon
fdmove -c 2 1
/home/user/test/test-daemon --s6=5
FILE /home/user/test/svc-repo/test-service/notification-fd
5

This launches test-daemon with effective user daemon and the maximum number of open file descriptors set to 5. This is the same as if test-daemon performed a setrlimit(RLIMIT_NOFILE, &rl) call itself with rl.rlim_cur set to 5, provided that value does not exceed the corresponding hard limit. The program supports an --s6 option to turn readiness notification on, specifying the notification file descriptor (5), and also periodically prints to its standard error a message of the form 'Logged message #n', with an incrementing number n between 0 and 9. The redirection of test-daemon's standard error to standard output, using execline's fdmove program with the -c (copy) option, allows logging its messages using s6-log:

FILE /home/user/test/svc-repo/test-service/log/run
#!/bin/execlineb -P
s6-setuidgid user
s6-log t /home/user/test/logdir

An automatically rotated logging directory named logdir will be used, and messages will have a timestamp in external TAI64N format prepended to them.

Manually starting test-service:

root #time rc-service test-service start
* Creating s6 scan directory
* /run/openrc/s6-scan: creating directory
* Starting s6-svscan ...                    [ ok ]
* Starting test-service ...                 [ ok ]

real	0m11.681s
user	0m0.039s
sys	0m0.034s
root #rc-service test-service status
up (pid 2279) 33 seconds, ready 23 seconds

This shows that test-daemon took about 10 seconds to notify readiness to s6-supervise, and that the rc-service start command waited until the up and ready event, because of the s6-svwait -U option passed via s6_svwait_options_start in /etc/conf.d/test-service.

user $rc-status
Runlevel: default
...
Dynamic Runlevel: needed/wanted
...
s6-svscan                                   [  started  ]
...
Dynamic Runlevel: manual
test-service                                [  started  ]

The scan directory:

user $ls -la /run/openrc/s6-scan
total 0
drwxr-xr-x  3 root root  80 Aug  8 22:38 .
drwxrwxr-x 15 root root 360 Aug  8 22:38 ..
drwx------  2 root root  80 Aug  8 22:38 .s6-svscan
lrwxrwxrwx  1 root root  46 Aug  8 22:38 test-service -> /home/user/test/svc-repo/test-service

The supervision tree:

user $ps axf -o pid,ppid,pgrp,euser,args
 PID  PPID  PGRP EUSER    COMMAND
...
2517     1  2517 root     /bin/s6-svscan /run/openrc/s6-scan
2519  2517  2517 root      \_ s6-supervise test-service/log
2523  2519  2523 user      |   \_ s6-log t /home/user/test/logdir
2520  2517  2517 root      \_ s6-supervise test-service
2522  2520  2522 daemon        \_ /home/user/test/test-daemon --s6=5
...

Messages from the test-daemon process go to the logging directory:

user $ls -l /home/user/test/logdir
total 12
-rwxr--r-- 1 user user 352 Aug  8 22:39 @40000000598a67ec2d5d7180.s
-rwxr--r-- 1 user user 397 Aug  8 22:40 @40000000598a681919d6e581.s
-rwxr--r-- 1 user user 397 Aug  8 22:40 current
-rw-r--r-- 1 user user   0 Aug  8 22:38 lock
-rw-r--r-- 1 user user   0 Aug  8 22:38 state
user $cat /home/user/test/logdir/current | s6-tai64nlocal
2017-08-08 22:40:20.562745759 Logged message #1
2017-08-08 22:40:25.565816199 Logged message #2
2017-08-08 22:40:30.570600144 Logged message #3
2017-08-08 22:40:35.578765601 Logged message #4
2017-08-08 22:40:40.585146120 Logged message #5
2017-08-08 22:40:45.591282433 Logged message #6

Removal

Unmerge

root #emerge --ask --depclean sys-apps/s6

All scan directories, service directories, the /command symlink, etc. must be manually deleted if no longer wanted after removing the package. Also, all modifications to sysvinit's /etc/inittab must be manually reverted: lines for s6-svscanboot must be deleted, and a telinit q command must be used afterwards. And obviously, if s6-svscan is running as process 1, an alternative init system must be installed in parallel, and the machine rebooted to use it (possibly by reconfiguring the bootloader), before the package is removed, or otherwise the machine will become unbootable.

See also

External resources

References

  1. The execline language design and grammar. Retrieved on July 8th, 2017.
  2. Laurent Bercot, Why not just use /bin/sh?. Retrieved on July 8th, 2017.
  3. How to run s6-svscan under another init process. Retrieved on July 16th, 2017.
  4. Laurent Bercot, The logging chain, Retrieved on May 1st, 2017.
  5. Laurent Bercot, On the syslog design, Retrieved on May 1st, 2017.
  6. Notification vs. polling. Retrieved on July 28th, 2017.
  7. Service startup notifications. Retrieved on July 29th, 2017.
  8. Jonathan de Boyne Pollard, The gen on the UNIX Client-Server Program Interface. Retrieved on July 22th, 2017.
  9. How do I perform socket activation with s6?. Retrieved on August 12th, 2017.
  10. How to run s6-svscan as process 1. Retrieved on August 20th, 2017.
  11. Using s6 with OpenRC. Retrieved on June 24th, 2017.