minijail0.5 - platform/external/minijail - Git at Google

 .TH MINIJAIL0 "5" "July 2011" "Chromium OS" "User Commands"
 .SH NAME
 minijail0 \- sandbox a process
 .SH DESCRIPTION
 .PP
 Runs PROGRAM inside a sandbox. See \fBminijail0\fR(1) for details.
 .SH EXAMPLES

 Safely switch from root to nobody while dropping all capabilities and
 inheriting any groups from nobody:

   # minijail0 -c 0 -G -u nobody /usr/bin/whoami
   nobody

 Run in a PID and VFS namespace without superuser capabilities (but still
 as root) and with a private view of /proc:

   # minijail0 -p -v -r -c 0 /bin/ps
     PID TTY           TIME CMD
       1 pts/0     00:00:00 minijail0
       2 pts/0     00:00:00 ps

 Running a process with a seccomp filter policy at reduced privileges:

   # minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\
               /bin/cat /proc/self/seccomp_filter
   ...

 .SH SECCOMP_FILTER POLICY
 The policy file supplied to the \fB-S\fR argument supports the following syntax:

   \fB<syscall_name>\fR:\fB<ftrace filter policy>\fR
   \fB<syscall_number>\fR:\fB<ftrace filter policy>\fR
   \fB<empty line>\fR
   \fB# any single line comment\fR

 Long lines may be broken up using \\ at the end.

 A policy that emulates \fBseccomp\fR(2) in mode 1 may look like:
   read: 1
   write: 1
   sig_return: 1
   exit: 1

 The "1" acts as a wildcard and allows any use of the mentioned system
 call.  More advanced filtering is possible if your kernel supports
 CONFIG_FTRACE_SYSCALLS.  For example, we can allow a process to open any
 file read only and mmap PROT_READ only:

   # open with O_LARGEFILE|O_RDONLY|O_NONBLOCK or some combination
   open: arg1 == 32768 || arg1 == 0 || arg1 == 34816 || arg1 == 2048
   mmap2: arg2 == 0x0
   munmap: 1
   close: 1

 The supported arguments may be found by reviewing the system call
 prototypes in the Linux kernel source code.  Be aware that any
 non-numeric comparison may be subject to time-of-check-time-of-use
 attacks and cannot be considered safe.

 \fBexecve\fR may only be used when invoking with CAP_SYS_ADMIN privileges.

 In order to promote reusability, policy files can include other policy files
 using the following syntax:

   \fB@include /absolute/path/to/file.policy\fR
   \fB@include ./path/relative/to/CWD/file.policy\fR

 Inclusion is limited to a single level (i.e. files that are \fB@include\fRd
 cannot themselves \fB@include\fR more files), since that makes the policies
 harder to understand.

 .SH SECCOMP_FILTER SYNTAX
 More formally, the expression after the colon can be an expression in
 Disjunctive Normal Form (DNF): a disjunction ("or", \fI||\fR) of
 conjunctions ("and", \fI&&\fR) of atoms.

 .SS "Atom Syntax"
 Atoms are of the form \fIarg{DNUM} {OP} {VAL}\fR where:
 .IP
 \[bu] \fIDNUM\fR is a decimal number

 \[bu] \fIOP\fR is an unsigned comparison operator:
 \fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, \fI>=\fR, \fI&\fR (flags set),
 or \fIin\fR (inclusion)

 \[bu] \fVAL\fR is a constant expression.  It can be a named constant (like
 \fBO_RDONLY\fR), a number (octal, decimal, or hexadecimal), a mask of constants
 separated by \fI|\fR, or a parenthesized constant expression. Constant
 expressions can also be prefixed with the bitwise complement operator \fI~\fR
 to produce their complement.
 .RE

 \fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, and \fI>=\fR should be pretty
 self explanatory.

 \fI&\fR will test for a flag being set, for example, O_RDONLY for
 .BR open (2):

   open: arg1 & O_RDONLY

 Minijail supports most common named constants, like O_RDONLY.
 It's preferable to use named constants rather than numeric values as not all
 architectures use the same numeric value.

 When the possible combinations of allowed flags grow, specifying them all can
 be cumbersome.
 This is where the \fIin\fR operator comes handy.
 The system call will be allowed iff the flags set in the argument are included
 (as a set) in the flags in the policy:

   mmap: arg3 in MAP_PRIVATE|MAP_ANONYMOUS

 This will allow \fBmmap\fR(2) as long as \fIarg3\fR (flags) has any combination
 of MAP_PRIVATE and MAP_ANONYMOUS, but nothing else.  One common use of this is
 to restrict \fBmmap\fR(2) / \fBmprotect\fR(2) to only allow write^exec
 mappings:

   mmap: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
   mprotect: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE

 .SS "Return Values"

 By default, blocked syscalls call the process to be killed.
 The \fIreturn {NUM}\fR syntax can be used to force a specific errno to be
 returned instead.

   read: return EBADF

 This expression will block the \fBread\fR(2) syscall, make it return -1, and set
 \fBerrno\fR to EBADF (9 on x86 platforms).

 An expression can also include an optional \fIreturn <errno>\fR clause,
 separated by a semicolon:

   read: arg0 == 0; return EBADF

 This is, if the first argument to read is 0, then allow the syscall;
 else, block the syscall, return -1, and set \fBerrno\fR to EBADF.

 .SH SECCOMP_FILTER POLICY WRITING

 Determining policy for seccomp_filter can be time consuming.  System
 calls are often named in arch-specific, or legacy tainted, ways.  E.g.,
 geteuid versus geteuid32.  On process death due to a seccomp filter
 rule, the offending system call number will be supplied with a best
 guess of the ABI defined name.  This information may be used to produce
 working baseline policies.  However, if the process being contained has
 a fairly tight working domain, using \fBtools/generate_seccomp_policy.py\fR
 with the output of \fBstrace -f -e raw=all <program>\fR can generate the list
 of system calls that are needed.  Note that when using libminijail or minijail
 with preloading, supporting initial process setup calls will not be required.
 Be conservative.

 It's also possible to analyze the binary checking for all non-dead
 functions and determining if any of them issue system calls.  There is
 no active implementation for this, but something like
 code.google.com/p/seccompsandbox is one possible runtime variant.

 .SH AUTHOR
 The Chromium OS Authors <chromiumos-dev@chromium.org>
 .SH COPYRIGHT
 Copyright \(co 2011 The Chromium OS Authors
 License BSD-like.
 .SH "SEE ALSO"
 .BR minijail0 (1)
	.TH MINIJAIL0 "5" "July 2011" "Chromium OS" "User Commands"
	.SH NAME
	minijail0 \- sandbox a process
	.SH DESCRIPTION
	.PP
	Runs PROGRAM inside a sandbox. See \fBminijail0\fR(1) for details.
	.SH EXAMPLES

	Safely switch from root to nobody while dropping all capabilities and
	inheriting any groups from nobody:

	# minijail0 -c 0 -G -u nobody /usr/bin/whoami
	nobody

	Run in a PID and VFS namespace without superuser capabilities (but still
	as root) and with a private view of /proc:

	# minijail0 -p -v -r -c 0 /bin/ps
	PID TTY TIME CMD
	1 pts/0 00:00:00 minijail0
	2 pts/0 00:00:00 ps

	Running a process with a seccomp filter policy at reduced privileges:

	# minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\
	/bin/cat /proc/self/seccomp_filter
	...

	.SH SECCOMP_FILTER POLICY
	The policy file supplied to the \fB-S\fR argument supports the following syntax:

	\fB<syscall_name>\fR:\fB<ftrace filter policy>\fR
	\fB<syscall_number>\fR:\fB<ftrace filter policy>\fR
	\fB<empty line>\fR
	\fB# any single line comment\fR

	Long lines may be broken up using \\ at the end.

	A policy that emulates \fBseccomp\fR(2) in mode 1 may look like:
	read: 1
	write: 1
	sig_return: 1
	exit: 1

	The "1" acts as a wildcard and allows any use of the mentioned system
	call. More advanced filtering is possible if your kernel supports
	CONFIG_FTRACE_SYSCALLS. For example, we can allow a process to open any
	file read only and mmap PROT_READ only:

	# open with O_LARGEFILE\|O_RDONLY\|O_NONBLOCK or some combination
	open: arg1 == 32768 \|\| arg1 == 0 \|\| arg1 == 34816 \|\| arg1 == 2048
	mmap2: arg2 == 0x0
	munmap: 1
	close: 1

	The supported arguments may be found by reviewing the system call
	prototypes in the Linux kernel source code. Be aware that any
	non-numeric comparison may be subject to time-of-check-time-of-use
	attacks and cannot be considered safe.

	\fBexecve\fR may only be used when invoking with CAP_SYS_ADMIN privileges.

	In order to promote reusability, policy files can include other policy files
	using the following syntax:

	\fB@include /absolute/path/to/file.policy\fR
	\fB@include ./path/relative/to/CWD/file.policy\fR

	Inclusion is limited to a single level (i.e. files that are \fB@include\fRd
	cannot themselves \fB@include\fR more files), since that makes the policies
	harder to understand.

	.SH SECCOMP_FILTER SYNTAX
	More formally, the expression after the colon can be an expression in
	Disjunctive Normal Form (DNF): a disjunction ("or", \fI\|\|\fR) of
	conjunctions ("and", \fI&&\fR) of atoms.

	.SS "Atom Syntax"
	Atoms are of the form \fIarg{DNUM} {OP} {VAL}\fR where:
	.IP
	\[bu] \fIDNUM\fR is a decimal number

	\[bu] \fIOP\fR is an unsigned comparison operator:
	\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, \fI>=\fR, \fI&\fR (flags set),
	or \fIin\fR (inclusion)

	\[bu] \fVAL\fR is a constant expression. It can be a named constant (like
	\fBO_RDONLY\fR), a number (octal, decimal, or hexadecimal), a mask of constants
	separated by \fI\|\fR, or a parenthesized constant expression. Constant
	expressions can also be prefixed with the bitwise complement operator \fI~\fR
	to produce their complement.
	.RE

	\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, and \fI>=\fR should be pretty
	self explanatory.

	\fI&\fR will test for a flag being set, for example, O_RDONLY for
	.BR open (2):

	open: arg1 & O_RDONLY

	Minijail supports most common named constants, like O_RDONLY.
	It's preferable to use named constants rather than numeric values as not all
	architectures use the same numeric value.

	When the possible combinations of allowed flags grow, specifying them all can
	be cumbersome.
	This is where the \fIin\fR operator comes handy.
	The system call will be allowed iff the flags set in the argument are included
	(as a set) in the flags in the policy:

	mmap: arg3 in MAP_PRIVATE\|MAP_ANONYMOUS

	This will allow \fBmmap\fR(2) as long as \fIarg3\fR (flags) has any combination
	of MAP_PRIVATE and MAP_ANONYMOUS, but nothing else. One common use of this is
	to restrict \fBmmap\fR(2) / \fBmprotect\fR(2) to only allow write^exec
	mappings:

	mmap: arg2 in ~PROT_EXEC \|\| arg2 in ~PROT_WRITE
	mprotect: arg2 in ~PROT_EXEC \|\| arg2 in ~PROT_WRITE

	.SS "Return Values"

	By default, blocked syscalls call the process to be killed.
	The \fIreturn {NUM}\fR syntax can be used to force a specific errno to be
	returned instead.

	read: return EBADF

	This expression will block the \fBread\fR(2) syscall, make it return -1, and set
	\fBerrno\fR to EBADF (9 on x86 platforms).

	An expression can also include an optional \fIreturn <errno>\fR clause,
	separated by a semicolon:

	read: arg0 == 0; return EBADF

	This is, if the first argument to read is 0, then allow the syscall;
	else, block the syscall, return -1, and set \fBerrno\fR to EBADF.

	.SH SECCOMP_FILTER POLICY WRITING

	Determining policy for seccomp_filter can be time consuming. System
	calls are often named in arch-specific, or legacy tainted, ways. E.g.,
	geteuid versus geteuid32. On process death due to a seccomp filter
	rule, the offending system call number will be supplied with a best
	guess of the ABI defined name. This information may be used to produce
	working baseline policies. However, if the process being contained has
	a fairly tight working domain, using \fBtools/generate_seccomp_policy.py\fR
	with the output of \fBstrace -f -e raw=all <program>\fR can generate the list
	of system calls that are needed. Note that when using libminijail or minijail
	with preloading, supporting initial process setup calls will not be required.
	Be conservative.

	It's also possible to analyze the binary checking for all non-dead
	functions and determining if any of them issue system calls. There is
	no active implementation for this, but something like
	code.google.com/p/seccompsandbox is one possible runtime variant.

	.SH AUTHOR
	The Chromium OS Authors <chromiumos-dev@chromium.org>
	.SH COPYRIGHT
	Copyright \(co 2011 The Chromium OS Authors
	License BSD-like.
	.SH "SEE ALSO"
	.BR minijail0 (1)