# Having fun with seccomp profiles on the edge

The [Security Profiles Operator (SPO)][spo] is a feature-rich
[operator][operator] for Kubernetes to make managing seccomp, SELinux and
AppArmor profiles easier than ever. Recording those profiles from scratch is one
of the key features of this operator, which usually involves the integration
into large CI/CD systems. Being able to test the recording capabilities of the
operator in edge cases is one of the recent development efforts of the SPO and
makes it excitingly easy to play around with seccomp profiles.

[spo]: https://github.com/kubernetes-sigs/security-profiles-operator
[operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator

## Recording seccomp profiles with __spoc record__ 

The [v0.8.0][spo-latest] release of the Security Profiles Operator shipped a new
command line interface called __spoc__ , a little helper tool for recording and
replaying seccomp profiles among various other things that are out of scope of
this blog post.

[spo-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/v0.8.0

Recording a seccomp profile requires a binary to be executed, which can be a
simple golang application which just calls [ __uname(2)__ ][uname]:

```go
package main

import (
	"syscall"
)

func main() {
	utsname := syscall.Utsname{}
	if err := syscall.Uname(&utsname); err != nil {
		panic(err)
	}
}
```

[uname]: https://man7.org/linux/man-pages/man2/uname.2.html

Building a binary from that code can be done by:

```console
> go build -o main main.go
> ldd ./main
        not a dynamic executable
```

Now it's possible to download the latest binary of [ __spoc__ from
GitHub][spoc-latest] and run the application on Linux with it:

[spoc-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/download/v0.8.0/spoc.amd64

```console
> sudo ./spoc record ./main
10:08:25.591945 Loading bpf module
10:08:25.591958 Using system btf file
libbpf: loading object 'recorder.bpf.o' from buffer
…
libbpf: prog 'sys_enter': relo #3: patched insn #22 (ALU/ALU64) imm 16 -> 16
10:08:25.610767 Getting bpf program sys_enter
10:08:25.610778 Attaching bpf tracepoint
10:08:25.611574 Getting syscalls map
10:08:25.611582 Getting pid_mntns map
10:08:25.613097 Module successfully loaded
10:08:25.613311 Processing events
10:08:25.613693 Running command with PID: 336007
10:08:25.613835 Received event: pid: 336007, mntns: 4026531841
10:08:25.613951 No container ID found for PID (pid=336007, mntns=4026531841, err=unable to find container ID in cgroup path)
10:08:25.614856 Processing recorded data
10:08:25.614975 Found process mntns 4026531841 in bpf map
10:08:25.615110 Got syscalls: read, close, mmap, rt_sigaction, rt_sigprocmask, madvise, nanosleep, clone, uname, sigaltstack, arch_prctl, gettid, futex, sched_getaffinity, exit_group, openat
10:08:25.615195 Adding base syscalls: access, brk, capget, capset, chdir, chmod, chown, close_range, dup2, dup3, epoll_create1, epoll_ctl, epoll_pwait, execve, faccessat2, fchdir, fchmodat, fchown, fchownat, fcntl, fstat, fstatfs, getdents64, getegid, geteuid, getgid, getpid, getppid, getuid, ioctl, keyctl, lseek, mkdirat, mknodat, mount, mprotect, munmap, newfstatat, openat2, pipe2, pivot_root, prctl, pread64, pselect6, readlink, readlinkat, rt_sigreturn, sched_yield, seccomp, set_robust_list, set_tid_address, setgid, setgroups, sethostname, setns, setresgid, setresuid, setsid, setuid, statfs, statx, symlinkat, tgkill, umask, umount2, unlinkat, unshare, write
10:08:25.616293 Wrote seccomp profile to: /tmp/profile.yaml
10:08:25.616298 Unloading bpf module
```

I have to run __spoc__ as root because it will internally run an [ebpf][ebpf]
program by reusing the same code parts from the Security Profiles Operator
itself. I can see that the bpf module got loaded successfully and __spoc__ 
attached the required tracepoint to it. Then it will track the main application
by using its [mount namespace][mntns] and process the recorded syscall data. The
nature of ebpf programs is that they see the whole context of the Kernel, which
means that __spoc__ tracks all syscalls of the system, but does not interfere with
their execution.

[ebpf]: https://ebpf.io
[mntns]: https://man7.org/linux/man-pages/man7/mount_namespaces.7.html

The logs indicate that __spoc__ found the syscalls __read__ , __close__ ,
 __mmap__ and so on, including __uname__ . All other syscalls than __uname__ are coming
from the golang runtime and its garbage collection, which already adds overhead
to a basic application like in our demo. I can also see from the log line
 __Adding base syscalls: …__ that __spoc__ adds a bunch of base syscalls to the
resulting profile. Those are used by the OCI runtime (like [runc][runc] or
[crun][crun]) in order to be able to run a container. This means that __spoc__ 
can be used to record seccomp profiles which then can be containerized directly.
This behavior can be disabled in __spoc__ by using the __--no-base-syscalls__ / __-n__ 
or customized via the __--base-syscalls__ / __-b__ command line flags This can be
helpful in cases where different OCI runtimes other than crun and runc are used,
or if I just want to record the seccomp profile for the application and stack
it with another [base profile][base].

[runc]: https://github.com/opencontainers/runc
[crun]: https://github.com/containers/crun
[base]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#base-syscalls-for-a-container-runtime

The resulting profile is now available in __/tmp/profile.yaml__ , but the default
location can be changed using the __--output-file value__ / __-o__ flag:

```console
> cat /tmp/profile.yaml
```

```yaml
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
kind: SeccompProfile
metadata:
  creationTimestamp: null
  name: main
spec:
  architectures:
    - SCMP_ARCH_X86_64
  defaultAction: SCMP_ACT_ERRNO
  syscalls:
    - action: SCMP_ACT_ALLOW
      names:
        - access
        - arch_prctl
        - brk
        - …
        - uname
        - …
status: {}
```

The seccomp profile Custom Resource Definition (CRD) can be directly used
together with the Security Profiles Operator for managing it within Kubernetes.
 __spoc__ is also capable of producing raw seccomp profiles (as JSON), by using the
 __--type__ / __-t__ __raw-seccomp__ flag:

```console
> sudo ./spoc record --type raw-seccomp ./main
…
52.628827 Wrote seccomp profile to: /tmp/profile.json
```

```
> jq . /tmp/profile.json
```

```json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": ["access", "…", "write"],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
```

The utility __spoc record__ allows us to record complex seccomp profiles directly
from binary invocations in any Linux system which is capable of running the ebpf
code within the Kernel. But it can do more: How about modifying the seccomp
profile and then testing it by using __spoc run__ .

## Running seccomp profiles with __spoc run__ 

 __spoc__ is also able to run binaries with applied seccomp profiles, making it
easy to test any modification to it. To do that, just run:

```console
> sudo ./spoc run ./main
10:29:58.153263 Reading file /tmp/profile.yaml
10:29:58.153311 Assuming YAML profile
10:29:58.154138 Setting up seccomp
10:29:58.154178 Load seccomp profile
10:29:58.154189 Starting audit log enricher
10:29:58.154224 Enricher reading from file /var/log/audit/audit.log
10:29:58.155356 Running command with PID: 437880
>
```

It looks like that the application exited successfully, which is anticipated
because I did not modify the previously recorded profile yet. I can also
specify a custom location for the profile by using the __--profile__ / __-p__ flag,
but this was not necessary because I did not modify the default output location
from the record. __spoc__ will automatically determine if it's a raw (JSON) or CRD
(YAML) based seccomp profile and then apply it to the process.

The Security Profiles Operator supports a [log enricher feature][enricher],
which provides additional seccomp related information by parsing the audit logs.
 __spoc run__ uses the enricher in the same way to provide more data to the end
users when it comes to debugging seccomp profiles.

[enricher]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#using-the-log-enricher

Now I have to modify the profile to see anything valuable in the output. For
example, I could remove the allowed __uname__ syscall:

```console
> jq 'del(.syscalls[0].names[] | select(. == "uname"))' /tmp/profile.json > /tmp/no-uname-profile.json
```

And then try to run it again with the new profile __/tmp/no-uname-profile.json__ :

```
> sudo ./spoc run -p /tmp/no-uname-profile.json ./main
10:39:12.707798 Reading file /tmp/no-uname-profile.json
10:39:12.707892 Setting up seccomp
10:39:12.707920 Load seccomp profile
10:39:12.707982 Starting audit log enricher
10:39:12.707998 Enricher reading from file /var/log/audit/audit.log
10:39:12.709164 Running command with PID: 480512
panic: operation not permitted

goroutine 1 [running]:
main.main()
        /path/to/main.go:10 +0x85
10:39:12.713035 Unable to run: launch runner: wait for command: exit status 2
```

Alright, that was expected! The applied seccomp profile blocks the __uname__ 
syscall, which results in an "operation not permitted" error. This error is
pretty generic and does not provide any hint on what got blocked by seccomp.
It is generally extremely difficult to predict how applications behave if single
syscalls are forbidden by seccomp. It could be possible that the application
terminates like in our simple demo, but it could also lead to a strange
misbehavior and the application does not stop at all.

If I now change the default seccomp action of the profile from __SCMP_ACT_ERRNO__ 
to __SCMP_ACT_LOG__ like this:

```console
> jq '.defaultAction = "SCMP_ACT_LOG"' /tmp/no-uname-profile.json > /tmp/no-uname-profile-log.json
```

Then the log enricher will give us a hint that the __uname__ syscall got blocked
when using __spoc run__ :

```
> sudo ./spoc run -p /tmp/no-uname-profile-log.json ./main
10:48:07.470126 Reading file /tmp/no-uname-profile-log.json
10:48:07.470234 Setting up seccomp
10:48:07.470245 Load seccomp profile
10:48:07.470302 Starting audit log enricher
10:48:07.470339 Enricher reading from file /var/log/audit/audit.log
10:48:07.470889 Running command with PID: 522268
10:48:07.472007 Seccomp: uname (63)
```

The application will not terminate any more, but seccomp will log the behavior
to __/var/log/audit/audit.log__ and __spoc__ will parse the data to correlate it
directly to our program. Generating the log messages to the audit subsystem
comes with a large performance overhead and should be handled with care in
production systems. It also comes with a security risk when running untrusted
apps in audit mode in production environments.

This demo should give you an impression how to debug seccomp profile issues with
applications, probably by using our shiny new helper tool powered by the
features of the Security Profiles Operator. __spoc__ is a flexible and portable
binary suitable for edge cases where resources are limited and even Kubernetes
itself may not be available with its full capabilities.

Thank you for reading this blog post! If you're interested in more, providing
feedback or asking for help, then feel free to get in touch with us directly via
[Slack (#security-profiles-operator)][slack] or the [mailing list][mail].

[slack]: https://kubernetes.slack.com/messages/security-profiles-operator
[mail]: https://groups.google.com/forum/#!forum/kubernetes-dev