Little cabin on nightstand in altbau in Berlin, Germany. Available for rent for your next photoshoot via beazy.co.

Computers exist for doing work, usually useful, often not. In rare instances, it's useful to make a program do nothing at all. My primary use case for this is Docker containers, where it's useful to have the container do nothing, so it can be exec-d into as part of another process (eg VSCode dev containers). For that to work, the container needs a CMD which isn't the application itself, but is at least something to keep the container alive.

One of the most popular ways this is implemented is using tail -f /dev/null. Reading from a file which will never return any content will hang forever, and be pretty efficient doing it. It also has the benefit that because tail is part of coreutils, it's installed by default everywhere - no extra dependencies needed.

However, as with many popular techniques, it's not always the best. To understand why, we need look at what tail is actually doing, using strace. This is far from the place to go into the specifics, But strace lets us see what syscalls a process is making when it talk sot the kernel. Whether that's reading files, reading shared libraries, running subprocesses etc.

If we run tail -f /dev/null through strace, we get this:

Bash Session
$ strace tail -f /dev/null
execve("/usr/bin/tail", ["tail", "-f", "/dev/null"], 0x7ffc0bda28d0 /* 74 vars */) = 0
brk(NULL)                               = 0x55bdddfcf000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffca9262430) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=251875, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 251875, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe9d0e76000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20:\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1961632, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe9d0e74000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2006640, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fe9d0c8a000
mmap(0x7fe9d0cac000, 1429504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7fe9d0cac000
mmap(0x7fe9d0e09000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17f000) = 0x7fe9d0e09000
mmap(0x7fe9d0e61000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d6000) = 0x7fe9d0e61000
mmap(0x7fe9d0e67000, 52848, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fe9d0e67000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe9d0c87000
arch_prctl(ARCH_SET_FS, 0x7fe9d0c87740) = 0
set_tid_address(0x7fe9d0c87a10)         = 7628
set_robust_list(0x7fe9d0c87a20, 24)     = 0
rseq(0x7fe9d0c88060, 0x20, 0, 0x53053053) = 0
mprotect(0x7fe9d0e61000, 16384, PROT_READ) = 0
mprotect(0x55bddce41000, 4096, PROT_READ) = 0
mprotect(0x7fe9d0ee5000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7fe9d0e76000, 251875)          = 0
getrandom("\x34\xac\xfa\x57\xe0\x4d\x2f\x21", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x55bdddfcf000
brk(0x55bdddff0000)                     = 0x55bdddff0000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=3057440, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 3057440, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe9d0800000
close(3)                                = 0
openat(AT_FDCWD, "/dev/null", O_RDONLY) = 3
newfstatat(3, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0
read(3, "", 8192)                       = 0
newfstatat(3, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0
fstatfs(3, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=1999709, f_bfree=1999709, f_bavail=1999709, f_files=1999709, f_ffree=1999121, f_fsid={val=[0x176964cd, 0x94f003ee]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_RELATIME}) = 0
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}, AT_EMPTY_PATH) = 0
newfstatat(AT_FDCWD, "/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_SYMLINK_NOFOLLOW) = 0
read(3, "", 8192)                       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffca92622b0) = 0
read(3, "", 8192)                       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffca92622b0) = 0
read(3, "", 8192)                       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffca92622b0) = 0
read(3, "", 8192) 
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffca92622b0) = 0
read(3, "", 8192) 
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffca92622b0) = 0
read(3, "", 8192) 
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffca92622b0) = 0
read(3, "", 8192)
...w

This is quite a lot to read, and I don't entirely understand it all - but that's ok, because we don't need to. The important lines are the repeated lines at the bottom:

clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffca92622b0) = 0
read(3, "", 8192)

What we expect is that tail never receives content from /dev/null, and thus never does anything. As it turns out, this is completely false. What appears to actually be happening is tail tries to read the file, gets nothing back, waits for a second, then tries again.

The reason for this is the default behaviour for tail. When you pass -f, tail reads the entire content of the file as-is. Once it's run out of content, it falls back to polling the file every second for changes, and displaying those. You can control the interval with -s. This is what we're seeing in strace. /dev/null isnt' actually a file which never yields data, it's just an empty file when you read from it (the magic comes when you write to it).

Whilst we might think out command is doing nothing, it's actually doing something. Sure, that something is very very little, but what if we want to do better, to achieve as close to nothing as is physically possible?

As much of a surprise as it may be, Windows does this surprisingly well. If you're writing a batch script, you could use the pause command to output the famous "Press any key to continue..." text, and wait for user input before continuing. It's not a perfect replacement for doing absolutely nothing, but it's pretty close. My knowledge of Windows kernel internals is pretty limited, so I'm not sure what exactly it's doing under the hood, but I'd hope it's not much. The bad news is, there's no pause command on Linux, but what there is is a pause syscall.

The pause syscall does exactly what it says on the tin: It pauses. But, it does is in a very efficient way. pause forces the process to do nothing until it receives a signal. Because all of this happens in the process scheduler in the kernel, the process itself does exactly nothing until it receives a signal, making it incredibly efficient.

#How people normally wait

If you're developing a program of some kind, it's quite common that you need to wait a set amount of time for something to happen. Python has time.sleep, Rust has std::thread::sleep, and in the shell there's the sleep command. Pass it a time of some type, and sleep will stop executing the program until that amount of time has passed (or as close as the kernel can get). Because the kernel is aware of sleeping processes, there's no need to context switch back to the application, fractionally reducing system load.

If you want to sleep "forever", you could just pass a very big number to sleep, like 31557600 (1 year in seconds). 1 year is a long way from "forever", but it's also kinda close, so it might be "good enough" for your use case. "good enough" isn't good enough for me though, not this time - I still want the perfect solution.

We can get closer, but to do so, we need to talk about everyone's favourite: Floating Point.

#Floating Point

Floating point is a way for computers to represent both large and precise numbers relatively efficiently. I'm not going to try and explain it further than that, I'll let someone else:

A side effect of floating point is that it's possible to represent a few more interesting values. Most people are well aware of the number 0, but to computers, there's 2: -0 and +0. And, to make matters more interesting, especially for our needs, there's -Infinity and +Infinity.

#To infinity, and beyond

Armed with this knowledge of floating point, can it be applied to sleep? Sleep's man page says it only takes numbers. But if we can represent infinity numerically, can we pass it in somehow? Because there's support in the underlying libraries, we can, but for completely undocumented reasons:

Bash Session
$ sleep infinity
...

It works, but contradicts the man pages. How?

Bash Session
$ strace sleep infinity
execve("/usr/bin/sleep", ["sleep", "infinity"], 0x7ffdad6c62e8 /* 74 vars */) = 0
brk(NULL)                               = 0x557949f43000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffcd6850470) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=251775, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 251775, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f37e9547000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20:\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1961632, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37e9545000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2006640, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f37e935b000
mmap(0x7f37e937d000, 1429504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f37e937d000
mmap(0x7f37e94da000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17f000) = 0x7f37e94da000
mmap(0x7f37e9532000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d6000) = 0x7f37e9532000
mmap(0x7f37e9538000, 52848, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f37e9538000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37e9358000
arch_prctl(ARCH_SET_FS, 0x7f37e9358740) = 0
set_tid_address(0x7f37e9358a10)         = 6255
set_robust_list(0x7f37e9358a20, 24)     = 0
rseq(0x7f37e9359060, 0x20, 0, 0x53053053) = 0
mprotect(0x7f37e9532000, 16384, PROT_READ) = 0
mprotect(0x557949e3e000, 4096, PROT_READ) = 0
mprotect(0x7f37e95b6000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f37e9547000, 251775)          = 0
getrandom("\x4e\x4d\x8e\x33\x89\x0f\x1a\xeb", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x557949f43000
brk(0x557949f64000)                     = 0x557949f64000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=3057440, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 3057440, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f37e9000000
close(3)                                = 0
pause(

It seems the coreutils (where the sleep command comes from) developers are one step ahead of us. You might think it'd just pass an incredibly large number to clock_nanosleep, but instead, it does better. sleep infinity is smart enough to know "infinity" will never come, and simply use pause instead - a kind of infinite sleep when you think about it. This is the perfect solution we've been looking for.

#How to sleep for infinity

A little tangent into the depths of coreutils and something I rarely enjoy: reading C.

<tangent>

Sent from my Python-powered web server, through Nginx, and originally written in an Electron app on a linux laptop.

</tangent>

The sleep command takes a single argument, "The duration, in seconds", according to its man page. Because humans hate large numbers, suffixes are also supported (eg 6h).

Under the hood, sleep calls to xstrtod, an alternative to strtod - both of which convert a string to a double. The exact implementation isn't important, but they have a few cases where they don't quite do this, specifically, when said input string is "Infinity" (case-insensitive). In this case, the returned value is Infinity, represented as a double. This value is then passed to xnanosleep, which has special handling for environments which support pause, to call it if the provided seconds is far too large.

Diving further, to find out how pause actually worked requires passing through into libc, and further into the kernel's process scheduler, 2 places I'd rather not go - It's a scary place down there!

#Pause in other places

pause itself is a syscall, so it can be called from many different languages, so long as they have an interface.

If you need to call pause from other languages, there are likely options, such as in Python and Rust. However, if you're building an application, there are likely alternatives to consider, or some other kinds of processing or error checking your process should be doing instead. Using pause isn't necessarily perfect, and has some disadvantages

In a shell script, as I mentioned, there's no pause binary like there is on windows, so you'd need to run sleep infinity yourself.

#GNU's Not Universal

I've been talking about Linux and GNU's coreutils like it's everything - but it's not. For example, there's BSD.

BSD, namely FreeBSD and OpenBSD, both have their own implementations of libc, so have pause methods to influence the kernel's scheduler. Similarly, both ship with implementations of sleep. However, both are strict and sensible enough to follow their own man pages. Neither supports sleep infinity. If you want to pause from a bash script, you'll need to find another way.

#Application

Now we know it exists, we can stop using tail -f /dev/null, and use sleep infinity, for the smallest possible micro optimisation.

If this change makes any difference to your workflow, you're probably doing something else wrong. If you just want to open a few "technically correct" Pull Requests, then now you have something to link to!

Share this page

Similar content

View all →

None

Docker in LXC

Docker is a great containerization technology for running applications. It keeps multiple applications completely isolated from each other, only allowing connections exactly when you tell them to. But what if you’re on a hypervisor? You want your host OS to be as lean as possible (else it defeats the point),…

YouTube on a phone

Casting YouTube videos from Linux

2023-02-23
4 minutes

Over the past few months, I've been watching a lot more content on the TV, sat on the sofa, than at my desk like I used to. The bigger screen is much more enjoyable, not to mention it's a different seat to the one I work in 8 hours a…