If you’ve ever tried to use ssh
, and similarly ffmpeg
or mplayer
, in a while read
loop, you’ll have stumbled upon a surprising interaction: the loop mysteriously aborts after the first iteration!
The solution, using ssh -n
or ssh < /dev/null
, is quickly spoiled by ShellCheck (shellcheck this code online), but why stop there? Let’s take a deep dive into the technical details surrounding this issue.
Note that all numbers given here are highly tool and platform specific. They apply to GNU coreutils 8.26, Linux 4.9 and OpenSSH 7.4, as found on Debian in July, 2017. If you use a different platform, or even just sufficiently newer versions, and wish to repeat the experiments, you may have to follow the reasoning and update the numbers accordingly.
Anyways, to first demonstrate the problem, here’s a while read
loop that runs a command for each line in a file:
while IFS= read -r host
do
echo ssh "$host" uptime
done < hostlist.txt
It works perfectly and iterates over all lines in the file:
ssh localhost uptime
ssh 10.0.0.4 uptime
ssh 10.0.0.7 uptime
However, if we remove the echo
and actually run ssh
, it will stop after the first iteration with no warnings or errors:
16:12:41 up 21 days, 4:24, 12 users, load average: 0.00, 0.00, 0.00
Even uptime
itself works fine, but ssh localhost uptime
will stop after the first one, even though it runs the same command on the same machine.
Of course, applying the aforementioned fix, ssh -n
or ssh < /dev/null
solves the problem and gives the expected result:
16:14:11 up 21 days, 4:24, 12 users, load average: 0.00, 0.00, 0.00
16:14:11 up 14 days, 6:59, 15 users, load average: 0.00, 0.00, 0.00
01:14:13 up 73 days, 13:17, 8 users, load average: 0.08, 0.15, 0.11
If we merely wanted to fix the problem though, we'd just have followed ShellCheck's advice from the start. Let's keep digging.
You see similar behavior if you try to use ffmpeg
to convert media or mplayer
to play it. However, now it's even worse: not only does it stop after one iteration, it may abort in the middle of the first one!
All other commands work fine -- even other converters, players and ssh-based commands like sox
, vlc
and scp
. Why do certain commands fail?
The root of the problem is that when you pipe or redirect to a while read
loop, you're not just redirecting to read
but to the entire loop body. Everything in both condition and body will share the same file descriptor for standard input. Consider this loop:
while IFS= read -r line
do
cat > rest
done < file.txt
First read
will successfully read a line and start the first iteration. Then cat
will read from the same input source, where read
left off. It reads until EOF and exits, and the loop iterates. read
again tries to read from the same input, which remains at EOF. This terminates the loop. In effect, the loop only iterated once.
The question remains, though: why do our three commands in particular drain stdin?
ffmpeg
and mplayer
are simple cases. They both accept keyboard controls from stdin.
While ffmpeg
encodes a video, you can use '+' to make the process more verbose or 'c' to input filter commands. While mplayer
plays a video, you can use 'space' to pause or 'm' to mute. The commands drain stdin while processing these keystrokes.
They both also share a shortcut to quit: they will stop abruptly if any of the input they read is a "q".
But why ssh
? Shouldn't it mirror the behavior of the remote command? If uptime
doesn't read anything, why should ssh localhost uptime
?
The Unix process model has no good way to detect when a process wants input. Instead, ssh
has to preemptively read data, send it over the wire, and have sshd offer it on a pipe to the process. If the process doesn't want it, there's no way to return the data to the FD from whence it came.
We get a toy version of the same problem with cat | uptime
. Output in this case is the same as when using ssh localhost uptime
:
16:25:51 up 21 days, 4:34, 12 users, load average: 0.16, 0.03, 0.01
In this case, cat
will read from stdin and write to the pipe until the pipe's buffer is full, at which time it'll block until something reads. Using strace
, we can see that GNU cat from coreutils 8.26 uses a 128KiB buffer -- more than Linux's current 64KiB pipe buffer -- so one 128KiB buffer is the amount of data we can expect to lose.
This implies that the loop doesn't actually abort. It will continue if there is still data left after 128KiB has been read from it. Let's try that:
{
echo first
for ((i=0; i < 16384; i++)); do echo "garbage"; done
echo "second"
} > file
while IFS= read -r line
do
echo "Read $line"
cat | uptime > /dev/null
done < file
Here, we write 16386 lines to the file. "first", 16384 lines of "garbage", followed by "second". "garbage" + linefeed is 8 bytes, so 16384 of them make up exactly 128KiB. The file prevents any race conditions between producer and consumer.
Here's what we get:
Read first
Read second
If we add a single line additional line of "garbage", we'll see that instead. If we write one less, "second" disappears. In other words, the expected 128KiB of data were lost between iterations.
ssh
has to do the same thing, but more: it has to read input, encrypt it, and transmit it over the wire. On the other side, sshd
receives it, decrypts it, and feeds it into the pipe. Both sides work asynchronously in duplex mode, and one side can shut down the channel at any time.
If we use ssh localhost uptime
we're racing to see how much data we can push before sshd
notifies us that the command has already exited. The faster the computer and slower the roundtrip time, the more we can write. To avoid this and ensure deterministic results, we'll use sleep 5
instead of uptime
from now on.
Here's one way of measuring how much data we write:
$ tee >(wc -c >&2) < /dev/zero | { sleep 5; }
65536
Of course, by showing how much it writes, it doesn't directly show how much sleep
reads: the 65536 bytes here is the Linux pipe buffer size.
This is also not a general way to get exact measurements because it relies on buffers aligning perfectly. If nothing is reading from the pipe, you can successfully write two blocks of 32768 bytes, but only one block of 32769.
Fortunately, GNU tee
currently uses a buffer size of 8192, so given 8 full reads, it will perfectly fill the 65536 byte pipe buffer. strace
also reveals that ssh
(OpenSSH 7.4) uses a buffer size of 16384, which is exactly 2x of tee
and 1/4x of the pipe buffer, so they will all align nicely and give an accurate count.
Let's try it with ssh
:
$ tee >(wc -c >&2) < /dev/zero | ssh localhost sleep 5
2228224
As discussed, we'll subtract the pipe buffer, so we can surmise that 2162688 bytes has been read by ssh
. We can verify this manually with strace
if we want. But why 2162688?
On the other side, sshd
has to feed this data into sleep
through a pipe with no readers. That's another 65536. We're now left with 2097152 bytes. How can we account for these?
This number is in fact the OpenSSH transport layer's default window size for non-interactive channels!
Here's an excerpt from channels.h in the OpenSSH source code:
/* default window/packet sizes for tcp/x11-fwd-channel */
#define CHAN_SES_PACKET_DEFAULT (32*1024)
#define CHAN_SES_WINDOW_DEFAULT (64*CHAN_SES_PACKET_DEFAULT)
There it is: 64*32*1024 = 2097152.
If we adapt our previous example to use ssh anyhost sleep 5
and write "garbage"
(64*32*1024+65536)/8 = 270336 times, we can again game the buffers and cause our iterator to get exactly the lines we want:
{
echo first
for ((i=0; i < $(( (64*32*1024 + 65536) / 8)); i++)); do echo "garbage"; done
echo "second"
} > file
while IFS= read -r line
do
echo "Read $line"
ssh localhost sleep 5
done < file
Again, this results in:
Read first
Read second
An entirely useless experiment of course, but pretty nifty!