Now that everyone and their grandmother have at least two cores, you can double the efficiency by distributing the workload. However, multithreading support in pure shell scripts is terrible, even though you often do things that can take a while, like encoding a bunch of chip tunes to ogg vorbis:
mkdir ogg for file in *.mod do xmp -d wav -o - "$file" | oggenc -q 3 -o "ogg/$file.ogg" done
This is exactly the kind of operation that is conceptually trivial to parallelize, but not obvious to implement in a shell script. Sure, you could run them all in the background and wait
for them, but that will give you a load average equal to the number of files. Not fun when there are hundreds of files.
You can run two (or however many) in the background, wait
and then start two more, but that’ll give terrible performance when the jobs aren’t of roughly equal length, since at the end, the longest running job will be blocking the other eager cores.
Instead of listing ways that won’t work, I’ll get to the point: GNU (and FreeBSD) xargs
has a -P
for specifying the number of jobs to run in parallel!
Let’s rewrite that conversion loop to parallelize
mod2ogg() { for arg; do xmp -d wav -o - "$arg" | oggenc -q 3 -o "ogg/$arg.ogg" -; done } export -f mod2ogg find . -name '*.mod' -print0 | xargs -0 -n 1 -P 2 bash -c 'mod2ogg "$@"' --
And if we already had a mod2ogg script, similar to the function just defined, it would have been simpler:
find . -name '*.mod' -print0 | xargs -0 -n 1 -P 2 mod2ogg
Voila. Twice as fast, and you can just increase the -P
with fancier hardware.
I also added -n 1
to xargs here, to ensure an even distribution of work. If the work units are so small that executing the command starts becoming a sizable portion of it, you can increase it to make xargs run mod2ogg with more files at a time (which is why it’s a loop in the example).