RAM-loadable Linux on a stick

I wanted to play some SNES games with a friend on one of a dozen public windows boxes, but I didn’t want to start downloading ROMs and installing zsnes on them. The simple solution was to just make a bootable USB memory stick with Ubuntu and boot from that on whichever box was available at the time.

The boxes turned out to have more horsepower than I assumed, and conveniently came with Linux-friendly Intel GPUs, so I wanted to try out OpenArena. Of course, then you need multiple windows boxes, and I just had one memory stick. Time to make it load and run entirely from memory, so the memory stick can be unplugged and used to boot other boxes.

Thanks to the fantastic initramfs mechanism, the best Linux feature since UUID partition selection (initrd wasn’t nearly as sweet), this is very easy to do, even when the distro doesn’t support it. Here are some hints on how to do it:

  1. Install on a memory stick. These days, you can conveniently do this in a VM and still expect it to boot on a PC: kvm -hda /dev/sdb -cdrom somedistro.iso -boot d -m 2200 -net nic -net user -k en-us. A minimal install is preferable, loading GNOME from a slow memory stick just to cover it with OpenArena is a waste.
  2. Ubuntu installs GRUB with UUID partitions, but Debian does not, so in that case you have to update menu.list: replace root=/dev/hda1 with root=UUID=<uuid from tune2fs -l here>
  3. Debian has a fancy system for adding initramfs hooks (see /etc/initramfs-tools) that will survive kernel upgrades, but for generality (and not lazyness at all, no siree), we’ll do it the hacked up manual way: Make a new directory and unpack the initramfs: gzip -d < /boot/initrd.img-2.6.26-2-686 | cpio -i
  4. vim init. Find the place where the root fs has just been mounted, and add some code to mount --move it, mount a tmpfs big enough to hold all the files, copy all the files from the real root and then unmount it:

    echo "Press a key to not load to RAM"
    if ! read -t 3 -n 1 k
    then
        realroot=/tmp/realroot
    
        mkdir "$realroot"
        mount --move "$rootmnt" "$realroot"
        mount -t tmpfs -o size=90% none "$rootmnt"
        echo
        echo "Copying files, wait..."
        cp -a "$realroot"/* "$rootmnt"
        umount "$realroot"
        echo "Done"
    fi
    

    Exercises for the reader: Add a progress meter to make the 1-2 minute load time more bearable.

  5. Pack the initramfs back up: find . | cpio -o -H newc | gzip -9 > /boot/initrd.img-2.6.26-2-686
  6. Boot (still in the VM, if you want) and hit a key when prompted so you're running straight from the stick, install all the packages you want, and configure them the way you want them. In my case, I made the stick boot straight into X, running fluxbox and iDesk to make a big shiny Exit icon that would reboot the box (returning it to Windows), just in case any laymen wandered in on it.
  7. Very important: apt-get clean. I had 500MB of cached packages the first time around, which is half a gig of lost memory and an additional minute of load time.
  8. Try booting it from RAM. Make sure you remember if you're running in RAM or not when configuring, or all changes will be lost.

Debian required some kludges in the checkroot.sh init script to make it not die when the root fs wasn't on disk and thus failed to check, but Ubuntu was very smooth about it. Still, no big deal.

In the end, I had a 1000MB installation that could easily turn a dull park of windows web browsing boxes into a LAN party with no headaches for the administrator. Game on.

Pattern matching with Bash (not grep)

Pattern matching, either on file names or variable contents, is something Bash can do faster and more accurately by itself than with grep. This post tersely describes some cases where bash’s own pattern matching can help, by being faster, easier or better.

Simple substring search on variables

# Check if a variable contains 'foo'. Just to warm up.

# Works
if echo "$var" | grep -q foo
if [[ "$(echo $var | grep foo))" == "" ]]

# Easier and faster 
if [[ $var == *foo* ]] 

The latter runs several hundred times faster by saving two forks (good to know when looping), and the code is cleaner and clearer.

Mixed pattern/fixed string search on variables

This is a less common but more interesting case.

#Check if /usr/bin overrides our install dir

# Mostly works (Can fail if $installdir contains 
# regex characters like . * [ ] etc)
if echo "$PATH" | grep -q "/usr/bin:.*:$installdir"

# Quoted strings will not be interpreted as globs
if [[ $PATH == */usr/bin:*:"$installdir" ]] 

We want parts of our input to be interpreted as regex, and parts to be literal, so neither grep nor fgrep entirely fits. Manually trying to escape regex chars is icky at best. We end up chancing that people won’t use the script anywhere near weirdly named files (like, in their torrent directory). With globs, bash doesn’t have to reparse the variable contents as part of the pattern, and just knows to take quoted strings literally.

Of course, you see how both the above fails to account for cases like /usr/bin:$installdir. This is not something you can easily express in traditional globs, but bash does regex too, and the semantics of quotes remain the same (since 3.2 or so at least):

# Quoted strings will not be interpreted as regex either
if [[ $PATH =~ (^|.*:)/usr/bin(:|:.*:)"$dir"(:.*|$) ]]

Matching file names

I’ll skip the trivial examples for things like `ls | grep .avi$`. Here is a case where traditional globs don’t cut it:

# Copy non-BBC .avi files, and fail on half a dozen different cases
cp $(ls *.avi | grep -v BBC) /stuff

Bash has another form of regular expressions, extglobs (enable with shopt -s extglob). These are mathematically regular, but don’t follow the typical unix regex syntax:

 
# Copy non-BBC .avi files without making a mess 
# when files have spaces or other weird characters
cp !(*BBC*).avi /stuff

man bash contains enough on extglob, so I’d just like to point out one thing. grep -v foo can be replaced by !(foo), which strives to reject “foo” (unlike [^f][^o][^o] and similar attempts which strive to accept). egrep "foo|bar" can be replaced by @(foo|bar) to match one of the patterns. But how about grep foo | grep bar to match both?

That’s our old friend De Morgan: !(@(!(foo)|!(bar))). Don’t you just love that guy?

PS: If you don’t already use parameter expansion to do simple trimming and replacement on variables, now could be a good time to look up that and probably save yourself a lot of sed too.

What’s up with directory hard link counts?

Ever considered the hard link count from ls on directories?

 
vidar@kelvin ~/src $ ls -l
total 108
drwxr-xr-x  4 vidar vidar  4096 2009-11-22 12:52 aml-lsb
drwxr-xr-x 13 vidar vidar  4096 2009-12-13 16:00 delta3d_REL-2.4.0
drwxr-xr-x 23 vidar vidar  4096 2010-02-02 18:22 linux-2.6.32.7
...

For files, this is the number of hard links. You can use find / -samefile filename to find all files that point to the same file inode.

So what does this number mean for directories? Exactly the same thing.

Users, including root, are blocked from creating directory hard links out of the kernel’s mortal fear of cyclical directory trees (or should I say directory graphs?). The kernel still creates them though, specifically in the form of the “.” entry in the directory itself, and “..” in each subdirectory.

An empty directory /foo/bar will have two links, /foo/bar itself, and /foo/bar/.. When creating a subdirectory /foo/bar/baz, you will get the additional hard link /foo/bar/baz/... In other words, the hard link count is the number of subdirectories plus two.

Here’s a party trick for listing directory hard links in bash:

vidar@kelvin ~/src $ ls -ld aml-lsb/{,.,*/..}
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/.
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/bin/..
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/lib/..
vidar@kelvin ~/src $ 

Clearly, each of them refers to the same thing, and the numbers add up (if they don’t, shopt -s dotglob)

As a side note, you can use mount --rbind to fake a directory hard link. This will remount a directory and all submounts on some other directory, but will prevent cycles.

You can also use mount --bind to remount without submounts. This can be useful for when you want to copy the contents of a directory that has another file system mounted over it. This is most commonly /dev, which is over-mounted with udev early in the boot process. Many people don’t realize that they have an entire /dev they’ve never seen!

Simple ad-hoc file sharing

There is a distinct lack of simple, ad-hoc file sharing mechanisms for one-off file transfers between platforms. Maintaining an ftp or http server securely and grant users access to files is cumbersome. An ssh guest account opens more than you’d like, and still requires you to somehow grant access to a certain file to a user and then close it. IRC requires that the file is on the box you run the client on (which is often not your local box), and MSN requires that you add people to your contact list, assuming you don’t use it through bitlbee anyways.

Here is a little script I have lying around, I call it wwwshare:

#!/bin/bash

die() { echo "$*" >&2; exit 1; }

[[ $# != 0 ]] || die "Usage: $0 filename"
[[ -f $1 ]]   || die "No such file: $1"

file="$1"
ip=$(curl -s 'http://checkip.dyndns.com/' | sed 's/.* \([0-9.]*\).*/\1/')
port=$((8000 + RANDOM%1000))

echo "http://$ip:$port/$file"

cat - "$file" << EOF | nc -l -p $port -q 0
HTTP/1.1 200 Ok
Content-Type: application/octet-stream
Content-Length: $(stat -c %s "$file")

EOF

Just run wwwshare filename, and it’ll print an URL and start a wannabe http server on a random port (8000-9000) for a single session. When the file is downloaded, it exits. No setup or cleanup required.

Default package selection

There has been some recent drama over Canonical’s decision to not include Gimp in the default Ubuntu installation. The importance of this decision has been blown way out of proportion, and relates to one of the most overrated issues in modern distro wars: default package selection.

Any distro will let you choose which packages to use, both before and after installation. If your dislike of the default set in any way affects your experience, it’s entirely self-inflicted and easily mended. Allow me to illustrate:

Rejecting Ubuntu because it uses Gnome is like rejecting OS X because it's purple

The default package set is like the serving suggestion on the cracker box. It’s there for inspiration – not to limit you.

Using SSH keys from untrusted clients

We all know and love OpenSSH’s scriptability. For example:

# Burn file.iso from 'host' locally without using disk space
ssh host cat file.iso | cdrecord driveropts=burnfree speed=4 - 

# Create a uptime high score list 
for host in hostone hosttwo hostthree hostfour
do 
    echo "$(ssh -o BatchMode=yes $host "cut -d\  -f 1 /proc/uptime" \
                 || echo "0 host is unavailable: ") $host"
done | sort -rn 

The former is something you’d just do from your own box, since you need to be physically present to insert the CD anyways. But what if you want to automate the latter—commands that repetedly poll or invoke something—from a potentially untrustworthy box?

Preferably, you’d use something other than ssh. Perhaps an entry in inetd that invokes the app, or maybe a cgi script (potentially over SSL and with password protection). But let’s say that for whichever reason (firewalls, available utilities, application interfaces) that you do want to use ssh.

In those cases, you won’t be there to type in a password or unlock your ssh keys, and you don’t want someone to just copy the passwordless key and run their own commands.

OpenSSH has a lot of nice features, and some of them relate to limiting what a user can do with a key. If you generate a passwordless key pair with ssh-keygen, you can add the following to .ssh/authorized_keys:

command="uptime" ssh-rsa AAAASsd+olg4(rest of public key follows)

Select the key to use with ssh -i key .... This will make sure that anyone authenticated with this key pair will only be able to run “uptime” and not any other commands (including scp/sftp). This seems clever enough, but we’re not entirely out of the woods yet. SSH supports more than running commands.

Someone might use your key to forward spam via local port forwarding, or they could open a bunch of ports on your remote host and spoof services with remote port forwarding.

Some less well documented authorized_keys options will help:

#This is really just one line: 
command="uptime",   
from="192.168.1.*",
no-port-forwarding,
no-x11-forwarding,
no-pty ssh-rsa AAAASsd+olg4(rest of public key follows)

Now we’ve disabled port forwarding including socks, x11 forwarding (shouldn’t matter, but hey), PTY allocation (due to DoS). And for laughs, we’ve limited the allowed clients to a subnet of IPs.

Clients can still hammer the service, and depending on the command, that could cause DoS. However, we’ve drastically reduced the risks of handing out copies of the key.