Pattern matching with Bash (not grep)

Pattern matching, either on file names or variable contents, is something Bash can do faster and more accurately by itself than with grep. This post tersely describes some cases where bash’s own pattern matching can help, by being faster, easier or better.

Simple substring search on variables

# Check if a variable contains 'foo'. Just to warm up.

# Works
if echo "$var" | grep -q foo
if [[ "$(echo $var | grep foo))" == "" ]]

# Easier and faster 
if [[ $var == *foo* ]] 

The latter runs several hundred times faster by saving two forks (good to know when looping), and the code is cleaner and clearer.

Mixed pattern/fixed string search on variables

This is a less common but more interesting case.

#Check if /usr/bin overrides our install dir

# Mostly works (Can fail if $installdir contains 
# regex characters like . * [ ] etc)
if echo "$PATH" | grep -q "/usr/bin:.*:$installdir"

# Quoted strings will not be interpreted as globs
if [[ $PATH == */usr/bin:*:"$installdir" ]] 

We want parts of our input to be interpreted as regex, and parts to be literal, so neither grep nor fgrep entirely fits. Manually trying to escape regex chars is icky at best. We end up chancing that people won’t use the script anywhere near weirdly named files (like, in their torrent directory). With globs, bash doesn’t have to reparse the variable contents as part of the pattern, and just knows to take quoted strings literally.

Of course, you see how both the above fails to account for cases like /usr/bin:$installdir. This is not something you can easily express in traditional globs, but bash does regex too, and the semantics of quotes remain the same (since 3.2 or so at least):

# Quoted strings will not be interpreted as regex either
if [[ $PATH =~ (^|.*:)/usr/bin(:|:.*:)"$dir"(:.*|$) ]]

Matching file names

I’ll skip the trivial examples for things like `ls | grep .avi$`. Here is a case where traditional globs don’t cut it:

# Copy non-BBC .avi files, and fail on half a dozen different cases
cp $(ls *.avi | grep -v BBC) /stuff

Bash has another form of regular expressions, extglobs (enable with shopt -s extglob). These are mathematically regular, but don’t follow the typical unix regex syntax:

 
# Copy non-BBC .avi files without making a mess 
# when files have spaces or other weird characters
cp !(*BBC*).avi /stuff

man bash contains enough on extglob, so I’d just like to point out one thing. grep -v foo can be replaced by !(foo), which strives to reject “foo” (unlike [^f][^o][^o] and similar attempts which strive to accept). egrep "foo|bar" can be replaced by @(foo|bar) to match one of the patterns. But how about grep foo | grep bar to match both?

That’s our old friend De Morgan: !(@(!(foo)|!(bar))). Don’t you just love that guy?

PS: If you don’t already use parameter expansion to do simple trimming and replacement on variables, now could be a good time to look up that and probably save yourself a lot of sed too.

What’s up with directory hard link counts?

Ever considered the hard link count from ls on directories?

 
vidar@kelvin ~/src $ ls -l
total 108
drwxr-xr-x  4 vidar vidar  4096 2009-11-22 12:52 aml-lsb
drwxr-xr-x 13 vidar vidar  4096 2009-12-13 16:00 delta3d_REL-2.4.0
drwxr-xr-x 23 vidar vidar  4096 2010-02-02 18:22 linux-2.6.32.7
...

For files, this is the number of hard links. You can use find / -samefile filename to find all files that point to the same file inode.

So what does this number mean for directories? Exactly the same thing.

Users, including root, are blocked from creating directory hard links out of the kernel’s mortal fear of cyclical directory trees (or should I say directory graphs?). The kernel still creates them though, specifically in the form of the “.” entry in the directory itself, and “..” in each subdirectory.

An empty directory /foo/bar will have two links, /foo/bar itself, and /foo/bar/.. When creating a subdirectory /foo/bar/baz, you will get the additional hard link /foo/bar/baz/... In other words, the hard link count is the number of subdirectories plus two.

Here’s a party trick for listing directory hard links in bash:

vidar@kelvin ~/src $ ls -ld aml-lsb/{,.,*/..}
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/.
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/bin/..
drwxr-xr-x 4 vidar vidar 4096 2009-11-22 12:52 aml-lsb/lib/..
vidar@kelvin ~/src $ 

Clearly, each of them refers to the same thing, and the numbers add up (if they don’t, shopt -s dotglob)

As a side note, you can use mount --rbind to fake a directory hard link. This will remount a directory and all submounts on some other directory, but will prevent cycles.

You can also use mount --bind to remount without submounts. This can be useful for when you want to copy the contents of a directory that has another file system mounted over it. This is most commonly /dev, which is over-mounted with udev early in the boot process. Many people don’t realize that they have an entire /dev they’ve never seen!

Simple ad-hoc file sharing

There is a distinct lack of simple, ad-hoc file sharing mechanisms for one-off file transfers between platforms. Maintaining an ftp or http server securely and grant users access to files is cumbersome. An ssh guest account opens more than you’d like, and still requires you to somehow grant access to a certain file to a user and then close it. IRC requires that the file is on the box you run the client on (which is often not your local box), and MSN requires that you add people to your contact list, assuming you don’t use it through bitlbee anyways.

Here is a little script I have lying around, I call it wwwshare:

#!/bin/bash

die() { echo "$*" >&2; exit 1; }

[[ $# != 0 ]] || die "Usage: $0 filename"
[[ -f $1 ]]   || die "No such file: $1"

file="$1"
ip=$(curl -s 'http://checkip.dyndns.com/' | sed 's/.* \([0-9.]*\).*/\1/')
port=$((8000 + RANDOM%1000))

echo "http://$ip:$port/$file"

cat - "$file" << EOF | nc -l -p $port -q 0
HTTP/1.1 200 Ok
Content-Type: application/octet-stream
Content-Length: $(stat -c %s "$file")

EOF

Just run wwwshare filename, and it’ll print an URL and start a wannabe http server on a random port (8000-9000) for a single session. When the file is downloaded, it exits. No setup or cleanup required.

Default package selection

There has been some recent drama over Canonical’s decision to not include Gimp in the default Ubuntu installation. The importance of this decision has been blown way out of proportion, and relates to one of the most overrated issues in modern distro wars: default package selection.

Any distro will let you choose which packages to use, both before and after installation. If your dislike of the default set in any way affects your experience, it’s entirely self-inflicted and easily mended. Allow me to illustrate:

Rejecting Ubuntu because it uses Gnome is like rejecting OS X because it's purple

The default package set is like the serving suggestion on the cracker box. It’s there for inspiration – not to limit you.

Using SSH keys from untrusted clients

We all know and love OpenSSH’s scriptability. For example:

# Burn file.iso from 'host' locally without using disk space
ssh host cat file.iso | cdrecord driveropts=burnfree speed=4 - 

# Create a uptime high score list 
for host in hostone hosttwo hostthree hostfour
do 
    echo "$(ssh -o BatchMode=yes $host "cut -d\  -f 1 /proc/uptime" \
                 || echo "0 host is unavailable: ") $host"
done | sort -rn 

The former is something you’d just do from your own box, since you need to be physically present to insert the CD anyways. But what if you want to automate the latter—commands that repetedly poll or invoke something—from a potentially untrustworthy box?

Preferably, you’d use something other than ssh. Perhaps an entry in inetd that invokes the app, or maybe a cgi script (potentially over SSL and with password protection). But let’s say that for whichever reason (firewalls, available utilities, application interfaces) that you do want to use ssh.

In those cases, you won’t be there to type in a password or unlock your ssh keys, and you don’t want someone to just copy the passwordless key and run their own commands.

OpenSSH has a lot of nice features, and some of them relate to limiting what a user can do with a key. If you generate a passwordless key pair with ssh-keygen, you can add the following to .ssh/authorized_keys:

command="uptime" ssh-rsa AAAASsd+olg4(rest of public key follows)

Select the key to use with ssh -i key .... This will make sure that anyone authenticated with this key pair will only be able to run “uptime” and not any other commands (including scp/sftp). This seems clever enough, but we’re not entirely out of the woods yet. SSH supports more than running commands.

Someone might use your key to forward spam via local port forwarding, or they could open a bunch of ports on your remote host and spoof services with remote port forwarding.

Some less well documented authorized_keys options will help:

#This is really just one line: 
command="uptime",   
from="192.168.1.*",
no-port-forwarding,
no-x11-forwarding,
no-pty ssh-rsa AAAASsd+olg4(rest of public key follows)

Now we’ve disabled port forwarding including socks, x11 forwarding (shouldn’t matter, but hey), PTY allocation (due to DoS). And for laughs, we’ve limited the allowed clients to a subnet of IPs.

Clients can still hammer the service, and depending on the command, that could cause DoS. However, we’ve drastically reduced the risks of handing out copies of the key.

Two classic ways of getting owned

There are two classic ways that Linux newbies open themselves up for pranks and shenanigans (or worse): double-su and startx. The double-su will not cause any holes that a crafty conman couldn’t already have arranged, but the startx trick can actually be a serious back door.

The double-su is when you su twice from some other user’s shell. Imagine, if you will, that Vidar just called over the admin of the company’s server, pointed to top where a process is running un-niced at 99% and has racked up hours and hours of cpu time. Vidar makes a big fuss about this, so the admin says “fine, move over”, and does the following at Vidar’s terminal:


vidar@kelvin ~ $ su
Password:
root@kelvin:/home/vidar# renice 19 3156
3156: old priority 0, new priority 19
root@kelvin:/home/vidar# su vidar
vidar@kelvin ~ $

He then scampers off to lunch. Spotted the problem? “su” doesn’t switch to another user’s account; UNIX/Linux doesn’t allow non-root users to do that, even if they have the password. Instead, it starts another shell on top of the old one. Then the admin run su again, creating a third shell on top of the other two. Now, when Vidar exits the third shell, he finds himself back at the second one, with full root access:


vidar@kelvin ~ $ exit
exit
root@kelvin:/home/vidar# echo "Want to buy: Baggy pants and a more suitable job. Love, your admin" >> /etc/issue
root@kelvin:/home/vidar# exit
exit
vidar@kelvin ~ $

The admin clearly should have ended his su-session with exit rather than su originaluser Of course, the real issue here is using “su” on untrusted hardware and software.

If Vidar was evil, he could just as easily have set up a software or hardware keylogger, a spoofed su or simply used strace. This is the reason why the double-su is more of a prank opportunity than an exploit.

Now, startx, on the other hand…! Some users, mostly for leetness, like to log in in text mode and then “startx” to start X, instead of a graphical login. What most of these don’t consider, is that both the shell and startx are still running on the virtual console it was started on.

If the user dutifully locks the screen before attending wetware chores, you can hit Ctrl-Alt-F1 to get to this shell, Ctrl-Z and bg. You now have a shell running as this user. If that isn’t enough, you can killall xscreensaver and Ctrl-Alt-F7. You now have an unlocked X session:


vidar@kelvin ~ $ startx
^Z
[1]+ Stopped startx
vidar@kelvin ~ $ bg
[1]+ startx &
vidar@kelvin ~ $ killall xscreensaver
vidar@kelvin ~ $ clear; exit;

This user should at least have used startx & exit to log off the virtual console when X started.

So how serious is this hole? It depends on how far you’re willing to go. Sure, with physical access you can try all sorts of things, like rebooting with a livecd. If you know there’s a bios password you can’t clear, you can take the disk out. If the disk is encrypted, you can try a cold boot attack. But surely by then, the user’s back and is trying to figure out why you’re pouring liquid nitrogen into his hardware.

It might have been easier to hit him over the head before he locked the screen in the first pace.

More seriously, proper startx usage turns getting your stuff from a trivial act of stealthy espionage into a violent crime or an invasive and time consuming thousand-euro procedure. Don’t underestimate that.

If you can think of any other classical security no-nos being reinvented by every new generation of Linux users, do comment!