tl;dr: dd
works for reading and writing disks, but it has no "low level I/O" capabilities that make it more suited for this than any other shell utility. Like cat
you should use it where it makes sense, e.g. to take advantage of its wide array of options, rather than try to ensure that all disk related commands begin and end with dd
out of fear and superstition.
If you’ve ever used dd
, you’ve probably used it to read or write disk images:
# Write myfile.iso to a USB drive
dd if=myfile.iso of=/dev/sdb bs=1M
Usage of dd
in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices. Want to read from a raw device? Use dd
. Want to write to a raw device? Use dd
.
This belief adds unnecessary complexity to simple commands. How do you combine dd
with gzip? How do you use pv if the source is raw device? How do you dd
over ssh?
People cleverly find ways to insert dd
at the front and end of pipelines. dd if=/dev/sda | gzip > image.gz
, they say. dd if=/dev/sda | pv | dd of=/dev/sdb
.
In both these cases, dd
serves no real purpose. It’s purely a superstitious charm trying to ensure safe passage of the data. You can see how silly this is when you replace dd
with the functionally equivalent cat
: cat /dev/sda | pv | cat > /dev/sdb
The fact of the matter is, dd
is not a disk writing tool. Neither “d” is for “disk”, “drive” or “device”. It does not support “low level” reading or writing. It has no special dominion over any kind of device whatsoever.
dd
just reads and writes file.
On UNIX, the adage goes, everything is a file. This includes raw disks. Since raw disks are files, and dd
can be used to copy files, dd
be used to copy raw disks.
But do you know what else can read and write files? Everything:
# Write myfile.iso to a USB drive
cp myfile.iso /dev/sdb
# Rip a cdrom to a .iso file
cat /dev/cdrom > myfile.iso
# Create a gzipped image
gzip -9 < /dev/sdb > /tmp/myimage.gz
dd
uses the same interface these commands do, and is not any safer or more reliable.
dd
can even end up doing a worse job. By specification, its default 512 block size has had to remain unchanged for decades. Today, this tiny size makes it CPU bound by default. A script that doesn’t specify a block size is very inefficient, and any script that picks the current optimal value may slowly become obsolete — or start obsolete if it’s copied from
Meanwhile, cat
is free to choose its buffer size that best serves a modern system, and the GNU cat
buffer size has grown steadily over the years from 512 bytes in 1991 to 131072 bytes in 2014. src/ioblksize.h
in the coreutils source code has benchmarks backing up this decision.
However, this does not mean that dd
should be categorically shunned! The reason why people started using it in the first place is that it does exactly what it’s told: no more and no less.
If an alias specifies -a
, cp
might try to create a new block device instead of a copy of the file data. If using gzip
without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.
dd
, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files.
However, when this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd
out entirely.
This is not to say I think dd
is overrated! Au contraire! It’s one of my favorite Unix tools!
dd
is the swiss army knife of the open, read, write and seek syscalls. It’s unique in its ability to issue seeks and reads of specific lengths, which enables a whole world of shell scripts that have no business being shell scripts. Want to simulate a lseek+execve? Use dd
! Want to open a file with O_SYNC
? Use dd
! Want to read groups of three byte pixels from a PPM file? Use dd
!
It’s a flexible, unique and useful tool, and I love it. My only issue is that, far too often, this great tool is being relegated to, and inappropriately hailed for, its most generic and least interesting capability: simply copying a file from start to finish.
* dd
actually has two jobs: Convert and Copy. A post on comp.unix.misc (incorrectly) claimed that the intended name “cc” was taken by the C compiler, so the letters were shifted in the same way we ended up with a Window system called X. A more likely explanation is given in that thread as pointed out by Paweł and Bruce in the comments: the name, syntax and purpose is almost identical to the JCL “Dataset Definition” command found in 1960s IBM mainframes.