Monday, May 14, 2007

Fun with find, sed and xargs

[See below for update 20070807]
Common tasks for Unix system administrators often require working with all of the files in a directory tree and selectively doing something with some of them: copying, deleting, renaming, or moving them, or simply getting a list of files matching certain characteristics.

Sometimes we want to do something from the Unix command line with files that have spaces in their names. Let's see what we can do with our friends find(1), sed(1), and xargs(1).

Find looks for entries in some directory matching its arguments, typically sending a list of them to the standard output. Sed is the Stream EDitor, and applies a series of commands to transform its input into its output. Xargs supplies its input to the command line of any program.

First, let's set up our little foobox:
$ cd /tmp
$ mkdir foo
$ touch 'foo/file with spaces'
$ touch 'foo/bar'
$ touch 'foo/another file with spaces'
$ ls -1 foo
another file with spaces
file with spaces

Now lets's do a simple find:

/tmp $ find foo -type f
foo/file with spaces
foo/another file with spaces

Now let's do something with those files. Let's just list them:

/tmp $ find foo -type f | xargs ls
foo/file: No such file or directory
spaces: No such file or directory
foo/another: No such file or directory
file: No such file or directory
with: No such file or directory
spaces: No such file or directory

What happened? Xargs delivered its input to the command line of ls(1), which interpreted the spaces in the filenames as new filenames. We need to escape the spaces inside the names for ls, but leave the spaces surrounding the filenames. That's just the sort of thing sed likes to do:

/tmp $ find foo -type f | sed 's, ,\\&,g'| xargs ls -ltr
-rw-r--r-- 1 user group 0 May 11 12:12 foo/file with spaces
-rw-r--r-- 1 user group 0 May 11 12:12 foo/bar
-rw-r--r-- 1 user group 0 May 11 12:12 foo/another file with spaces

In the dorky sed command between the single quotes, the "s" means to substitute for the text matched by the pattern between the first and second delimiter the text between the second and third delimiters. I like to use commas as delimiters instead of slashes, though any character will do. Slashes often appear in path names, and by habitually using commas I avoid errors when I fail to escape the slashes.

The pattern, called a regular expression, in this case says to look for a space, and replace it with a backslash followed by the text we just found. This is sed-ese for "prepend a backlash".

A slightly more general approach is to wrap each filename with single quotes. You still run into a problem with filenames which have single quotes in them, but you shouldn't put quotes in filenames:

$ find foo -type f | sed -e "s,[^.],\'&," -e "s,\$,\',"

'foo/file with spaces'
'foo/another file with spaces'

$ find foo -type f | \
sed -e "s,[^.],\'&," \
-e "s,\$,\'," | \
xargs ls

foo/another file with spaces
foo/file with spaces

Sharp reader Nic Ivy has noted a far simpler way to deal with spaces in filenames for find(1) and xargs(1), which also deals with other special characters like quotes and greater-than or less-than symbols:

$ find foo -type f -print0 | xargs -0 ls

foo/another file with spaces
foo/file with spaces

From the Unixhelp xargs(1) man page:

--null, -0
Input items are terminated by a null character instead of by
whitespace, and the quotes and backslash are not special (every
character is taken literally). Disables the end of file string,
which is treated like any other argument. Useful when input
items might contain white space, quote marks, or backslashes.
The GNU find -print0 option produces input suitable for this

Man pages courtesy UnixHelp.


Anonymous said...
This comment has been removed by a blog administrator.
Nic Ivy said...

A simpler approach using null termination:

find foo -type f -print0 | xargs -0 ls

Anonymous said...

Thanks! That helped me a lot with an old BusyBox that doesn't support find -print0 and xargs -0