How to Save Only Some Files in a Directory Tree and Delete Strange Filenames

Sometimes a bunch of unwanted files end up somewhere they don't belong. Maybe a cp gone wrong, or other command line typo creates a slew of oddly named files -- but you've got a number of files you'd really like to keep around mixed in with the bad. The following should be able to handle things:

find /tmp/disasterpiece -type f -print0 | grep -zxvf SAVE_THESE | xargs -0 -L1 rm -v
# or if you want to remove links and other non-directories
find /tmp/disasterpiece ! -type d -print0 | grep -zxvf SAVE_THESE | xargs -0 -L1 unlink

The find will list all the normal files (not directories, etc), or in the second command's case all non-directories. That's passed to grep which has been given the following options:

  1. -z delimit lines based on nulls rather than the normal carriage returns
  2. -x only match if the pattern matches the whole line (e.g. foo doesn't match food)
  3. -v invert the selection (e.g. only print files that aren't in SAVE_THESE)
  4. -f read patterns out of the specified file

Without the -v "SAVE_THESE" becomes "REMOVE_THESE" which if you're nervous may be a better way to go given that the above command pointed at the wrong location is going to find a lot of files that aren't in "SAVE_THESE".

Making SAVE_THESE is easy using the find command and editing the output:

find /tmp/disasterpiece -type f > SAVE_THESE
# do a little clean up

Finally, by giving xargs the -0 option in the original command it will break things up on the nulls so that issues with unusual files name can be avoided, the -L1 sends only one file at a time to rm or unlink.

And this is what is should look like:

mkdir /tmp/disasterpiece
cd /tmp/disasterpiece
# going to use STDIN redirection to create 0 length files with odd names
> '!!!'; > "line
feed"; > "s p a c e d"; > '"quoted"'; > -opt; > --longopt; > '*'; > '???'; > $(echo -e "be\\007ll"); > good_file; > not_bad

Now you have these files:
!!!, ???, *, be?ll, good_file, line?feed, --longopt, not_bad, -opt, "quoted" and s p a c e d.

The question marks are due to unprintable characters, in this case a line feed (\n) and bell (\a). To get a better idea of what these filenames actually are use ls -b, this lists the files this way:

!!!, ???, *, be\all, good_file, line\nfeed, --longopt, not_bad, -opt, "quoted" and s\ p\ a\ c\ e\ d.

Time to create SAVE_THESE:

cd /tmp/
find /tmp/disasterpiece -type f > SAVE_THESE
# perform a little housecleaning

If you want a dry-run put an echo in front of the rm command. Otherwise, let's clean things up!:

find /tmp/foo -type f -print0 | grep -zxvf SAVE_THESE | xargs -0 -L1 rm -v
removed ‘/tmp/disasterpiece/!!!’
removed ‘/tmp/disasterpiece/be\\all’
removed ‘/tmp/disasterpiece/???’
removed ‘/tmp/disasterpiece/*’
removed ‘/tmp/disasterpiece/--longopt’
removed ‘/tmp/disasterpiece/-opt’
removed ‘/tmp/disasterpiece/"quoted"’
removed ‘/tmp/disasterpiece/s p a c e d’
removed ‘/tmp/disasterpiece/line\\nfeed’
# all clean
cd /tmp/disasterpiece
# let's see

The only files you'll find are good_file and not_bad.