[HOWTO] Find disk usage by file type

gatorparrots

~departed~
Say you want to find out how big your MP3 collection is. It's relatively easy to find all the .mp3 files on a drive and calculate the total amount of disk space they collectively occupy, all from one command.

We want to run a find command, then pipe the output to du via xargs. (The -0 flag for xargs in conjunction with the -print0 flag for find will handle spaces in filenames. From the xargs man page:
Code:
-0          Use NUL (``\0'') instead of whitespace as the argument sepa-
                 rator.  This can be used in conjuction with the -print0
                 option of find(1).
Here is the command:
sudo find / -iname "*.mp3" -print0 | xargs -0 du -cks

To make it more fun, throw an alias into your .*shrc file like so:
alias diskspc 'sudo find / -iname "*\!:1*" -print0 | xargs -0 du -cks'

This will net you the ability to search for any file type. For example, issuing diskspc .mp3 would return the filesize for every MP3 file (regardless if its extension is .MP3 or .mp3, thanks to the -iname case insensitive flag for find).
 
Code:
find / -type f -name "*.mp3" -ls | awk '{ sum += $7 } END { print "total size: " sum }'
This exploits the -ls output of find and lets us sidestep the problem of worrying about filenames with spaces: field seven is the size of the file in bytes, and the awk snippet sums up all the matching lines.

When I run this on my disk, I see a result of total size: 24009565121

Do the math and you'll see that, correctly, I have 22.3 GB of music files online.
 
In either method, the command should be run as root if you wish to avoid access errors and ensure a more complete search.

d1taylor's method took @1.75 minutes to scan 160GB of data and locate 156M total of MP3 files on my three hard disks. My method took just over 1 minute to complete. (Testing was done informally without a stopwatch or multiple runs. Drive caches and other factors may have influenced the results, but overall the method I posted seems more effecient.)

Another advantage of using my method and setting up the alias is that you can find ALL instances of a particular file type by extension, regardless of its case. For example, to find all tiff files you would issue: diskspc .tif and my method will return the disk space for all files with .tif, .TIF, .TIFF, or .tiff extensions.

A slight revision
If you have fileutils installed, you can change the flag on du from -k (kilobytes) to -h (human readable), so it will return the result in megabyes instead of kilobytes.

The revised code is then:
sudo find / -iname "*.mp3" -print0 | xargs -0 du -ckh

The alias line:
alias diskspc 'sudo find / -iname "*\!:1*" -print0 | xargs -0 du -ckh'
 
Back
Top