Drew Lustro  ·  bootleg hypertext

Parallel Convert RAW to JPEG with ImageMagick

Philly Workshop Above: A JPEG image from the Philly workshop batch converted from RAW by the method detailed below.

The Task

Convert an entire folder of RAW images, typically produced by DSLR camera at 300dpi and convert them all to web-ready, 72dpi JPEG's in a single bash command while utilizing all eight threads in parallel on a quad-core i5 CPU.

OS X Prerequesites

Install ImageMagick with RAW support via Homebrew.

If you don't already have Homebrew, shame on you. Grab it like so:

ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
brew update # update to latest package listing
brew upgrade 
brew doctor # make sure everything is cool with homebrew
Get some basic libs everyone should have
brew install pkg-config cmake glib zlib libtool
RAW Support

RAW support is achieve via the ufraw package, which has its own set of library dependencies. Install these first:

brew install libpng jpeg libtiff dcraw little-cms exiv2 # ufraw dependencies + exiv2 support
brew install ufraw --with-exiv2 # RAW image support for imagemagick
Get ImageMagick with cool extras

ImageMagick offers the convert command we will use to do the heavy lifting. While we're at it, let's grab a handful of other libs to make our ImageMagick install robust for any other future needs.

brew install freetype little-cms2 webp # necessary ImageMagick libs + extras
brew install imagemagick --with-webp 
Make sure everything installed correctly

If successful, you should have the convert and ufraw-batch commands available in your terminal:

which convert && which ufraw-batch
# You should see:

If you're all set, let's take a look at the one-liner solution and then I'll explain each part of the pipeline.

But before that, make sure your current working directory has some RAW files to play with, like so:

cd ~/raw-pix && ls -la
total 3.0G
-rwxr-xr-x   1 drew staff  21M Jul 12 04:13 Speakers-Rev01-100.CR2*
-rwxr-xr-x   1 drew staff  21M Jul 12 04:14 Speakers-Rev01-101.CR2*
-rwxr-xr-x   1 drew staff  23M Jul 12 04:17 Speakers-Rev01-102.CR2*
-rwxr-xr-x   1 drew staff  22M Jul 12 04:17 Speakers-Rev01-103.CR2*
# ... many more files ...

The “One-Liner” Solution

Just take a gander at this for a moment and try to decipher what is going on.

find . -type f -iname '*.CR2' -print0 | xargs -0 -n 1 -P 8 -I {} convert -verbose -units PixelsPerInch {} -colorspace sRGB -resize 2560x2650 -set filename:new '%t-%wx%h' -density 72 -format JPG -quality 80 '%[filename:new].jpg'

Kinda crazy looking, right? I don't recommend executing this quite yet to try it out, it will crush your machine's CPU.

Pipeline I: find

This command will case-insensitive look for all .CR2 files in the current working directory and print them to stdout.

find . -type f -iname '*.CR2' -print
# Output:

Pipeline II: xargs + echo

Because things start to get crazy here with the introduction of xargs, I'll modify the pipeline slightly to explain what's going on. Instead of calling convert at the conclusion of xargs, I'll simply echo the output.

xargs is an incredibly versitile command that can take input files and perform operations on them one by one or even parallelized on machines with mutli-core CPU's.

find . -type f -iname '*.CR2' -print0 | xargs -0 -n 1 -P 1 -I {} echo The file is '{}'! # the 'suffix command' for xargs
### Output:
The file is 'Speakers-Rev01-100.CR2'!
The file is 'Speakers-Rev01-101.CR2'!
The file is 'Speakers-Rev01-102.CR2'!
The file is 'Speakers-Rev01-103.CR2'!
The file is 'Speakers-Rev01-104.CR2'!
The file is 'Speakers-Rev01-105.CR2'!

NOTE the new -print0 flag added to our call to find, which is explained below.

Options detail for xargs

  • -0 means “separate arguments by the null terminator character instead of newlines and spaces. We use this to support filenames that happen to have spaces in them, otherwise xargs would split one file into two or more arguments and bork out. Consequently, our find command has also been modified to use the -print0 flag, which null-terminates all the files it finds, anticipating this exact use-case with xargs!
  • -n 1 means “use one file at at time coming in from cut and apply each to sequential echo command. Otherwise, xargs would append all the files to a single call.
  • -P 1 means “parallelize up to 1 thread”. At the moment, it may be obvious that this option is useless. After all, 1 is the default value for -P anyway. We'll revisit this very soon.
  • -I {} defines the special replacement sequence {}. This option means that, in the upcoming command, replace any occurance of {} with the filename incoming into xargs. So everytime we use {} after this, it will be replaced with something like Speakers-Rev01-102.CR2.

The suffix command for xargs

Immediately after the first {} follows the command xargs will use per-file it is fed via the preceding pipeline from find:

echo The file is '{}'!

This can be any command! For demonstrative purposes, we're just echo'ing a string with the filename back to the terminal.

Pipeline III: xargs + convert

Instead of echo, we use ImageMagick's convert command. This executable is incredibly robust and has so many capabilities and options that it'll make your head spin.

The simplest usage of convert to change a PNG to a JPEG and reduce the size by half is:

convert input.png -resize 50% output.jpg

Seems easy enough, right? The -resize flag handles the half-size reduction, and the file extension of output.jpg hints to ImageMagick that it should convert to JPEG.

Now, lets combine it all together again:

find . -type f -iname '*.CR2' -print0 | xargs -0 -n 1 -P 1 -I {} convert -verbose -units PixelsPerInch {} -colorspace sRGB -resize 2560x2650 -set filename:new '%t-%wx%h' -density 72 -format JPG -quality 80 '%[filename:new].jpg'

Options detail for convert

  • -verbose barf out extra debug output so we can see what's going on behind the scenes.
  • -units PixelsPerInch set the input file's units to be measured in PPI.
  • {} the input filename. Remember, this will be replaced by xargs with something relevant!
  • -colorspace sRGB set the output file's color profile to the industry-standard sRGB
  • -resize 2560x2650 place a 2560px constraint on width & height, whichever is larger while maintaining aspect ratio.
  • -set filename:new '%t-%wx%h' set a special variable called filename:new that will be used momentarily to a string of special characters.
    • %t is the original filename without the .CR2 extension.
    • - is simply a hyphen verbatim, nothing special.
    • %wx%h is the width and height in pixels with an "x" in between. Example output from this would be 800x600
    • Put it all together and we get our filename:new variable ready for use ;)
  • -density 72 samples the output file down to 72dpi.
  • -format JPG isn't 100% necessary, but is explicit to say that we want a JPEG file as output.
  • -quality 80 sets the JPEG compression quality to a number between 0-100. For high quality images, I'd recommend anything above 70. The higher this number, the greater the output filesize.
  • %[filename:new].jpg is a special macro that replaces what is within a percent and brackets %[] with the variable name we set earlier. Lastly, we add the new .jpg file extension.

Jesus, that was a mountain of a headache, right? I wasted almost two hours trying to set an additional special variable with -set before I realized that it is unsupported, but ImageMagick also doesn't complain if you try. Sigh.

You can read plenty more about convert and its many options by typing man convert or going to ImageMagick's documentation for command-line tools.

Enabling Multicore Parallelization

Remember the -P option for xargs I briefly touched on earlier? Well, you can set it to as many concurrent threads as your machine can handle and xargs will automatically parallelize the work across all your CPU cores!

I have a quad-core Intel i5 processor on my machine, so I have up to 8 threads to take advantage of (2 per core, due to hyperthread technology), so I've set my flag to -P 8 as such:

find . -type f -iname '*.CR2' -print0 | xargs -0 -n 1 -P 8 -I {} convert -verbose -units PixelsPerInch {} -colorspace sRGB -resize 2560x2650 -set filename:new '%t-%wx%h' -density 72 -format JPG -quality 80 '%[filename:new].jpg'

If you run the above command, and it successfully begins to process the RAW files, your system may become somewhat unresponsive! Check out how my CPU utilsation explodes after running the aforementioned command with -P 8:

Quad-core i5 8-thread crushed

Above: The Dream achieved.

Wrapping Up

Soon I'll formalize this into a proper Bash script with robust options and a one-command install system for dependencies, but it's already been a long night. In the interim, I've wrapped the one-liner promised into a script function that has a few quick options for adjusting the quality, max-dimension, and source files, the default being *.CR2.

You can copy and paste this to your ~/.bashrc or ~/.bash_profile file to make use immediately:

A quick & dirty shell function:

# Convert RAW images to 2560px max-dimension JPEG @ 80 quality to ./output
# check for presence of convert command first:
if [ -x "$(which convert)" ]; then
  function convert-raw-to-jpg() {
    local quality=${1:-80};
    local max_dim=${2:-2650};
    local source_files=${3:-*.CR2};
    echo "Usage: convert-raw-to-jpg [quality=80] [max-dimension-px=2560] [source=*.CR2]";
    echo "Converting all ${source_files} to JPEG (${quality} quality, ${max_dim}px max) to output/...";
    mkdir -p output 2> /dev/null;
    find . -type f -iname "${source_files}" -print0 |       xargs -0 -n 1 -P 8 -I {} convert -verbose -units PixelsPerInch {}       -colorspace sRGB -resize ${max_dim}x${max_dim} -set filename:new '%t-%wx%h'       -density 72 -format JPG -quality ${quality} 'output/%[filename:new].jpg';
    echo 'Done.';


And naturally, a demo that converts PNG's instead to 800px max-dimension at 30 quality:

convert-raw-to-jpg 80 600 *.png # gotta escape that globbing *, for now

Github repo coming soon!

Catch me on Twitter.