Ruarí's thoughts

Fun with GNU Parallel

Recently I have being playing a lot with GNU Parallel, which is kind of like xargs on steroids. I really think this is one of the coolest utilities I have used in a long time, so on the off chance anyone regularly reads this blog (and isn't already aware of it) I thought I'd give it a bit of free advertising. wink

However, it is late and I should really get to bed, so rather than give you a lengthy explanation of how it works and why it is so great I thought I'd give you a very quick taste, showing off just a very small amount of its power!

It probably helps to provide an example Opera users can relate to so let's take the recent blog post I did on the Opera Desktop team blog about Snapshot 11.00-1170. In it, my fairly short change log included:

Originally posted by ruario:

Some skin fixes and tweaks

Now some of the changes are obvious but some people may have been left thinking, "What exactly does that mean?" or "How can I get more details of the specifics?". Well if you were one of those people you have come to the right place! wink

A neat tool for comparing files on Unix-like OSes is diff but the problem with diff is that it doesn't do much with binary files other than telling you they are different. Given the skin files are zipped and hence binary, you hit exactly this problem. Sure you can unpack the zips first but it starts to become a hassle and fairly time consuming. So what to do. Harness the power of GNU Parallel of course!

Assuming you saved down a copy of opera-11.01-1164.x86_64.linux.tar.xz and opera-11.01-1170.i386.linux.tar.xz from the last couple of snapshots into the same directory, the following command would unpack both tar packages and all compressed files within those tar packages, creating an appropriately named subdirectory for each uncompressed component:
parallel -I A --basenamereplace B 'mkdir A_; cd A_; tar xf ../B --strip-components 1; find share | grep -E "\.(ua|zip)$" | parallel "mkdir {}_; cd {}_; unzip -q ../{/}; rm -f ../{/}"; find share -name "*.gz" -exec gunzip {} \;' ::: opera-11.01-1164.i386.linux.tar.xz opera-11.01-1170.i386.linux.tar.xz 
Remember I said "all"? I actually uncompressed the man pages and Unite files for good measure! p

Note: I am using a Beta level Parallel option here '--basenamereplace', which means for this to work you will need GNU Parallel version 20101222 or greater. If your distro does not include such a recent version of GNU Parallel within its software repository, you don't need to worry as you can get packages for a range of distros (or source code) from the GNU Parallel website.

Once the above is done, you can then recursively diff the skin directories to see exactly what changes where made:
diff -r opera-11.01-1164.i386.linux.tar.xz_/share/opera/skin opera-11.01-1170.i386.linux.tar.xz_/share/opera/skin 
Pretty cool, huh? smile

To be honest the Parallel man page has plenty of much more impressive examples (like running some of your jobs remotely on multiple machines to speed things up and making better use of available CPU cores) but I have resisted the urge to copy all these great examples out here. If however, I wet your appetite and you want to learn more I recommend you check out the following two introduction videos that the author himself provides. In fact I got the links directly from the Parallel man page (Yeah a man page with YouTube links to instructional videos. Surely this is a first!)

Part 1: GNU Parallel script processing and execution
Part 2: GNU Parallel script processing and execution

I hope you all enjoy Parallel as much as I do!

P.S. Because I uncompressed every compressed file earlier, you can check all the files that changed throughout the packages, i.e.:
diff -r opera-11.01-1164.i386.linux.tar.xz_ opera-11.01-1170.i386.linux.tar.xz_ 
This may be handy in the future. wink

A poor man's auto-update with snapshot supportEditing Debian packages / More fun with GNU Parallel

Comments

Ruarí Ødegaardruario Tuesday, January 18, 2011 11:55:26 PM

By the way my example command becomes much more useful if you had a directory full of Opera builds and you changed the end part from:
::: opera-11.01-1164.i386.linux.tar.xz opera-11.01-1170.i386.linux.tar.xz 
to:
::: opera-*.i386.linux.tar.xz 
This would cause them all to completely unpack recursively into named directories. Ripe for diffing!

In fact you could remove that off the end entirely, stick a find command on the front and pipe this into the main parallel command and recursively unpack every copy of Opera on your hard disk! bigsmile

Kyle Bakerkyleabaker Wednesday, January 19, 2011 12:41:54 AM

up

Ruarí Ødegaardruario Wednesday, January 19, 2011 8:32:42 PM

So, does anyone know the build numbers of the last 50 *nix snapshots?

I do! p
seq 0 10 60 | parallel 'wget "http://my.opera.com/desktopteam/blog/?startidx={}" -qO- | sed -rn "s,.*unix/[a-zA-Z0-9]+_([0-9][0-9]\.[0-9][0-9]-[0-9][0-9][0-9][0-9])/.*,\1,p"' | sort -u | tail -n 50 

Ruarí Ødegaardruario Friday, January 21, 2011 10:32:06 AM

I received a private message about how to build and install Parallel from source. Also how you would uninstall if you did this. Firstly, strictly speaking you don't really need to do this since Parallel is actually a perl script, meaning you can probably just extract it from the source package and run it directly. However, 'building' from source has the advantage that you are warned about missing dependencies (if there are any), get the binary placed in a suitable location and install the documentation and man pages. Hence it is probably a good idea to do if there is no package for your distro and no distro repacking script. Do check for this first though as it is always nicer to use a native package if there is one! wink

Building and installing parallel from source is pretty easy. Indeed most packages are pretty easy to build and install from source, though it can seem daunting if it is all new to you.

If you have sudo installed and configured you can install Parallel from source as follows:
$ wget -O- http://ftp.gnu.org/gnu/parallel/parallel-20110122.tar.bz2 | tar -xjf- 
$ cd parallel-20110122
$ ./configure --prefix=/usr/local && make && sudo make install
$ cd ..
$ sudo mkdir -p /usr/local/src
$ sudo mv parallel-20110122 /usr/local/src/.
$ sudo chown -R root:root /usr/local/src/parallel-20110122

If you don't have sudo installed or configured:
$ wget -O- http://ftp.gnu.org/gnu/parallel/parallel-20110122.tar.bz2 | tar -xjf- 
$ cd parallel-20110122
$ ./configure --prefix=/usr/local && make 
$ su
# make install
# cd ..
# mkdir -p /usr/local/src
# mv parallel-20110122 /usr/local/src/.
# chown -R root:root /usr/local/src/parallel-20110122
# exit

Files will be installed in the following locations:
/usr/local/bin/niceload
/usr/local/bin/parallel
/usr/local/bin/sem
/usr/local/bin/sql
/usr/local/share/doc/parallel/
/usr/local/share/man/man1/niceload.1
/usr/local/share/man/man1/parallel.1
/usr/local/share/man/man1/sem.1
/usr/local/share/man/man1/sql.1
/usr/local/src/parallel-20110122/

If you ever want/need to uninstall Parallel.

With sudo:
$ cd /usr/local/src/parallel-20110122/
$ sudo make uninstall
$ cd ..
$ sudo rm -r parallel-20110122

Without sudo:
$ su
# cd /usr/local/src/parallel-20110122/
# make uninstall
# cd ..
# rm -r parallel-20110122
# exit
Edit: Updated instructions to GNU Parallel version 20110122.

Ruarí Ødegaardruario Sunday, January 23, 2011 2:55:08 PM

A new GNU Parallel just came out yesterday and my blog post got mentioned in the release announcement! Double win!! bigsmile

That has just made my whole week!

http://lists.gnu.org/archive/html/parallel/2011-01/msg00018.html

Thanks to Ole Tange, both for the excellent Parallel and the mention.

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies