Friday 2010-10-08

Cleaning up some code at work, I wanted to make a user migration script run in parallel. And I wanted to make adding parallelism to other scripts here really easy, so here's for parallelizing bash scripts.

Say you have the following code:

for user in $( cat $user_file ); do 
	migrate_user $user $to_host
you can parallelize it with:
source "/${path_to}/"
for user in $( cat $user_file ); do parallel 5
	migrate_user $user $to_host &

This limits the spawned migrate_user()s to 5 in parallel. When the queue is full, the call to parallel() sleeps. When a migrate_user() completes, parallel() returns, and another migrate_user() is spawned.

When parallel() is called, it traps EXIT so your script doesn't exit when it has jobs still running (you can disable this behavior and call parallel_exit() if you want to closely manage the manager ;)

Instead of child counting like GitAll, it abuses bash's jobs builtin function (actually, the only non-builtin in the library = sleep) to keep track of how many jobs are running. Which keeps it all simple.