The new compute cluster is beginning to feel like a production system. I’m currently run off my feet installing software for the stream of new users. Mostly this is fine, but occasionally I run into software that makes me want to band my head repeatedly on my desk until the pain goes away; or more accurately makes me want to bang the programmer’s head on the desk.
Just today we received a linux port of a code that has been running on the Windows Condor pool for a while now. Everything seemed fine except for it’s stubborn refusal to run if it couldn’t find a windowing system. Bear in mind that it doesn’t actually produce any graphical output it just dies if it can’t connect to X. After a bit of futzing around we discover that the people that normally run this code do something like:
Xvfb :1 -server 1 1024x1024x8 &
export DISPLAY=:1
./stupid_code_that_wants_X
Xvfb is the X virtual framebuffer. It creates a running X client without actually needing any graphics to be running.
Which works just great locally but if you want to launch that as a script in the job scheduling system (we use PBSpro) then you need to be a bit more careful. What happens if two of these jobs try to launch on the same machine? Obviously one of them will fail because display 1 is already allocated. What I really needed was a script that will try to launch Xvfb and increment DISPLAY on failure until it finds a display that is free. For your edification here it is:
get_xvfb_pid () {
XVFB_PID=`ps -efww | grep -v grep | grep Xvfb |\
grep $USERNAME | tail -n 1 | awk '{print $2}'`
}
create_xvfb () {
USERNAME=`whoami`
DISPLAYNO=1
while [ -z $xvfb_success ]
do
get_xvfb_pid
old_XVFB_PID=$XVFB_PID
XVFB_PID=""
Xvfb :${DISPLAYNO} -screen 0 1024x1024x8 >& /dev/null &
sleep 1
get_xvfb_pid
if ! [ -z $old_XVFB_PID ]
then
if [ -z $XFVB_PID ] && ! [ $XVFB_PID == $old_XVFB_PID ]
then
echo "Started XVFB on display $DISPLAYNO process $XVFB_PID"
xvfb_success=1
else
DISPLAYNO=$(($DISPLAYNO + 1))
XVFB_PID=""
fi
else
if [ -z $XFVB_PID ]
then
echo "Started XVFB on display $DISPLAYNO process $XVFB_PID"
xvfb_success=1
else
DISPLAYNO=$(($DISPLAYNO + 1))
echo "FAIL!" $XVFB_PID
XVFB_PID=""
fi
fi
done
export XVFB_PID
export DISPLAY=:${DISPLAYNO}
}
kill_xvfb () {
kill $XVFB_PID
}
Which you can call from a script like thus:
[arccacluster8]$. ./xvfb_helper
[arccacluster8]$ create_xvfb
Started XVFB on display 1 process 9563
[arccacluster8 ~]$ echo $DISPLAY
:1
[arccacluster8 ~]$ echo $XVFB_PID
9563
[arccacluster8 ~]$ ps -efw | grep Xvfb
username 9563 9498 0 19:31 pts/8 00:00:00 Xvfb :1 -screen 0 1024x1024x8
[arccacluster8 ~]$ kill_xvfb
[arccacluster8 ~]$ ps -efw | grep Xvfb
[arccacluster8 ~]$
I submit that this is a disgraceful hack, but it might come in handy to someone else.