Dealing With Stupid Programs That Think They Need X

The new compute cluster is beginning to feel like a production system. I’m currently run off my feet installing software for the stream of new users. Mostly this is fine, but occasionally I run into software that makes me want to band my head repeatedly on my desk until the pain goes away; or more accurately makes me want to bang the programmer’s head on the desk.

Just today we received a linux port of a code that has been running on the Windows Condor pool for a while now. Everything seemed fine except for it’s stubborn refusal to run if it couldn’t find a windowing system. Bear in mind that it doesn’t actually produce any graphical output it just dies if it can’t connect to X. After a bit of futzing around we discover that the people that normally run this code do something like:

Xvfb :1 -server 1 1024x1024x8 &
export DISPLAY=:1
./stupid_code_that_wants_X

Xvfb is the X virtual framebuffer. It creates a running X client without actually needing any graphics to be running.

Which works just great locally but if you want to launch that as a script in the job scheduling system (we use PBSpro) then you need to be a bit more careful. What happens if two of these jobs try to launch on the same machine? Obviously one of them will fail because display 1 is already allocated. What I really needed was a script that will try to launch Xvfb and increment DISPLAY on failure until it finds a display that is free. For your edification here it is:

get_xvfb_pid () {
	XVFB_PID=`ps -efww | grep -v grep | grep Xvfb |\
       grep $USERNAME | tail -n 1 | awk '{print $2}'`
	}

create_xvfb () {
	USERNAME=`whoami`
	DISPLAYNO=1
	while [ -z $xvfb_success ]
		do
		get_xvfb_pid
		old_XVFB_PID=$XVFB_PID
		XVFB_PID=""
		Xvfb :${DISPLAYNO} -screen 0 1024x1024x8 >& /dev/null &
		sleep 1
		get_xvfb_pid
		if ! [ -z $old_XVFB_PID ]
			then
			if [ -z $XFVB_PID ] && ! [ $XVFB_PID == $old_XVFB_PID ]
				then
				echo "Started XVFB on display $DISPLAYNO process $XVFB_PID"
				xvfb_success=1
			else
				DISPLAYNO=$(($DISPLAYNO + 1))
				XVFB_PID=""
			fi
		else
			if [ -z $XFVB_PID ]
                                then
                                echo "Started XVFB on display $DISPLAYNO process $XVFB_PID"
                                xvfb_success=1
                        else
                                DISPLAYNO=$(($DISPLAYNO + 1))
                                echo "FAIL!" $XVFB_PID
                                XVFB_PID=""
                        fi
		fi
 		done
	export XVFB_PID
	export DISPLAY=:${DISPLAYNO}
	}

kill_xvfb () {
	kill $XVFB_PID
	}

Which you can call from a script like thus:

[arccacluster8]$. ./xvfb_helper
[arccacluster8]$ create_xvfb
Started XVFB on display 1 process 9563
[arccacluster8 ~]$ echo $DISPLAY
:1
[arccacluster8 ~]$ echo $XVFB_PID
9563
[arccacluster8 ~]$ ps -efw | grep Xvfb
username    9563  9498  0 19:31 pts/8    00:00:00 Xvfb :1 -screen 0 1024x1024x8
[arccacluster8 ~]$ kill_xvfb
[arccacluster8 ~]$ ps -efw | grep Xvfb
[arccacluster8 ~]$

I submit that this is a disgraceful hack, but it might come in handy to someone else.

5 thoughts on “Dealing With Stupid Programs That Think They Need X”

  1. I wasn’t aware of Xvfb, thanks Huw.

    With regard to get_xvfb_pid, may I respectfully refer you to the -U option to ps(1) and awk pattern matches?
    The function could be written with less overhead as:

    get_xvfb_pid () {
    XVFB_PID=`ps -efwwU $USERNAME | awk ‘/Xvfb/ {print $2}’`
    }

    The presence of tail in your pipeline looks wrong too, I think ps(1) sorts by controlling terminal then process id, neither of which are likely to be useful!

  2. ps on linux seems to sort by PID which given the way linux behaves seems to equate to latest process last. Hence the use of tail -n 1 which effectively gives us the PID of the last Xvfb to be spawned.

    It didn’t occur to me to use the -U flag of ps, I shall use it next time I need to grep $USERNAME it’s clearly the better way. The awk trick doesn’t work because the /Xvfb/ pattern matches the awk command and you end up with the PID of awk not Xvfb.

  3. Process IDs will wrap, which is my main concern. The awk pattern could be refined to filter out toothpicks, perhaps.

    More importantly, I fed you a duff command line – -e overrides -U in most implementations, so you’d probably need to drop it (I’m pleased to comply with the rules that state when being a smartarse, it’s compulsory to make at least one mistake).

  4. This script has a race condition – you are testing for the id of last running Xvfb to change. Somebody else can start another Xvfb after you test but before you start your Xvfb, therefore fooling your script.

    Proposed fix:

    create_xvfb () {
    DISPLAYNO=1
    while [ -z $xvfb_success ]
    do
    Xvfb :${DISPLAYNO} -screen 0 1024x1024x8 >& /dev/null &
    XVFB_PID = $!
    sleep 1
    if ps –pid $XFVB_PID
    then
    echo “Started XVFB on display $DISPLAYNO process $XVFB_PID”
    xvfb_success=1
    else
    echo “Failed to run Xvfb on display $DISPLAYNO”
    DISPLAYNO=$(($DISPLAYNO + 1))
    fi
    fi
    done
    export XVFB_PID
    export DISPLAY=:${DISPLAYNO}
    }

    kill_xvfb () {
    kill $XVFB_PID
    }

    It simplifies your code, you are not firing grep upon grep upon grep and I think it is more robust too. E.g. what if I had a user named “Xvfb” that would show up in ps?

Leave a Reply

Your email address will not be published. Required fields are marked *