Re: [Condor-users] BLAST jobs go to 0% CPU; condor thinks they're running

On Tue, 1 Mar 2005, Michael Rusch wrote:

> I have a four machine condor pool (three are dual-processor, so there are 7
> virtual machines), with all machines running Windows XP.
> I have tried several times to submit a job cluster that has sixteen
> individual jobs/processes.  They're all BLAST searches, for those who are
> familiar with BLAST.  Each job uses two input files and a batch script
> issues the two commands necessary (formatdb and blastall).  There are a
> total of four input files and the submit script queues one process for every
> ordered pair of input files (for 4x4 = 16 jobs).
> Every time I've submitted the cluster it completes the first four jobs
> (searching a single input file against each of the other ones), and it runs
> the others for about a minute, after which the execute machine beeps (it's
> the "Asterisk" sound), and then processes drop down to 0% CPU.  They do not
> drop down at the same time, but close to one another.  Condor_q reports that
> they are still running, but they are not.  In one case, they resumed for a
> brief period of time after several hours of not doing anything.  Nothing in
> the condor logs.
> If you run the jobs without condor, it works fine (though it takes forever).
> Also, I noticed that for some reason the jobs when run through condor use
> significantly more CPU than when you just run individually on the local
> machine.

Are the jobs still alive when the CPU drops to 0%? You can check by
logging into the machines, running ps and looking for processes named
condor_exec.exe. If you are programming savvy and know something about
the BLAST code, you can attach to them with a debugger to see why they're

