[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Blast job crash !



Hello everybody,

I write this mail because I've problems with the executing of
blast jobs on my pool test.

I run simple blast job as blastn... (which create an out file)
My pool test is form by 2 machine : a quadcore one (submit & execute)
and a 8 core one (manager), so I have 12 slots
I use a common space (NAS mount in NFS) and vanilla jobs

The problem is after a few secondes after the start of running, 
job process on the execute machine are going to state D, and after 
I have problem with my machine (ex: can't run a ls command on the racine)

The different files (log,out,err...) are well create, but 
I see nothing special about the problem in these


My job submit file:
universe 		= vanilla
executable		= /usr/bin/blastn
arguments		= -query /grid/condor/16head.fna -db 
/grid/condor/16head -out /grid/condor/folder_out/16head_$(Process).out
log			= folder_out/sub2.log.$(Process)
output			= folder_out/sub2.out.$(Process)
error			= folder_out/sub2.err.$(Process)
queue			10

It's random sometimes jobs create output files correctly sometimes it's crash.
Apparently it's when I have some jobs (with queue 5 it's often run correctly,
with queue 10 it's often crash)
On local it's always run correctly, so I think the problem come from my
configuration files or my NAS (common space)

Please, I need help to resolve this I research on Internet etc... and
don't succeed to solve this.

Thank you.
Regards.


-- Romain --