[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAGMan slow startup



I am running a DAGMan job which I want to be executed more or less
realtime. When I submit the .dag file the log sais the different DAG
Nodes are submitted right away (right now only 3, but may need to be
100+), but it takes about 5 minutes before anything more happens. I
have Condor in TEST mode so it should ignore if my nodes are busy.
condor_status sais all nodes are Idle. It only takes about 30-40
seconds from the jobs start running to everything is finished
(including postprocessing). It would be nice if I could cut off those
5 minutes of idle time. Is that possible?

Sometimes a job returnes the error value 127. What does that mean? I
have to set the jobs to Retry in the DAGMan submit scrupt which
reexecutes pretty fast. Where can I find a list of return codes?

My awesome test grid consists of 2 linux nodes (one of them is the
Master as well as execute node) and 1 windows node(not in use right
now).

- Atle