[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAGMan Job Problem



Hi,

I am trying to submit DAGMan job in linux.

 

I have sixteen batches of job. Each job inturn has 41 jobs.

And my requirement is batch2 jobs shouldn’t start until all batch1 jobs are done, similarly batch3 jobs shouldn’t start until all batch2 job are done.

 

 

I created dagman job like the one below, the problem is dagman job fails randomly on the batch3 or batch4 etc and the reason is some of the batch3 job needs input which will be output from some of the batch2 job. And condor complains about the file is not found

 

Read so far: Submitting job(s).............................ERROR: Can't open "/u/Senthil/DAGMan/MatlabJobs/immuneic4401.txt"  with flags 00 (No such file or directory)

 

Based on the time stamp this file was not created during the above error msg, it was created after that. How this is happening? Does condor dagman won’t wait until all the jobs for the parent is done before start child job, or just wait the last job of the parent to complete in order to start the child jobs.

 

Is it possible to do what I am trying to do with condor dagman.

 

Could you please let me know.

 

Thanks,

Senthil

 

JOB  A  Job_batch_1

JOB  B  Job_batch_2

JOB  C  Job_batch_3

JOB  D  Job_batch_4

JOB  E  Job_batch_5

JOB  F  Job_batch_6

JOB  G  Job_batch_7

JOB  H  Job_batch_8

JOB  I  Job_batch_9

JOB  J  Job_batch_10

JOB  K  Job_batch_11

JOB  L  Job_batch_12

JOB  M  Job_batch_13

JOB  N  Job_batch_14

JOB  O  Job_batch_15

JOB  P  Job_batch_16

PARENT A CHILD B

PARENT B CHILD C

PARENT C CHILD D

PARENT D CHILD E

PARENT E CHILD F

PARENT F CHILD G

PARENT G CHILD H

PARENT H CHILD I

PARENT I CHILD J

PARENT J CHILD K

PARENT K CHILD L

PARENT L CHILD M

PARENT M CHILD N

PARENT N CHILD O

PARENT O CHILD P

Retry A 10

Retry B 10

Retry C 10

Retry D 10

Retry E 10

Retry F 10

Retry G 10

Retry H 10

Retry I 10

Retry J 10

Retry K 10

Retry L 10

Retry M 10

Retry N 10

Retry O 10

Retry P 10