[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] basic submission? (was: birdbath dag submission)



It is definitely helpful to know your job will run with simply condor_submit before you start using the SOAP interface to submit your job.

Eviction happens when a machine decides to run some other job instead of yours. Maybe it had better priority? Anyway, that shouldn't be an issue.

You job isn't successfully terminating though. Since you said it is a script you should make sure you aren't embedding any paths that might not exist on the execution machine (and in the execute machine's execute directory).

Care to share your script?



matt

Mariette, Jerome wrote:
Allright, I'm still stock with this job submition. I have no idea why my submission is not working, first I thought it was because I was submitting my job from java, but I have the same probleme submitting this job by condor_submit !!!

so my job is basicly a script with different steps. This one works perfectly when lunch outside condor! but got the following log when using it:
000 (508.000.000) 10/02 23:09:00 Job submitted from host: <127.0.0.1:8181>
...
001 (508.000.000) 10/02 23:09:04 Job executing on host: <127.0.0.1:51445>
...
006 (508.000.000) 10/02 23:09:12 Image size of job updated: 11968
...
010 (508.000.000) 10/02 23:12:06 Job was suspended.
        Number of processes actually suspended: 5
...
006 (508.000.000) 10/02 23:12:13 Image size of job updated: 70820
...
011 (508.000.000) 10/02 23:22:11 Job was unsuspended.
...
004 (508.000.000) 10/02 23:22:12 Job was evicted.
        (0) Job was not checkpointed.
                Usr 0 00:00:13, Sys 0 00:00:11  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...
001 (508.000.000) 10/02 23:29:08 Job executing on host: <127.0.0.1:51445>
...
005 (508.000.000) 10/02 23:29:09 Job terminated.
        (1) Normal termination (return value 1)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job


what mean the evected thing ?
sounds like my job is placed back in the queue, then tryed to be reexecuted but from begining so then crash because some file allready exist!

what is going wrong ?
thx

Jerome