[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job eviction on submitting DAG machines



Just up'd to 7.40 versions condor_dag.exe and condor_submit_dag.exe in an effort to see if this problem would go away with newer versions, but it doesn't seem to fix the whole spinning dag (executing, evicted, executing, etc.) on some people's machines.. usually when there's a few other heavy apps running.  Though I'm not 100% that my 'just use the new condor dag executables' works between 7.0.4 and 7.4.0 -- do I have to update all of 7.4.0?

Same problem, same questions -- hoping somebody has seen this or know why a DAG would ever evict itself off its own machine and thus spin between IDLE and RUNNING.

Thanks as always,
Steve


From: steveshaw89@xxxxxxxxxxx
To: condor-users@xxxxxxxxxxx
Date: Wed, 18 Nov 2009 21:38:12 +0000
Subject: [Condor-users] Job eviction on submitting DAG machines

Hey all,

I was hoping to get some advice on this problem:

We have some machines that occasionally refuse to run the DAG from a submitter's machine.  In other words, the submitter will submit a DAG job and the condor_dagman will just spin between IDLE and RUNNING, continuously evicting the job.

e.g. (from the dagman output):

001 (4634.000.000) 11/18 11:03:49 Job executing on host: <10.10.xxx.xxx:1118>

...

004 (4634.000.000) 11/18 11:03:49 Job was evicted.

                (0) Job was not checkpointed.

                                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage

                                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage

                0  -  Run Bytes Sent By Job

                0  -  Run Bytes Received By Job


(this continues to repeat over and over)...

These machines submit jobs only and do not handle any jobs.  I don't know if the DAGMAN submission follows the same START rules as with machines in the Condor pool, but how do I ensure that, regardless of any circumstances, a user's machine will not evict the job?

(We are using 7.04 on most user submit machines, but have been upgrading their condor_submit_dag executables to the latest -- I'm pretty sure this issue has been seen on users with either version).

As always, appreciate the assistance :),
Steve


Get a great deal on Windows 7 and see how it works the way you want. Check out the offers on Windows 7now.

Get a great deal on Windows 7 and see how it works the way you want. See the Windows 7 offers now.