[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs being shutdown immediately.



That was the first thing I tried ...  we've been using it like that
forever on our current farm at our central location. The reason is
that we have a tonne of short jobs and only a very few large jobs.
So, if there are competing jobs, with PREEMPT on short jobs take
precendence.  Right?

Regardless ... these tests, with the log I previously sent is with
only one job being submitted to a farm of three machines.  It's
getting preempted when nothing else is reported by condor_q -global.
The farm hasn't been deployed to artists yet.  condor_q -analyze says
removed for an unknown reason.

Mark.

On Thu, Sep 17, 2009 at 6:14 AM, David Watrous
<dwatrous@xxxxxxxxxxxxxxxxxx> wrote:
> Mark,
> Check your PREEMPT expression on the workstation.  It is evaluating to True
> and causing the job to terminate.
> Hope this helps,
> Dave
> --
> ===================================
> David Watrous
> main: 888.292.5320
> Cycle Computing, LLC
> Leader in Condor Grid Solutions
> Enterprise Condor Support and Management Tools
> http://www.cyclecomputing.com
> http://www.cyclecloud.com
> On Sep 17, 2009, at 12:24 AM, Mark Tigges wrote:
>
> We have condor (7.0.5) running just fine at our own studio.  I'm
> trying to set it up remotely in
> Shanghai, everything is running alright.  If I try simple hello world
> batch files, all works great.
>
> As soon as I try a bigger job, rendering an image for a few minutes
> jobs get scheduled,
> start, then go down right away into idle.  Wait 4 minutes and the
> cycle repeats itself.  I've been
> reading manuals for hours, googling, and tearing my hair out.  Here's
> the starter log from the
> machine running the job.
>
> 9/17 12:06:09 match_info called
> 9/17 12:06:09 Received match <10.88.70.102:64805>#1253158085#15#...
> 9/17 12:06:09 State change: match notification protocol successful
> 9/17 12:06:09 Changing state: Unclaimed -> Matched
> 9/17 12:06:10 Request accepted.
> 9/17 12:06:10 Remote owner is yhong@***********
> 9/17 12:06:10 State change: claiming protocol successful
> 9/17 12:06:10 Changing state: Matched -> Claimed
> 9/17 12:06:14 Got activate_claim request from shadow (<10.88.70.26:4063>)
> 9/17 12:06:14 Remote job ID is 75.0
> 9/17 12:06:14 Got universe "VANILLA" (5) from request classad
> 9/17 12:06:14 State change: claim-activation protocol successful
> 9/17 12:06:14 Changing activity: Idle -> Busy
> 9/17 12:06:19 State change: PREEMPT is TRUE
> 9/17 12:06:19 Changing activity: Busy -> Retiring
> 9/17 12:06:19 State change: claim retirement ended/expired
> 9/17 12:06:19 State change: WANT_VACATE is FALSE
> 9/17 12:06:19 Changing state and activity: Claimed/Retiring ->
> Preempting/Killing
> 9/17 12:06:20 Got KILL_FRGN_JOB while in Preempting state, ignoring.
> 9/17 12:06:20 Got RELEASE_CLAIM while in Preempting state, ignoring.
> 9/17 12:06:20 Starter pid 3524 exited with status 0
> 9/17 12:06:20 State change: starter exited
> 9/17 12:06:20 State change: No preempting claim, returning to owner
> 9/17 12:06:20 Changing state and activity: Preempting/Killing -> Owner/Idle
> 9/17 12:06:20 State change: IS_OWNER is false
> 9/17 12:06:20 Changing state: Owner -> Unclaimed
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>