[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs being shutdown immediately.



Hmmmmm, I just checked our slaves here locally, and they're set to
False explicitly,
I wasn't as careful as I thought I was.

Ok, thanks.

Mark.

On Thu, Sep 17, 2009 at 7:17 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
>
> The PREEMPT expression has nothing to do with preemption of one job by
> another.  It is for kicking a job off of a machine because of the
> machine policy (e.g. because the machine is needed for some other purpose).
>
> Run the following command to see your PREEMPT expression on the execute
> machine where you are having the problem:
>
> condor_config_val -v PREEMPT
>
> --Dan
>
> Mark Tigges wrote:
>> That was the first thing I tried ...  we've been using it like that
>> forever on our current farm at our central location. The reason is
>> that we have a tonne of short jobs and only a very few large jobs.
>> So, if there are competing jobs, with PREEMPT on short jobs take
>> precendence.  Right?
>>
>> Regardless ... these tests, with the log I previously sent is with
>> only one job being submitted to a farm of three machines.  It's
>> getting preempted when nothing else is reported by condor_q -global.
>> The farm hasn't been deployed to artists yet.  condor_q -analyze says
>> removed for an unknown reason.
>>
>> Mark.
>>
>> On Thu, Sep 17, 2009 at 6:14 AM, David Watrous
>> <dwatrous@xxxxxxxxxxxxxxxxxx> wrote:
>>
>>> Mark,
>>> Check your PREEMPT expression on the workstation.  It is evaluating to True
>>> and causing the job to terminate.
>>> Hope this helps,
>>> Dave
>>> --
>>> ===================================
>>> David Watrous
>>> main: 888.292.5320
>>> Cycle Computing, LLC
>>> Leader in Condor Grid Solutions
>>> Enterprise Condor Support and Management Tools
>>> http://www.cyclecomputing.com
>>> http://www.cyclecloud.com
>>> On Sep 17, 2009, at 12:24 AM, Mark Tigges wrote:
>>>
>>> We have condor (7.0.5) running just fine at our own studio.  I'm
>>> trying to set it up remotely in
>>> Shanghai, everything is running alright.  If I try simple hello world
>>> batch files, all works great.
>>>
>>> As soon as I try a bigger job, rendering an image for a few minutes
>>> jobs get scheduled,
>>> start, then go down right away into idle.  Wait 4 minutes and the
>>> cycle repeats itself.  I've been
>>> reading manuals for hours, googling, and tearing my hair out.  Here's
>>> the starter log from the
>>> machine running the job.
>>>
>>> 9/17 12:06:09 match_info called
>>> 9/17 12:06:09 Received match <10.88.70.102:64805>#1253158085#15#...
>>> 9/17 12:06:09 State change: match notification protocol successful
>>> 9/17 12:06:09 Changing state: Unclaimed -> Matched
>>> 9/17 12:06:10 Request accepted.
>>> 9/17 12:06:10 Remote owner is yhong@***********
>>> 9/17 12:06:10 State change: claiming protocol successful
>>> 9/17 12:06:10 Changing state: Matched -> Claimed
>>> 9/17 12:06:14 Got activate_claim request from shadow (<10.88.70.26:4063>)
>>> 9/17 12:06:14 Remote job ID is 75.0
>>> 9/17 12:06:14 Got universe "VANILLA" (5) from request classad
>>> 9/17 12:06:14 State change: claim-activation protocol successful
>>> 9/17 12:06:14 Changing activity: Idle -> Busy
>>> 9/17 12:06:19 State change: PREEMPT is TRUE
>>> 9/17 12:06:19 Changing activity: Busy -> Retiring
>>> 9/17 12:06:19 State change: claim retirement ended/expired
>>> 9/17 12:06:19 State change: WANT_VACATE is FALSE
>>> 9/17 12:06:19 Changing state and activity: Claimed/Retiring ->
>>> Preempting/Killing
>>> 9/17 12:06:20 Got KILL_FRGN_JOB while in Preempting state, ignoring.
>>> 9/17 12:06:20 Got RELEASE_CLAIM while in Preempting state, ignoring.
>>> 9/17 12:06:20 Starter pid 3524 exited with status 0
>>> 9/17 12:06:20 State change: starter exited
>>> 9/17 12:06:20 State change: No preempting claim, returning to owner
>>> 9/17 12:06:20 Changing state and activity: Preempting/Killing -> Owner/Idle
>>> 9/17 12:06:20 State change: IS_OWNER is false
>>> 9/17 12:06:20 Changing state: Owner -> Unclaimed
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>>
>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>>
>>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>