[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] ERROR starting jobs: Jobs get evicted fror unknown reason (108)




What shows up in your StartLog and StarterLog? The ShadowLog just shows that the job was evicted, but these other logs should show more information about why this happened.

--Dan

Thomas Bretz wrote:

Hi,

randomly many of our submitted jobs get immediatly evicted when started. I have no idea what's going on, because one log-files says "Unknown Reason" All other logfiles contain neither warnings nor errors. The current behaviour of condor (6.8.0 on suse linux 9 and 10) makes it completely unusable, because some jobs take 20-30 negotiation cycles until they really start running. I also tried to switch on more log-output, but also this output does not contain any information which gives a hint why the jobs are evicted.

Any help is welcome,
Thomas

-------------------------------------

Part of the Setup:
WANT_SUSPEND      = False
WANT_VACATE       = False
START     = True
SUSPEND = False
CONTINUE = True
PREEMPT= False
CLAIM_WORKLIFE    = 0
MaxJobRetirementTime = 0
KILL = False
NEGOTIATOR_PRE_JOB_RANK = 0
NEGOTIATOR_POST_JOB_RANK = 0
PREEMPTION_REQUIREMENTS = False
PREEMPTION_RANK = 0

ShadowLog:
8/15 16:46:33 ******************************************************
8/15 16:46:33 ** condor_shadow (CONDOR_SHADOW) STARTING UP
8/15 16:46:33 ** /home/condor/condor-6.8.0/sbin/condor_shadow
8/15 16:46:33 ** $CondorVersion: 6.8.0 Jul 19 2006 $
8/15 16:46:33 ** $CondorPlatform: X86_64-LINUX_RHEL3 $
8/15 16:46:33 ** PID = 27459
8/15 16:46:33 ** Log last touched 8/15 16:46:31
8/15 16:46:33 ******************************************************
8/15 16:46:33 Using config source: /home/condor/condor_config
8/15 16:46:33 Using local config sources: 8/15 16:46:33 /home/condor/hosts/dc08/condor_config.local
8/15 16:46:33 DaemonCore: Command Socket at <132.187.*.*:56626>
8/15 16:46:33 Initializing a VANILLA shadow for job 3105.0
8/15 16:46:33 (3105.0) (27459): Request to run on <132.187.*.*:58903> was REFUSED
8/15 16:46:33 (3105.0) (27459): Job 3105.0 is being evicted
8/15 16:46:33 (3105.0) (27459): logEvictEvent with unknown reason (108), aborting 8/15 16:46:33 (3105.0) (27459): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 108

NegotiatorLog:
8/15 16:43:28     Request 03105.00000:
8/15 16:43:28 Matched 3105.0 tbretz@xxxxxxxxxxxxxxxxxxxxxx <132.187.47.28:52515> preempting none <132.187.47.22:58903> vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxx 8/15 16:43:28 Successfully matched with vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxx

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR