[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor : Long pending Idle jobs



Average job execution time is 10-15 minutes. Some jobs takes 30-45 mins, depending on number of records to process.


regards,
Ashish | Office: +91 (712) 66-92460
Persistent Systems Ltd., Nagpur | Partners in Innovation | www.persistent.com



-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Thursday, October 27, 2016 7:01 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] HTCondor : Long pending Idle jobs

On 10/27/2016 8:10 AM, Ashish Thool wrote:
> Thanks Todd.
>
> My condor version is 8.2.10.
> Yes recurring jobs means to run once an hour.
> I've max 10 slots on the machine.
> How to increase the cores to handle the workload?

Well, one way to increase the number of cores to handle the workload is to obtain more execute nodes. :).  But if you know your machine can timeshare many of your recurring jobs at once (i.e. you will not run the machine out of RAM or start CPU thrashing), you can increase the number of slots on your execute node by placing the following in your condor_config

   # Tell HTCondor this machine has 100 cores so we end up with
   # 100 slots.
   NUM_CPUS = 100

> My aim is to make condor comfortable to handle such workload.
> Please find attached job submit and log file. Please let me know if you need any other information to rectify the issue.
>

Curious, how long do your jobs run?

Hope the above helps
Todd




>
> regards,
> Ashish
>
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On 
> Behalf Of Todd Tannenbaum
> Sent: Thursday, October 27, 2016 6:00 PM
> To: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] HTCondor : Long pending Idle jobs
>
> On 10/27/2016 2:48 AM, Ashish Thool wrote:
>> Hi All,
>>
>>
>>
>> I had already running recurring jobs on HTCondor. Then I've added 
>> some more recurring jobs (approx. 250 new) on Oct 21, 2016. But most 
>> of the newly added jobs are still in Idle state. I've restarted 
>> condor several times but no luck.
>>
>>
>>
>> Please see below output of condor_q andcondor_q -analyze <job id> .
>>
>> I'm unable to fix this issue. None of the newly added jobs executed 
>> even once.
>>
>
> You have not provided enough information for anyone have any real insight into what is going wrong.
>
> - Not sure what you mean by a "recurring" job. Could you include your submit file?
>
> - No idea what you think _should_ be happening. You have two machines (probably with two cores each?), and four jobs are running.  What do you expect?  What do you want to happen?
>
> - If by "recurring" job you mean a job that is supposed to run once an hour, you could still easily have many jobs permanently in idle state if the jobs run for any significant period of time. For instance if you have 4 slots, and you submit hundreds of jobs that run for 15 minutes and are rescheduled every hour, then only 16 job out of your batch of hundreds will likely ever start (4 slots * 60 min / 15 min job = 16).
> I.e. it is possible you do not have enough cores to run your workload within the recurring time window.
>
> I could make several other guesses, but cannot really help more without more information about what you are trying to do and how you are attempting to do it (your submit file, an event log from a run, condor version you are using), etc...
>
> regards,
> Todd
>
>
> --
> Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
> Center for High Throughput Computing   Department of Computer Sciences
> HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
> Phone: (608) 263-7132                  Madison, WI 53706-1685
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.