Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_rm cannot remove very large number of jobs?

Date: Fri, 03 Dec 2010 21:50:47 -0600 (CST)
From: Steven Timm <timm@xxxxxxxx>
Subject: Re: [Condor-users] condor_rm cannot remove very large number of jobs?

There have long been issues with huge numbers of jobs exiting at
once in condor.  they keep on improving performance but
I don't think they are are up to 20K yet.
Look at the various timeouts including the tool timeouts--I think
if you lengthen a timeout you can probably get a condor_rm of
20K jobs to finish.  IF that doesn't work then up the debug level
in the schedd log and that will tell the story.

Steve


On Fri, 3 Dec 2010, Rob wrote:


Hi,

I encountered this when 'stress-testing' my condor pool.

Occassionally I submit 20.000 jobs as a single cluster to the queue, which is
OK.
However, removing jobs with condor_rm does not work when there are so many jobs
in the queue to be removed.

For example:
----------------------------------
$ condor_q | grep " 350." | wc -l
14515

$ condor_rm 350
Couldn't find/remove all jobs in cluster 350.

$ condor_rm -all
Could not remove all jobs.
----------------------------

I then use a script:

#!/bin/sh
counter=0
while [ $counter -lt 20000 ]
do
 condor_rm 350.${counter} >> /dev/null 2>&1
 counter=$(( $counter + 1))
done
exit

which removes the jobs one-by-one and this works.
(though this script takes a long time!)


What is the problem when removing all jobs at once with condor_rm ?

When the number of jobs to be removed is less (say about 4000), then condor_rm
can remove all at once!


I am using condor 7.4.2 on Fedora Linux.

Regards,
Rob.



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

Follow-Ups:
- Re: [Condor-users] condor_rm cannot remove very large number of jobs?
  - From: Matthew Farrellee

References:
- [Condor-users] condor_rm cannot remove very large number of jobs?
  - From: Rob

Prev by Date: [Condor-users] condor_rm cannot remove very large number of jobs?
Next by Date: Re: [Condor-users] condor_rm cannot remove very large number of jobs?
Previous by thread: [Condor-users] condor_rm cannot remove very large number of jobs?
Next by thread: Re: [Condor-users] condor_rm cannot remove very large number of jobs?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] condor_rm cannot remove very large number of jobs?