[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_rm cannot remove very large number of jobs?



Hi,

I encountered this when 'stress-testing' my condor pool.

Occassionally I submit 20.000 jobs as a single cluster to the queue, which is 
OK.
However, removing jobs with condor_rm does not work when there are so many jobs 
in the queue to be removed.

For example:
----------------------------------
$ condor_q | grep " 350." | wc -l
14515

$ condor_rm 350
Couldn't find/remove all jobs in cluster 350.

$ condor_rm -all
Could not remove all jobs.
----------------------------

I then use a script:

#!/bin/sh
counter=0
while [ $counter -lt 20000 ]
do
  condor_rm 350.${counter} >> /dev/null 2>&1
  counter=$(( $counter + 1))
done
exit

which removes the jobs one-by-one and this works.
(though this script takes a long time!)


What is the problem when removing all jobs at once with condor_rm ?

When the number of jobs to be removed is less (say about 4000), then condor_rm 
can remove all at once!


I am using condor 7.4.2 on Fedora Linux.

Regards,
Rob.