[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Can't get rid of a job



Hello,

I'm current playing around with condor to see how it works, and got into a situation where I can't remove a job from the queue. I have one machine set up as central manager and submitter, and another as executer. I've been able to submit and complete jobs in a vanilla universe and everything seemed OK, except that I see the jobs constantly cycle between suspended and unsuspended like the manual warns about on linux.

Anyway, then I tried to see if I could specify input and output files to transfer (there's a shared file system, so it's not really necessary), and intentionally specified an output file that I knew wouldn't be found. It looked like the job completed (several suspends later) based on logging that my script did, but the job stayed in the queue. I tried to release it but condor says it can't be released (either with -all or user name). I then shut down the condor daemons on both machines (which evicted the job) and restarted them, but the job was still there. It appears to be trying to run the job over and over again, but can't ever transfer the output file. The job id never changes.

How can I get rid of the-job-that-won't-die?  I was only fooling around!

Thanks,

Cathy