[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] remote condor job never gets removed



Hey Joe, got it! It's amazing what you can learn by actually reading the Condor Manual :-)

Since it isn't explicitly mentioned in the manual, here are the steps to submit a remote job and get the results back:

   $ condor_submit -remote cmhost -pool cmhost remote_vanilla.sub
   Submitting job(s).
   Logging submit event(s).
   1 job(s) submitted to cluster 61.
   Spooling data files for 1 jobs...

After the job completes (JobStatus=4 'C'), get the results and then remove the job:

   $ condor_q -pool cmhost -name cmhost


   -- Schedd: cmhost.bestsystems.co.jp : <172.16.10.117:46010>
    ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
     61.0   ajs             5/18 16:55   0+00:00:06 C  0   9.8  vanilla.sh

   0 jobs; 0 idle, 0 running, 0 held
   $ condor_transfer_data -pool cmhost -name cmhost 61.0
   Fetching data files...
   $ condor_rm -pool cmhost -name cmhost 61.0
   Job 61.0 marked for removal
   $


Cheers,
Andrew

On 5/20/2006 3:17 AM, Joe Meehean wrote:
Try:

condor_transfer_data <cluster.process>

_joe

Andrew Stubbings wrote:
I have sent the logs to condor-admin. When I said the "submitting machine" I was referring to where condor_submit was invoked. I didn't know data was not returned to the original machine. I can't find any data for the job left on the machine with the schedd. Is this an effect of the job stuck in the completed ('C') state?

Andrew

On 5/18/2006 11:03 PM, Erik Paulson wrote:
On Thu, May 18, 2006 at 05:20:45PM +0900, Andrew Stubbings wrote:
A remote job submitted from 6.7.18 SuSE 9.3/x86_64 to 6.7.19 SuSE 8.2/x86 completes but never gets removed from the queue or the results returned back to the submitting machine:

The SchedLog shows the job completed but ends with an mrec error:

The mrec thing is not the real problem. Please post (or stick
on a website, or send to condor-admin) the whole schedd log
and shadow log, there's not enough here to figure out what's
going on.

BTW, when you say "submitting machine" you mean the machine with
the schedd, right? Not the machine where condor_submit was invoked?
In a remote submit, Condor doesn't return data to the original
machine (there's no agent on that machine to accept the data), so
it stays on the machine with the schedd.

-Erik

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users