[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job stuck in idle mode - HasFileTransfer



Mr. Agarwal,

I don't think TRUST_UID_DOMAIN is the problem.  Run 'condor_status -long | grep ^HasFileTransfer' and 'condor_status -long | grep ^FileSystemDomain' to find out which of the two conditions is failing.  First of all, I'm assuming these two conditions have been automatically inserted into your job's requirements because you enabled file transfer in the submission file or Condor needs it by default.  Assuming that file transfer can work on all of your machines, HasFileTransfer should be true for all of your machines and FileSystemDomain should be set to the domain that all of the machines belong to (such as "cs.wisc.edu"), depending on your situation.  Check the FILESYSTEM_DOMAIN variable in the configuration files.  If your machines all share a similar filesystem (using NFS or mounted home directories or something), they should all be set to the same internet subdomain that they all belong to.
I know this is basic stuff, but hopefully this will prompt you to check your configuration to see if anything is wrong.  Besides that, I don't know exactly what causes Condor to set HasFileTransfer to be set to true or false.  Search the documentation for descriptions of these variables for more information.

Best Regards,
 - Garrett Heath Koller
kollerg14@xxxxxxxxxxxx

From: condor-users-bounces@xxxxxxxxxxx [condor-users-bounces@xxxxxxxxxxx] on behalf of Shiv Agarwal [shiv@xxxxxxxxxxx]
Sent: Friday, August 19, 2011 6:16 PM
To: condor-users
Subject: [Condor-users] job stuck in idle mode - HasFileTransfer

I have setup a small condor pool with 1 master node and 1 execute node.

I see not error messages in master or worker node log files whatsoever. In fact, the worker node does not even receive the request to execute the job. From my understanding the master node decides itself not to send the job to the execute node.

condor_q - analyze shows me that this particular requirement did not match ?

 ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == "condor-mstr" ) )  0


I have even set the TRUST_UID_DOMAIN = True


Please HELP!


--
Shiv Agarwal