[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem running Grid jobs using Condor.



Hello,

I am trying to run a job in the condor system submitted through the Globus Gatekeeper.

But the jobs are being held for this reason:

HoldReason = "Error from starter on slot1@xxxxxxxxxxxxxxxxxxx: Failed to open '/home/research/bala/.globus/job/vulcan.txcorp.com/9128.1239817731/stdout' as standard output: No such file or directory (errno 2)"

Here is what I already did:
1. Started the execute machine's master daemon as root.

2. Set the UID_DOMAIN in the condor_config on the execute machine to txcorp.com

3. Set the TRUST_UID_DOMAIN = TRUE on the execute machine

4. The account with which the job is supposed to be run on the execute machine is not in the /etc/passwd file. So the SOFT_UID_DOMAIN = TRUE is set in the execute machine.

However, the execute machine (10.0.0.2) cannot do a dns lookup. So there is no way the execute machine can DNS resolve 10.0.0.105 to vulcan.txcorp.com which is the submit machine, although /etc/hosts can be used to resolve 10.0.0.105 to vulcan.txcorp.com

Questions:
1. Does the execute machine depends only on dns to resolve the ip address to its name? And if it fails does it run the job as nobody?

2. How do I see with what account the job is tried to run as? I'm guessing that the job is run as nobody while it is supposed to be running as bala. How do I check it?

Thanks much!

--
Balamurali Ananthan (bala@xxxxxxxxxx) (720.974.1843)	
Tech-X Corp, 5621 Arapahoe Ave, Suite A, Boulder, CO 80303