[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] user on the execute machine



Hi everybody

I've a test cluster with condor 6.8.5 and CentOS5.
I tried different test script that do not need to access any file in input and everything worked fine.
Now I'm submitting this job where it needs to transfer over some input file and I get this in the log file:

---------------------------------------------------------
000 (019.000.000) 08/08 10:16:04 Job submitted from host: <10.0.0.1:20023>
...
022 (019.000.000) 08/08 10:16:07 Job disconnected, attempting to reconnect
    Socket between submit and execute hosts closed unexpectedly
    Trying to reconnect to node1 <10.0.0.101:20086>
...
024 (019.000.000) 08/08 10:16:07 Job reconnection failed
    Job not found at execution machine
    Can not reconnect to node1, rescheduling job
...
---------------------------------------------------------

I looked on the execute machine and StarterLog contains this:

---------------------------------------------------------
.....
8/8 10:16:07 Submitting machine is "master1"
8/8 10:16:07 passwd_cache::cache_uid(): getpwnam("foo") failed: Success
8/8 10:16:07 ERROR: Uid for "foo" not found in passwd file and SOFT_UID_DOMAIN is False
8/8 10:16:07 ERROR: Failed to determine what user to run this job as, aborting
8/8 10:16:07 Failed to initialize JobInfoCommunicator, aborting
8/8 10:16:07 Unable to start job.
8/8 10:16:07 **** condor_starter (condor_STARTER) EXITING WITH STATUS 1
-------------------------------------------------------

I guess that the issue is the file accessing, but I do not understand why Condor is complaining about the user.
Is it suppose to run with the local "nobody" account if the user does not have an account on the machine where the job is running?

Thank you all
MAX