|On May 1, 2015, at 8:54 AM, Angelo Fausti Neto <angelofausti@xxxxxxxxx> wrote:|
The decision of whether to run a job as the submitting user or the nobody user is based on the UID_DOMAIN configuration parameter of the submit and execute machines. With the usual configuration, in order to run the job as the submitting user, the value must be the same on the two machines and the value must be a substring of the submit machineâs full hostname. Otherwise, the job is run as the nobody user.
The value of UID_DOMAIN wonât change while a daemon is running, so it would be weird for an execute node to initially run jobs as the submitting user, and then start running them as user nobody (assuming the same submit machine is involved). One possibility is that after running for a while, the Condor startd starts getting a different result when determining the full hostname of the submit machine, such that it no longer matches the UID_DOMAIN value.
If that is happening, you will see the following message in the StarterLog.* logs:
ERROR: the submitting host claims to be in our UidDomain (%s), yet its hostname (%s) does not match. If the above hostname is actually an IP address, Condor could not perform a reverse DNS lookup to convert the IP back into a name. To solve this problem, you can either correctly configure DNS to allow the reverse lookup, or you can enable TRUST_UID_DOMAIN in your condor configuration.
Do the PERMISSION DENIED errors only appear when the machine runs jobs as user nobody, or are they always there? These two errors should not be directly related. But a change in how hostnames are being resolved could connect them.