[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] All the other machines except central manager don't work!!



Genie,

Is your condor execute directory on NFS with root squashing? The following line is what makes me guess that it might be:

01/13 06:32:30 get_file(): Failed to open file /home/condor/execute/dir_22496/condor_exec.exe, errno = 13: Permission denied.

If EXECUTE is on a NFS mount with root squashing, then it needs to be world-writable.

--Dan


Genie Jhang wrote:
Hello, again.
Thanks to all of you, I succeed to run and to connect all the machines our lab have. But, when I finally tried to submit jobs to machines, I found that all the other machines except central manager doesn't work!! and I dug the log files. Here's the log. ---------------------------------------------------------------------------------------------------------------------------------- 01/13 06:32:30 ******************************************************
01/13 06:32:30 ** condor_starter (CONDOR_STARTER) STARTING UP
01/13 06:32:30 ** /condor/sbin/condor_starter
01/13 06:32:30 ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1) 01/13 06:32:30 ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
01/13 06:32:30 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/13 06:32:30 ** $CondorPlatform: I386-LINUX_RHEL3 $
01/13 06:32:30 ** PID = 22496
01/13 06:32:30 ** Log last touched time unavailable (No such file or directory)
01/13 06:32:30 ******************************************************
01/13 06:32:30 Using config source: /condor/etc/condor_config
01/13 06:32:30 Using local config sources:
01/13 06:32:30    /home/condor/condor_config.local
01/13 06:32:30 DaemonCore: Command Socket at <192.168.0.105:33714 <http://192.168.0.105:33714>>
01/13 06:32:30 Done setting resource limits
01/13 06:32:30 Communicating with shadow <192.168.0.109:55237 <http://192.168.0.109:55237>>
01/13 06:32:30 Submitting machine is "pheko09"
01/13 06:32:30 setting the orig job name in starter
01/13 06:32:30 setting the orig job iwd in starter
01/13 06:32:30 get_file(): Failed to open file /home/condor/execute/dir_22496/condor_exec.exe, errno = 13: Permission denied.
01/13 06:32:30 get_file(): consumed 28023 bytes of file transmission
01/13 06:32:30 DoDownload: consuming rest of transfer and failing after encountering the following error: STARTER at 192.168.0.105 failed to write to file /home/condor/execute/dir_22496/condor_exec.exe: (errno 13) Permission denied 01/13 06:32:30 WARNING: File /home/condor/execute/dir_22496/condor_exec.exe can not be accessed by Quill file transfer tracking.
01/13 06:32:30 File transfer failed (status=0).
01/13 06:32:30 ERROR "Failed to transfer files" at line 1882 in file jic_shadow.cpp
01/13 06:32:30 ShutdownFast all jobs.
------------------------------------------------------------------------------------------------------------------------------------ What on the earth is the problem? I set ALLOW_WRITE = * in condor_config file of all the machines.
------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/