[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] All the other machines except central manager don't work!!



Thanks for your reply, Dan.

As you said, I changed permission of the directory, /home/condor/execute, on all machines to 777.

And I don't use NFS.

Now, i'm getting this kind of error.

--------------------------------------------------

022 (218.000.000) 01/13 18:03:15 Job disconnected, attempting to reconnect
    Socket between submit and execute hosts closed unexpectedly
    Trying to reconnect to slot3@pheko05 <192.168.0.105:33682>
...
024 (218.000.000) 01/13 18:03:15 Job reconnection failed
    Job not found at execution machine 
    Can not reconnect to slot3@pheko05, rescheduling job

-------------------------------------------------------------

I set 
UID_DOMAIN = 192.168.0.109
FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
USE_NFS = False 
SOFT_UID_DOMAIN = TRUE.



2010/1/13 Dan Bradley <dan@xxxxxxxxxxxx>
Genie,

Is your condor execute directory on NFS with root squashing?  The following line is what makes me guess that it might be:


01/13 06:32:30 get_file(): Failed to open file /home/condor/execute/dir_22496/condor_exec.exe, errno = 13: Permission denied.

If EXECUTE is on a NFS mount with root squashing, then it needs to be world-writable.

--Dan


Genie Jhang wrote:
Hello, again.
 Thanks to all of you, I succeed to run and to connect all the machines our lab have.
 But, when I finally tried to submit jobs to machines, I found that all the other machines except central manager doesn't work!!
 and I dug the log files.
 Here's the log.
 ----------------------------------------------------------------------------------------------------------------------------------  01/13 06:32:30 ******************************************************
01/13 06:32:30 ** condor_starter (CONDOR_STARTER) STARTING UP
01/13 06:32:30 ** /condor/sbin/condor_starter
01/13 06:32:30 ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
01/13 06:32:30 ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
01/13 06:32:30 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/13 06:32:30 ** $CondorPlatform: I386-LINUX_RHEL3 $
01/13 06:32:30 ** PID = 22496
01/13 06:32:30 ** Log last touched time unavailable (No such file or directory)
01/13 06:32:30 ******************************************************
01/13 06:32:30 Using config source: /condor/etc/condor_config
01/13 06:32:30 Using local config sources:
01/13 06:32:30    /home/condor/condor_config.local
01/13 06:32:30 DaemonCore: Command Socket at <192.168.0.105:33714 <http://192.168.0.105:33714>>

01/13 06:32:30 Done setting resource limits
01/13 06:32:30 Communicating with shadow <192.168.0.109:55237 <http://192.168.0.109:55237>>

01/13 06:32:30 Submitting machine is "pheko09"
01/13 06:32:30 setting the orig job name in starter
01/13 06:32:30 setting the orig job iwd in starter
01/13 06:32:30 get_file(): Failed to open file /home/condor/execute/dir_22496/condor_exec.exe, errno = 13: Permission denied.
01/13 06:32:30 get_file(): consumed 28023 bytes of file transmission
01/13 06:32:30 DoDownload: consuming rest of transfer and failing after encountering the following error: STARTER at 192.168.0.105 failed to write to file /home/condor/execute/dir_22496/condor_exec.exe: (errno 13) Permission denied
01/13 06:32:30 WARNING: File /home/condor/execute/dir_22496/condor_exec.exe can not be accessed by Quill file transfer tracking.
01/13 06:32:30 File transfer failed (status=0).
01/13 06:32:30 ERROR "Failed to transfer files" at line 1882 in file jic_shadow.cpp
01/13 06:32:30 ShutdownFast all jobs.
 ------------------------------------------------------------------------------------------------------------------------------------
 What on the earth is the problem?
 I set ALLOW_WRITE = * in condor_config file of all the machines.
------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
 
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/