[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Failed to chmod file



Thank you very much Simon
I have tried to change acl, as I am running Condor not as root. but it's always the same, and I have this in the StarterLog :
******************************************************
8/7 10:41:49 Using config source: /home/condor/condor_config
8/7 10:41:49 Using local config sources: 
8/7 10:41:49    /home/condor/hosts/balsa/condor_config.local
8/7 10:41:49 DaemonCore: Command Socket at <143.234.88.55:63601>
8/7 10:41:49 Done setting resource limits
8/7 10:41:49 Communicating with shadow <143.234.88.55:63599>
8/7 10:41:49 Submitting machine is "balsa.macaulay.ac.uk"
8/7 10:41:50 File transfer completed successfully.
8/7 10:41:51 Starting a VANILLA universe job with ID: 176.0
8/7 10:41:51 IWD: /home/condor/hosts/balsa/execute/dir_24354
8/7 10:41:51 Output file: /home/condor/hosts/balsa/execute/dir_24354/condor_output
8/7 10:41:51 Error file: /home/condor/hosts/balsa/execute/dir_24354/condor_error
8/7 10:41:51 About to exec /home/condor/hosts/balsa/execute/dir_24354/condor_exec.exe Simul --batch -cfg /home/sp5978/simul2/configFiles/neutral/config-neutral0
8/7 10:41:51 Create_Process succeeded, pid=24357
8/7 10:42:03 Process exited, pid=24357, status=134
8/7 10:42:03 condor_write(): send() 65536 bytes to unknown source returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:03 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:03 ReliSock::put_file: failed to put 65536 bytes (put_bytes_nobuffer() returned -1)
8/7 10:42:03 DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed to receive file /home/sp5978/simul2/condorRes/neutral/out0/condor_output
8/7 10:42:03 File transfer failed, forcing disconnect.
8/7 10:42:03 JIC::allJobsDone() failed, waiting for job lease to expire or for a reconnect attempt
8/7 10:42:03 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:03 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:03 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:04 condor_write(): send() 65536 bytes to unknown source returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:04 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:04 ReliSock::put_file: failed to put 65536 bytes (put_bytes_nobuffer() returned -1)
8/7 10:42:04 DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed to receive file /home/sp5978/simul2/condorRes/neutral/out0/condor_output
8/7 10:42:04 File transfer failed, forcing disconnect.
8/7 10:42:04 JIC::allJobsDone() failed, waiting for job lease to expire or for a reconnect attempt
8/7 10:42:04 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:04 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:04 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:04 condor_write(): send() 65536 bytes to unknown source returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:04 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:04 ReliSock::put_file: failed to put 65536 bytes (put_bytes_nobuffer() returned -1)
8/7 10:42:04 DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed to receive file /home/sp5978/simul2/condorRes/neutral/out0/condor_output
8/7 10:42:04 File transfer failed, forcing disconnect.
8/7 10:42:04 JIC::allJobsDone() failed, waiting for job lease to expire or for a reconnect attempt
8/7 10:42:04 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:04 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:04 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:05 condor_write(): send() 65536 bytes to unknown source returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:05 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:05 ReliSock::put_file: failed to put 65536 bytes (put_bytes_nobuffer() returned -1)
8/7 10:42:05 DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed to receive file /home/sp5978/simul2/condorRes/neutral/out0/condor_output
8/7 10:42:05 File transfer failed, forcing disconnect.
8/7 10:42:05 JIC::allJobsDone() failed, waiting for job lease to expire or for a reconnect attempt
8/7 10:42:05 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:05 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:05 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:05 condor_write(): send() 65536 bytes to unknown source returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:05 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:05 ReliSock::put_file: failed to put 65536 bytes (put_bytes_nobuffer() returned -1)
8/7 10:42:05 DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed to receive file /home/sp5978/simul2/condorRes/neutral/out0/condor_output
8/7 10:42:05 JIC::allJobsDone() failed, waiting for job lease to expire or for a reconnect attempt
8/7 10:42:18 Got SIGQUIT.  Performing fast shutdown.
8/7 10:42:18 ShutdownFast all jobs.
8/7 10:42:18 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0

I can see the results in the condor_output file, but the job restarts.

>>> "Simon Hammond" <simon.hammond@xxxxxxxxx> 07/08/2007 09:53 >>>
I guess you are running Condor not as root?

If not, you can use ACL's to give the user Condor is running as access to
the file

e.g. setfacl -m u:condor:rwx ./myfile.txt

This will enable just the Condor user to read/write the file. You may need
to adjust the mask to get this to work correctly.


On 07/08/07, Sophie Prieur <s.prieur@xxxxxxxxxxxxxx> wrote:
>
>  Hi everybody,
>
> I have a problem when I submit a job, I have this in ShadowLog :
> ReliSock::get_file_with_permissions(): Failed to chmod file
> '/home/sp5978/simul2/condorRes/neutral/out1/condor_output': Not owner
> (errno: 1)
> and this in StarterLog
> DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <
> 143.234.88.55:51883>; SHADOW at 143.234.88.55 failed to receive file
> /home/sp5978/simul2/condorRes/neutral/out0/condor_output
>
> The submit file is this :
> Universe = vanilla
> Executable = /software/guiswarm/swarm-2.2/bin/javaswarm
>
> Log = condor_log
> Error = condor_error
> Output = condor_output
>
> getenv = true
>
> requirements = ((( OpSys == "SOLARIS29" ) && ( Arch == "SUN4u" )) || ((
> OpSys == "SOLARIS28" ) && ( Arch == "SUN4u" )))
> transfer_input_files = /home/sp5978/simul2/bin/Simul.class,
> /home/sp5978/simul2/bin/BatchSwarm.class,
> /home/sp5978/simul2/bin/Beq0.class, /home/sp5978/simul2/bin/Cell.class,
> /home/sp5978/simul2/bin/DNA.class, /home/sp5978/simul2/bin/DsupK.class,
> /home/sp5978/simul2/bin/incorrectValue.class,
> /home/sp5978/simul2/bin/Individual.class, /home/sp5978/simul2/bin/Map.class,
> /home/sp5978/simul2/bin/missingValue.class,
> /home/sp5978/simul2/bin/ModelSwarm.class,
> /home/sp5978/simul2/bin/noEnoughValues.class,
> /home/sp5978/simul2/bin/ObserverSwarm.class,
> /home/sp5978/simul2/bin/Param.class, /home/sp5978/simul2/bin/Plant.class,
> /home/sp5978/simul2/bin/Project.class, /home/sp5978/simul2/bin/Seed.class,
> /home/sp5978/simul2/bin/Specie.class,
> /home/sp5978/simul2/bin/SwarmUtils.class
> transfer_files = ALWAYS
>
> InitialDir = /home/sp5978/simul2/condorRes/neutral/out0
> Arguments = Simul --batch -cfg
> /home/sp5978/simul2/configFiles/neutral/config-neutral0
> Queue
> InitialDir = /home/sp5978/simul2/condorRes/neutral/out1
> Arguments = Simul --batch -cfg
> /home/sp5978/simul2/configFiles/neutral/config-neutral1
> Queue
> And the right for the condor_output :
> bash-2.03$ ls -l ../condorRes/neutral/out0
> total 377
> -rw-rw-r--    1 sp5978   staff           0 Aug  6 11:18 condor_error
> -rw-rw-r--    1 sp5978   staff      370097 Aug  7 09:30 condor_log
> -rwxrwxrwx    1 sp5978   staff         400 Aug  7 09:29 condor_output
>
> The job is running, I can see the results in the condor_output files but
> it doesn't stop, it still remains in the queue and restart after a while.
> Can someone help me?
> Thanks in advance
> Sophie
>
>
> --
> Please note that the views expressed in this e-mail are those of the
> sender and do not necessarily represent the views of the Macaulay
> Institute. This email and any attachments are confidential and are
> intended solely for the use of the recipient(s) to whom they are
> addressed. If you are not the intended recipient, you should not read,
> copy, disclose or rely on any information contained in this e-mail, and
> we would ask you to contact the sender immediately and delete the email
> from your system. Thank you.
> Macaulay Institute and Associated Companies, Macaulay Drive,
> Craigiebuckler, Aberdeen, AB15 8QH.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users 
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/ 
>
>

-- 
Please note that the views expressed in this e-mail are those of the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not read,
copy, disclose or rely on any information contained in this e-mail, and
we would ask you to contact the sender immediately and delete the email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.