[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Failed to chmod file




Has anyone got Condor to work with ACLs in Linux? We seem to have tried a lot between us and can't this to work.


Si Hammond


On 15 Aug 2007, at 14:14, Sophie Prieur wrote:

Hi Simon,

No the problem still remains the same, from user account a job for Unix doesn't work but for a job for Windows there is no problem, on Windows no problem to submit a job, and on condor user no problem to submit a job.

Sophie

"Simon Hammond" <simon.hammond@xxxxxxxxx> 15/08/2007 10:57 >>>
Sophie,

Have you managed to fix this, I've been experimenting here and we are having
the same problems. The ACL solution doesn't seem to work here.

Thanks,

Si Hammond


On 08/08/07, Sophie Prieur <s.prieur@xxxxxxxxxxxxxx> wrote:

Simon,

Sorry, but I have not understand your answer. The job is running well and give the results with the condor account, but with another account which is not condor, the job has these problems. I am using NFS, and I have this when
I submit the job : WARNING: Log file
/home/sp5978/Simul/condorRes/neutral/out1/condor_log is on NFS.
This could cause log file corruption and is _not_ recommended.

Thank you
Sophie

Si Hammond <simon.hammond@xxxxxxxxx> 07/08/2007 18:25 >>>

Sophie,

Are you running Stork to handle the file transfers? If you're not
using a shared filesystem then you might need that.



S.


On 7 Aug 2007, at 10:46, Sophie Prieur wrote:

Thank you very much Simon
I have tried to change acl, as I am running Condor not as root. but
it's always the same, and I have this in the StarterLog :
******************************************************
8/7 10:41:49 Using config source: /home/condor/condor_config
8/7 10:41:49 Using local config sources:
8/7 10:41:49    /home/condor/hosts/balsa/condor_config.local
8/7 10:41:49 DaemonCore: Command Socket at <143.234.88.55:63601>
8/7 10:41:49 Done setting resource limits
8/7 10:41:49 Communicating with shadow <143.234.88.55:63599>
8/7 10:41:49 Submitting machine is "balsa.macaulay.ac.uk"
8/7 10:41:50 File transfer completed successfully.
8/7 10:41:51 Starting a VANILLA universe job with ID: 176.0
8/7 10:41:51 IWD: /home/condor/hosts/balsa/execute/dir_24354
8/7 10:41:51 Output file: /home/condor/hosts/balsa/execute/
dir_24354/condor_output
8/7 10:41:51 Error file: /home/condor/hosts/balsa/execute/dir_24354/
condor_error
8/7 10:41:51 About to exec /home/condor/hosts/balsa/execute/
dir_24354/condor_exec.exe Simul --batch -cfg /home/sp5978/simul2/
configFiles/neutral/config-neutral0
8/7 10:41:51 Create_Process succeeded, pid=24357
8/7 10:42:03 Process exited, pid=24357, status=134
8/7 10:42:03 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:03 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:03 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:03 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:03 File transfer failed, forcing disconnect.
8/7 10:42:03 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:03 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:03 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:03 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:04 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:04 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:04 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:04 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:04 File transfer failed, forcing disconnect.
8/7 10:42:04 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:04 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:04 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:04 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:04 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:04 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:04 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:04 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:04 File transfer failed, forcing disconnect.
8/7 10:42:04 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:04 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:04 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:04 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:05 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:05 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:05 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:05 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:05 File transfer failed, forcing disconnect.
8/7 10:42:05 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:05 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:05 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:05 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:05 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe).  Assuming failure.
8/7 10:42:05 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:05 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:05 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:05 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:18 Got SIGQUIT.  Performing fast shutdown.
8/7 10:42:18 ShutdownFast all jobs.
8/7 10:42:18 **** condor_starter (condor_STARTER) EXITING WITH
STATUS 0

I can see the results in the condor_output file, but the job restarts.

"Simon Hammond" <simon.hammond@xxxxxxxxx> 07/08/2007 09:53 >>>
I guess you are running Condor not as root?

If not, you can use ACL's to give the user Condor is running as
access to
the file

e.g. setfacl -m u:condor:rwx ./myfile.txt

This will enable just the Condor user to read/write the file. You
may need
to adjust the mask to get this to work correctly.


On 07/08/07, Sophie Prieur <s.prieur@xxxxxxxxxxxxxx> wrote:

 Hi everybody,

I have a problem when I submit a job, I have this in ShadowLog :
ReliSock::get_file_with_permissions(): Failed to chmod file
'/home/sp5978/simul2/condorRes/neutral/out1/condor_output': Not owner
(errno: 1)
and this in StarterLog
DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <
143.234.88.55:51883>; SHADOW at 143.234.88.55 failed to receive file
/home/sp5978/simul2/condorRes/neutral/out0/condor_output

The submit file is this :
Universe = vanilla
Executable = /software/guiswarm/swarm-2.2/bin/javaswarm

Log = condor_log
Error = condor_error
Output = condor_output

getenv = true

requirements = ((( OpSys == "SOLARIS29" ) && ( Arch == "SUN4u" ))
|| ((
OpSys == "SOLARIS28" ) && ( Arch == "SUN4u" )))
transfer_input_files = /home/sp5978/simul2/bin/Simul.class,
/home/sp5978/simul2/bin/BatchSwarm.class,
/home/sp5978/simul2/bin/Beq0.class, /home/sp5978/simul2/bin/
Cell.class,
/home/sp5978/simul2/bin/DNA.class, /home/sp5978/simul2/bin/
DsupK.class,
/home/sp5978/simul2/bin/incorrectValue.class,
/home/sp5978/simul2/bin/Individual.class, /home/sp5978/simul2/bin/
Map.class,
/home/sp5978/simul2/bin/missingValue.class,
/home/sp5978/simul2/bin/ModelSwarm.class,
/home/sp5978/simul2/bin/noEnoughValues.class,
/home/sp5978/simul2/bin/ObserverSwarm.class,
/home/sp5978/simul2/bin/Param.class, /home/sp5978/simul2/bin/
Plant.class,
/home/sp5978/simul2/bin/Project.class, /home/sp5978/simul2/bin/
Seed.class,
/home/sp5978/simul2/bin/Specie.class,
/home/sp5978/simul2/bin/SwarmUtils.class
transfer_files = ALWAYS

InitialDir = /home/sp5978/simul2/condorRes/neutral/out0
Arguments = Simul --batch -cfg
/home/sp5978/simul2/configFiles/neutral/config-neutral0
Queue
InitialDir = /home/sp5978/simul2/condorRes/neutral/out1
Arguments = Simul --batch -cfg
/home/sp5978/simul2/configFiles/neutral/config-neutral1
Queue
And the right for the condor_output :
bash-2.03$ ls -l ../condorRes/neutral/out0
total 377
-rw-rw-r-- 1 sp5978 staff 0 Aug 6 11:18 condor_error
-rw-rw-r--    1 sp5978   staff      370097 Aug  7 09:30 condor_log
-rwxrwxrwx 1 sp5978 staff 400 Aug 7 09:29 condor_output

The job is running, I can see the results in the condor_output
files but
it doesn't stop, it still remains in the queue and restart after a
while.
Can someone help me?
Thanks in advance
Sophie


--
Please note that the views expressed in this e-mail are those of the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not
read,
copy, disclose or rely on any information contained in this e-
mail, and
we would ask you to contact the sender immediately and delete the
email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
Please note that the views expressed in this e-mail are those of the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not read,
copy, disclose or rely on any information contained in this e-mail,
and
we would ask you to contact the sender immediately and delete the
email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

--
Please note that the views expressed in this e-mail are those of the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not read, copy, disclose or rely on any information contained in this e- mail, and we would ask you to contact the sender immediately and delete the email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
Please note that the views expressed in this e-mail are those of the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not read,
copy, disclose or rely on any information contained in this e-mail, and we would ask you to contact the sender immediately and delete the email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/