[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem with MPI universe job



Hi,

Maybe increasing the value of SEC_TCP_SESSION_TIMEOUT will help?

Pasquale

On 4/4/07, Pasquale Tricarico <tricaric@xxxxxxxxx> wrote:
The device on the head node is a raid 5 array with SCSI ultra160
connection to the head node. When the job finishes, Condor has to copy
back on the head node several (~100) files of the size of many MB, and
this happens for 8 nodes at about the same time, so it is reasonable
that the device cannot accommodate the whole load, and some
connections might be timing out. Is it possible to extend this
time-out period? Say, 5 minutes instead of 30 seconds or so?

Thanks,
Pasquale

On 4/4/07, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
>
>
> Pasquale Tricarico wrote:
> > 4/4 02:38:55 condor_write(): timed out writing 65536 bytes to <10.7.7.250:34338>
> >
>
> It is timing out after 30 seconds while trying to copy back 65536 bytes
> of an output file.  Are your output files being written to a very slow
> device?  Or do you have a lot of jobs all writing to this same device at
> the same time?
>
> --Dan
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>