[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] spool to execute directory file transfers fail because of sparse and thin vm disks



Hi,
    when Starter trying to transfer file from spool directory It not able to transfer the sparse file correctly. It exited with socket error.
but the network seems to be proper. Is it because  condor not not able to differentiate sparse or thin or normal files correctly.

The Starter tried may times but it exited with following error and it not able to successfully transfer that file at all.

STARTERLog
5/25 19:11:27 get_file(): going to write to filename /vmfs/volumes/1107ffff-0ea6c919/execute/cloudesx2/dir_25659/vmBIbP33_condor-2bd4e3bf.vmss
5/25 19:11:27 get_file: Receiving 4296162111 bytes
5/25 19:18:01 DaemonCore: in SendAliveToParent()
5/25 19:18:01 DaemonCore: Leaving SendAliveToParent() - success
5/25 19:19:32 condor_read(): Socket closed when trying to read 65536 bytes from <192.168.10.7:9621>
5/25 19:19:32 ReliSock::get_bytes_nobuffer: Failed to receive file.
5/25 19:19:32 get_file: wrote 2494562304 bytes to file
5/25 19:19:32 get_file(): ERROR: received 2494562304 bytes, expected 4296162111!
5/25 19:19:32 DoDownload: STARTER at 192.168.10.254 failed to receive file /vmfs/volumes/1107ffff-0ea6c919/execute/cloudesx2/dir_25659/vmBIbP33_condor-2bd4e3bf.vmss
5/25 19:19:32 condor_write(): Socket closed when trying to write 249 bytes to <192.168.10.7:9621>, fd is 8
5/25 19:19:32 Buf::write(): condor_write() failed
5/25 19:19:32 Failed to send download failure report to <192.168.10.7:9621>.
5/25 19:19:32 DoDownload: exiting at 1743
5/25 19:19:32 DaemonCore: No more children processes to reap.
5/25 19:19:32 File transfer failed (status=0).
5/25 19:19:32 Calling client FileTransfer handler function.
5/25 19:19:32 ERROR "Failed to transfer files" at line 1780 in file jic_shadow.cpp
5/25 19:19:32 condor_write(): Socket closed when trying to write 165 bytes to <192.168.10.7:9731>, fd is 10
5/25 19:19:32 Buf::write(): condor_write() failed
5/25 19:19:32 ERROR "Assertion ERROR on (result)" at line 875 in file NTsenders.cpp
5/25 19:19:32 Deleting the StarterHookMgr

some more analysis on that. if we do ls the file size is given bellow is what taken by condor while starting.
[root@cloudesx2 cloudesx2]# ls -l /vmfs/volumes/nfs2/spool/cluster124.proc0.subproc0/vmBIbP33_condor-2bd4e3bf.vmss
-rwxrwxrwx    1 root     root     4296162111 May 25 17:40 /vmfs/volumes/nfs2/spool/cluster124.proc0.subproc0/vmBIbP33_condor-2bd4e3bf.vmss

if we do du the file size is given  condor is taking this size after transferring is done.
[root@cloudesx2 root]# du /vmfs/volumes/nfs2/spool/cluster124.proc0.subproc0/vmBIbP33_condor-2bd4e3bf.vmss
245363 /vmfs/volumes/nfs2/spool/cluster124.proc0.subproc0/vmBIbP33_condor-2bd4e3bf.vmss

[root@cloudesx2 cloudesx2]# du -h /vmfs/volumes/nfs2/spool/cluster124.proc0.subproc0/vmBIbP33_condor-2bd4e3bf.vmss
240M /vmfs/volumes/nfs2/spool/cluster124.proc0.subproc0/vmBIbP33_condor-2bd4e3bf.vmss

by
Johnson


Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com