[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Bug in condor_transfer_data (Windows) in Condor 7.6 series



I reported some months ago on this list that condor_transfer_data
suddenly stopped working on my Windows machines when I upgrade from
the 7.4 series to the 7.6 series. As I got no reply at that time it
seems to me that no one else has experienced this issue so far. As I
would really like to upgrade to Condor 7.6 now I investigated the
issue now and found the change in the source that seems to break the
condor_transfer_data tool on my machines. Just as a reminder. As soon
as I try to transfer data with the condor_transfer_data tool. On any
of my Windows machines I get the following error:

C:\Documents and Settings\FelixWolfheimer\Desktop\condor>condor_transfer_data 65


Fetching data files...



DCSchedd::receiveJobSandbox:7003:File transfer failed for target job 65.0: SCHED

D at 10.2.4.60 failed to send file(s) to <10.2.4.60:1318>: error reading from C:

\Condor/spool\65\0\cluster65.proc0.subproc0\horn.out.log: permission denied; TOO

L failed to receive file(s) from <10.2.4.60:9619>

ERROR: Failed to spool job files.

When I look into the SchedLog file I can see the following error (the
important line is the first one:
01/03/12 16:30:44 (pid:47892) Perm::GetAclInformation failed with error 122

01/03/12 16:30:44 (pid:47892) DoUpload: (Condor error code 13, subcode
1) SCHEDD at 10.2.4.60 failed to send file(s) to <10.2.4.60:1318>:
error reading from
C:\Condor/spool\65\0\cluster65.proc0.subproc0\horn.out.log: permission
denied; TOOL failed to receive file(s) from <10.2.4.60:9619>

01/03/12 16:30:44 (pid:47892) generalJobFilesWorkerThread(): failed to
transfer files for job 65.0

This is the piece of code in Condor causing the error in version 7.6.4
(src/condor_utils/perm.WINDOWS.cpp):
	ACL_SIZE_INFORMATION* acl_info = new ACL_SIZE_INFORMATION();
		// Structure contains the following members:
		//  DWORD   AceCount;
		//  DWORD   AclBytesInUse;
		//  DWORD   AclBytesFree;



	// first get the number of ACEs in the ACL
		if (! GetAclInformation( pacl,		// acl to get info from
					acl_info,	// buffer to receive info
					sizeof(acl_info),  // size in bytes of buffer
					AclSizeInformation // class of info to retrieve
					) ) {
			dprintf(D_ALWAYS, "Perm::GetAclInformation failed with error %d\n",
GetLastError() );
			return -1;
		}

Here is the piece of code which worked for me (Condor
7.4.4,src/condor_c++_util/perm.cpp):
	ACL_SIZE_INFORMATION* acl_info = new ACL_SIZE_INFORMATION();
		// Structure contains the following members:
		//  DWORD   AceCount;
		//  DWORD   AclBytesInUse;
		//  DWORD   AclBytesFree;

	// first get the number of ACEs in the ACL
		if (! GetAclInformation( pacl,		// acl to get info from
								acl_info,	// buffer to receive info
								24,			// size in bytes of buffer
								AclSizeInformation // class of info to retrieve
								) ) {
			dprintf(D_ALWAYS, "Perm::GetAclInformation failed with error %d\n",
GetLastError() );
			return -1;
		}

The code in Condor 7.6.4 although a good attempt to parameterize the
size of ACL_SIZE_INFORMATION is wrong. You are asking for the size of
the pointer here (8 Bytes) and not for the size of the struct (24
Bytes). Is there any chance that you can fix that soon in the stable
series (7.6)? BTW: The GetLastError function comes back with 122 which
means "The buffer provided is too small" which is consinstent with my
observations. I wonder why no one else seems to have this problem as
condor_transfer_data will fail with this bug in it for sure on any
Windows system...