[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_trsnafer_data makes schedd unresposive



Hi All,

I'm running condor_version:
$CondorVersion: 8.0.3 Sep 19 2013 BuildID: 174914 $
$CondorPlatform: x86_64_RedHat6 $

Jobs are submitted remotely. The remote machine needs to retrieve the results via condor_transfer_data.  When trying to retrieve completed jobs condor becomes unresponsive. On the remote machine:

condor_transfer_data -pool cloudscheduler.cern.ch -name aiatlas009.cern.ch -const 'JobStatus==4' -all
Fetching data files...

DCSchedd::receiveJobSandbox:6004:Can't receive JobAdsArrayLen from the schedd (<128.142.135.122:8081>)
ERROR: Failed to spool job files.


There is a long (5min) pause between the "Fetching data" and the DCSchedd message.  While this is going on the scheduler seems unresponsive, even locally:

condor_q

-- Failed to fetch ads from: <128.142.135.122:8081> : aiatlas009.cern.ch
SECMAN:2007:Failed to end classad message

Since this is running on a linux machines I thought the condor_transfer_data request was supposed to fork off the scheduler. Is there a setting that could be preventing this?

Cheers,
-Frank


----------
Frank Berghaus
University of Victoria
Research Associate
Physics & Astronomy
UVic Phone: +1 (250) 472-4085
UVic Office: Elliot 201