[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor problem with file transfer



Dear experts,

   We are using condor as our batch system, and met a prblem these days. 

   If we require file transfer in the condor script, then the jobs will failed  with information:

        007 (2885999.000.000) 10/14 11:49:04 Shadow exception!
        Error from slot7@XXX: Could not initiate file transfer

  if you use condor_q, you will see your jobs exchange between "Idle" and "Run" .

  but if the condor jobs donnot require a file transfer, then if can run successfully.

  I also tried to shutdown the firewall, and it doesn't really help.

 

  Here is an example of our condor script:

  Universe             = vanilla
  Notification         = Never
  GetEnv               = True
  Executable           = /moose/AtlUser/liumh/CondorTest/Data/run_ana.sh
  #Arguments            = realdata_001_150501_0042004.txt /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/root/realdata_001_150501_0042004.root realdata_001_150501_0042004.lst
  Arguments            = realdata_001_150501_0042004.txt /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/root/realdata_001_150501_0042004.root
  Output               = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/out/realdata_001_150501_0042004.out
  Error                = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/err/realdata_001_150501_0042004.err
  Log                  = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/log/realdata_001_150501_0042004.log
  +Group               = "BESIII"
  should_transfer_files= yes
  #transfer_input_files = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/opt/realdata_001_150501_0042004.txt,/moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/opt/realdata_001_150501_0042004.lst
  transfer_input_files = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/opt/realdata_001_150501_0042004.txt
  requirements         = (substr(Machine,0,4)!="bl-0"&&ARCH=="X86_64")&& (machine != "bl-3-15.hep.ustc.edu.cn") && (machine != "bl-3-16.hep.ustc.edu.cn")
  WhenToTransferOutput = ON_EXIT_OR_EVICT
  OnExitRemove         = TRUE
  Queue

 

  Would you please gave me some comments? Thanks a lot !

Best regards,


Minghui