[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem with deferral_time, deferral_prep_time with Condor on Windows



Dear Condor-Users,

 

Since a few years I’m using Condor to run programs in batch on a Windows platform and  this works fine for me.

To be able to schedule jobs I would like to use the Time Scheduling option of Condor. The aim is to schedule jobs in the future, but still be able to submit small jobs before the scheduled jobs start.

Reading through the manual I feel I need to use the “deferral_time”, “deferral_prep_time” and “deferral_window” options.

 

The “deferral_time” option only works when I supply it with a large enough “deferral_prep_time”, but then the specific machine is claimed and cannot be used for analysis until the scheduled job finishes.

A small “deferral_prep_time” (300sec) will set the status to  Idle, and stays idle for ever and the job stays rejected with the message “no match found”.

 

Have you got any idea why this job won’t get processed?

 

Attached 2 files with job particulars showing output from Condor.

 

Thanks in advance,

Adrian

CurrentTime = time()
NiceUser = false
LocalSysCpu = 0.0
ExitStatus = 0
NTDomain = "HOCG"
WindowsMajorVersion = 6
BufferBlockSize = 32768
MyType = "Job"
WindowsBuildNumber = 7601
NumRestarts = 0
CumulativeSuspensionTime = 0
TargetType = "Machine"
Owner = "adriann"
RemoteUserCpu = 0.0
ClusterId = 158
CompletionDate = 0
QDate = 1455790207
RemoteSysCpu = 0.0
DeferralPrepTime = 650
ExitBySignal = false
User = "adriann@xxxxxxxxxxxxxxxxxxxxxxxx"
WindowsMinorVersion = 1
LastSuspensionTime = 0
WindowsServicePackMajorVersion = 1
LocalUserCpu = 0.0
WindowsServicePackMinorVersion = 0
WantCheckpoint = false
TransferErr = false
WindowsProductType = 1
CondorPlatform = "$CondorPlatform: x86_64_Windows8 $"
RemoteWallClockTime = 0.0
NumSystemHolds = 0
WhenToTransferOutput = "ON_EXIT"
Requestwamit7 = 1
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts = 0
CommittedTime = 0
CommittedSlotTime = 0
MaxHosts = 1
CumulativeSlotTime = 0
CoreSize = 0
TotalSuspensions = 0
DiskUsage_RAW = 144
CommittedSuspensionTime = 0
Iwd = "\\leinetapp1\condor\ISTest\WAMIT\Wamit7"
DiskUsage = 150
WantRemoteSyscalls = false
ImageSize_RAW = 2
CondorVersion = "$CondorVersion: 8.2.2 Aug 07 2014 BuildID: 265643 $"
JobUniverse = 5
CurrentHosts = 0
Cmd = "\\leinetapp1\condor\ISTest\WAMIT\Wamit7\runBT.bat"
RequestCpus = 1
MinHosts = 1
BufferSize = 524288
EnteredCurrentStatus = 1455790207
ImageSize = 2
JobPrio = 0
Environment = ""
UserLog = "\\leinetapp1\condor\ISTest\WAMIT\Wamit7\WAMIT7.log"
JobNotification = 1
WantRemoteIO = true
NotifyUser = "adriann@xxxxxxxxxxxxxxxxxxxxx"
Rank = 0.0
In = "/dev/null"
TransferIn = false
TransferOut = false
ShouldTransferFiles = "YES"
TransferInput = "BT.wam,BT.cfg,BT.pot,bt.gdf,BT.frc"
ExecutableSize_RAW = 2
ExecutableSize = 2
TransferInputSizeMB = 0
RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
RequestDisk = DiskUsage
DeferralTime = ( CurrentTime + 960 )
DeferralWindow = 480
ScheddInterval = 300
Requirements = ( debug(OpSys == "WINDOWS" && Arch == "X86_64") ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.wamit7 >= Requestwamit7 ) && ( TARGET.HasFileTransfer ) && ( TARGET.HasJobDeferral ) && ( ( ( time() + ScheddInterval ) >= ( DeferralTime - DeferralPrepTime ) ) && ( time() < ( DeferralTime + DeferralWindow ) ) )
AutoClusterId = 4
JobLeaseDuration = 1200
PeriodicHold = false
PeriodicRelease = false
PeriodicRemove = false
OnExitHold = false
OnExitRemove = true
LeaveJobInQueue = false
Args = ""
ConcurrencyLimits = "cpu:1"
Out = "BT_Condor.out"
ProcId = 0
Err = "BT_Condor.err"
GlobalJobId = "LEICDRTST1.internal.hmc.heerema.com#158.0#1455790207"
JobStatus = 1
LastJobStatus = 0
AutoClusterAttrs = "_condor_RequestCpus,_condor_RequestDisk,_condor_RequestMemory,_condor_RequestWAMIT7,JobUniverse,LastCheckpointPlatform,NumCkpts,RequestCpus,RequestDisk,RequestMemory,RequestWAMIT7,DeferralPrepTime,DeferralTime,DeferralWindow,DiskUsage,ImageSize,ScheddInterval,Requirements,NiceUser,ConcurrencyLimits"
LastRejMatchReason = "no match found"
LastRejMatchTime = 1455790341
ServerTime = 1455790343


-- Submitter: LEICDRTST1.internal.hmc.heerema.com : <57.192.11.103:8080> : LEICDRTST1.internal.hmc.heerema.com
User priority for adriann@xxxxxxxxxxxxxxxxxxxxxxxx is not available, attempting to analyze without it.
---
158.000:  Run analysis summary.  Of 1 machines,
      1 are rejected by your job's requirements 
      0 reject your job because of their own requirements 
      0 match and are already running your jobs 
      0 match but are serving other users 
      0 are available to run your job
	No successful match recorded.
	Last failed match: Thu Feb 18 12:08:04 2016

	Reason for last match failure: no match found

WARNING:  Be advised:
   No resources matched request's constraints

The Requirements expression for your job is:

    ( debug(OpSys == "WINDOWS" && Arch == "X86_64") ) &&
    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
    ( TARGET.wamit7 >= Requestwamit7 ) && ( TARGET.HasFileTransfer ) &&
    ( TARGET.HasJobDeferral ) &&
    ( ( ( time() + ScheddInterval ) >= ( DeferralTime - DeferralPrepTime ) ) &&
      ( time() < ( DeferralTime + DeferralWindow ) ) )

Your job defines the following attributes:

    CurrentTime = 1455793689
    DeferralPrepTime = 650
    DeferralTime = 1455794649
    DeferralWindow = 480
    DiskUsage = 150
    ImageSize = 2
    RequestDisk = 150
    RequestMemory = 1
    Requestwamit7 = 1
    ScheddInterval = 300

slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx has the following attributes:

    TARGET.Arch = "X86_64"
    TARGET.OpSys = "WINDOWS"
    TARGET.Disk = 16653746176
    TARGET.HasFileTransfer = true
    TARGET.HasJobDeferral = true
    TARGET.Memory = 4095
    TARGET.wamit7 = 3

The Requirements expression for your job reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[11]      never  time() + ScheddInterval
[12]      never  DeferralTime - DeferralPrepTime
[13]      never  ( time() + ScheddInterval ) >= ( DeferralTime - DeferralPrepTime )

Suggestions:

Job ClassAd Requirements expression evaluates to false