[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Unsubscribe




--- Vijay Shiv Kumar <vijayskumar@xxxxxxxxxxx> wrote:

> 
> Dear all,
> 
> I have set up a Condor pool spanning nodes from two
> clusters;
> each cluster has its own filesystem domain
> ('bmi.oar.net' and
> 'cse.oar.net'). The head-node of one of the clusters
> serves as the central
> manager, dedicated scheduler and the submit node for
> the entire pool.
> 
> Now, I wish to execute certain jobs only on a
> specific cluster. I tried to
> achieve this by having these jobs require a specific
> FileSystemDomain in
> their classAd. However, this request is never
> matched even though
> unclaimed candidate resources exist in the pool.
> 
> Specific e.g.: One of my jobs must be executed only
> on the cluster with
> filesystem domain 'cse.oar.net'. (The submit node
> and this target cluster 
> do not share a common filesystem).
> 
> The job's specific requirements are as follows:
> 
> [vijayskumar@bm-login ~]$ condor_q -long | grep
> Requirements
> Requirements = (regexp("*.cse.oar.net",
> FileSystemDomain, "i")) && (Arch 
> == "X86_64") && (OpSys == "LINUX") && (Disk >=
> DiskUsage) && ((Memory * 
> 1024) >= ImageSize)
> 
> The requested resources for the job are available:-
> 
> [vijayskumar@bm-login ~]$ condor_status -const
> "regexp(\".cse.oar.net\", FileSystemDomain)"
> 
> Name               OpSys      Arch   State    
> Activity LoadAv Mem   ActvtyTime
> 
> slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle 
>    0.000  2009  0+00:50:04
> slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle 
>    0.000  2009  0+00:50:04
> slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle 
>    0.000  2009  0+00:50:04
> slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle 
>    0.000  2009  0+00:50:04
> 
> [vijayskumar@bm-login ~]$ condor_status -long | grep
> FileSys | grep cse
> FileSystemDomain = "cs41.cse.oar.net"
> FileSystemDomain = "cs42.cse.oar.net"
> FileSystemDomain = "cs43.cse.oar.net"
> FileSystemDomain = "cs44.cse.oar.net"
> 
> However, a match never transpires, and the requests
> by the job keep
> getting rejected. (Here, 84544 is the jobID of the
> job that does not
> complete).
> 
> [vijayskumar@bm-login ~] condor_q -better-analyze
>
------------------------------------------------------------------
> 84544.000:  Run analysis summary.  Of 20 machines,
>       20 are rejected by your job's requirements
>        0 reject your job because of their own
> requirements
>        0 match but are serving users with a better
> priority in the pool
>        0 match but reject the job for unknown
> reasons
>        0 match but will not currently preempt their
> existing job
>        0 are available to run your job
> 
> WARNING:  Be advised:
>     No resources matched request's constraints
> 
> The Requirements expression for your job is:
> 
> ( regexp("*.cse.oar.net", FileSystemDomain, "i") )
> && ( target.Arch == 
> "X86_64" ) &&
> ( target.OpSys == "LINUX" ) && ( target.Disk >=
> DiskUsage ) &&
> ( ( target.Memory * 1024 ) >= ImageSize )
> 
> Job ClassAd Requirements expression evaluates to
> false
>
----------------------------------------------------------------------
> 
> Why does a match not occur? Is there something wrong
> with the regular
> expression in the job classAd? Any help is
> appreciated.
> 
> Thanks for your time,
> 
> -Vijay
> 
> PS: here is the complete classAd for the job that
> just refuses to get
> executed:
> 
> MyType = "Job"
> TargetType = "Machine"
> ClusterId = 84544
> QDate = 1227117554
> CompletionDate = 0
> Owner = "vijayskumar"
> RemoteWallClockTime = 0.000000
> LocalUserCpu = 0.000000
> LocalSysCpu = 0.000000
> RemoteUserCpu = 0.000000
> RemoteSysCpu = 0.000000
> ExitStatus = 0
> NumCkpts_RAW = 0
> NumCkpts = 0
> NumJobStarts = 0
> NumRestarts = 0
> NumSystemHolds = 0
> CommittedTime = 0
> TotalSuspensions = 0
> LastSuspensionTime = 0
> CumulativeSuspensionTime = 0
> ExitBySignal = FALSE
> CondorVersion = "$CondorVersion: 7.0.1 Feb 26 2008
> BuildID: 76180 $"
> CondorPlatform = "$CondorPlatform:
> X86_64-LINUX_RHEL3 $"
> RootDir = "/"
> Iwd = 
>
"/home/vijayskumar/pegasusrun/vijayskumar/pegasus/Template_P10runC1/run0001"
> JobUniverse = 5
> TransferExecutable = FALSE
> Cmd =
>
"/home/vijayskumar/installed/pegasus/default/bin/kickstart"
> MinHosts = 1
> MaxHosts = 1
> CurrentHosts = 0
> WantRemoteSyscalls = FALSE
> WantCheckpoint = FALSE
> JobStatus = 1
> EnteredCurrentStatus = 1227117554
> JobPrio = 0
> User = "vijayskumar@.oar.net"
> NiceUser = FALSE
> EnvDelim = ";"
> JobNotification = 0
> WantRemoteIO = TRUE
> UserLog = "/tmp/Template_P10runC1-053166.log"
> CoreSize = 0
> KillSig = "SIGTERM"
> Rank = 0.000000
> In = "/dev/null"
> TransferIn = FALSE
> Out = 
>
"/home/vijayskumar/pegasusrun/vijayskumar/pegasus/Template_P10runC1/run0001/Template_P10runC1_0_cseri_cdir.out"
> StreamOut = FALSE
> Err = 
>
"/home/vijayskumar/pegasusrun/vijayskumar/pegasus/Template_P10runC1/run0001/Template_P10runC1_0_cseri_cdir.err"
> StreamErr = FALSE
> BufferSize = 524288
> BufferBlockSize = 32768
> ShouldTransferFiles = "NO"
> TransferFiles = "NEVER"
> ImageSize_RAW = 172
> ImageSize = 175
> ExecutableSize_RAW = 172
> ExecutableSize = 175
> DiskUsage_RAW = 172
> DiskUsage = 175
> Requirements = (regexp("*.cse.oar.net",
> FileSystemDomain, "i")) && (Arch 
> == "X86_64") && (OpSys == "LINUX") && (Disk >= DiskU
> sage) && ((Memory * 1024) >= ImageSize)
> FileSystemDomain = ".bmi.oar.net"
> JobLeaseDuration = 1200
> PeriodicHold = FALSE
> PeriodicRelease = (NumSystemHolds <= 3)
> PeriodicRemove = (NumSystemHolds > 3)
> OnExitHold = FALSE
> OnExitRemove = TRUE
> LeaveJobInQueue = FALSE
> Arguments = "-n pegasus::dirmanager -N
> pegasus::dirmanager:1.0 -R cseri -w
> /home/vijayskumar/pegasusrun/work /home/vijayskuma
> r/installed/pegasus/default/bin/dirmanager --create
> --dir
>
/home/vijayskumar/pegasusrun/work/pegasusexec/vijayskumar/pegasus/T
> emplate_P10runC1/run0001"
> DAGNodeName = "Template_P10runC1_0_cseri_cdir"
> pegasus_job_id = "Template_P10runC1_0_cseri_cdir"
> pegasus_wf_xformation = "pegasus::dirmanager"
> pegasus_site = "cseri"
> pegasus_generator = "Pegasus"
> 
=== 以下のメッセージは省略されました ===