[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] reject your job because of their own requirements



Hi,
I have installed Condor 7.4.2 on ubuntu 9.10 central manager and one worker node.
I have set ALLOW_READ and ALLOW_WRITE = * .
condor_status report my worker node:
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

ip-10-195-111-210. LINUX      INTEL  Owner     Idle     0.000  1714  0+00:48:04
                     Total Owner Claimed Unclaimed Matched Preempting Backfill
         INTEL/LINUX     1     1       0         0       0          0        0
               Total     1     1       0         0       0          0        0

So when I submit my job it is rejected:
condor_q -better-analyze 2.0
-- Submitter: ip-10-245-195-79.ec2.internal : <10.245.195.79:39729> : ip-10-245-195-79.ec2.internal
---
002.000:  Run analysis summary.  Of 1 machines,
      0 are rejected by your job's requirements
      1 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 match but are currently offline
      0 are available to run your job
    No successful match recorded.
    Last failed match: Tue Apr 20 10:23:05 2010
    Reason for last match failure: no match found

WARNING:  Be advised:   Request 2.0 did not match any resource's constraints



This is condor_q -l output:
condor/condor-7.4.2/bin/condor_q -l


-- Submitter: ip-10-245-195-79.ec2.internal : <10.245.195.79:39729> : ip-10-245-195-79.ec2.internal
ClusterId = 1
QDate = 1271757620
CompletionDate = 0
Owner = "ubuntu"
RemoteWallClockTime = 0.000000
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteUserCpu = 0.000000
RemoteSysCpu = 0.000000
ExitStatus = 0
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
LastSuspensionTime = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_DEBIAN50 $"
RootDir = "/"
Iwd = "/home/ubuntu"
JobUniverse = 5
Cmd = "/bin/hostname"
MinHosts = 1
MaxHosts = 1
CurrentHosts = 0
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
RequestCpus = 1
EnteredCurrentStatus = 1271757620
JobPrio = 0
User = "ubuntu@xxxxxxxxxxxx"
NiceUser = FALSE
Environment = ""
JobNotification = 2
WantRemoteIO = TRUE
UserLog = "/home/ubuntu/hname.log"
CoreSize = 0
KillSig = "SIGTERM"
Rank = 0.000000
In = "/dev/null"
TransferIn = FALSE
Out = "hnameout.log"
StreamOut = FALSE
Err = "/dev/null"
TransferErr = FALSE
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "NO"
NeverCreateJobSandbox = TRUE
TransferFiles = "NEVER"
ImageSize_RAW = 10
ImageSize = 10
ExecutableSize_RAW = 10
ExecutableSize = 10
DiskUsage_RAW = 10
DiskUsage = 10
RequestMemory = ceiling(ifThenElse(JobVMMemory =!= UNDEFINED, JobVMMemory, ImageSize / 1024.000000))
RequestDisk = DiskUsage
Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && (Disk >= DiskUsage) && (((Memory * 1024) >= ImageSize) && ((RequestMemory * 1024) >= ImageSize)) && (TARGET.FileSystemDomain == MY.FileSystemDomain)
FileSystemDomain = "ec2.internal"
JobLeaseDuration = 1200
PeriodicHold = FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
>>LeaveJobInQueue = FALSE
Args = "-f"
GlobalJobId = "ip-10-245-195-79.ec2.internal#1.0#1271757620"
LastJobStatus = 0
JobStatus = 1
ProcId = 0
AutoClusterId = 0
AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,DiskUsage,ImageSize,RequestMemory,FileSystemDomain,Requirements,NiceUser,ConcurrencyLimits"
WantMatchDiagnostics = TRUE
LastRejMatchReason = "no match found"
LastRejMatchTime = 1271758280
ServerTime = 1271758294

This is SchedLog:
04/20 09:54:57 (pid:2977) ******************************************************
04/20 09:54:57 (pid:2977) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
04/20 09:54:57 (pid:2977) ** /home/ubuntu/condor/condor-7.4.2/sbin/condor_schedd
04/20 09:54:57 (pid:2977) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
04/20 09:54:57 (pid:2977) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
04/20 09:54:57 (pid:2977) ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
04/20 09:54:57 (pid:2977) ** $CondorPlatform: I386-LINUX_DEBIAN50 $
04/20 09:54:57 (pid:2977) ** PID = 2977
04/20 09:54:57 (pid:2977) ** Log last touched 4/20 09:54:53
04/20 09:54:57 (pid:2977) ******************************************************
04/20 09:54:57 (pid:2977) Using config source: /home/ubuntu/condor/condor-7.4.2/etc/condor_config
04/20 09:54:57 (pid:2977) Using local config sources:
04/20 09:54:57 (pid:2977)    /home/ubuntu/condor/condor-7.4.2/local.ip-10-245-195-79/condor_config.local
04/20 09:54:57 (pid:2977) DaemonCore: Command Socket at <10.245.195.79:39729>
04/20 09:54:57 (pid:2977) History file rotation is enabled.
04/20 09:54:57 (pid:2977)   Maximum history file size is: 20971520 bytes
04/20 09:54:57 (pid:2977)   Number of rotated history files is: 2
04/20 10:00:20 (pid:2977) Sent ad to central manager for ubuntu@xxxxxxxxxxxx
04/20 10:00:20 (pid:2977) Sent ad to 1 collectors for ubuntu@xxxxxxxxxxxx
04/20 10:00:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:00:20 (pid:2977) AutoCluster:config() significant atttributes changed to JobUniverse,LastCheckpointPlatform,NumCkpts
04/20 10:00:20 (pid:2977) Checking consistency running and runnable jobs
04/20 10:00:20 (pid:2977) Tables are consistent
04/20 10:00:20 (pid:2977) Rebuilt prioritized runnable job list in 0.000s.
04/20 10:00:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:01:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:01:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:01:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:02:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:02:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:02:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:03:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:03:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:03:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:04:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:04:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:04:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:05:20 (pid:2977) Sent ad to central manager for ubuntu@xxxxxxxxxxxx
04/20 10:05:20 (pid:2977) Sent ad to 1 collectors for ubuntu@xxxxxxxxxxxx
04/20 10:05:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:05:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:05:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:06:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:06:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:06:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:07:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:07:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:07:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:08:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:08:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:08:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:09:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:09:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:09:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:10:20 (pid:2977) Sent ad to central manager for ubuntu@xxxxxxxxxxxx
04/20 10:10:20 (pid:2977) Sent ad to 1 collectors for ubuntu@xxxxxxxxxxxx
04/20 10:10:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:10:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:10:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:11:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:11:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:11:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
04/20 10:12:20 (pid:2977) Activity on stashed negotiator socket
04/20 10:12:20 (pid:2977) Negotiating for owner: ubuntu@xxxxxxxxxxxx
04/20 10:12:20 (pid:2977) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected


I think that the problem is that  ubuntu@xxxxxxxxxxxx not exist, but exist:
ubuntu@ip-10-195-111-210    worker node
ubuntu@ip-10-245-195-79      master node

What can I do?
P.S. my machine are on Amazon ec2 AMI.

Thanks in advice.