[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] "Job has not yet been considered by the matchmaker' When trying to submit to two machines.



Hi Oren,

I saw in your attachment that you have the DedicatedScheduler set to ubu3 on the machine that you took that profile from. Is the DedicatedScheduler also set to ubu3 on your ubu2 machine? In other words, is this the same on all of your machines:

DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

All of your execute machines should have the same value for DedicatedScheduler if you are using parallel universe.

Jason Patton

On Sun, Aug 12, 2018 at 6:18 AM Oren Shani <oshani@xxxxxxxxxx> wrote:
Hi All,
Â
Before, I will explain the problem I encountered, let me tell you that I worked with Condor, more than 25 years ago, and I am really happy to see that it is still around and how it has grown so nicely, anyways, since that was so long ago, I am actually starting now from scratch, so consider me a newbie.
Â
OK and now to my problem
Â
I have a Condor cluster with two machines. Ubu2 and ubu3. They are identical VMware VMâs running Ubuntu 16.04, each with 4 cores ( I add some commands output below, with the details of what I try. Also see attached my condor-profile.txt)
Â
Ubu3 is the master and I am only submitting jobs from it. The problem is that as long as I specify machine count <= 4 , all works well. But if I try to specify a higher machine count, so that ubu2âs cores will be needed, the job just remains idle, and as far as I can tell, condor never attempts to run it.
Â
I skimmed trough mailing list posts and various documentation sources and I couldnât find anything that could help. I am sure I am missing something in the configuration but I canât figure out what.
Â
Please help!
Â
Thanks,
Â
Oren
Â
Â
Â
Â
oren@shilo-ubu3:~/condor$ condor_q
Â
Â
-- Schedd: shilo-ubu3.vi-seem.iucc.ac.il : <128.139.196.113:9618?... @ 08/12/18 13:37:48
OWNER BATCH_NAMEÂÂÂÂÂ SUBMITTEDÂÂ DONEÂÂ RUNÂÂÂ IDLEÂÂ HOLDÂ TOTAL JOB_IDS
Â
0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
oren@shilo-ubu3:~/condor$ condor_status
NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ OpSysÂÂÂÂÂ ArchÂÂ StateÂÂÂÂ Activity LoadAv MemÂÂ ActvtyTime
Â
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 4004Â 3+23:05:06
slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 4004Â 3+23:05:32
slot3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 4004Â 3+23:05:33
slot4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 4004Â 3+23:05:34
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1990Â 0+03:11:30
slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1990Â 0+03:11:30
slot3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1990Â 0+03:11:30
slot4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1990Â 0+03:11:30
Â
 Total Owner Claimed Unclaimed Matched Preempting Backfill Drain
Â
ÂÂÂÂÂÂÂ X86_64/LINUXÂÂÂÂ 8ÂÂÂÂ 0ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂ 8ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂ 0
Â
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ TotalÂÂÂÂ 8ÂÂÂÂ 0ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂ 8ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂ 0
Â
Â
oren@shilo-ubu3:~/condor$ cat sleepp.sub
universe = parallel
executable = sleep.sh
log = logfile
output = outfile.$(Node)
error = errfile.$(Node)
machine_count = 5
request_cpus = 1
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
queue
Â
oren@shilo-ubu3:~/condor$ condor_submit sleepp.sub
Submitting job(s).
1 job(s) submitted to cluster 68.
oren@shilo-ubu3:~/condor$ condor_q
Â
Â
-- Schedd: shilo-ubu3.vi-seem.iucc.ac.il : <128.139.196.113:9618?... @ 08/12/18 13:40:07
OWNERÂÂÂ BATCH_NAMEÂÂÂÂÂÂ SUBMITTEDÂÂ DONEÂÂ RUNÂÂÂ IDLEÂ TOTAL JOB_IDS
orenÂÂÂÂ CMD: sleep.shÂÂ 8/12 13:40ÂÂÂÂÂ _ÂÂÂÂÂ _ÂÂÂÂÂ 1ÂÂÂÂÂ 1 68.0
Â
1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
oren@shilo-ubu3:~/condor$ condr_q -better-analyze
-bash: condr_q: command not found
Â
Â
oren@shilo-ubu3:~/condor$ condor_q -better-analyze
Â
Â
The Requirements _expression_ for job 68.000 is
Â
ÂÂÂ ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
ÂÂÂ ( TARGET.HasFileTransfer )
Â
Job 68.000 defines the following attributes:
Â
ÂÂÂ DiskUsage = 1
ÂÂÂ ImageSize = 1
ÂÂÂ RequestDisk = DiskUsage
ÂÂÂ RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
Â
The Requirements _expression_ for job 68.000 reduces to these conditions:
Â
ÂÂÂÂÂÂÂÂ Slots
Step Matched Condition
-----Â --------Â ---------
[0]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.Arch == "X86_64"
[1]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.OpSys == "LINUX"
[3]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.Disk >= RequestDisk
[5]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.Memory >= RequestMemory
[7]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.HasFileTransfer
Â
Â
068.000:Â Job has not yet been considered by the matchmaker.
Â
Â
068.000: Run analysis summary ignoring user priority. Of 8 machines,
ÂÂÂÂÂ 0 are rejected by your job's requirements
ÂÂÂÂÂ 0 reject your job because of their own requirements
ÂÂÂÂÂ 0 match and are already running your jobs
ÂÂÂÂÂ 0 match but are serving other users
ÂÂÂÂÂ 8 are available to run your job
Â
Â
Â
Â
Â
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/