[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] "Job has not yet been considered by the matchmaker' When trying to submit to two machines.



Hi Jason,

 

Thank you for your reply.

 

I tried making the values equal both by defining DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx" in both machines and eliminating this line, and neither seem to help.

 

Best,

 

Oren

 

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Jason Patton
Sent: Tuesday, August 14, 2018 12:43 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] "Job has not yet been considered by the matchmaker' When trying to submit to two machines.

 

Hi Oren,

 

I saw in your attachment that you have the DedicatedScheduler set to ubu3 on the machine that you took that profile from. Is the DedicatedScheduler also set to ubu3 on your ubu2 machine? In other words, is this the same on all of your machines:

 

 

All of your execute machines should have the same value for DedicatedScheduler if you are using parallel universe.

 

Jason Patton

 

On Sun, Aug 12, 2018 at 6:18 AM Oren Shani <oshani@xxxxxxxxxx> wrote:

Hi All,

 

Before, I will explain the problem I encountered, let me tell you that I worked with Condor, more than 25 years ago, and I am really happy to see that it is still around and how it has grown so nicely, anyways, since that was so long ago, I am actually starting now from scratch, so consider me a newbie.

 

OK and now to my problem

 

I have a Condor cluster with two machines. Ubu2 and ubu3. They are identical VMware VMâs running Ubuntu 16.04, each with 4 cores ( I add some commands output below, with the details of what I try. Also see attached my condor-profile.txt)

 

Ubu3 is the master and I am only submitting jobs from it. The problem is that as long as I specify machine count <= 4 , all works well. But if I try to specify a higher machine count, so that ubu2âs cores will be needed, the job just remains idle, and as far as I can tell, condor never attempts to run it.

 

I skimmed trough mailing list posts and various documentation sources and I couldnât find anything that could help. I am sure I am missing something in the configuration but I canât figure out what.

 

Please help!

 

Thanks,

 

Oren

 

 

 

 

oren@shilo-ubu3:~/condor$ condor_q

 

 

-- Schedd: shilo-ubu3.vi-seem.iucc.ac.il : <128.139.196.113:9618?... @ 08/12/18 13:37:48

OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

 

0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

oren@shilo-ubu3:~/condor$ condor_status

Name                                OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

 

slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 4004  3+23:05:06

slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 4004  3+23:05:32

slot3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 4004  3+23:05:33

slot4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 4004  3+23:05:34

slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1990  0+03:11:30

slot2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1990  0+03:11:30

slot3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1990  0+03:11:30

slot4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1990  0+03:11:30

 

                     Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain

 

        X86_64/LINUX     8     0       0         8       0          0        0      0

 

               Total     8     0       0         8       0          0        0      0

 

 

oren@shilo-ubu3:~/condor$ cat sleepp.sub

universe = parallel

executable = sleep.sh

log = logfile

output = outfile.$(Node)

error = errfile.$(Node)

machine_count = 5

request_cpus = 1

should_transfer_files = Yes

when_to_transfer_output = ON_EXIT

queue

 

oren@shilo-ubu3:~/condor$ condor_submit sleepp.sub

Submitting job(s).

1 job(s) submitted to cluster 68.

oren@shilo-ubu3:~/condor$ condor_q

 

 

-- Schedd: shilo-ubu3.vi-seem.iucc.ac.il : <128.139.196.113:9618?... @ 08/12/18 13:40:07

OWNER    BATCH_NAME       SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS

oren     CMD: sleep.sh   8/12 13:40      _      _      1      1 68.0

 

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

oren@shilo-ubu3:~/condor$ condr_q -better-analyze

-bash: condr_q: command not found

 

 

oren@shilo-ubu3:~/condor$ condor_q -better-analyze

 

 

The Requirements _expression_ for job 68.000 is

 

    ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&

    ( TARGET.HasFileTransfer )

 

Job 68.000 defines the following attributes:

 

    DiskUsage = 1

    ImageSize = 1

    RequestDisk = DiskUsage

    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)

 

The Requirements _expression_ for job 68.000 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[0]           8  TARGET.Arch == "X86_64"

[1]           8  TARGET.OpSys == "LINUX"

[3]           8  TARGET.Disk >= RequestDisk

[5]           8  TARGET.Memory >= RequestMemory

[7]           8  TARGET.HasFileTransfer

 

 

068.000:  Job has not yet been considered by the matchmaker.

 

 

068.000:  Run analysis summary ignoring user priority.  Of 8 machines,

      0 are rejected by your job's requirements

      0 reject your job because of their own requirements

      0 match and are already running your jobs

      0 match but are serving other users

      8 are available to run your job

 

 

 

 

 

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/