[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] New to ht condor and have basic questions



When adding the request_cpus = 4 to the submit file,
the requierement are not matched :

C:\Data\test_condor>condor_submit test_condor.sub
Submitting job(s)........
8 job(s) submitted to cluster 40.

C:\Data\test_condor>condor_q


-- Schedd: LUTECE : <192.168.1.181:60384?...
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  40.0   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D
  40.1   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D
  40.2   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D
  40.3   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D
  40.4   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D
  40.5   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D
  40.6   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D
  40.7   Mathieu         1/13 17:56   0+00:00:00 I  0   1.0  laszip.exe
-i C:\D

8 jobs; 0 completed, 0 removed, 8 idle, 0 running, 0 held, 0 suspended

C:\Data\test_condor>condor_q -analyze 40.0


-- Schedd: LUTECE : <192.168.1.181:60384?...
User priority for Mathieu@LUTECE is not available, attempting to analyze
without it.
---
040.000:  Run analysis summary.  Of 8 machines,
      8 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      0 are available to run your job

WARNING:  Be advised:
   No resources matched request's constraints

The Requirements expression for your job is:

    ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "WINDOWS" ) &&
    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
    ( TARGET.Cpus >= RequestCpus ) &&
    ( TARGET.FileSystemDomain == MY.FileSystemDomain )


Suggestions:

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( TARGET.Cpus >= 4 )              0                   MODIFY TO 1
2   ( TARGET.Arch == "X86_64" )       8
3   ( TARGET.OpSys == "WINDOWS" )     8
4   ( TARGET.Disk >= 1000 )           8
5   ( TARGET.Memory >= ifthenelse(MemoryUsage isnt
undefined,MemoryUsage,1) )
                                      8
6   ( TARGET.FileSystemDomain == "LUTECE" )8


it seems that HTCondor sees each slot as a single CPU machine.

I'll have a look to this notion of dynamic slot.
In the meanwhile, I think I'm facing more or less the same kind off
issue as the one described in the thread :

[HTCondor-users] Numerous, short jobs using HTCondor by Matthew Hinton

the jobs i'm submitting have a pretty short execution time (but there
may be a lot of them) and the rate at which they enter "into production"
(sorry if i'm missing some official HTCondor vocab yet)
is too slow.

I've been trying to play around with the following variables

JOB_START_COUNT
NEGOTIATOR_INTERVAL
NEGOTIATOR_UPDATE_INTERVAL
NEGOTIATOR_CYCLE_DELAY
SHADOW_WORKLIFE
STARTER_UPDATE_INTERVAL
MASTER_UPDATE_INTERVAL
UPDATE_INTERVAL
SCHEDD_INTERVAL
MAX_NEXT_JOB_START_DELAY
ALIVE_INTERVAL

but did not get much success...

those two lines (not sure which one does the right job)

SHADOW_RENICE_INCREMENT = 0
JOB_RENICE_INCREMENT = 0

at least solve the prority of the effective worker process that does the
computation in the very end.

Regards,

Mathieu

Le 13/01/2016 15:50, Rich Pieri a écrit :

> I think you need to specify how many processors each job wants in your
> submission file. For example:
> 
>   request_cpus = 4
> 
> will request 4 processors (cores/threads). The default is 1. You
> typically need to request CPUs equal to the number of threads a
> multi-threaded program will create. If this works then you should look
> into dynamic slots:
> https://research.cs.wisc.edu/htcondor/CondorWeek2012/presentations/thain-dynamic-slots.pdf
> 
> Alternatively, you can try configuring the program to run
> single-threaded and keep the default static slots.
> 


-- 
tel : +33 (0)6 87 30 83 59