Thank you for your quick reply!
I tried to submit 66 jobs again and I check the condor_q -better-analyze 9.
The output is :
Job 9.063 defines the following attributes:
ÂÂÂ DiskUsage = 10000
ÂÂÂ FileSystemDomain = "domian.com"
ÂÂÂ ImageSize = 10000
ÂÂÂ RequestDisk = DiskUsage
ÂÂÂ RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
The Requirements _expression_ for job 9.063 reduces to these conditions:
StepÂÂÂ MatchedÂ Condition
-----Â --------Â ---------
ÂÂÂÂÂÂÂÂ 192Â TARGET.Arch == "X86_64"
ÂÂÂÂÂÂÂÂ 192Â TARGET.OpSys == "LINUX"
ÂÂÂÂÂÂÂÂ 192Â TARGET.Disk >= RequestDisk
ÂÂÂÂÂÂÂÂ 192Â TARGET.Memory >= RequestMemory
ÂÂÂÂÂÂÂÂ 192Â TARGET.HasFileTransfer
009.063:Â Job is running.
Last successful match: Tue Jun Â2 02:26:21 2020
009.063:Â Run analysis summary ignoring user priority.Â Of 192 machines,
ÂÂÂÂÂ 0 are rejected by your job's requirements
ÂÂÂÂÂ 0 reject your job because of their own requirements
ÂÂÂÂ 64 match and are already running your jobs
ÂÂÂÂÂ 0 match but are serving other users
128 are able to run your job
It seems like all works fine, but they still didnât execute in another computers. In one machine I has 64 core so they actually all run within one machine. Also, I only can see 64 jobs here but I had 66 jobs. In condor_q, I found 2 jobs were done, but I donât think it could be done in seconds/
if you send a 'lot' of jobs you can check on an idle job with condor_q -better-analyze <jobid> or condor_q -analyze <jobid> there you will see the number of potential nodes this job could technically run on.
Once you discovered that it can only run on one machine you can check why it is not able to run on a specific machine using:
condor_q -better-analyze <jobid> -reverse -machine <nodename> (<nodename> needs to be fqdn here for some reason)
Von: "Xinjie Zeng" <xinjie.zeng@xxxxxxxxxxxxxx>
I have set up HTcondor in three computers by tarball and condor_config. The version is:
When I submit the jobs in central manager, which also configured as submit and execute, I found that the jobs are only executed in the central manager. They didn't execute in other two computers, It seems like that the three computers didn't connect to each other. However, when I check condor_status, I can see all three nodes in the pool. Could any one give some help?
Any advises are appreciated!
Thank you very much!