[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor_submit: job cannot excute in the second node



Hi,

if you send a 'lot' of jobs you can check on an idle job with condor_q -better-analyze <jobid> or condor_q -analyze <jobid> there you will see the number of potential nodes this job could technically run on.

Once you discovered that it can only run on one machine you can check why it is not able to run on a specific machine using:

condor_q -better-analyze <jobid> -reverse -machine <nodename> (<nodename> needs to be fqdn here for some reason)

Best
christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Xinjie Zeng" <xinjie.zeng@xxxxxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 2. Juni 2020 05:19:45
Betreff: [HTCondor-users] Condor_submit: job cannot excute in the second node

Hi All
I have set up HTcondor in three computers by tarball and condor_config. The version is:
condor_version
$CondorVersion: 8.8.9 May 06 2020 BuildID: 503068 $
$CondorPlatform: x86_64_CentOS7 $

When I submit the jobs in central manager, which also configured as submit and execute, I found that the jobs are only executed in the central manager. They didn't execute in other two computers, It seems like that the three computers didn't connect to each other. However, when I check condor_status, I can see all three nodes in the pool. Could any one give some help?
Any advises are  appreciated!
Thank you very much!

Warm regards,
Xinjie Zeng

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/