[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Beginner's question regarding HTCondor basic setup



Hello,

Glad you are trying out HTCondor.  I read through your description below, and here's what I can suggest:

The process called the "condor_schedd" is the daemon that manages a job queue, receives job submissions, and communicates with worker nodes to send and receive jobs.  This should be running on the machines you have designated "SUBMITTERS" but as you pointed out currently that daemon is not running.

I assume the condor_master daemon is running on each of these submit nodes.  Are there others? (All have the "condor_" prefix).  You should be able to edit the configuration on those machines and update the "DAEMON_LIST" setting.  Add in the name "SCHEDD" to this list, and restart condor on that machine.

(If you are going to be running jobs as the user that submitted them (as opposed to a generic user, like "nobody" on unix systems) each user will need to run the condor_store_cred command as well... but let's figure that out once we are sure the daemons are up and running properly.)


Cheers,
-zach


ïOn 8/24/20, 8:59 AM, "HTCondor-users on behalf of Finn Bastiansen" <htcondor-users-bounces@xxxxxxxxxxx on behalf of Finn.Bastiansen@xxxxxxxxx> wrote:

    Dear list members,

    this is my first attempt to use/set up HTCondor and my first post to the ML, so hello to you all :-)

    We are tring to configure a minimum HTCondor âclusterâ to get into the topic, but it seems that we are misunderstand something or made 1 to x mistakes ...

    If this was already described and solved frequently, I would be happy to be directed to a ML thread or any other source, my recent search did not lead me to something helpful.

    We are using version 8.8.9.

    You can find details of our setup below. 

    After installing HTCondor and creating a sample job I get the message:
    âERROR: Can't find address of local scheddâ

    I then saw in Task Manager that condor_schedd is not running, neither on the machine FROM which I submit the job nor the machine TO which I submit the job (central manager). In this context, does âsubmit jobsâ in the manual mean  âsubmit from a client PC to the central managerâ  OR âsubmit from central manager to the pool, i.e. to (a) client(s) executing the jobâ? Or both? Because this has implications for what box needs to be checked during setup.

    What could be the reason for this problem? Did I misunderstand something and therefore set it up incorrectly? 
    How can I solve this?

    Thank you for your time and help!
    Finn 



    ####
    Setup Details:

    Intended (and testing) Setup: 
    - 1 scheduling server (âcentral managerâ in the docs), currently a Windows 10 VM => âSCHEDULERâ
    - 3-4 desktop machines/ laptops from which jobs will be submitted (test: 1 Win10 desktop) => âSUBMITTERSâ
    - 10-20 currently unused desktop machines (dedicated to HTCondor, will not be used by humans in parallel; test: 1 laptop) as worker bees which will receive jobs from the scheduler => âWORKERSâ

    After reading the docs, we set up the 3 machines using the Windows GUI installer according to the following settings:
    - SCHEDULER: âCreate a new HTCondor Poolâ; Name of new pool: TEST; Submit jobs to HTCondor pool: Unchecked (because the docs say âGenerally jobs should not be either submitted or run on the central manager machineâ); âDo not run jobs on this machineâ.
    - SUBMITTER: âJoin existing HTCondor Poolâ, Hostname of central manager: (hostname of SCHEDULER); Submit jobs to HTCondor pool: Checked; âDo not run jobs on this machineâ.
    - WORKER: âJoin existing HTCondor Poolâ, Hostname of central manager: (hostname of SCHEDULER); Submit jobs to HTCondor pool: Unchecked; âAlways run jobs and never suspend themâ.

    I do not list the remaining setup config because I assume that it is irrelevant for the issue at hand.

    Based on this setup, I created a submission description file âexample1_submit.txtâ (which calls rscript.exe that gets the path to an R script passed as argument).

    On the submitter, I then called:
    condor_submit example1_submit.txt

    This however returns âERROR: Can't find address of local scheddâ. condor_schedd.exe is not running on the SCHEDULER nor the SUBMITTER.



       Finn Bastiansen | Effect Modelling and Statistics   
        RIFCON GmbH | GoldbeckstraÃe 13 | 69493 Hirschberg 
         T. +49 6201 84528-24 |   Fax: +49 (0)6201 8452899 
       Finn.Bastiansen@xxxxxxxxx |  MEET US!  <http://www.rifcon.de/index.php/en/meet-us> | www.rifcon.de  <http://www.rifcon.de/>




    RIFCON GmbH 
    GoldbeckstraÃe 13 - D-69493 Hirschberg 
    Amtsgericht Mannheim | HRB 433053 | Ust.IdNr. DE 814188954 
    GeschÃftsfÃhrer / Managing Directors: Dr. Michael Riffel, Juergen Riffel, Ute Terberger 

    Please think twice before you print this email ! 

    DISCLAIMER: This e-mail transmission may contain confidential or legally privileged information that is intended only for the individual or entity named in the e-mail address. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. If you have received this e-mail transmission in error, please reply to the sender, so that we can arrange for proper delivery, and then please delete the message from your system. The original of this e-mail was scanned for viruses, but you should always use your own virus-scanning software to ensure mail and attachments are safe to open. This e-mail does not constitute a consent to the use of sender's contact information for direct marketing purposes or for transfers of data to third parties.