[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Question about cluster setup



The condor_store_cred command should be run as root and the /etc/condor/password.d directory should exist. You can just copy the /etc/condor/password.d/POOL file to other nodes.

Let me know if that works for you.

Having a preconfigured read only execute node image should work fine. The only places that HTCondor would be writing into would be in the /var partition (/var/lib/condor, /var/log/condor, and /var/spool/condor). The job will should have read/write access to it own directory in /var/lib/condor/execute.

New nodes contact the collector on the central manager machine. The condor_status command will list the available slots.

...Tim

On 12/2/19 12:51 PM, Sunghyun Park wrote:
Hi, Tim. Thank you so much for the kind and swift response. 

I have some follow-up questions.
  • Security setup: I add a config for security under /etc/condor/config.d/ and ran condor_store_cred add -c. However, I'm getting the following error message:
    • Operation failed.
          Make sure you have CONFIG access to the target Master.
      Do I need to install/config something before this step? I went through the previous threads, but couldn't find any useful information.

  • In my cluster environment, all execute nodes pull a single Ubuntu OS image in READ-ONLY mode (PXE mode). 
    • In this case, how should I configure each node? I'm thinking to install condor and configure it as "execute" in the common OS image. However, if this is a right way to go, I'm not sure how condor can identify each node. (I want to pinpoint each node to offload a certain task later on)
    • I'm wondering how the execution occurs if a task entails file-write operation. (each execute node will pull OS image in READ-ONLY mode.)

Appreciate for your kind answers to this newbie questions. 
Since it is my first time with cluster environment, I'm trying to understand step-by-step. 


On Mon, Dec 2, 2019 at 7:59 AM Tim Theisen <tim@xxxxxxxxxxx> wrote:

Answers below in-line.

On 11/30/19 3:41 PM, Sunghyun Park wrote:
Hi, Tim. 
Thank you so much for the reference. 
I'm looking at the slides and it seems like I need to add some config files at /etc/condor/config.d/.
After the packaged installation, my config.d/ directory is currently empty. 

Slides say that I need to add the following files:
NOTE: You will have to fill in the host name of your central manager.
  • /etc/condor/config.d/51-role-cm (one node)
    • use ROLE: CentralManager
  • /etc/condor/config.d/51-role-submit (some set of nodes)
    • use ROLE: Submit
  • /etc/condor/config.d/51-role-exec (some set of nodes)
    • use ROLE: Execute

Don't forget the security configuration file on the next page. You will also have to change the host names mentioned in that file as well. You also need to store the pool password on each node. You can create it condor_store_cred command. Either run the command on each node, or copy the /etc/condor/password.d/POOL file to each node.
I understand what it is trying to do but cannot fully follow the details. Here are my questions.
  • Should I use the exactly same file names or certain naming convention? If so, where does the numbers (49, 51, ..) come from?
The numbers examples. They merely determine the order that the files are interpreted. Convention has them numbered from 00 to 99. Stay away from files that begin with 0, those are often used by the HTCondor team or distributions. The idea is that you can override a setting that is processed after an earlier file.
  • Are those only contents inside each file? If so, I want to make sure that each file will have a single line of config.
In the example, these file have a single line. It is not important that they have a single line, For example, you may want to add configuration to a specific type of node in that file.
  • Is this the only setup to start with condor? I'm wondering if there is an additional step before I can submit jobs at the cm/submit node to execute node.
Once you have established the configuration, you can start the HTCondor daemons and submit jobs. It is a good place to start.
  • I'm also curious if we can specify "execute" node we want to use when we submit the task. I went through the documents, but couldn't find anything relevant yet.
Using the powerful ClassAd mechanism you can specify the requirements of the job. Usually, the requirements are general, such as the memory and disk required. However, you can make the requirements so specific that that job would only match a single machine.
Thank you so much for answering these baby-step questions. 
Have a nice weekend!


On Fri, Nov 29, 2019 at 2:05 PM Tim Theisen <tim@xxxxxxxxxxx> wrote:

Hello Sung,

I am sorry that our manuals need more work on installation. The condor_configure command is for use in the tarball installations. With the packaged installations, you need to add a few configuration files.

I presented a talk at HTCondor Week last May that has a helpful example. Take a look at the example "3 node pool" in my presentation. If you have further questions, please ask.

Here is a link to the talk:

    https://agenda.hep.wisc.edu/event/1325/session/16/contribution/41

Let me know how it goes.

Regards, ...Tim

On 11/27/19 3:04 PM, Sunghyun Park wrote:
Hi, all. I'm a newbie who is trying to install condor at the cluster environment.
Since I'm not familiar with cluster setup, I'm having trouble to install/configure condor.
I'm trying to have a separate machine that has a role of "central manger" and "submit" while having a multiple slave nodes that only "execute".
My machines have ubuntu 18.04 so I successfully installed ubuntu package as the installation using tarball wasn't recommended on the online doc. https://htcondor.readthedocs.io/en/v8_8_6/admin-manual/installation-startup-shutdown-reconfiguration.html#unix-installation-from-a-repository

Here's the steps I think I should follow:
  1. Install the package on all machines ( a machine for "manager/submit" + all slave machines for "execute")
  2. Use condor_configure to configure the type of each machine.
Is this the right approach?
I'm having errors at the second step providing right directories for release/local/install ...

Any suggestion/advice will be greatly helpful.
Thank you!
--
Best, Sung

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736


--
Best, Sung
-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736


--
Best, Sung
-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736