[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel environment



On 6/3/2016 10:30 AM, Francesca Maccarone wrote:
Thanks Michael,
I took your advice, and I added these lines in all files
/etc/condor/config.d/00debconf :

DedicatedScheduler = "DedicatedScheduler@Master"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler


The above is not sufficient.... the manual in section 3.12.8 goes on to say, for instance, your startd config needs to have a RANK expression preferring your dedicated scheduler. Suggest you read all of section 3.12.8 :).

Also, I think the condor_config.local.dedicated.resource example config file could be very helpful to you; it is a template of the config knobs to add to support parallel universe with lots of comments. Probably easier to follow than the manual. If you installed via the RPM, you typically find the examples in /usr/share/doc/condor-X.X.X/examples. For your convenience, here is a URL link to it : https://is.gd/plLPVn

Finally, if you happen to be using partitionable slots on your execute nodes (if you don't know what this is, you are not using them and can ignore this), you will need to set ALLOW_PSLOT_PREEMPTION=True.

Hope the above helps
Todd


The problem is when I try to run my jobs, these remain idle. The log
file contains only :

Job submitted from host 192.168.56.101

Only the first job is submitted, while the rest isn't submitted. I don't
understand where is the problem.

Thanks in advance

2016-06-02 20:14 GMT+02:00 Michael V Pelletier
<Michael.V.Pelletier@xxxxxxxxxxxx
<mailto:Michael.V.Pelletier@xxxxxxxxxxxx>>:

    From: Francesca Maccarone <dike991@xxxxxxxxx
    <mailto:dike991@xxxxxxxxx>>
    Date: 06/02/2016 11:07 AM

    > The problem is all jobs of the queue remain idle and they are never
    > executed. Because I want to run my job in parallel: what changes should  I
    > make to get the desired behavior ?

    Ciao, Francesca,

    Take a look at section 3.12.8 of the 8.4.6 manual. In order for a
    machine
    to match a parallel universe job, it must be advertising the
    "DedicatedScheduler" attribute which is set in the configuration and
    pushed to the machine ad using the STARTD_ATTRS config.

    Once this is set up correctly, you should be good to go. The idea here
    is that parallel jobs cannot tolerate having any one of the
    parallel processes on any of the machines being terminated
    unexpectedly,
    so machines set up in this way are presumed to prevent eviction and
    thus be safe for parallel universe submissions.

             -Michael Pelletier.
    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
    <mailto:htcondor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685