[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] rooster on linux, take 3



Hi Ian,

The HIBERNATE expression is supposed to evaluate to a string specifying the desired power state. The expression you specified below does not do that. It evaluates to a boolean true/false value.

--Dan

On 11/29/11 6:55 AM, Ian Cottam wrote:
Some more info:

I am now testing with a HIBERNATE of just:
HIBERNATE = (NO_ONE_LOGGED_IN =?= True)

and after a couple of hours with no one logged in, it is still powered up.

I copied the work here
<https://twiki.grid.iu.edu/bin/view/Tier3/CondorHawkeyeSetup>.

So I have:

STARTD_CRON_JOBLIST = NOONELOGGEDIN
STARTD_CRON_NOONELOGGEDIN_EXECUTABLE = /etc/condor/local/nooneloggedin.sh
STARTD_CRON_NOONELOGGEDIN_PERIOD = 30s
STARTD_CRON_NOONELOGGEDIN_MODE = Periodic

Is that correct for Condor v7.4.4?

nooneloggedin.sh has been tested independently (as described previously,
below).
-Ian


On 29/11/2011 09:00, "Ian Cottam"<Ian.Cottam@xxxxxxxxxxxxxxxx>  wrote:

Does anyone have a good working of "hibernation/rooster wake up" across a
reasonable sized pool (i.e. bigger than just one test PC, although even
that would be interesting)?
Condor v7.4.4 or higher (below versions were known to have problems I
believe)?
Please share configs with us if you do.

I recently added the recommended way of not hibernating if anyone was
logged in (via a "startd cron" as its sometimes called) and now the test
machines don't hibernate at all. I have tested that the script generates
NO_USER_LOGGED_IN = True
when only Condor has the PC, and test for that in the HIBERNATE
expression.

Thanks.
-Ian






On 28/11/2011 18:44, "Dan Bradley"<dan@xxxxxxxxxxxx>  wrote:


On 11/28/11 12:33 PM, Dimitri Maziuk wrote:
On 11/28/2011 09:18 AM, Dan Bradley wrote:

So the next question is how do I figure out what's up with the
negotiator?

(E.g.) with 40 cores busy and 4 cores sleeping condor_q -analyze 961082
says:

-- Submitter: minnow.bmrb.wisc.edu :
<144.92.167.254:9617?sock=13250_c2fa_3>   : minnow.bmrb.wisc.edu
---
961082.000:  Run analysis summary.  Of 44 machines,
...
        4 match but are currently offline
        0 are available to run your job
          No successful match recorded.
          Last failed match: Fri Nov 25 18:18:55 2011
          Reason for last match failure: no match found
-----------------------------------------------------

NegotiatorLog (on D_FULLDEBUG) is not very informative as to why the "4
matching but offline" cores are not a "successful match":

11/25/11 18:17:55     Sending SEND_JOB_INFO/eom
11/25/11 18:17:55     Getting reply from schedd ...
11/25/11 18:17:55     Got JOB_INFO command; getting classad/eom
11/25/11 18:17:55     Request 961082.00000:
11/25/11 18:17:55 matchmakingAlgorithm: limit 4.000000 used 0.000000
pieLeft 4.000000
11/25/11 18:17:55       Rejected 961082.0 bbee@xxxxxxxxxxxxx
<144.92.167.254:9617?sock=13250_c2fa_3>: no match found
--------------------------------------------------------



If you add D_JOB and D_MACHINE to NEGOTIATOR_DEBUG, you will get verbose
logging of every machine considered by the negotiator when trying to
match the job.  Is it even considering the offline machine?  If so, and
if it matches, I would expect the following to be logged by the
negotiator:

"Registering attempt to match offline machine<host.name>  by
<user.name>."

--Dan

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
Ian Cottam
ext. 61851
IT Services for Research
Faculty of Engineering and Physical Sciences
The University of Manchester
"The only strategy that is guaranteed to fail is not taking risks." Mark
Zuckerberg




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/