[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RANDOM_INTEGER problems on Windows



Hi Ian,

The semantics of $RANDOM_INTEGER() are not well defined with respect to the timing of the throw of the dice. The dice are thrown whenever condor reads the configuration setting that contains the reference to $RANDOM_INTEGER(). For some things in Condor, this happens once per reconfig. For other things, it happens much more frequently, because condor doesn't happen to cache the value.

I had thought that a published classad attribute would only be read from the configuration once per reconfig, but I was wrong. As you discovered, it happens much more frequently.

Here's an idea: set the random value in an environment variable that is passed to the daemon when it is started. Example:

STARTD_ENVIRONMENT = "_CONDOR_VACATE_FUZZ=$RANDOM_INTEGER(0,10)"
PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(VACATE_FUZZ) )

Unfortunately, in addition to a reconfig of the master, that will require a restart of the startd, since the environment is only set at startup of the daemon. Also note that, for better or worse, all slots in the startd will get the same value of VACATE_FUZZ.

--Dan

On 12/3/12 9:40 AM, Smith, Ian wrote:
Hi Dan,

I tried this but it still didn't seem to work and I still get different
values for $RANDOM_INTEGER() each time I do a condor_status -direct.
I tried something like this so that it should only get one value
per "session" but that didn't seem to work either.

RAND_INT  = ifThenElse( ( MonitorSelfTime == 0     ), \
                         ( $RANDOM_INTEGER( 0, 10 ) ), \
                         0 )

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(RAND_INT) )

Obviously the following should work but my guess is that it would favour evictions
closer to $(REBOOT_TIME):

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) <= $RANDOM_INTEGER( 0, 10 ) ) && \
                     ( $(REBOOT_TIME) - ClockMin ) >= 0 )

any ideas ?

regards,

-ian.

PS The manual might be a bit clearer on what is happening behind the scenes with
$RANDOM_INTEGER() .

-----Original Message-----
From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: 30 November 2012 16:01
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows

Hi Ian,

In the context where you are using it, I'd expect $RANDOM_INTEGER() to
be reevaluated every time the startd restarts or is told to reread the
configuration (condor_reconfig, SIGHUP).

To make things explicit, you could put PERIODIC_VACATE into the startd
ad and make your PREEMPT expression refer to it as a classad attribute
rather than as a configuration macro.  Then you can see the value with
condor_status.  Example of how to configure things that way:

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $RANDOM_INTEGER(0,
10) )
STARTD_EXPRS = $(STARTD_EXPRS) PERIODIC_VACATE
PREEMPT         = ($(UWCS_PREEMPT)) || PERIODIC_VACATE

I don't see why your policy is not working the way you want. Perhaps
the above will help make it clear.

--Dan

On 11/30/12 5:25 AM, Smith, Ian wrote:
Hi Dan,

Thanks for the quick reply. Yes having the correct syntax certainly
helps ! I really should RTFM more carefully :-;

The strange thing is though that >this< expression never seems to
evaluate to TRUE (i.e. the jobs never get vacated).

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
$RANDOM_INTEGER(0, 10) )
                                                       ^ If I run
condor_config_val I see different integer values generated so the big
question is how often are the random values updated compared with the
ClockMin values ??? Obviously if it's just generated once on start up
then there's no problem but if the update periods are similar then I
could see why this would not work ...

Imagine for example that it is 10 minutes to reboot time and the just
a few integers are generated in the following minute: e.g 4, 8, 2, 3.
Then PERIODIC_VACATE doesn't evaluate to TRUE. By the same token on
each succeeding minute the integer needed for this to evaluate to
TRUE
may also not be generated.

I'm sure there must be a way of expressing this so that
PERIODIC_VACATE evaluates to TRUE just once a day at a randomised
time
but I can't see it at the moment.

any ideas ?

many thanks,

-ian.



-----Original Message-----
From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: 29 November 2012 15:21
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows

Hi Ian,

There should be a $ in front of RANDOM_INTEGER.  Does that help?

--Dan

On 11/29/12 6:18 AM, Smith, Ian wrote:
Hello All,

I'm trying to configure our execute hosts to vacate jobs
automatically
just before they are rebooted each night. To spread out the
checkpoints I've tried to add some "jitter" with RANDOM_INTEGER
thus:
PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
RANDOM_INTEGER(
0, 10 ) )
PREEMPT         = $(UWCS_PREEMPT) || ( $(PERIODIC_VACATE) == TRUE )

but this does not seem to work. I can't track down a definitive
error message but it looks like the condor_startd (or possibly
condor_starter) is repeatedly failing and the shadow disconnecting
because of this.
If I take out the randomness, e.g.

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == 0 )

everything works fine.

Has anyone else seen this ? Is RANDOM_INTEGER supported under
Windows or does it have some /dev/random dependence ?

I'm using Condor 7.6.2 on Windows 7 Enterprise.

regards,

-ian.

---------------------------------------
Dr Ian C. Smith,
Advanced Research Computing,
University of Liverpool, UK.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-
request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/