[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RANDOM_INTEGER problems on Windows




From the code, it appears that the random number generator used in this instance is initialized using the process id of the daemon. I don't know how process ids are assigned in windows, but I'd be worried that the process id of the condor master might not be uniformly distributed. If this causes you trouble, it shouldn't be hard for someone to fix.

--Dan

On 12/4/12 9:20 AM, Smith, Ian wrote:
Hi Dan,

Thanks for that - it did the trick. Final word on the subject - how
are the random number generators seeded ? Or in other words what
are the chances that different PCs will get the same values ?

regards,

-ian.



-----Original Message-----
From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: 03 December 2012 16:55
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows

Hi Ian,

The semantics of $RANDOM_INTEGER() are not well defined with respect to
the timing of the throw of the dice.  The dice are thrown whenever
condor reads the configuration setting that contains the reference to
$RANDOM_INTEGER().  For some things in Condor, this happens once per
reconfig.  For other things, it happens much more frequently, because
condor doesn't happen to cache the value.

I had thought that a published classad attribute would only be read
from the configuration once per reconfig, but I was wrong.  As you
discovered, it happens much more frequently.

Here's an idea: set the random value in an environment variable that is
passed to the daemon when it is started.  Example:

STARTD_ENVIRONMENT = "_CONDOR_VACATE_FUZZ=$RANDOM_INTEGER(0,10)"
PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(VACATE_FUZZ) )

Unfortunately, in addition to a reconfig of the master, that will
require a restart of the startd, since the environment is only set at
startup of the daemon.  Also note that, for better or worse, all slots
in the startd will get the same value of VACATE_FUZZ.

--Dan

On 12/3/12 9:40 AM, Smith, Ian wrote:
Hi Dan,

I tried this but it still didn't seem to work and I still get
different values for $RANDOM_INTEGER() each time I do a condor_status
-direct.
I tried something like this so that it should only get one value per
"session" but that didn't seem to work either.

RAND_INT  = ifThenElse( ( MonitorSelfTime == 0     ), \
                          ( $RANDOM_INTEGER( 0, 10 ) ), \
                          0 )

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(RAND_INT) )

Obviously the following should work but my guess is that it would
favour evictions closer to $(REBOOT_TIME):

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) <= $RANDOM_INTEGER(
0, 10 ) ) && \
                      ( $(REBOOT_TIME) - ClockMin ) >= 0 )

any ideas ?

regards,

-ian.

PS The manual might be a bit clearer on what is happening behind the
scenes with
$RANDOM_INTEGER() .

-----Original Message-----
From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: 30 November 2012 16:01
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows

Hi Ian,

In the context where you are using it, I'd expect $RANDOM_INTEGER()
to be reevaluated every time the startd restarts or is told to
reread
the configuration (condor_reconfig, SIGHUP).

To make things explicit, you could put PERIODIC_VACATE into the
startd ad and make your PREEMPT expression refer to it as a classad
attribute rather than as a configuration macro.  Then you can see
the
value with condor_status.  Example of how to configure things that
way:
PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
$RANDOM_INTEGER(0,
10) )
STARTD_EXPRS = $(STARTD_EXPRS) PERIODIC_VACATE
PREEMPT         = ($(UWCS_PREEMPT)) || PERIODIC_VACATE

I don't see why your policy is not working the way you want. Perhaps
the above will help make it clear.

--Dan

On 11/30/12 5:25 AM, Smith, Ian wrote:
Hi Dan,

Thanks for the quick reply. Yes having the correct syntax certainly
helps ! I really should RTFM more carefully :-;

The strange thing is though that >this< expression never seems to
evaluate to TRUE (i.e. the jobs never get vacated).

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
$RANDOM_INTEGER(0, 10) )
                                                        ^ If I run
condor_config_val I see different integer values generated so the
big question is how often are the random values updated compared
with the ClockMin values ??? Obviously if it's just generated once
on start up then there's no problem but if the update periods are
similar then I could see why this would not work ...

Imagine for example that it is 10 minutes to reboot time and the
just a few integers are generated in the following minute: e.g 4,
8, 2, 3.
Then PERIODIC_VACATE doesn't evaluate to TRUE. By the same token on
each succeeding minute the integer needed for this to evaluate to
TRUE
may also not be generated.

I'm sure there must be a way of expressing this so that
PERIODIC_VACATE evaluates to TRUE just once a day at a randomised
time
but I can't see it at the moment.

any ideas ?

many thanks,

-ian.



-----Original Message-----
From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: 29 November 2012 15:21
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows

Hi Ian,

There should be a $ in front of RANDOM_INTEGER.  Does that help?

--Dan

On 11/29/12 6:18 AM, Smith, Ian wrote:
Hello All,

I'm trying to configure our execute hosts to vacate jobs
automatically
just before they are rebooted each night. To spread out the
checkpoints I've tried to add some "jitter" with RANDOM_INTEGER
thus:
PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
RANDOM_INTEGER(
0, 10 ) )
PREEMPT         = $(UWCS_PREEMPT) || ( $(PERIODIC_VACATE) == TRUE
)
but this does not seem to work. I can't track down a definitive
error message but it looks like the condor_startd (or possibly
condor_starter) is repeatedly failing and the shadow
disconnecting
because of this.
If I take out the randomness, e.g.

PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == 0 )

everything works fine.

Has anyone else seen this ? Is RANDOM_INTEGER supported under
Windows or does it have some /dev/random dependence ?

I'm using Condor 7.6.2 on Windows 7 Enterprise.

regards,

-ian.

---------------------------------------
Dr Ian C. Smith,
Advanced Research Computing,
University of Liverpool, UK.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-
request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-
request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/