[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RANDOM_INTEGER problems on Windows



Hi Dan,

Thanks for that - it did the trick. Final word on the subject - how
are the random number generators seeded ? Or in other words what
are the chances that different PCs will get the same values ?

regards,

-ian.



> -----Original Message-----
> From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> Sent: 03 December 2012 16:55
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
> 
> Hi Ian,
> 
> The semantics of $RANDOM_INTEGER() are not well defined with respect to
> the timing of the throw of the dice.  The dice are thrown whenever
> condor reads the configuration setting that contains the reference to
> $RANDOM_INTEGER().  For some things in Condor, this happens once per
> reconfig.  For other things, it happens much more frequently, because
> condor doesn't happen to cache the value.
> 
> I had thought that a published classad attribute would only be read
> from the configuration once per reconfig, but I was wrong.  As you
> discovered, it happens much more frequently.
> 
> Here's an idea: set the random value in an environment variable that is
> passed to the daemon when it is started.  Example:
> 
> STARTD_ENVIRONMENT = "_CONDOR_VACATE_FUZZ=$RANDOM_INTEGER(0,10)"
> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(VACATE_FUZZ) )
> 
> Unfortunately, in addition to a reconfig of the master, that will
> require a restart of the startd, since the environment is only set at
> startup of the daemon.  Also note that, for better or worse, all slots
> in the startd will get the same value of VACATE_FUZZ.
> 
> --Dan
> 
> On 12/3/12 9:40 AM, Smith, Ian wrote:
> > Hi Dan,
> >
> > I tried this but it still didn't seem to work and I still get
> > different values for $RANDOM_INTEGER() each time I do a condor_status
> -direct.
> > I tried something like this so that it should only get one value per
> > "session" but that didn't seem to work either.
> >
> > RAND_INT  = ifThenElse( ( MonitorSelfTime == 0     ), \
> >                          ( $RANDOM_INTEGER( 0, 10 ) ), \
> >                          0 )
> >
> > PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(RAND_INT) )
> >
> > Obviously the following should work but my guess is that it would
> > favour evictions closer to $(REBOOT_TIME):
> >
> > PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) <= $RANDOM_INTEGER(
> 0, 10 ) ) && \
> >                      ( $(REBOOT_TIME) - ClockMin ) >= 0 )
> >
> > any ideas ?
> >
> > regards,
> >
> > -ian.
> >
> > PS The manual might be a bit clearer on what is happening behind the
> > scenes with
> > $RANDOM_INTEGER() .
> >
> >> -----Original Message-----
> >> From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
> >> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> >> Sent: 30 November 2012 16:01
> >> To: htcondor-users@xxxxxxxxxxx
> >> Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
> >>
> >> Hi Ian,
> >>
> >> In the context where you are using it, I'd expect $RANDOM_INTEGER()
> >> to be reevaluated every time the startd restarts or is told to
> reread
> >> the configuration (condor_reconfig, SIGHUP).
> >>
> >> To make things explicit, you could put PERIODIC_VACATE into the
> >> startd ad and make your PREEMPT expression refer to it as a classad
> >> attribute rather than as a configuration macro.  Then you can see
> the
> >> value with condor_status.  Example of how to configure things that
> way:
> >>
> >> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
> >> $RANDOM_INTEGER(0,
> >> 10) )
> >> STARTD_EXPRS = $(STARTD_EXPRS) PERIODIC_VACATE
> >> PREEMPT         = ($(UWCS_PREEMPT)) || PERIODIC_VACATE
> >>
> >> I don't see why your policy is not working the way you want. Perhaps
> >> the above will help make it clear.
> >>
> >> --Dan
> >>
> >> On 11/30/12 5:25 AM, Smith, Ian wrote:
> >>> Hi Dan,
> >>>
> >>> Thanks for the quick reply. Yes having the correct syntax certainly
> >>> helps ! I really should RTFM more carefully :-;
> >>>
> >>> The strange thing is though that >this< expression never seems to
> >>> evaluate to TRUE (i.e. the jobs never get vacated).
> >>>
> >>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
> >> $RANDOM_INTEGER(0, 10) )
> >>>                                                        ^ If I run
> >>> condor_config_val I see different integer values generated so the
> >>> big question is how often are the random values updated compared
> >>> with the ClockMin values ??? Obviously if it's just generated once
> >>> on start up then there's no problem but if the update periods are
> >>> similar then I could see why this would not work ...
> >>>
> >>> Imagine for example that it is 10 minutes to reboot time and the
> >>> just a few integers are generated in the following minute: e.g 4,
> 8, 2, 3.
> >>> Then PERIODIC_VACATE doesn't evaluate to TRUE. By the same token on
> >>> each succeeding minute the integer needed for this to evaluate to
> >> TRUE
> >>> may also not be generated.
> >>>
> >>> I'm sure there must be a way of expressing this so that
> >>> PERIODIC_VACATE evaluates to TRUE just once a day at a randomised
> >> time
> >>> but I can't see it at the moment.
> >>>
> >>> any ideas ?
> >>>
> >>> many thanks,
> >>>
> >>> -ian.
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
> >>>> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> >>>> Sent: 29 November 2012 15:21
> >>>> To: htcondor-users@xxxxxxxxxxx
> >>>> Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
> >>>>
> >>>> Hi Ian,
> >>>>
> >>>> There should be a $ in front of RANDOM_INTEGER.  Does that help?
> >>>>
> >>>> --Dan
> >>>>
> >>>> On 11/29/12 6:18 AM, Smith, Ian wrote:
> >>>>> Hello All,
> >>>>>
> >>>>> I'm trying to configure our execute hosts to vacate jobs
> >>>> automatically
> >>>>> just before they are rebooted each night. To spread out the
> >>>>> checkpoints I've tried to add some "jitter" with RANDOM_INTEGER
> >> thus:
> >>>>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
> >> RANDOM_INTEGER(
> >>>> 0, 10 ) )
> >>>>> PREEMPT         = $(UWCS_PREEMPT) || ( $(PERIODIC_VACATE) == TRUE
> )
> >>>>>
> >>>>> but this does not seem to work. I can't track down a definitive
> >>>>> error message but it looks like the condor_startd (or possibly
> >>>>> condor_starter) is repeatedly failing and the shadow
> disconnecting
> >>>> because of this.
> >>>>> If I take out the randomness, e.g.
> >>>>>
> >>>>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == 0 )
> >>>>>
> >>>>> everything works fine.
> >>>>>
> >>>>> Has anyone else seen this ? Is RANDOM_INTEGER supported under
> >>>>> Windows or does it have some /dev/random dependence ?
> >>>>>
> >>>>> I'm using Condor 7.6.2 on Windows 7 Enterprise.
> >>>>>
> >>>>> regards,
> >>>>>
> >>>>> -ian.
> >>>>>
> >>>>> ---------------------------------------
> >>>>> Dr Ian C. Smith,
> >>>>> Advanced Research Computing,
> >>>>> University of Liverpool, UK.
> >>>>> _______________________________________________
> >>>>> HTCondor-users mailing list
> >>>>> To unsubscribe, send a message to htcondor-users-
> >> request@xxxxxxxxxxx
> >>>>> with a
> >>>>> subject: Unsubscribe
> >>>>> You can also unsubscribe by visiting
> >>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>>>
> >>>>> The archives can be found at:
> >>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>>> _______________________________________________
> >>>> HTCondor-users mailing list
> >>>> To unsubscribe, send a message to
> >>>> htcondor-users-request@xxxxxxxxxxx
> >>>> with a
> >>>> subject: Unsubscribe
> >>>> You can also unsubscribe by visiting
> >>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>>
> >>>> The archives can be found at:
> >>>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>> _______________________________________________
> >>> HTCondor-users mailing list
> >>> To unsubscribe, send a message to htcondor-users-
> request@xxxxxxxxxxx
> >>> with a
> >>> subject: Unsubscribe
> >>> You can also unsubscribe by visiting
> >>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>
> >>> The archives can be found at:
> >>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >> _______________________________________________
> >> HTCondor-users mailing list
> >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> >> with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/htcondor-users/
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> > with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/