[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RANDOM_INTEGER problems on Windows



OK thanks. I'll keep an eye on the logfiles and we'll see.

regards,

-ian.

> -----Original Message-----
> From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> Sent: 04 December 2012 17:35
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
> 
> 
>  From the code, it appears that the random number generator used in
> this instance is initialized using the process id of the daemon.  I
> don't know how process ids are assigned in windows, but I'd be worried
> that the process id of the condor master might not be uniformly
> distributed.
> If this causes you trouble, it shouldn't be hard for someone to fix.
> 
> --Dan
> 
> On 12/4/12 9:20 AM, Smith, Ian wrote:
> > Hi Dan,
> >
> > Thanks for that - it did the trick. Final word on the subject - how
> > are the random number generators seeded ? Or in other words what are
> > the chances that different PCs will get the same values ?
> >
> > regards,
> >
> > -ian.
> >
> >
> >
> >> -----Original Message-----
> >> From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
> >> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> >> Sent: 03 December 2012 16:55
> >> To: htcondor-users@xxxxxxxxxxx
> >> Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
> >>
> >> Hi Ian,
> >>
> >> The semantics of $RANDOM_INTEGER() are not well defined with respect
> >> to the timing of the throw of the dice.  The dice are thrown
> whenever
> >> condor reads the configuration setting that contains the reference
> to
> >> $RANDOM_INTEGER().  For some things in Condor, this happens once per
> >> reconfig.  For other things, it happens much more frequently,
> because
> >> condor doesn't happen to cache the value.
> >>
> >> I had thought that a published classad attribute would only be read
> >> from the configuration once per reconfig, but I was wrong.  As you
> >> discovered, it happens much more frequently.
> >>
> >> Here's an idea: set the random value in an environment variable that
> >> is passed to the daemon when it is started.  Example:
> >>
> >> STARTD_ENVIRONMENT = "_CONDOR_VACATE_FUZZ=$RANDOM_INTEGER(0,10)"
> >> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(VACATE_FUZZ)
> )
> >>
> >> Unfortunately, in addition to a reconfig of the master, that will
> >> require a restart of the startd, since the environment is only set
> at
> >> startup of the daemon.  Also note that, for better or worse, all
> >> slots in the startd will get the same value of VACATE_FUZZ.
> >>
> >> --Dan
> >>
> >> On 12/3/12 9:40 AM, Smith, Ian wrote:
> >>> Hi Dan,
> >>>
> >>> I tried this but it still didn't seem to work and I still get
> >>> different values for $RANDOM_INTEGER() each time I do a
> >>> condor_status
> >> -direct.
> >>> I tried something like this so that it should only get one value
> per
> >>> "session" but that didn't seem to work either.
> >>>
> >>> RAND_INT  = ifThenElse( ( MonitorSelfTime == 0     ), \
> >>>                           ( $RANDOM_INTEGER( 0, 10 ) ), \
> >>>                           0 )
> >>>
> >>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == $(RAND_INT) )
> >>>
> >>> Obviously the following should work but my guess is that it would
> >>> favour evictions closer to $(REBOOT_TIME):
> >>>
> >>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) <=
> >>> $RANDOM_INTEGER(
> >> 0, 10 ) ) && \
> >>>                       ( $(REBOOT_TIME) - ClockMin ) >= 0 )
> >>>
> >>> any ideas ?
> >>>
> >>> regards,
> >>>
> >>> -ian.
> >>>
> >>> PS The manual might be a bit clearer on what is happening behind
> the
> >>> scenes with
> >>> $RANDOM_INTEGER() .
> >>>
> >>>> -----Original Message-----
> >>>> From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
> >>>> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> >>>> Sent: 30 November 2012 16:01
> >>>> To: htcondor-users@xxxxxxxxxxx
> >>>> Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
> >>>>
> >>>> Hi Ian,
> >>>>
> >>>> In the context where you are using it, I'd expect
> $RANDOM_INTEGER()
> >>>> to be reevaluated every time the startd restarts or is told to
> >> reread
> >>>> the configuration (condor_reconfig, SIGHUP).
> >>>>
> >>>> To make things explicit, you could put PERIODIC_VACATE into the
> >>>> startd ad and make your PREEMPT expression refer to it as a
> classad
> >>>> attribute rather than as a configuration macro.  Then you can see
> >> the
> >>>> value with condor_status.  Example of how to configure things that
> >> way:
> >>>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
> >>>> $RANDOM_INTEGER(0,
> >>>> 10) )
> >>>> STARTD_EXPRS = $(STARTD_EXPRS) PERIODIC_VACATE
> >>>> PREEMPT         = ($(UWCS_PREEMPT)) || PERIODIC_VACATE
> >>>>
> >>>> I don't see why your policy is not working the way you want.
> >>>> Perhaps the above will help make it clear.
> >>>>
> >>>> --Dan
> >>>>
> >>>> On 11/30/12 5:25 AM, Smith, Ian wrote:
> >>>>> Hi Dan,
> >>>>>
> >>>>> Thanks for the quick reply. Yes having the correct syntax
> >>>>> certainly helps ! I really should RTFM more carefully :-;
> >>>>>
> >>>>> The strange thing is though that >this< expression never seems to
> >>>>> evaluate to TRUE (i.e. the jobs never get vacated).
> >>>>>
> >>>>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
> >>>> $RANDOM_INTEGER(0, 10) )
> >>>>>                                                         ^ If I
> run
> >>>>> condor_config_val I see different integer values generated so the
> >>>>> big question is how often are the random values updated compared
> >>>>> with the ClockMin values ??? Obviously if it's just generated
> once
> >>>>> on start up then there's no problem but if the update periods are
> >>>>> similar then I could see why this would not work ...
> >>>>>
> >>>>> Imagine for example that it is 10 minutes to reboot time and the
> >>>>> just a few integers are generated in the following minute: e.g 4,
> >> 8, 2, 3.
> >>>>> Then PERIODIC_VACATE doesn't evaluate to TRUE. By the same token
> >>>>> on each succeeding minute the integer needed for this to evaluate
> >>>>> to
> >>>> TRUE
> >>>>> may also not be generated.
> >>>>>
> >>>>> I'm sure there must be a way of expressing this so that
> >>>>> PERIODIC_VACATE evaluates to TRUE just once a day at a randomised
> >>>> time
> >>>>> but I can't see it at the moment.
> >>>>>
> >>>>> any ideas ?
> >>>>>
> >>>>> many thanks,
> >>>>>
> >>>>> -ian.
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-
> >>>>>> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> >>>>>> Sent: 29 November 2012 15:21
> >>>>>> To: htcondor-users@xxxxxxxxxxx
> >>>>>> Subject: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
> >>>>>>
> >>>>>> Hi Ian,
> >>>>>>
> >>>>>> There should be a $ in front of RANDOM_INTEGER.  Does that help?
> >>>>>>
> >>>>>> --Dan
> >>>>>>
> >>>>>> On 11/29/12 6:18 AM, Smith, Ian wrote:
> >>>>>>> Hello All,
> >>>>>>>
> >>>>>>> I'm trying to configure our execute hosts to vacate jobs
> >>>>>> automatically
> >>>>>>> just before they are rebooted each night. To spread out the
> >>>>>>> checkpoints I've tried to add some "jitter" with RANDOM_INTEGER
> >>>> thus:
> >>>>>>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) ==
> >>>> RANDOM_INTEGER(
> >>>>>> 0, 10 ) )
> >>>>>>> PREEMPT         = $(UWCS_PREEMPT) || ( $(PERIODIC_VACATE) ==
> TRUE
> >> )
> >>>>>>> but this does not seem to work. I can't track down a definitive
> >>>>>>> error message but it looks like the condor_startd (or possibly
> >>>>>>> condor_starter) is repeatedly failing and the shadow
> >> disconnecting
> >>>>>> because of this.
> >>>>>>> If I take out the randomness, e.g.
> >>>>>>>
> >>>>>>> PERIODIC_VACATE = ( ( $(REBOOT_TIME) - ClockMin ) == 0 )
> >>>>>>>
> >>>>>>> everything works fine.
> >>>>>>>
> >>>>>>> Has anyone else seen this ? Is RANDOM_INTEGER supported under
> >>>>>>> Windows or does it have some /dev/random dependence ?
> >>>>>>>
> >>>>>>> I'm using Condor 7.6.2 on Windows 7 Enterprise.
> >>>>>>>
> >>>>>>> regards,
> >>>>>>>
> >>>>>>> -ian.
> >>>>>>>
> >>>>>>> ---------------------------------------
> >>>>>>> Dr Ian C. Smith,
> >>>>>>> Advanced Research Computing,
> >>>>>>> University of Liverpool, UK.
> >>>>>>> _______________________________________________
> >>>>>>> HTCondor-users mailing list
> >>>>>>> To unsubscribe, send a message to htcondor-users-
> >>>> request@xxxxxxxxxxx
> >>>>>>> with a
> >>>>>>> subject: Unsubscribe
> >>>>>>> You can also unsubscribe by visiting
> >>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>>>>>
> >>>>>>> The archives can be found at:
> >>>>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>>>>> _______________________________________________
> >>>>>> HTCondor-users mailing list
> >>>>>> To unsubscribe, send a message to
> >>>>>> htcondor-users-request@xxxxxxxxxxx
> >>>>>> with a
> >>>>>> subject: Unsubscribe
> >>>>>> You can also unsubscribe by visiting
> >>>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>>>>
> >>>>>> The archives can be found at:
> >>>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>>>> _______________________________________________
> >>>>> HTCondor-users mailing list
> >>>>> To unsubscribe, send a message to htcondor-users-
> >> request@xxxxxxxxxxx
> >>>>> with a
> >>>>> subject: Unsubscribe
> >>>>> You can also unsubscribe by visiting
> >>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>>>
> >>>>> The archives can be found at:
> >>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>>> _______________________________________________
> >>>> HTCondor-users mailing list
> >>>> To unsubscribe, send a message to
> >>>> htcondor-users-request@xxxxxxxxxxx
> >>>> with a
> >>>> subject: Unsubscribe
> >>>> You can also unsubscribe by visiting
> >>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>>
> >>>> The archives can be found at:
> >>>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >>> _______________________________________________
> >>> HTCondor-users mailing list
> >>> To unsubscribe, send a message to htcondor-users-
> request@xxxxxxxxxxx
> >>> with a
> >>> subject: Unsubscribe
> >>> You can also unsubscribe by visiting
> >>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>>
> >>> The archives can be found at:
> >>> https://lists.cs.wisc.edu/archive/htcondor-users/
> >> _______________________________________________
> >> HTCondor-users mailing list
> >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> >> with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/htcondor-users/
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> > with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/