[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] "is not an integer" (in config file)



You can always write a test job that goes to each machine and does a
condor_version.

Alternatively, you could do what my .cgi scripts
( http://tardis.dl.ac.uk/Condor/cgi-bin/CondorVersion.cgi for example )
use:
condor_status  -master -f "%-32s" Machine -f "%s\n" CondorVersion

that will tell you the version of condor for all machines in the pool
(I seem to remember that using the -master flag means that anything
running 
a condor_master daemon will get an entry, missing out will only get the
execute machines.

If you are interested in using the .cgi scripts as a starter for your
condor pool
web site, let me know.

Cheers

JK

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Finch, Ralph
> Sent: Thursday, April 10, 2008 7:23 PM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] "is not an integer" (in config file)
> 
> I'm 99.9% sure that all machines are using 7.0.1.  On the 
> problem machines I looked backward in the MasterLog file to 
> see the version number when they started, all were 7.0.1
> 
> There's been odd behavior of the pool since everything was 
> upgraded from 6.8.X last week.  The main problem is that our 
> hyperthreaded machines SMP still appear as a total of 4 
> slots, even though COUNT_HYPERTHREAD_CPUS = FALSE in 
> condor_config.local:
> 
> slot1@xxxxxxxxxxxx WINNT51    INTEL  Owner     Idle     0.780   767
> 0+01:14:49
> slot2@xxxxxxxxxxxx WINNT51    INTEL  Claimed   Busy     0.990   767
> 0+01:44:18
> slot3@xxxxxxxxxxxx WINNT51    INTEL  Unclaimed Idle     0.000   767
> 0+00:02:03
> slot4@xxxxxxxxxxxx WINNT51    INTEL  Unclaimed Idle     0.000   767
> 0+00:02:04
> 
> (VENICE is a two-cpu [not dual-core], hypertheaded Wintel machine).
> 
> Ralph Finch
> 916-653-7552
> 
> 
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
> Sent: Thursday, April 10, 2008 8:44 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] "is not an integer" (in config file)
> 
> Finch, Ralph wrote:
> > condor 7.0.1 on all machines in a Wintel pool.
> > 
> > I'm getting different behavior on what should be identical machines.
> > 
> > In each machine's condor_config.local file I added the 
> following line:
> > 
> > TOUCH_LOG_INTERVAL = 3600 * 24
> > 
> > I generally like to use a product, rather than the result, 
> to make it 
> > clearer (in this case, the touch log interval is a day long).
> > 
> 
> Makes sense, but unfortunately not allowed in this specific case. 
> Expressions like the above are allowable in ClassAd 
> expressions, and thus are allowed in condor_config parameters 
> that are specifying ClassAd expressions (like Start, Suspend, 
> Rank, etc), but are typically not allowed elsewhere.  Someday 
> we hope to make this better / more consistent.
> 
> > After adding the line I copied the file to each machine in the pool 
> > and issued condor_reconfig -all
> > 
> > Most machines accepted the change without problem: (masterlog)
> > 
> > 4/10 08:09:31 Reconfiguring all running daemons.
> > 4/10 08:09:31 Sent signal 1 to STARTD (pid 7424) 4/10 08:09:31 Sent 
> > signal 1 to SCHEDD (pid 904) 4/10 08:09:31 Return from HandleReq 
> > <handle_reconfig()> 4/10 08:09:31 Return from Handler 
> > <DaemonCore::HandleReqSocketHandler>
> > 4/10 08:09:32 Calling HandleReq <HandleChildAliveCommand> (0) 4/10
> > 08:09:32 Return from HandleReq <HandleChildAliveCommand> 
> 4/10 08:09:32
> 
> > Calling HandleReq <HandleChildAliveCommand> (0) 4/10 
> 08:09:32 Return 
> > from HandleReq <HandleChildAliveCommand>
> > 
> > But some machines did not like the new line and died: (masterlog)
> > 
> > 4/10 08:03:05 Reconfiguring all running daemons.
> > 4/10 08:03:05 Sent signal 1 to STARTD (pid 13404) 4/10 
> 08:03:05 Sent 
> > signal 1 to SCHEDD (pid 18172) 4/10 08:03:05 Return from HandleReq 
> > <handle_reconfig()> 4/10 08:03:05 Return from Handler 
> > <DaemonCore::HandleReqSocketHandler>
> > 4/10 08:03:06 Calling HandleReq <HandleChildAliveCommand> (0) 4/10
> > 08:03:06 Return from HandleReq <HandleChildAliveCommand> 
> 4/10 08:03:06
> 
> > Calling HandleReq <HandleChildAliveCommand> (0) 4/10 
> 08:03:06 Return 
> > from HandleReq <HandleChildAliveCommand> 4/10 08:06:52 ERROR 
> > "TOUCH_LOG_INTERVAL in the condor configuration is not an integer 
> > (3600 * 24).  Please set it to an integer in the range
> > -2147483648 to 2147483647 (default 60)." at line 1331 in file 
> > ..\src\condor_c++_util\condor_config.C
> > 4/10 08:06:52 Sent SIGKILL to STARTD (pid 13404) and all it's
> children.
> > 4/10 08:06:53 Sent SIGKILL to SCHEDD (pid 18172) and all it's
> children.
> > 4/10 08:06:53 **** Condor (condor_MASTER) EXITING WITH STATUS 1
> > 
> > 
> > Any ideas why the different behavior?
> > 
> 
> Maybe in the machines were it appeared to have succeeded have 
> simply not
> (yet) attempted to fetch the value of TOUCH_LOG_INTERVAL ?  
> It is fetched on demand at run time.
> 
> Another idea: perhaps some machines in your pool are running 
> an older version of Condor that doesn't look at TOUCH_LOG_INTERVAL ?
> 
> regards,
> Todd
> 
> -- 
> Todd Tannenbaum                       University of Wisconsin-Madison
> Condor Project Research               Department of Computer Sciences
> tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to 
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to 
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
>