[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] "is not an integer" (in config file)



I'm 99.9% sure that all machines are using 7.0.1.  On the problem
machines I looked backward in the MasterLog file to see the version
number when they started, all were 7.0.1

There's been odd behavior of the pool since everything was upgraded from
6.8.X last week.  The main problem is that our hyperthreaded machines
SMP still appear as a total of 4 slots, even though
COUNT_HYPERTHREAD_CPUS = FALSE in condor_config.local:

slot1@xxxxxxxxxxxx WINNT51    INTEL  Owner     Idle     0.780   767
0+01:14:49
slot2@xxxxxxxxxxxx WINNT51    INTEL  Claimed   Busy     0.990   767
0+01:44:18
slot3@xxxxxxxxxxxx WINNT51    INTEL  Unclaimed Idle     0.000   767
0+00:02:03
slot4@xxxxxxxxxxxx WINNT51    INTEL  Unclaimed Idle     0.000   767
0+00:02:04

(VENICE is a two-cpu [not dual-core], hypertheaded Wintel machine).

Ralph Finch
916-653-7552


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Thursday, April 10, 2008 8:44 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] "is not an integer" (in config file)

Finch, Ralph wrote:
> condor 7.0.1 on all machines in a Wintel pool.
> 
> I'm getting different behavior on what should be identical machines.
> 
> In each machine's condor_config.local file I added the following line:
> 
> TOUCH_LOG_INTERVAL = 3600 * 24
> 
> I generally like to use a product, rather than the result, to make it 
> clearer (in this case, the touch log interval is a day long).
> 

Makes sense, but unfortunately not allowed in this specific case. 
Expressions like the above are allowable in ClassAd expressions, and
thus are allowed in condor_config parameters that are specifying ClassAd
expressions (like Start, Suspend, Rank, etc), but are typically not
allowed elsewhere.  Someday we hope to make this better / more
consistent.

> After adding the line I copied the file to each machine in the pool 
> and issued condor_reconfig -all
> 
> Most machines accepted the change without problem: (masterlog)
> 
> 4/10 08:09:31 Reconfiguring all running daemons.
> 4/10 08:09:31 Sent signal 1 to STARTD (pid 7424) 4/10 08:09:31 Sent 
> signal 1 to SCHEDD (pid 904) 4/10 08:09:31 Return from HandleReq 
> <handle_reconfig()> 4/10 08:09:31 Return from Handler 
> <DaemonCore::HandleReqSocketHandler>
> 4/10 08:09:32 Calling HandleReq <HandleChildAliveCommand> (0) 4/10 
> 08:09:32 Return from HandleReq <HandleChildAliveCommand> 4/10 08:09:32

> Calling HandleReq <HandleChildAliveCommand> (0) 4/10 08:09:32 Return 
> from HandleReq <HandleChildAliveCommand>
> 
> But some machines did not like the new line and died: (masterlog)
> 
> 4/10 08:03:05 Reconfiguring all running daemons.
> 4/10 08:03:05 Sent signal 1 to STARTD (pid 13404) 4/10 08:03:05 Sent 
> signal 1 to SCHEDD (pid 18172) 4/10 08:03:05 Return from HandleReq 
> <handle_reconfig()> 4/10 08:03:05 Return from Handler 
> <DaemonCore::HandleReqSocketHandler>
> 4/10 08:03:06 Calling HandleReq <HandleChildAliveCommand> (0) 4/10 
> 08:03:06 Return from HandleReq <HandleChildAliveCommand> 4/10 08:03:06

> Calling HandleReq <HandleChildAliveCommand> (0) 4/10 08:03:06 Return 
> from HandleReq <HandleChildAliveCommand> 4/10 08:06:52 ERROR 
> "TOUCH_LOG_INTERVAL in the condor configuration is not an integer 
> (3600 * 24).  Please set it to an integer in the range
> -2147483648 to 2147483647 (default 60)." at line 1331 in file 
> ..\src\condor_c++_util\condor_config.C
> 4/10 08:06:52 Sent SIGKILL to STARTD (pid 13404) and all it's
children.
> 4/10 08:06:53 Sent SIGKILL to SCHEDD (pid 18172) and all it's
children.
> 4/10 08:06:53 **** Condor (condor_MASTER) EXITING WITH STATUS 1
> 
> 
> Any ideas why the different behavior?
> 

Maybe in the machines were it appeared to have succeeded have simply not
(yet) attempted to fetch the value of TOUCH_LOG_INTERVAL ?  It is
fetched on demand at run time.

Another idea: perhaps some machines in your pool are running an older
version of Condor that doesn't look at TOUCH_LOG_INTERVAL ?

regards,
Todd

-- 
Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/