[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] "is not an integer" (in config file)



Finch, Ralph wrote:
condor 7.0.1 on all machines in a Wintel pool.

I'm getting different behavior on what should be identical machines.

In each machine's condor_config.local file I added the following line:

TOUCH_LOG_INTERVAL = 3600 * 24

I generally like to use a product, rather than the result, to make it clearer (in this case, the touch log interval is a day long).


Makes sense, but unfortunately not allowed in this specific case. Expressions like the above are allowable in ClassAd expressions, and thus are allowed in condor_config parameters that are specifying ClassAd expressions (like Start, Suspend, Rank, etc), but are typically not allowed elsewhere. Someday we hope to make this better / more consistent.

After adding the line I copied the file to each machine in the pool and issued condor_reconfig -all

Most machines accepted the change without problem: (masterlog)

4/10 08:09:31 Reconfiguring all running daemons.
4/10 08:09:31 Sent signal 1 to STARTD (pid 7424)
4/10 08:09:31 Sent signal 1 to SCHEDD (pid 904)
4/10 08:09:31 Return from HandleReq <handle_reconfig()>
4/10 08:09:31 Return from Handler <DaemonCore::HandleReqSocketHandler>
4/10 08:09:32 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:09:32 Return from HandleReq <HandleChildAliveCommand>
4/10 08:09:32 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:09:32 Return from HandleReq <HandleChildAliveCommand>

But some machines did not like the new line and died: (masterlog)

4/10 08:03:05 Reconfiguring all running daemons.
4/10 08:03:05 Sent signal 1 to STARTD (pid 13404)
4/10 08:03:05 Sent signal 1 to SCHEDD (pid 18172)
4/10 08:03:05 Return from HandleReq <handle_reconfig()>
4/10 08:03:05 Return from Handler <DaemonCore::HandleReqSocketHandler>
4/10 08:03:06 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:03:06 Return from HandleReq <HandleChildAliveCommand>
4/10 08:03:06 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:03:06 Return from HandleReq <HandleChildAliveCommand>
4/10 08:06:52 ERROR "TOUCH_LOG_INTERVAL in the condor configuration is
not an integer (3600 * 24).  Please set it to an integer in the range
-2147483648 to 2147483647 (default 60)." at line 1331 in file
..\src\condor_c++_util\condor_config.C
4/10 08:06:52 Sent SIGKILL to STARTD (pid 13404) and all it's children.
4/10 08:06:53 Sent SIGKILL to SCHEDD (pid 18172) and all it's children.
4/10 08:06:53 **** Condor (condor_MASTER) EXITING WITH STATUS 1


Any ideas why the different behavior?


Maybe in the machines were it appeared to have succeeded have simply not (yet) attempted to fetch the value of TOUCH_LOG_INTERVAL ? It is fetched on demand at run time.

Another idea: perhaps some machines in your pool are running an older version of Condor that doesn't look at TOUCH_LOG_INTERVAL ?

regards,
Todd

--
Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257