[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor & RH4



We've recently upgraded to RedHat Enterprise Linux 4 and are now experiencing some new problems with our Condor system. It seems that Condor can't figure out on its own how much memory and swap space is available on the upgraded machines. Example log files:

StartLog:

3/23 10:35:53 ** $CondorPlatform: I386-LINUX_RH9 $
3/23 10:35:53 ** PID = 30532
3/23 10:35:53 ******************************************************
3/23 10:35:53 Using config file: /users/condor/condor_config
3/23 10:35:53 Using local config files: /net/condor/etc/rime.local
3/23 10:35:53 DaemonCore: Command Socket at <128.95.99.136:33656>
3/23 10:35:53 Error computing physical memory with calc_phys_mem().
                MEMORY parameter not defined in config file.
                Try setting MEMORY to the number of megabytes of RAM.
3/23 10:35:53 ERROR "Can't compute physical memory." at line 60 in file
ResAttributes.C

Note that before, the MEMORY parameter was not specified for any of the machines, and it still worked fine. This has now been resolved by adding a MEMORY line to all the .local files, but I don't really like this bandaid solution.

and SchedLog:

3/23 17:15:07 Sent ad to central manager for stinson@xxxxxxxxxxxxxxxxxxxx
3/23 17:15:33 Activity on stashed negotiator socket
3/23 17:15:33 Negotiating for owner: stinson@xxxxxxxxxxxxxxxxxxxx
3/23 17:15:33 Checking consistency running and runnable jobs
3/23 17:15:33 Tables are consistent
3/23 17:15:33 Swap space estimate reached! No more jobs can be run!
3/23 17:15:33 Solution: get more swap space, or set RESERVED_SWAP = 0

This results in jobs idling forever in the queue. They get submitted fine, but the above errors are repeated over and over, resulting in the job never actually running.

I suspect that this issue has already been addressed, and if so, I apologize I wasn't able to find the thread... any input is much appreciated. Thanks,


Rok