[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem running mpi job on Condor 6.7.5 Feb 28 2005, I386-LINUX_RH9



Good morning to you too.

Thanks for your answer. The hosts are rebooted every day, since the condor
setup is a special setup using linux remote boot.

On Wed, Apr 20, 2005 at 03:33:43PM -0500, Greg Thain wrote:
> Thanks for your complete logs -- this really helps us debug this kind
> of thing.  The key line in the log is this:
> 
> > Found 0 potential dedicated resources
> 
> This means that, despite your setup, the dedicated scheduler has not
> found any of the machines you have dedicated to it.  Did you restart the
> startds after changing their configuration?

I have activated the Full Debug for the Startds and here is the output, when I
submit the mpi job:

reconfig:
4/21 10:06:15 Got SIGHUP.  Re-reading config files.
4/21 10:06:15 STARTD_TIMEOUT_MULTIPLIER is undefined, using default value of 0
4/21 10:06:15 Will use UDP to update collector gridmaster.ben.tuwien.ac.at
<193.170.74.44:9618>
4/21 10:06:15 UidDomain = "ben.tuwien.ac.at"
4/21 10:06:15 FileSystemDomain = "ben.tuwien.ac.at"
4/21 10:06:15 Subnet = "193.170.74"
4/21 10:06:15 Swap space: 0
4/21 10:06:15 186015992 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:06:15 Looking up RESERVED_DISK parameter
4/21 10:06:15 Reserving 5120 kbytes for file system
4/21 10:06:15 Disk space: 186010872
4/21 10:06:16 MainConfig finish
4/21 10:06:16 CronMgr: Doing config (reconfig)
4/21 10:06:16 DaemonCore: in SendAliveToParent()
4/21 10:06:16 DaemonCore: attempting to connect to '<193.170.74.30:32768>'
4/21 10:06:16 STARTD_TIMEOUT_MULTIPLIER is undefined, using default value of 0
4/21 10:06:16 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:06:16 DaemonCore: No more children processes to reap.
4/21 10:06:20 Trying to update collector <193.170.74.44:9618>
4/21 10:06:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:06:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:06:20 Sent update to 1 collector(s)
4/21 10:07:35 Getting monitoring info for pid 4231
4/21 10:08:16 Swap space: 0
4/21 10:08:16 186015992 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:08:16 Looking up RESERVED_DISK parameter
4/21 10:08:16 Reserving 5120 kbytes for file system
4/21 10:08:16 Disk space: 186010872
4/21 10:08:20 Trying to update collector <193.170.74.44:9618>
4/21 10:08:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:08:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:08:20 Sent update to 1 collector(s)


submission:
4/21 10:10:16 Swap space: 0
4/21 10:10:16 186014528 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:10:16 Looking up RESERVED_DISK parameter
4/21 10:10:16 Reserving 5120 kbytes for file system
4/21 10:10:16 Disk space: 186009408
4/21 10:10:20 Trying to update collector <193.170.74.44:9618>
4/21 10:10:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:10:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:10:20 Sent update to 1 collector(s)
4/21 10:11:35 Getting monitoring info for pid 4231
4/21 10:12:16 Swap space: 0
4/21 10:12:16 186014520 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:12:16 Looking up RESERVED_DISK parameter
4/21 10:12:16 Reserving 5120 kbytes for file system
4/21 10:12:16 Disk space: 186009400
4/21 10:12:20 Trying to update collector <193.170.74.44:9618>
4/21 10:12:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:12:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:12:20 Sent update to 1 collector(s)


Thanks for your help

Philipp Kolmann

-- 
If you have problems in Windows: REBOOT
If you have problems in Linux:   BE ROOT