[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Jobs remaining Idle
- Date: Mon, 22 May 2006 11:10:59 -0600
- From: rnayar@xxxxxxxx
- Subject: Re: [Condor-users] Jobs remaining Idle
Hey buddy, I don't know if this will help you but "failing to connect" could be
the result of your firewall not configured properly. I had a little run in with
this problem when I was getting Condor to work.
Quoting "Shaun J. O'Callaghan" <Shaun.OCallaghan@xxxxxxxxxxxx>:
> Hi there,
> Firstly, apologies if this has been dealt with in this list already.
> I've searched through this list, and the docs, and don't seem to be able
> to find an answer.
> I'm running a test Condor pool at the moment. I have a Windows XP
> machine (the master server) and a Windows Server 2003 machine (the only
> other machine in the pool).
> I've written a test application, a 'hello world' app, in C just to
> demonstrate that jobs actually get executed and run ok. However, the
> jobs are queued and then appear to run briefly before entering the
> "Idle" state which is where they stay. I submit the job from the
> Windows Server 2003 machine to the pool.
> The submit file is as follows:
> executable = condortestapp.exe
> universe = vanilla
> Requirements = (OpSys == "WINNT50") || (OpSys == "WINNT51") || (OpSys ==
> error = error.output
> output = out.output
> Negotiator.log has the following line:
> 5/22 16:42:32 DC_AUTHENTICATE: attempt to open invalid session
> GEOG41:2204:1148048993:2, failing.
> CollectorLog.log has the following:
> 5/22 16:42:08 (Sent 7 ads in response to query)
> 5/22 16:42:08 Got QUERY_STARTD_PVT_ADS
> 5/22 16:42:08 (Sent 2 ads in response to query)
> 5/22 16:42:32 SubmittorAd : Inserting ** "<
> Administrator@xxxxxxxxxxxxxxxxxx , xxx.xxx.xxx.xxx >"
> 5/22 16:42:32 stats: Inserting new hashent for
> 'Submittor':'Administrator@xxxxxxxxxxxxxxxxxx:' xxx.xxx.xxx.xxx'
> 5/22 16:42:49 Got QUERY_SCHEDD_ADS
> 5/22 16:42:49 (Sent 1 ads in response to query)
> 5/22 16:46:44 Housekeeper: Ready to clean old ads
> 5/22 16:46:44 Cleaning StartdAds ...
> 5/22 16:46:44 Cleaning StartdPrivateAds ...
> 5/22 16:46:44 Cleaning ScheddAds ...
> 5/22 16:46:44 Cleaning SubmittorAds ...
> 5/22 16:46:44 Cleaning LicenseAds ...
> 5/22 16:46:44 Cleaning MasterAds ...
> 5/22 16:46:44 Cleaning CkptServerAds ...
> 5/22 16:46:44 Cleaning CollectorAds ...
> 5/22 16:46:44 Cleaning StorageAds ...
> 5/22 16:46:44 Housekeeper: Done cleaning
> 5/22 16:46:48 Can't connect to < xxx.xxx.xxx.xxx:9618>:0, errno = 10060
> 5/22 16:46:48 Will keep trying for 10 seconds...
> 5/22 16:46:57 Connect failed for 10 seconds; returning FALSE
> 5/22 16:46:57 ERROR:
> SECMAN:2003:TCP connection to <xxx.xxx.xxx.xxx:9618> failed
> 5/22 16:46:57 Can't send UPDATE_COLLECTOR_AD to collector
> (condor.cs.wisc.edu): Failed to send UDP update command to collector
> 5/22 16:47:09 (Sent 8 ads in response to query)
> 5/22 16:47:09 Got QUERY_STARTD_PVT_ADS
> 5/22 16:47:09 (Sent 2 ads in response to query)
> Condor_q -analyze gives the following output from the Windows Server
> 2003 machine:
> 011.000: Run analysis summary. Of 2 machines,
> 0 are rejected by your job's requirements
> 0 reject your job because of their own requirements
> 0 match, but are serving users with a better priority in the pool
> 2 match, match, but reject the job for unknown reasons
> 0 match, but will not currently preempt their existing job
> 0 are available to run your job
> Last successful match: Mon May 22 16:47:10 2006
> 1 jobs; 1 idle, 0 running, 0 held
> Condor_q -global gives the following output from the Windows XP machine
> (central server)
> -- Failed to fetch ads from: <xxx.xxx.xxx.xxx:12566> :
> internaldomain.com (IP of Windows Server 2003)
> If anybody can shed any light on why these jobs are remaining idle,
> which I'm sure is a pretty straightforward error I just can't seem to
> put my finger on it, that'd be great.
> Thanks in advance,
> Shaun James O'Callaghan
> Condor-users mailing list