[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor condor_status shows no available workers



It looks like the security settings are causing the condor_collector daemon to not trust the other daemons for network communications.

Can you send me the output of the following commands:
% condor_config_val FULL_HOSTNAME
% condor_config_val IP_ADDRESS
% condor_config_val ALLOW_WRITE
% condor_config_val ALLOW_ADMINISTRATOR
% condor_config_val COLLECTOR_HOST
% host `condor_config_val FULL_HOSTNAME`

And send me the contents of these files:
/var/log/condor/.collector_address
/var/log/condor/.master_address

 -- Jaime Frey

On Feb 20, 2013, at 7:43 AM, Dan Coffey <dan_coffey@xxxxxxxxxxx> wrote:

Also, in looking through my logs, I am seeing the following error in the NegotiatorLog:

02/20/13 08:16:16 ---------- Started Negotiation Cycle ----------
02/20/13 08:16:16 Phase 1:  Obtaining ads from collector ...
02/20/13 08:16:16   Getting Scheduler, Submitter and Machine ads ...
02/20/13 08:16:16   Sorting 0 ads ...
02/20/13 08:16:16   Getting startd private ads ...
02/20/13 08:16:16 condor_read() failed: recv(fd=8) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from collector at <127.0.1.1:9618>.
02/20/13 08:16:16 IO: Failed to read packet header
02/20/13 08:16:16 Couldn't fetch ads: communication error
02/20/13 08:16:16 Aborting negotiation cycle



Dan


On Wed, Feb 20, 2013 at 8:21 AM, Dan Coffey <dan_coffey@xxxxxxxxxxx> wrote:
Hi Harshad and list,

I tried a few things:
- I did an apt-get purge and re-installed to be sure I got new configuration files
- I restarted condor to make sure that it was running all the daemons
- I let watch condor_status run all night

Still no workers showed up, keep in mind I touched nothing in the config file

I removed again and changed from "deb http://research.cs.wisc.edu/htcondor/debian/development/ lenny contrib" to "deb http://research.cs.wisc.edu/htcondor/debian/stable/ squeeze contrib"

same result.

I am running this on a Dell PowerEdge 1950 server.  OS is Ubuntu 12.04 LTS.

Thanks for any advice!

Dan


On Wed, Feb 20, 2013 at 6:23 AM, Harshad Prajapati <harshad.b.prajapati@xxxxxxxxx> wrote:
Dan Coffey:

I had experienced the same problem.

For default personal condor installation, you do not need to touch any configuration files.

As per my understanding:

condor_status takes some time to report machines due to delay caused by daemons while reporting to master.

Therefore, you wait for some time. You should get machine listed in output within not more than 2 minutes. You can keep continuous check by running following command.
watch condor_status

With regards,
Harshad Prajapati


On Wed, Feb 20, 2013 at 11:14 AM, Dan Coffey <dan_coffey@xxxxxxxxxxx> wrote:
Hello,

I am just getting started with HTCondor.  I am running Ubuntu 12.04 LTS and have installed via apt-get with the source "http://research.cs.wisc.edu/htcondor/debian/stable/ lenny contrib"

This yielded HTCondor version:
7.8.7 Dec 12 2012 BuildID: 86173 $
$CondorPlatform: x86_64_deb_5.0 $

I have done my best to touch as little as possible to show myself that condor us running:

$ pstree | grep condor
     |-condor_master-+-condor_collecto
     |               |-condor_negotiat
     |               |-condor_schedd---condor_procd
     |               `-condor_startd

The issue is that when I run condor_status, my command prompt simply returns.  No workers are listed at all, no information is printed.

I checked to make sure that the daemons are correctly specfied in the config file, as best I can tell it is setup correctly.

Please advise if you have any advice!

Thank you,

Dan



Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project