[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] I can not run condor_master on the 2nd node



Cituji Todd Tannenbaum <tannenba@xxxxxxxxxxx>:


Hi Rene,

Couple quick thoughts -

1. First kill all condor process on the 2nd node; from the logs below it looks like perhaps you had two instances of the condor_master running, and they were conflicting with each other. Be certain that
  ps -auxw | grep condor
does not show anything.

2. At least on my nodes (SL6), the user "condor" has ownership and write permissions (not just root) to the lock and run directories; behold:

  $ ls -ld /var/run/condor
  drwxr-xr-x 2 condor condor 4096 Feb 26 15:54 /var/run/condor
  $ ls -ld /var/lock/condor
  drwxr-xr-x 4 condor condor 4096 Feb 26 16:17 /var/lock/condor


Thanks Todd very much for help. I killed all condor processes, then changed the owner and group for those folders at condor, and restarted condor_master.

Everything is running condor_master and condor_startd under user condor. And condor_procd under user root. I hope it is ok.

Finally, I can see all cores available.

3. You may want to consider using the HTCondor Debian Repository, as it tends to be updated w/ bug fixes etc faster than downstream distros. See https://research.cs.wisc.edu/htcondor/debian/

The HTcondor installed my profesor before. I have not speaking with him yet. But I do not think that he would use different than standard distro instalation.

One more time, thanks a lot to all condor users who helped me.

Regards,
Rene


Hope the above helps
Todd