[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] old version of condor_master keeps trying to start every 3 minutes



Thanks to everyone for the tips.
A quick fix was to use Dans chmod a-x /opt/condor/sbin/condor_master,
while Garretts 'chkconfig --list | fgrep condor' revealed a
'rocks-condor' service running at levels 3,4 & 5. 

I'll dig in to this when time permits but it looks like the problem's on
the way to resolution.
Now if only I could get my parallel jobs running ... but that's another
thread.

Thanks
Steve

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Koller, Garrett
Sent: 11 August 2011 20:32
To: Condor-Users Mail List
Subject: Re: [Condor-users] old version of condor_master keeps trying to
start every 3 minutes

Steven,

It seems that your Condor is running on Red Hat.  Red Hat uses
'chkconfig' to manage its services.  Try running 'chkconfig --list |
fgrep condor'.  If a Condor service is present (usually just called
"condor") and it is listed as "on" for any of the runlevels, the culprit
is probably this script that manages the "condor" service.  My guess is
that the said script is designed to wait three minutes and re-execute
condor_master if it exits for any reason.  Try fixing the
"/etc/init.d/condor" script to call the arguments properly, since that
is the problem in the first place (condor_master is printing the
"usage:" message because it was given an invalid argument).
Let me/us know what you find out.

Best Regards,
 - Garrett
Washington and Lee University

________________________________________
From: condor-users-bounces@xxxxxxxxxxx
[condor-users-bounces@xxxxxxxxxxx] on behalf of Todd Tannenbaum
[tannenba@xxxxxxxxxxx]
Sent: Thursday, August 11, 2011 2:14 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] old version of condor_master keeps trying to
start every 3 minutes

Dan Bradley wrote:
> Steve,
>
> It really sounds like something on your system is configured to
> periodically start the master.
>
> If you can't turn that off, I suggest the following:
>
> chmod a-x /opt/condor/sbin/condor_master
>
> --Dan

And/or replace /opt/condor/sbin/condor_master with a shell script that
sleeps for days, and then at your leisure use ps (or pstree) to see what
process is the parent to determine where/what keeps invoking the old
version.

Todd


>
> On 8/11/11 10:39 AM, Steven Platt wrote:
>> Hello,
>>
>> Taking Matts advice I'm upgrading to 7.6.2 and everything's going
well
>> ... until I had a look at my MasterLog on the master machine.
>>
>> Without any condor running (confirmed by 'ps aux | grep condor_') the
>> following pops up in the MasterLog every 3 minutes
>>
>> 8/11 16:21:44 ******************************************************
>> 8/11 16:21:44 ** condor_master (CONDOR_MASTER) STARTING UP
>> 8/11 16:21:44 ** /opt/condor/sbin/condor_master
>> 8/11 16:21:44 ** $CondorVersion: 7.0.5 Sep 20 2008 BuildID: 105846 $
>> 8/11 16:21:44 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
>> 8/11 16:21:44 ** PID = 29397
>> 8/11 16:21:44 ** Log last touched 8/11 16:18:45
>> 8/11 16:21:44 ******************************************************
>> 8/11 16:21:44 Using config source: /home/condor/condor_config
>> 8/11 16:21:44 Using local config sources:
>> 8/11 16:21:44    /opt/condor/etc/condor_config.local
>> 8/11 16:21:44 DaemonCore: Command Socket at<xxx.xxx.xxx.xx:49852>
>> 8/11 16:21:44 Usage: /opt/condor/sbin/condor_master [-f] [-t] [-n
name]
>> 8/11 16:21:44 **** condor_master (condor_MASTER) EXITING WITH STATUS
1
>>
>> With 7.6.2 successfully started it's this every 3 minutes...
>>
>> 8/11 16:06:44 ******************************************************
>> 8/11 16:06:44 ** condor_master (CONDOR_MASTER) STARTING UP
>> 8/11 16:06:44 ** /opt/condor/sbin/condor_master
>> 8/11 16:06:44 ** $CondorVersion: 7.0.5 Sep 20 2008 BuildID: 105846 $
>> 8/11 16:06:44 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
>> 8/11 16:06:44 ** PID = 28695
>> 8/11 16:06:44 ** Log last touched 8/11 16:06:13
>> 8/11 16:06:44 ******************************************************
>> 8/11 16:06:44 Using config source: /home/condor/condor_config
>> 8/11 16:06:44 Using local config sources:
>> 8/11 16:06:44    /opt/condor/etc/condor_config.local
>> 8/11 16:06:44 FileLock::obtain(1) failed - errno 11 (Resource
>> temporarily unavailable)
>> 8/11 16:06:44 ERROR "Can't get lock on
>> "/tmp/condor-lock.queen/InstanceLock"" at line 848 in file master.C
>>
>> I've trawled through configs&  crontabs and have switched off our
>> monitoring service, all to no avail.
>> It could be a non-condor problem as we're running on a Rocks cluster
>> (v5.1) that was installed by someone ~3 years ago who's since left.
>>
>> What I do know is that this is probably the cause of the problems
>> reported earlier
>>
https://lists.cs.wisc.edu/archive/condor-users/2011-August/msg00037.shtm
>> l
>>
>> Has anyone come across anything similar?
>>
>> Thanks
>> Steve
>> -----------------------------------------
>>
************************************************************************
**
>>
>> The information contained in the EMail and any attachments is
>> confidential and intended solely and for the attention and use of
>> the named addressee(s). It may not be disclosed to any other person
>> without the express authority of the HPA, or the intended
>> recipient, or both. If you are not the intended recipient, you must
>> not disclose, copy, distribute or retain this message or any part
>> of it. This footnote also confirms that this EMail has been swept
>> for computer viruses, but please re-sweep any attachments before
>> opening or saving. HTTP://www.HPA.org.uk
>>
************************************************************************
**
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/


--
Todd Tannenbaum                       University of Wisconsin-Madison
Center for High Throughput Computing  Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
-----------------------------------------
**************************************************************************
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
**************************************************************************