[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] No Starter found to Run



I've seen this, when I edited the condor_config file and erroneously deleted the variable specifying the condor_starter binary. Maybe checking whether your condor_config points to the correct starter binary is a good idea.

Am 18.06.2013 18:39 schrieb "Vishal Shah" <vishal.b.shah@xxxxxxxxxxx>:
Steve,

I changed the logging to D_FULLDEBUG and the following is what I found in the StartLog:

06/18/13 16:32:01 slot1: Received match <10.144.6.164:9226>#1371570586#147#...
06/18/13 16:32:01 slot1: Started match timer (605) for 120 seconds.
06/18/13 16:32:01 slot1: State change: match notification protocol successful
06/18/13 16:32:01 slot1: Changing state: Unclaimed -> Matched
06/18/13 16:32:01 slot1: Canceled match timer (605)
06/18/13 16:32:01 slot1: Schedd addr = <10.144.6.174:9943>
06/18/13 16:32:01 slot1: Alive interval = 300
06/18/13 16:32:01 slot1: Received ClaimId from schedd (<10.144.6.164:9226>#1371570586#147#...)
06/18/13 16:32:01 slot1: No starter found to run this job!  Is something wrong with your Condor installation?
06/18/13 16:32:01 slot1: Request to claim resource refused.
06/18/13 16:32:01 slot1: State change: claiming protocol failed
06/18/13 16:32:01 slot1: Changing state: Matched -> Owner
06/18/13 16:32:01 slot1: State change: IS_OWNER is false
06/18/13 16:32:01 slot1: Changing state: Owner -> Unclaimed
06/18/13 16:32:02 Getting monitoring info for pid 23777
06/18/13 16:32:03 Getting monitoring info for pid 23777
06/18/13 16:32:04 Getting monitoring info for pid 23777
06/18/13 16:32:05 Publishing ClassAd for 'mips' to slot 1
06/18/13 16:32:05 Publishing ClassAd for 'kflops' to slot 1
06/18/13 16:32:05 Trying to update collector <10.144.6.174:9618>
06/18/13 16:32:05 Attempting to send update via UDP to collector mrg_node.espc.nesdis.noaa.gov <10.144.6.174:9618>
06/18/13 16:32:05 slot1: Sent update to 1 collector(s)

There does not seem to be any more pertaining information regarding the issue; however, you may see something that I do not. One interesting thing to note is that the StarterLog is empty.

Thanks,
Vishal



Date: Tue, 18 Jun 2013 09:28:03 -0500
From: timm@xxxxxxxx
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] No Starter found to Run

Look on the compute nodes at the StartLog, that will give
you some more clues.. If you don't see anything
set the logging of the startd and starter to D_FULLDEBUG.
I've never seen this problem happen before but it should be possible
to figure out with the right debug level.

Steve Timm


On Tue, 18 Jun 2013, Vishal Shah wrote:

> Hello,
>
> I am having an issue configuring condor to run jobs. Currently there is a
> node that acts as a head node from which the jobs are submitted and where
> all of the nodes in the compute unit can be seen; however, when submitting
> a job, the logs in the head node indicate that the job has been submitted,
> but the job does not run. The queue shows that the job has been submitted;
> however the state is perpetually pending. The following is a snippet of
> the StartLog on the compute node:
>
> 06/18/13 14:07:19 slot1: Received match
> <10.144.6.164:9532>#1371557953#351#...
> 06/18/13 14:07:19 slot1: State change: match notification protocol
> successful
> 06/18/13 14:07:19 slot1: Changing state: Unclaimed -> Matched
> 06/18/13 14:07:19 slot1: No starter found to run this job!  Is something
> wrong with your Condor installation?
> 06/18/13 14:07:19 slot1: Request to claim resource refused.
> 06/18/13 14:07:19 slot1: State change: claiming protocol failed
> 06/18/13 14:07:19 slot1: Changing state: Matched -> Owner
> 06/18/13 14:07:19 slot1: State change: IS_OWNER is false
> 06/18/13 14:07:19 slot1: Changing state: Owner -> Unclaimed
>
> Does anybody have insight into this issue?
>
> Thanks,
> Vishal
>
>

------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm@xxxxxxxx http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.

_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/