[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Bug



hello all,


after all we ad to change the scheduler. this mens that we are not using condor
anymore as ondor is not compatible with the current release of mpich.  This is
also to point out that the latest distribution of ROCKS cluster we are using is
still providing condor scheduler that sistill not compatible (according with the
manual) with the lates mpich. 
his is kind of strange, especially when the older versions of mpich are not so
easy to find. 


regards

martin lukac

Quoting Erik Paulson <epaulson@xxxxxxxxxxx>:

> On Mon, Oct 18, 2004 at 11:41:29AM -0700, lukacm@xxxxxxx wrote:
> > Hello,
> > 
> > can someone help me to find out how can i debug the fact that my MPi apps
> is
> > going directly to Idle state no matter the configuration of the Cluster
> with
> > Condor? There is no log and the only diference between a succesful Vanilla
> job
> > and a Idel MPi job is from the NegotiatorLog:
> > for Vanilla i have: 
> > Phase 4.1:  Negotiating with schedds ...
> > 10/18 18:36:01   Negotiating with selvan@local at <10.1.1.1:41547>
> > 10/18 18:36:01     Request 00443.00000:
> > 10/18 18:36:01       Matched 443.0 selvan@local <10.1.1.1:41547> preempting
> none
> > <10.255.255.254:32810>
> > 10/18 18:36:01       Successfully matched with compute-0-0.local
> > 10/18 18:36:01     Got NO_MORE_JOBS;  done negotiating
> > 
> > however for MPi the log stops 
> > Phase 4.1:  Negotiating with schedds ...
> > 
> > all machines are unclaimed. 
> > 
> > thank you
> > 
> > martin lukac
> 
> What does the schedd log say when you submit a job?
> 
> -Erik
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
>