[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Bug



Hello,

the SchedLog says this:
10/18 19:02:12 Using config file: /opt/condor/etc/condor_config
10/18 19:02:12 Using local config files:
/opt/condor/local.frontend-0/condor_config.local
10/18 19:02:12 DaemonCore: Command Socket at <10.1.1.1:42769>
10/18 19:02:12 "/opt/condor/sbin/condor_shadow.pvm -classad" did not produce any
output, ignoring
10/19 03:02:12 DaemonCore: Command received via UDP from host <10.1.1.1:33185>
10/19 03:02:12 DaemonCore: received command 60014 (DC_INVALIDATE_KEY), calling
handler (handle_invalidate_key())
10/19 15:00:27 DaemonCore: Command received via UDP from host <10.1.1.1:33242>
10/19 15:00:27 DaemonCore: received command 421 (RESCHEDULE), calling handler
(reschedule_negotiator)
10/19 15:00:28 Sent ad to central manager for selvan@local
10/19 15:00:28 Called reschedule_negotiator()

The problem with all this is that i cannot find any valuable information why the
MPI job does not want to start. All activity seems to freeze somewhere with MPI
while Vanilla is working pretty well.

thank you

martin

Quoting Erik Paulson <epaulson@xxxxxxxxxxx>:

> On Mon, Oct 18, 2004 at 11:41:29AM -0700, lukacm@xxxxxxx wrote:
> > Hello,
> > 
> > can someone help me to find out how can i debug the fact that my MPi apps
> is
> > going directly to Idle state no matter the configuration of the Cluster
> with
> > Condor? There is no log and the only diference between a succesful Vanilla
> job
> > and a Idel MPi job is from the NegotiatorLog:
> > for Vanilla i have: 
> > Phase 4.1:  Negotiating with schedds ...
> > 10/18 18:36:01   Negotiating with selvan@local at <10.1.1.1:41547>
> > 10/18 18:36:01     Request 00443.00000:
> > 10/18 18:36:01       Matched 443.0 selvan@local <10.1.1.1:41547> preempting
> none
> > <10.255.255.254:32810>
> > 10/18 18:36:01       Successfully matched with compute-0-0.local
> > 10/18 18:36:01     Got NO_MORE_JOBS;  done negotiating
> > 
> > however for MPi the log stops 
> > Phase 4.1:  Negotiating with schedds ...
> > 
> > all machines are unclaimed. 
> > 
> > thank you
> > 
> > martin lukac
> 
> What does the schedd log say when you submit a job?
> 
> -Erik
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
>