[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Scheduling





> Date: Sun, 2 Jun 2013 18:09:24 +0100
> From: b.candler@xxxxxxxxx
> To: htcondor-users@xxxxxxxxxxx
> CC: muakrules@xxxxxxxx
> Subject: Re: [HTCondor-users] Job Scheduling
>
> On 02/06/2013 10:52, Usman Khan wrote:
> > On 06/02/2013 12:58 PM, Brian Candler wrote:
> >> On 01/06/2013 18:32, Muak rules wrote:
> >>> I just did what you asked me to do.
> >>> There is only one worker node is showing but in worker node it was
> >>> not showing queue
> >> What do you mean by "not showing queue"?
> > Means queue was not showing queue on workers node..
> And what does that mean?? What command did you type, on which machine,
> and what response did you see?

I run condor_q on workers node and that queue was empty... All i understood is that the worker will not know if job is executing on its machine......

> > How I came to know that job is running on worker machine or not if I
> > don't have any access to master node?
> The master node is where you run condor_submit to queue a job, condor_q
> to examine the queue, condor_status to look at what the worker machines
> are doing.
>
> Are you saying you don't have access to the master node? Well, it is
> possible to run these tools on one machine and ask them to query a
> different master node. But this adds extra command-line options. Also
> you would need to set up access permissions so that access from that
> other machine was allowed.
>
> If I were you, I'd start simple. One master node, a number of execute
> nodes, everyone logs into the master node to submit jobs.

will you please explain this who this will happened "everyone logs into the master node to submit jobs."

>
> > And what should I do if I want to migrate my job from one worker
> > machine to other if I'm using standard universe?
> condor doesn't, as far as I know, support any form of "live" migration.
> If you're using standard universe then you have checkpointing, so I
> suppose it's possible to terminate a job and have it restart on another
> node from that checkpoint, but I don't use standard universe so I don't
> really know (I use vanilla universe)
>
> > Will you plz help me out through this.....Thankx
> I've tried to explain this as simply as I can, but if I've failed then
> I'm sorry, I don't think I can put it any more simply than I already have.
>
> Also, please remember to reply to the list, not just to me personally.
>
> Regards,
>
> Brian.
>