Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor matchmaker and permissions problem.
- Date: Fri, 11 Dec 2015 21:30:37 +0000
- From: Zach Miller <zmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] HTCondor matchmaker and permissions problem.
Hmm. It shouldn't take 5 minutes to match. It also shouldn't be evicted after 5 seconds.
Can you open a ticket by sending email to htcondor-admin@xxxxxxxxxxx and attach your NegotiatorLog and condor_config?
I'll investigate those two issues.
Cheers,
-zach
> -----Original Message-----
> From: Sameem Zahoor [mailto:zsameem@xxxxxxxxx]
> Sent: Friday, December 11, 2015 12:24 AM
> To: Zach Miller
> Cc: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] HTCondor matchmaker and permissions problem.
>
> Hi Zach,
>
> So first about the NEGOTIATOR_INTERVAL setting. By default it was set to
> 60. I changed it to 10. Still it took 20 minutes to go from Idle to Running
> for the first run. Then it did improve. The jobs were considered within 5
> minutes or so.
>
>
> For the second Issue, all condor daemons are running under user condor.
> Checked it using ps. I changed the permissions on the directory containing
> the job submit file. Got the same permissions error.
> Finally I removed the output setting in the job file and tested that. The
> job runs fine now. I'll figure this out later.
>
> Is there any other setting that needs to be changed to increase the
> frequency of the matchmaking process?
>
> Also i need to know the setting to change the minimum time for a job to run
> before it is evicted. The default on my computer is 5 seconds. Because
> anytime a job runs for more than 5 seconds, it gets evicted and because I
> am in Vanilla universe, there is no checkpointing, and it starts over, and
> is never finished.
>
> Thanks all
>
> On Fri, Dec 11, 2015 at 1:25 AM, Zach Miller <zmiller@xxxxxxxxxxx> wrote:
>
>
> > -----Original Message-----
> > From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
> On Behalf
> > Of ap817
> > Sent: Thursday, December 10, 2015 5:12 AM
> > To: HTCondor-Users Mail List
> > Cc: Sameem Zahoor
> > Subject: Re: [HTCondor-users] HTCondor matchmaker and permissions
> problem.
> >
> > As per the permission issue, you have to make sure the script in
> the
> > condor executable is indeed executable. You can do this by
> changing file
> > permissions with chmod 777 (filename).
>
> You should _not_ make your executable world writable. Someone else
> on the system could potentially change the script to do something nasty,
> and then you would execute it. You can use 'chmod 755' instead. However,
> that's not the real issue here either, just thought I would clarify.
>
>
> > On 10.12.2015 11:02, Sameem Zahoor wrote:
> > > Hi all,
> > >
> > > I am working on a Policy for HTCondor setup for a cluster at
> ICTS [1].
> > > I have installed and configured condor on my laptop for now and
> I am
> > > facing two issues.
> > >
> > > 1. CONDOR MATCHMAKER:
> > >
> > > When I submit a job, it sits idle for a long time (around 20
> mins). On
> > > doing _condor_q - analyze _the result says:
> > >
> > > " 005.000: Request has not yet been considered by the
> matchmaker"
> > >
> > > Now I want to know how to increase the rate at which the
> _matchmaker
> > > _looks for the jobs in the queue.
>
> In your condor_config, you can set NEGOTIATOR_INTERVAL to something
> like 15 seconds and jobs will be considered more often.
>
> You can also run the command "condor_reschedule" to hint to the
> matchmaker to try again.
>
>
> > > 2. PERMISSION ISSUE:
> > >
> > > Eventually after 20 minutes or so when the job starts, it goes
> into
> > > the HOLD state, with the following error in the logs
> > >
> > > 012 (005.000.000) 12/10 15:55:33 Job was held.
> > > Error from slot2@sameem-Ideapad: Failed to open
> > > '/home/sameem/Desktop/condor/testjob/outputfile' as standard
> output:
> > > Permission denied (errno 13)
> > > Code 7 Subcode 13
>
> This isn't related to the executable file, but rather HTCondor is
> trying to open that file to store your stdout for the job. Presumably you
> specified this by setting "output" in your job submit file. Does that path
> exist, and is writable by you? Outside of HTCondor, try:
> touch /home/sameem/Desktop/condor/testjob/outputfile
>
> and if that doesn't work try to figure out why. Maybe the ownership
> or permissions are incorrect on the parent directories. It might also
> depend on what user you are running HTCondor as (root? condor? yourself?)
>
>
> Cheers,
> -zach
>
>
>