[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor matchmaker and permissions problem.



Hmm.  It shouldn't take 5 minutes to match.  It also shouldn't be evicted after 5 seconds.

Can you open a ticket by sending email to htcondor-admin@xxxxxxxxxxx and attach your NegotiatorLog and condor_config?

I'll investigate those two issues.


Cheers,
-zach


> -----Original Message-----
> From: Sameem Zahoor [mailto:zsameem@xxxxxxxxx]
> Sent: Friday, December 11, 2015 12:24 AM
> To: Zach Miller
> Cc: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] HTCondor matchmaker and permissions problem.
> 
> Hi Zach,
> 
> So first about the NEGOTIATOR_INTERVAL setting. By default it was set to
> 60. I changed it to 10. Still it took 20 minutes to go from Idle to Running
> for the first run. Then it did improve. The jobs were considered within 5
> minutes or so.
> 
> 
> For the second Issue, all condor daemons are running under user condor.
> Checked it using ps. I changed the permissions on the directory containing
> the job submit file. Got the same permissions error.
> Finally I removed the output setting in the job file and tested that. The
> job runs fine now. I'll figure this out later.
> 
> Is there any other setting that needs to be changed to increase the
> frequency of the matchmaking process?
> 
> Also i need to know the setting to change the minimum time for a job to run
> before it is evicted. The default on my computer is 5 seconds. Because
> anytime a job runs for more than 5 seconds, it gets evicted and because I
> am in Vanilla universe, there is no checkpointing, and it starts over, and
> is never finished.
> 
> Thanks all
> 
> On Fri, Dec 11, 2015 at 1:25 AM, Zach Miller <zmiller@xxxxxxxxxxx> wrote:
> 
> 
> 	> -----Original Message-----
> 	> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
> On Behalf
> 	> Of ap817
> 	> Sent: Thursday, December 10, 2015 5:12 AM
> 	> To: HTCondor-Users Mail List
> 	> Cc: Sameem Zahoor
> 	> Subject: Re: [HTCondor-users] HTCondor matchmaker and permissions
> problem.
> 	>
> 	> As per the permission issue, you have to make sure the script in
> the
> 	> condor executable is indeed executable. You can do this by
> changing file
> 	> permissions with chmod 777 (filename).
> 
> 	You should _not_ make your executable world writable.  Someone else
> on the system could potentially change the script to do something nasty,
> and then you would execute it.  You can use 'chmod 755' instead.  However,
> that's not the real issue here either, just thought I would clarify.
> 
> 
> 	> On 10.12.2015 11:02, Sameem Zahoor wrote:
> 	> > Hi all,
> 	> >
> 	> > I am working on a Policy for HTCondor setup for a cluster at
> ICTS [1].
> 	> > I have installed and configured condor on my laptop for now and
> I am
> 	> > facing two issues.
> 	> >
> 	> > 1. CONDOR MATCHMAKER:
> 	> >
> 	> > When I submit a job, it sits idle for a long time (around 20
> mins). On
> 	> > doing _condor_q - analyze _the result says:
> 	> >
> 	> >  " 005.000:  Request has not yet been considered by the
> matchmaker"
> 	> >
> 	> > Now I want to know how to increase the rate at which the
> _matchmaker
> 	> > _looks for the jobs in the queue.
> 
> 	In your condor_config, you can set NEGOTIATOR_INTERVAL to something
> like 15 seconds and jobs will be considered more often.
> 
> 	You can also run the command "condor_reschedule" to hint to the
> matchmaker to try again.
> 
> 
> 	> > 2. PERMISSION ISSUE:
> 	> >
> 	> > Eventually after 20 minutes or so when the job starts, it goes
> into
> 	> > the HOLD state, with the following error in the logs
> 	> >
> 	> > 012 (005.000.000) 12/10 15:55:33 Job was held.
> 	> >  Error from slot2@sameem-Ideapad: Failed to open
> 	> > '/home/sameem/Desktop/condor/testjob/outputfile' as standard
> output:
> 	> > Permission denied (errno 13)
> 	> >  Code 7 Subcode 13
> 
> 	This isn't related to the executable file, but rather HTCondor is
> trying to open that file to store your stdout for the job.  Presumably you
> specified this by setting "output" in your job submit file.  Does that path
> exist, and is writable by you?  Outside of HTCondor, try:
> 	  touch /home/sameem/Desktop/condor/testjob/outputfile
> 
> 	and if that doesn't work try to figure out why.  Maybe the ownership
> or permissions are incorrect on the parent directories.  It might also
> depend on what user you are running HTCondor as (root?  condor?  yourself?)
> 
> 
> 	Cheers,
> 	-zach
> 
> 
>