[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor matchmaker and permissions problem.



Hi Zach,

So first about the NEGOTIATOR_INTERVAL setting. By default it was set to 60. I changed it to 10. Still it took 20 minutes to go from Idle to Running for the first run. Then it did improve. The jobs were considered within 5 minutes or so.


For the second Issue, all condor daemons are running under user condor. Checked it using ps. I changed the permissions on the directory containing the job submit file. Got the same permissions error.
Finally I removed the output setting in the job file and tested that. The job runs fine now. I'll figure this out later.Â

Is there any other setting that needs to be changed to increase the frequency of the matchmaking process?

Also i need to know the setting to change the minimum time for a job to run before it is evicted. The default on my computer is 5 seconds. Because anytime a job runs for more than 5 seconds, it gets evicted and because I am in Vanilla universe, there is no checkpointing, and it starts over, and is never finished.

Thanks all

On Fri, Dec 11, 2015 at 1:25 AM, Zach Miller <zmiller@xxxxxxxxxxx> wrote:
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of ap817
> Sent: Thursday, December 10, 2015 5:12 AM
> To: HTCondor-Users Mail List
> Cc: Sameem Zahoor
> Subject: Re: [HTCondor-users] HTCondor matchmaker and permissions problem.
>
> As per the permission issue, you have to make sure the script in the
> condor executable is indeed executable. You can do this by changing file
> permissions with chmod 777 (filename).

You should _not_ make your executable world writable. Someone else on the system could potentially change the script to do something nasty, and then you would execute it. You can use 'chmod 755' instead. However, that's not the real issue here either, just thought I would clarify.


> On 10.12.2015 11:02, Sameem Zahoor wrote:
> > Hi all,
> >
> > I am working on a Policy for HTCondor setup for a cluster at ICTS [1].
> > I have installed and configured condor on my laptop for now and I am
> > facing two issues.
> >
> > 1. CONDOR MATCHMAKER:
> >
> > When I submit a job, it sits idle for a long time (around 20 mins). On
> > doing _condor_q - analyze _the result says:
> >
> >Â " 005.000:Â Request has not yet been considered by the matchmaker"
> >
> > Now I want to know how to increase the rate at which the _matchmaker
> > _looks for the jobs in the queue.

In your condor_config, you can set NEGOTIATOR_INTERVAL to something like 15 seconds and jobs will be considered more often.

You can also run the command "condor_reschedule" to hint to the matchmaker to try again.


> > 2. PERMISSION ISSUE:
> >
> > Eventually after 20 minutes or so when the job starts, it goes into
> > the HOLD state, with the following error in the logs
> >
> > 012 (005.000.000) 12/10 15:55:33 Job was held.
> >Â Error from slot2@sameem-Ideapad: Failed to open
> > '/home/sameem/Desktop/condor/testjob/outputfile' as standard output:
> > Permission denied (errno 13)
> >Â Code 7 Subcode 13

This isn't related to the executable file, but rather HTCondor is trying to open that file to store your stdout for the job. Presumably you specified this by setting "output" in your job submit file. Does that path exist, and is writable by you? Outside of HTCondor, try:
 touch /home/sameem/Desktop/condor/testjob/outputfile

and if that doesn't work try to figure out why. Maybe the ownership or permissions are incorrect on the parent directories. It might also depend on what user you are running HTCondor as (root? condor? yourself?)


Cheers,
-zach