[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor-users Digest, Vol 20, Issue 7



Thank you, BrianB for the guidance re Schedd! I appreciate it!


On Jul 9, 2015, at 8:33 PM, htcondor-users-request@xxxxxxxxxxx wrote:

Send HTCondor-users mailing list submissions to
htcondor-users@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
or, via email, send a message with subject or body 'help' to
htcondor-users-request@xxxxxxxxxxx

You can reach the person managing the list at
htcondor-users-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of HTCondor-users digest..."


Today's Topics:

  1. Re: Submit fails - RHEL 7 in NIS environment and FS
     authentication (Klint Gore)
  2. Re: Question--Is it feasible to fix SCHEDD IP Address/Port?
     (Brian Bockelman)
  3. Re: not starting jobs in condor ver 8.3.6 (Todd L Miller)
  4. Re: not starting jobs in condor ver 8.3.6 (Jan Balewski)
  5. Condor installation issues (Siddharth Srivastava)


----------------------------------------------------------------------

Message: 1
Date: Thu, 09 Jul 2015 00:19:54 +0000
From: Klint Gore <kgore4@xxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Submit fails - RHEL 7 in NIS environment
and FS authentication
Message-ID:
<b1da62d7c39646a3b9a77447e1ac2249@xxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; CHARSET=US-ASCII

Can I add a "me too" to that?  I've got new servers arriving today that are going to get centos7.

Klint.

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Feldt, Andrew N.
Sent: Thursday, 9 July 2015 6:10 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] Submit fails - RHEL 7 in NIS environment and FS authentication

Folks,

Ok, the NIS part turned out to be a red herring.  The real issue is that Condor 8.2.8 does not work on RHEL 7 with SELinux enabled (verified by disabling SELinux temporarily where FS Authentication works just fine).  I see that there is a ticket for this already at:

https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5012

so I will put my RHEL 7 upgrade plans on hold until this is resolved (we have committed to run SELinux and have been doing so ever since RHEL 6 came out).

Andy


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


------------------------------

Message: 2
Date: Wed, 08 Jul 2015 21:25:54 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Question--Is it feasible to fix SCHEDD
IP Address/Port?
Message-ID: <91DBA451-D6D4-4536-8DE2-1034B976DA34@xxxxxxxxxxx>
Content-Type: text/plain; charset=utf-8

Hi Tom,

You can achieve this by tweaking the schedd?s command line arguments:

SCHEDD_ARGS = -p 8080

(found this here: http://ben.versionzero.org/wiki/Condor_SOAP_Interface)

On newer versions of HTCondor, this should work:

SCHEDD_PORT = 8080

In future versions of HTCondor, the plan is to default all daemons to 9618 (and share the same port when there are multiple daemons on a given host).

Brian

On Jul 8, 2015, at 6:02 PM, Thomas B Winans <tbw@xxxxxxxxxxxx> wrote:

My goal is to have a fixed combination for web service invocation for which no query or file lookup is necessary? I see that SCHEDD_ADDRESS_FILE can be written to, but this is not my goal...

Many thanks,
Tom

Tom Winans
tbw@xxxxxxxxxxxx
http://tomwinans.info





_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




------------------------------

Message: 3
Date: Thu, 09 Jul 2015 14:27:40 -0500 (CDT)
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] not starting jobs in condor ver 8.3.6
Message-ID: <alpine.DEB.2.02.1507091407060.23398@azaphrael>
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; format=flowed

I hope you can figure out what I need to change to my my condor jobs to
start Thanks for looking in to my issue

From the negotiator log:

07/02/15 10:45:37 Phase 4.1:  Negotiating with schedds ...
07/02/15 10:45:37   Negotiating with cosy11@xxxxxxxxxxx at
<198.125.163.121:9601?addrs=10.200.60.19-9601&noUDP>
07/02/15 10:45:37 0 seconds so far
07/02/15 10:45:37 SECMAN: FAILED: Received "DENIED" from server for user
unauthenticated@unmapped using method (no authentication).
07/02/15 10:45:37 ERROR: SECMAN:2010:Received "DENIED" from server for
user unauthenticated@unmapped using method (no authentication).
07/02/15 10:45:37     Failed to send NEGOTIATE command to
cosy11@xxxxxxxxxxx (<198.125.163.121:9601?addrs=10.200.60.19-9601&noUDP>)
07/02/15 10:45:37   Error: Ignoring submitter for this cycle

The schedd says:

07/02/15 10:45:37 (pid:1818) PERMISSION DENIED to unauthenticated@unmapped
from host 10.200.60.19 for command 416 (NEGOTIATE), access level
NEGOTIATOR: reason: NEGOTIATOR authorization policy contains no matching
ALLOW entry for this request; identifiers used for this host:
10.200.60.19,cond-b-7212ad3c-320d-4981-9d09-b535053abab8.novalocal,
hostname size = 1, original ip address = 10.200.60.19 07/02/15 10:45:37
(pid:1818) DC_AUTHENTICATE: Command not authorized, done!

Which is what I was trying to fix with my earlier suggestion (B);
sorry that I asked you to turn the wrong knobs.  Try adding
ALLOW_NEGOTIATOR and ALLOW_NEGOTIATOR_SCHEDD to your config, setting each
to 10.200.60.*, or the specific machine hosting the negotiator
(10.200.60.19).

Note that HTCondor thinks that 10.200.60.19 is
cond-b-7212ad3c-320d-4981-9d09-b535053abab8.novalocal, not
oswrk121.lns.mit.edu.  This makes sense, because (at least for me)
oswrk121.lns.mit.edu looks up as 198.125.163.121.


I'm pretty sure what's going on is a disagreement between
different parts of HTCondor about how to handle TCP_FORWARDING_HOST.
If you look at the string the negotiator said it was trying to connect to,

<198.125.163.121:9601?addrs=10.200.60.19-9601&noUDP>

you can see that the first and second IP addresses are not the same; the
first is the TCP_FORWARDING_HOST and the second is the IP address that
HTCondor found on the machine.

- ToddM


------------------------------

Message: 4
Date: Thu, 09 Jul 2015 15:54:32 -0400
From: Jan Balewski <janstar1122@xxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] not starting jobs in condor ver 8.3.6
Message-ID: <AF43A16A-E733-4C43-922C-92EF823B5EFB@xxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"


On Jul 9, 2015, at 3:27 PM, Todd L Miller <tlmiller@xxxxxxxxxxx> wrote:

Try adding ALLOW_NEGOTIATOR and ALLOW_NEGOTIATOR_SCHEDD to your config, setting each to 10.200.60.*, or the specific machine hosting the negotiator (10.200.60.19).

Thanks Todd,
I did it and ? still jobs do not start.

This is new content of the relevant part of  /etc/condor/condor_config.local
###############################################################################
# Security settings
###############################################################################
# Allow local host and the central manager to manage the node
ALLOW_ADMINISTRATOR = $(FULL_HOSTNAME), $(CONDOR_HOST)
# master needs this two particular versions
ALLOW_READ = *.lns.mit.edu,10.200.60.*
ALLOW_WRITE = *.lns.mit.edu,10.200.60.*
# Fix to version 8.3.6 suggested by ToddL
ALLOW_NEGOTIATOR = 10.200.60.*
ALLOW_NEGOTIATOR_SCHEDD = 10.200.60.*
###############################################################################

At 15:38  I made this change on IP=121 which is condor master and has 6 job-slots open.

I executed service condor restart
next submitted 12 jobs - all are idle, despite 14 jobs slots are open.

At 15:42 I made similar change on IP=122, which is just condor worker node. I did not expected it changes anything since the change on IP=121 did not helped.

To be absolutely sure I rebooted few minutes later both VMs and verified again, condor jobs do not start, despite 6+8 jobs slots are open.

Perhaps  you can find time to look again in to the log files, which are copied  from both VMs :
$ scp -rp root@xxxxxxxxxxxxxxxxxxxx:/var/log/condor condor-122
$ scp -rp root@xxxxxxxxxxxxxxxxxxxx:/var/log/condor condor-121
and posted them here:
https://www.dropbox.com/sh/8z7pxbdc5j4yh43/AADh4J3WjukpJKsX55lsTMuva?dl=0

Thanks for looking in to it
Jan


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www-auth.cs.wisc.edu/lists/htcondor-users/attachments/20150709/91866ef3/attachment.html>

------------------------------

Message: 5
Date: Thu, 09 Jul 2015 19:32:50 -0700
From: Siddharth Srivastava <siddys@xxxxxxxxx>
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Condor installation issues
Message-ID:
<CAAMWsuCN2C8z1fqR6urqYwLbGwNNv2DLqrtNj6AD_dqwitVt+Q@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

Hi All,
    I have been working on getting a Condor installation up and running
in a 16 core Ubuntu 14.04, with an intended use to speedup some computations
in FSL, a widely used set of tools for neuroimaging. I installed condor
using the
apt-get install utility
This is where I am:
1) condor_status shows me 16 slots, with slot1@machine_name,
slot2@machine_name etc.

Name               OpSys      Arch   State     Activity LoadAv Mem
ActvtyTime

slot10@siddys-HP-Z LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:15
slot11@siddys-HP-Z LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:16
slot12@siddys-HP-Z LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:17
slot13@siddys-HP-Z LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:18
slot14@siddys-HP-Z LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:19
slot15@siddys-HP-Z LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:20
slot16@siddys-HP-Z LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:50:13
slot1@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:20:51
slot2@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.290 4020
0+09:45:15
slot3@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:16
slot4@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:17
slot5@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:18
slot6@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:19
slot7@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:20
slot8@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:50:13
slot9@siddys-HP-Z4 LINUX      X86_64 Unclaimed Idle      0.000 4020
0+09:45:14
                    Total Owner Claimed Unclaimed Matched Preempting
Backfill

       X86_64/LINUX    16     0       0        16       0          0
0

              Total    16     0       0        16       0          0
0
-----------------------------------------------------------------
My question for this stage is: does this look like a setup in which I can
submit 16 jobs in parallel?

2) This is a personal installation, where everything is on the local
machine, and I am trying to mazimize the utility of the 16 cores. Do I have
to do any special settings during the installation? apt-get did not give me
any options. I have been so some web sites that say that one generally gets
an interactive curses like window to perform some basic settings during
installation, but this was not my experience when I was installing.
3) My machine has one of those private addresses, and even though I am
under a domain name, the machine itself is not discoverable on the
internet/intranet. Are there any special config file directives that I must
be setting in this scenario. I do see some errors like the following in the
MasterLog (the xxx's are mine )

PERMISSION DENIED to unauthenticated@unmapped from host 10.xxx.xxx.xxx for
command 454 (DAEMONS_OFF), access level ADMINISTRATOR: reason:
ADMINISTRATOR authorization policy contains no matching ALLOW entry for
this request; identifiers used for this host: 10.xxx.xxx.xxx,
siddys-hp-z440-workstation.xxx.xxx.xxx.org, hostname size = 1, original ip
address = 10.xxx.xxx.xxx

I sincerely hope that someone can help with this issue.
Thanks,
Sid.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www-auth.cs.wisc.edu/lists/htcondor-users/attachments/20150709/4850b2d9/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
HTCondor-users mailing list
HTCondor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

------------------------------

End of HTCondor-users Digest, Vol 20, Issue 7
*********************************************