Re: [Condor-users] Condor 7.4.2 not working on Rocks 5.3

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

On Tue, Jul 13, 2010 at 9:00 AM, Philip Papadopoulos <philip.papadopoulos@xxxxxxxxx> wrote:

Gary,
Can you put a tar file/instructions for me to download the same thing you are doing and I will try to run on a cluster here in San Diego to see if I see the same results?

Thanks,
Phil

On Tue, Jul 13, 2010 at 8:56 AM, Gary Orser <garyorser@xxxxxxxxx> wrote:
Nope, added HOSTALLOW_READ, same symptom
on head /etc/init.d/rocks-condor restart

for i in `seq 1 100` ; do condor_submit subs/ncbi++_blastp.sub ; done

Submitting job(s).
Logging submit event(s).

1 job(s) submitted to cluster 106.
.
.
.
Logging submit event(s).
1 job(s) submitted to cluster 137.

WARNING: File /home/orser/tests/results/ncbi++_blastp.sub.137.0.err is not writable by condor.

WARNING: File /home/orser/tests/results/ncbi++_blastp.sub.137.0.out is not writable by condor.

Can't send RESCHEDULE command to condor scheduler
Submitting job(s)
ERROR: Failed to connect to local queue manager
CEDAR:6001:Failed to connect to <153.90.184.186:40031>

On Tue, Jul 13, 2010 at 9:46 AM, Gary Orser <garyorser@xxxxxxxxx> wrote:
[root@bugserv1 ~]# condor_config_val -dump | grep ALL
ALL_DEBUG =
ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
ALLOW_NEGOTIATOR = $(CONDOR_HOST)
ALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)
ALLOW_OWNER = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR)
ALLOW_READ = *
ALLOW_READ_COLLECTOR = $(ALLOW_READ), $(FLOCK_FROM)
ALLOW_READ_STARTD = $(ALLOW_READ), $(FLOCK_FROM)
ALLOW_WRITE = $(HOSTALLOW_WRITE)
ALLOW_WRITE_COLLECTOR = $(ALLOW_WRITE), $(FLOCK_FROM)
ALLOW_WRITE_STARTD = $(ALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE = bugserv1.core.montana.edu, *.local, *.local
SMALLJOB = (TARGET.ImageSize < (15 * 1024))

Looks like HOSTALLOW_READ is not set.
Is that the same as ALLOW_READ?

On Mon, Jul 12, 2010 at 6:16 PM, Mag Gam <magawake@xxxxxxxxx> wrote:

can you type,

condor_config_val -dump ?

Looks like a security issue,

http://www.cs.wisc.edu/condor/manual/v7.4/3_6Security.html#sec:Host-Security

What does HOSTALLOW_READ and HOSTALL_WRITE look like?

On Mon, Jul 12, 2010 at 1:12 PM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
> Gary,
>
> It may help to look in SchedLog to see what is happening to your
> condor_schedd.
>
> --Dan
>
> Gary Orser wrote:
>>
>> Trying sending again ...
>>
>> On Fri, Jul 9, 2010 at 10:46 AM, Gary Orser <garyorser> wrote:
>>
>> Hi all,
>>
>> I just upgraded my cluster from Rocks 5.1 to 5.3.
>> This upgraded Condor from 7.2.? to 7.4.2.
>>
>> I've got everything running, but it won't stay up.
>> (I have had the previous configuration running with condor for
>> years, done millions of hours of compute)
>>
>> I have a good repeatable test case.
>> (each job runs for a couple of minutes)
>>
>> [orser@bugserv1 tests]$ for i in `seq 1 100` ; do condor_submit
>> subs/ncbi++_blastp.sub ; done
>> Submitting job(s).
>> Logging submit event(s).
>> 1 job(s) submitted to cluster 24.
>> Submitting job(s).
>> Logging submit event(s).
>> 1 job(s) submitted to cluster 25.
>> Submitting job(s).
>> .
>> .
>> .
>> Submitting job(s).
>> Logging submit event(s).
>> 1 job(s) submitted to cluster 53.
>>
>> WARNING: File /home/orser/tests/results/ncbi++_blastp.sub.53.0.err
>> is not writable by condor.
>>
>> WARNING: File /home/orser/tests/results/ncbi++_blastp.sub.53.0.out
>> is not writable by condor.
>> Can't send RESCHEDULE command to condor scheduler
>> Submitting job(s)
>> ERROR: Failed to connect to local queue manager
>> CEDAR:6001:Failed to connect to <153.90.184.186:40026
>> <http://153.90.184.186:40026>>
>> Submitting job(s)
>> ERROR: Failed to connect to local queue manager
>> CEDAR:6001:Failed to connect to <153.90.184.186:40026
>> <http://153.90.184.186:40026>>
>> Submitting job(s)
>>
>> [orser@bugserv1 tests]$ cat subs/ncbi++_blastp.sub
>> ####################################
>> ## run distributed blast ##
>> ## Condor submit description file ##
>> ####################################
>> getenv = True
>> universe = Vanilla
>> initialdir = /home/orser/tests
>> executable = /share/bio/ncbi-blast-2.2.22+/bin/blastn
>> input = /dev/null
>> output = results/ncbi++_blastp.sub.$(Cluster).$(Process).out
>> WhenToTransferOutput = ON_EXIT_OR_EVICT
>> error = results/ncbi++_blastp.sub.$(Cluster).$(Process).err
>> log = results/ncbi++_blastp.sub.$(Cluster).$(Process).log
>> notification = Error
>>
>> arguments = "-db /share/data/db/nt -query
>> /home/orser/tests/data/gdo0001.fas -culling_limit 20 -evalue 1E-5
>> -num_descriptions 10 -num_alignments 100 -parse_deflines -show_gis
>> -outfmt 5"
>>
>> queue
>>
>> [root@bugserv1 etc]# condor_q
>>
>> -- Failed to fetch ads from: <153.90.84.186:40026
>> <http://153.90.84.186:40026>> : bugserv1.core.montana.edu
>> <http://bugserv1.core.montana.edu>
>> CEDAR:6001:Failed to connect to <153.90.184.186:40026
>> <http://153.90.184.186:40026>>
>>
>>
>> I can restart the head node with.
>> /etc/init.d/rocks-condor stop
>> rm -f /tmp/condor*/*
>> /etc/init.d/rocks-condor start
>>
>> and the jobs that got submitted do run.
>>
>> I have trawled through the archives, but haven't found anything
>> that might be useful.
>>
>> I've looked at the logs, but not finding any clues there.
>> I can provide them if that might be useful.
>>
>> The changes from a stock install, are minor.
>> (I just brought the cluster up this week)
>>
>> [root@bugserv1 etc]# diff condor_config.local
>> condor_config.local.08Jul09
>> 20c20
>> < LOCAL_DIR = /mnt/system/condor ---
>> > LOCAL_DIR = /var/opt/condor
>> 27,29c27
>> < PREEMPT = True <
>> UWCS_PREEMPTION_REQUIREMENTS = ( $(StateTimer) > (8 * $(HOUR)) && \
>> < RemoteUserPrio > SubmittorPrio * 1.2 ) || (MY.NiceUser
>> == True)
>> ---
>> > PREEMPT = False
>>
>> Just a bigger volume, and 8 hour preemption quanta.
>>
>> Ideas?
>>
>> -- Cheers, Gary
>> Systems Manager, Bioinformatics
>> Montana State University
>>
>>
>>
>>
>> --
>> Cheers, Gary
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

--
Cheers, Gary

--
Cheers, Gary

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)

Mailing List Archives

Public Access

Re: [Condor-users] Condor 7.4.2 not working on Rocks 5.3