[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Submit from (linux or Windows) to a Linux Manager FAILED with "Failed rank condition: MY.Rank > MY.CurrentRank"



Hi Dan, 

Running the command "condor_status -format "%s\n" rank" on the three station return the following:
0.00000
0.00000

Nevertheless, I investigated further: 

There is an other station named station4 (windows XP) which can submit. 

Scenario1:
Station4 (XP machine) does a condor_submit of script having UNMATCHED requirements (Arch and OpSys don't match the executer machine). Since there is no Matching, the job stays in the Queue with IDLE state. (because Arch and OpSys didn't match)
     From NegotiatorLog:  Rejected 2.0 John.doe@Station4 <10.171.2.152:1112>: no match found

Station1(Linux submit) does a condor_submit of a correct script. There is a matching but I got the message "Failed rank condition: MY.Rank > MY.CurrentRank"
From NegotiatorLog:  Matched 7.0 condor@XXXXXX <10.171.2.228:6993> preempting none <10.171.2.4:53957> slot1@station2

On both station1 and 4, I clean all queue doing "condor_rm XX"

Station1(Linux submit) does a condor_submit of a correct script. The job is RUNNING :) 
   From NegotiatorLog:  Matched 46.0 J-Chris@XXXXX <10.171.2.162:1171> preempting none <10.171.2.4:53957> slot1@station2

Scenario2:
All queue are empty

Station3 (windows XP) does a condor_submit of script having MATCHED requirements (Arch = X86_64 and OpSys= Linux). Beside the fact all queue are empty the job stays in IDLE state with "Failed rank condition: MY.Rank > MY.CurrentRank"
   Matched 47.0 J-Chris@XXXXXX <10.171.2.162:1171> preempting none <10.171.2.4:53957> slot1@station2


Thanks for your help
Jc

On Thu, Feb 19, 2009 at 11:33 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
What rank expressions are configured for your execute machines?

condor_status -format "%s\n" rank

Also, in the negotiator log, do you see it trying to negotiate with the
submit machine with the idle job?  If so, what does it say?

--Dan

Jean-Christophe Fillion-Robin wrote:
> Hi Folks,
>
> I setup a simple infrastructure having three stations:
>  - Station1: Ubuntu 8.04.2 / Kernel 2.6.24-17-generic / i686
>  - Station2: Ubuntu 8.04.2 / Kernel 2.6.24-23-generic / Intel 64
> (bi-processors)
>  - Station3: Windows XP Pro SP3 / Intel core Duo
>
> I installed the following respective package:
>  - condor-7.2.0-linux-x86-rhel5-dynamic.tar.gz
>  - condor-7.2.0-linux-x86_64-rhel5-dynamic.tar.gz
>  - condor-7.2.0-winnt50-x86.msi
>
> On all installation, the security has been setup using the following:
>
> SEC_DEFAULT_AUTHENTICATION = NEVER
> SEC_DEFAULT_ENCRYPTION = NEVER
> SEC_DEFAULT_INTEGRITY = NEVER
> SEC_DEFAULT_NEGOTIATION = REQUIRED
> QUEUE_ALL_USERS_TRUSTED = True
>
> All station can run 'condor_status'
>
> Only station1 managed to submit successfully the condor script
> described below. Both station1 and station3 managed to submit the job
> but this one stays in the queue with "Idle" state forever.
>
> Running the command 'condor_q -l -analyse', I obtained the following
> error message on both station: 'Failed rank condition: MY.Rank >
> MY.CurrentRank'
>
> Condor Script
> ---------------------------------------
> executable=script_to_run2.sh
> universe=vanilla
> arguments=Example.$(Cluster).$(Process) 100
> output=results.output.$(Process)
> error=results.error.$(Process)
> log=results.log
> notification=never
> Requirements = TARGET.UidDomain == "XXXXXX.XXX" && \
>                TARGET.FileSystemDomain == "XXXXXX.XXX" && \
>                TARGET.Arch =="X86_64" && TARGET.OpSys == "LINUX"
> should_transfer_files=YES
> when_to_transfer_output = ON_EXIT
> queue
> ---------------------------------------
>
> The shell script associated
> ---------------------------------------
> #! /bin/sh
>
> echo "I'm process id $$ on" `hostname`
> echo "This is sent to standard error" 1>&2
> date
> echo "Running as binary $0" "$@"
> echo "My name (argument 1) is $1"
> echo "My sleep duration (argument 2) is $2"
> sleep $2
> echo "Sleep of $2 seconds finished.  Exiting"
> exit 42
> ---------------------------------------
>
> Nota: It seems there is no suspicious message in the different log file.
>
> I would appreciate any hints regarding the possible cause of
> the  'Failed rank condition: MY.Rank > MY.CurrentRank' error.
>
> Thanks for you help
> J-Chris
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
Phone: +1 (518) 371-3971 x304