[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Issue introduced in 7.2.3 (Windows) andtransfer_output_remaps



The start log for the vm has nothing in it related to these unsuccessful runs , as the job never goes to the run state. When I remove the macro from transfer_output_remaps - there is a log of the successful job run of course.


On May 22, 2009, at 4:26 AM, Matt Hope wrote:

Ah ok - how about the starter log for that vm?

-----Original Message-----
From: Andrew Cunningham [mailto:andrewc@xxxxxxx]
Sent: 21 May 2009 20:32
To: Condor-Users Mail List
Cc: Matt Hope
Subject: Re: [Condor-users] Issue introduced in 7.2.3 (Windows)
andtransfer_output_remaps



I changed the requirements to Name="slot1@machine_name" so that I
forced it to run on a particular machine/slot ( this is not a problem
particular to one machine)
Just adding the $${macro) to the transfer_output_remaps causes the
machine to just alternate between "owner" "unclaimed" and "matched" ,
and the job is forever idle.
The "StartLog" shows this below.
The pool master's NegotiatorLog has nothing interesting , it just says
"matched" as one would expect.

It also seems to leave condor on this machine in an state where a
reconfig is necessary


5/21 12:11:15 slot1: match_info called
5/21 12:11:15 slot1: Received match
<192.168.0.196:1301>#1242932534#4#...
5/21 12:11:15 slot1: State change: match notification protocol
successful
5/21 12:11:15 slot1: Changing state: Unclaimed -> Matched
5/21 12:13:15 slot1: State change: match timed out
5/21 12:13:15 slot1: Changing state: Matched -> Owner
5/21 12:13:15 slot1: State change: IS_OWNER is false
5/21 12:13:15 slot1: Changing state: Owner -> Unclaimed
5/21 12:16:15 slot1: match_info called
5/21 12:16:15 slot1: Received match
<192.168.0.196:1301>#1242932534#6#...
5/21 12:16:15 slot1: State change: match notification protocol
successful
5/21 12:16:15 slot1: Changing state: Unclaimed -> Matched
5/21 12:18:15 slot1: State change: match timed out
5/21 12:18:15 slot1: Changing state: Matched -> Owner
5/21 12:18:15 slot1: State change: IS_OWNER is false
5/21 12:18:15 slot1: Changing state: Owner -> Unclaimed




On May 21, 2009, at 1:36 AM, Matt Hope wrote:

Check the negotiator and schedd logs, anything of interest in there?

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Andrew
Cunningham
Sent: 20 May 2009 23:47
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] Issue introduced in 7.2.3 (Windows)
andtransfer_output_remaps


We are running a Windows only grid of 4 machines and we are seeing a
very strange problem with 7.2.3 that seems to have been introduced
since
6.9.3 ( we upgraded all machines to 7.2.3)

Our vanilla universe .sub file has a line

transfer_output_remaps = "VA14.va1=VA14_$$(arch)_$$(opsys).va1"
This worked as expected under 6.9.3

Now under 7.2.3 when we submit the job it is correctly matched with a
machine, but never goes into the run state, it is rejected for
'unknown reasons' and stays in the idle state.

condor_q -analyze
The response is one I have never seen....
-- Failed to fetch ads from: <192.168.0.11:2638> : CARDIFF CEDAR:6001:Failed to connect to <192.168.0.11:2638>

( The machine CARDIFF is the machine that submitted the job)

If we then change the line to , say,
transfer_output_remaps = "VA14.va1=VA14_INTEL_WINNT51.va1"
or remove transfer_output_remaps completely

and submit the job again, the job immediate is matched , goes into the

run state, and completes as expected.

(condor_q -analyze does not report any errors, as expected)

This is 100% reproducible, and as I said, has only emerged as a
problem after upgrading all machines to 7.2.3. I can only conclude
some strange bug introduced with Macro substitution.

Andrew







_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

----
Gloucester Research Limited believes the information provided herein
is reliable. While every care has been taken to ensure accuracy, the
information is furnished to the recipients with no warranty as to the
completeness and accuracy of its contents and on condition that any
errors or omissions shall not be made the basis for any claim, demand
or cause for action.
The information in this email is intended only for the named
recipient.  If you are not the intended recipient please notify us
immediately and do not copy, distribute or take action based on this
e-mail.
All messages sent to and from this email address will be logged by
Gloucester Research Ltd and are subject to archival storage,
monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30
Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and
Wales with company number 04267560.
----

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

Andrew


Andrew