[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor error 10054




 
 Hi
         I am trying to execute a jar file from java universe for i neet to trasfer other required jar's on executing machines. i hav submit this job to run on 4 machine but job is running only on one machine and jobs are getting failed with 10054 error. Please suggest me somthing ....
 
 
job description file is
 
####################
#
# Example 1
# Execute a single Java class,
# not on a shared file system
#
####################
universe = java
executable = jegrid.jar
#Requirements = Machine == "johndoe.pesgrid.wipro.com"
jar_files = jegrid.jar,lib\commons-logging-1.0.4.jar,lib\concurrent.jar,lib\log4j-1.2.13.jar,lib\jms.jar,lib\jgroups-all-2.2.9.4.jar,lib\picocontainer-1.1.jar
arguments = org.jegrid.ServerMain test
output = jegrid.output
error = jegrid.error
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
getenv = true
queue
 
 
 
ShadowLog contents are:
 
2/6 11:20:11 ******************************************************
2/6 11:20:11 ** condor_shadow (CONDOR_SHADOW) STARTING UP
2/6 11:20:11 ** C:\condor\bin\condor_shadow.exe
2/6 11:20:11 ** $CondorVersion: 6.8.7 Nov 29 2007 $
2/6 11:20:11 ** $CondorPlatform: INTEL-WINNT50 $
2/6 11:20:11 ** PID = 2400
2/6 11:20:11 ** Log last touched 2/6 11:19:55
2/6 11:20:11 ******************************************************
2/6 11:20:11 Using config source: C:\condor\condor_config
2/6 11:20:11 Using local config sources:
2/6 11:20:11    C:\condor/condor_config.local
2/6 11:20:11 DaemonCore: Command Socket at <10.201.42.248:1701>
2/6 11:20:11 Initializing a JAVA shadow for job 58.0
2/6 11:20:12 (58.0) (2400): Request to run on <10.201.42.248:1671> was ACCEPTED
2/6 11:20:14 (58.0) (2400): condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.201.42.248:1671>.
2/6 11:20:14 (58.0) (2400): Can no longer talk to condor_starter <10.201.42.248:1671>
2/6 11:20:14 (58.0) (2400): Trying to reconnect to disconnected job
2/6 11:20:14 (58.0) (2400): LastJobLeaseRenewal: 1202277014 Wed Feb 06 11:20:14 2008
2/6 11:20:14 (58.0) (2400): JobLeaseDuration: 1200 seconds
2/6 11:20:14 (58.0) (2400): JobLeaseDuration remaining: 1200
2/6 11:20:14 (58.0) (2400): Attempting to locate disconnected starter
2/6 11:20:14 (58.0) (2400): locateStarter(): ClaimId (<10.201.42.248:1671>#1202206022#4601#689705291) and GlobalJobId ( JOHNDOE.pesgrid.wipro.com#1202206246#58.0 ) not found
2/6 11:20:14 (58.0) (2400): Reconnect FAILED: Job not found at execution machine
2/6 11:20:14 (58.0) (2400): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 107
 
 
StartLog contents are
 
 
2/6 11:20:07 DaemonCore: Command received via UDP from host <10.201.42.242:42047>
2/6 11:20:07 DaemonCore: received command 440 (MATCH_INFO), calling handler (command_match_info)
2/6 11:20:07 match_info called
2/6 11:20:07 Received match <10.201.42.248:1671>#1202206022#4601#...
2/6 11:20:07 State change: match notification protocol successful
2/6 11:20:07 Changing state: Unclaimed -> Matched
2/6 11:20:07 DaemonCore: Command received via TCP from host <10.201.42.248:1698>
2/6 11:20:07 DaemonCore: received command 442 (REQUEST_CLAIM), calling handler (command_request_claim)
2/6 11:20:07 Request accepted.
2/6 11:20:07 Remote owner is Manjari@xxxxxxxxxxxxxxxxxxxxxxxxx
2/6 11:20:07 State change: claiming protocol successful
2/6 11:20:07 Changing state: Matched -> Claimed
2/6 11:20:11 DaemonCore: Command received via TCP from host <10.201.42.248:1706>
2/6 11:20:11 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling handler (command_activate_claim)
2/6 11:20:11 Got activate_claim request from shadow (<10.201.42.248:1706>)
2/6 11:20:11 Remote job ID is 58.0
2/6 11:20:12 Got universe "JAVA" (10) from request classad
2/6 11:20:12 State change: claim-activation protocol successful
2/6 11:20:12 Changing activity: Idle -> Busy
2/6 11:20:14 DaemonCore: Command received via UDP from host <10.201.42.248:1715>
2/6 11:20:14 DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())
2/6 11:20:14 Starter pid 3620 died on signal -1073741819 (exception ACCESS_VIOLATION)
2/6 11:20:14 State change: starter exited
2/6 11:20:14 Changing activity: Busy -> Idle
2/6 11:20:14 DaemonCore: Command received via TCP from host <10.201.42.248:1716>
2/6 11:20:14 DaemonCore: received command 1200 (CA_CMD), calling handler (command_classad_handler)
2/6 11:20:14 Aborting CA_LOCATE_STARTER
2/6 11:20:14 ClaimId (<10.201.42.248:1671>#1202206022#4601#689705291) and GlobalJobId ( JOHNDOE.pesgrid.wipro.com#1202206246#58.0 ) not found
2/6 11:20:14 DaemonCore: Command received via UDP from host <10.201.42.248:1718>
2/6 11:20:14 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_release_claim)
2/6 11:20:14 State change: received RELEASE_CLAIM command
2/6 11:20:14 Changing state and activity: Claimed/Idle -> Preempting/Vacating
2/6 11:20:14 State change: No preempting claim, returning to owner
2/6 11:20:14 Changing state and activity: Preempting/Vacating -> Owner/Idle
2/6 11:20:14 State change: IS_OWNER is false
2/6 11:20:14 Changing state: Owner -> Unclaimed
2/6 11:20:14 DaemonCore: Command received via UDP from host <10.201.42.248:1719>
2/6 11:20:14 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_release_claim)
2/6 11:20:14 Warning: can't find resource with ClaimId (<10.201.42.248:1671>#1202206022#4601#...)
 
 
Please help me out
 
 

P Please do not print this email unless it is absolutely necessary. Spread environmental awareness.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com