[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_read(): Socket closed when trying to read 5 bytes in StartLog



> 10/17 13:09:57 Starter pid 31556 exited with status 1
> 10/17 13:09:57 State change: starter exited
> 10/17 13:09:57 Changing activity: Busy -> Idle

The Starter exiting with status 1 is an error in the Starter. Look in the StarterLog.

Best,


matt

Johnson koil Raj wrote:
Hi,

   When I submited a Job it Matched with all Machine in the Pool.
Negotiator sent a Matched with a particular Machine X. But the X machine
Start Log Shows like this and the Jobs Keeps on Idle.

1)Why the condor is not finding another suitable machine in the pool
because the Job is not started in X machine.
2) why it keep on trying the same X machine to submit that Job.

3) condor_read(): Socket closed is what kind of error.

-------------Start Log ---------

10/17 13:09:57 Remote global job ID is
scorpio.pesgrid.wipro.com#1224227353#161.0
10/17 13:09:57 JobLeaseDuration not defined: using 1800 (alive_interval
[300] * max_missed [6]
10/17 13:09:57 About to Create_Process "condor_starter -f
scorpio.pesgrid.wipro.com"
10/17 13:09:57 Create_Process: using fast clone() to create child
process.
10/17 13:09:57 Got RemoteUser (idealgrid@xxxxxxxxxxxxxxxxx) from request
classad
10/17 13:09:57 Got universe "VM" (13) from request classad
10/17 13:09:57 State change: claim-activation protocol successful
10/17 13:09:57 Changing activity: Idle -> Busy
10/17 13:09:57 condor_read(): Socket closed when trying to read 5 bytes
from <127.0.0.1:43202>
10/17 13:09:57 IO: EOF reading packet header
10/17 13:09:57 Closing job ClassAd update socket from starter.
10/17 13:09:57 DaemonCore: No more children processes to reap.
10/17 13:09:57 Starter pid 31556 exited with status 1
10/17 13:09:57 State change: starter exited
10/17 13:09:57 Changing activity: Busy -> Idle
10/17 13:09:57 Got activate_claim request from shadow
(<10.201.42.242:9603>)
10/17 13:09:57 Read request ad and starter from shadow.
10/17 13:09:57 Swap space: 1052124
10/17 13:09:57 28786748 kbytes available for "/vm/local.grid7/execute"
10/17 13:09:57 Looking up RESERVED_DISK parameter
10/17 13:09:57 Reserving 5120 kbytes for file system
10/17 13:09:57 Total execute space: 28781628
10/17 13:09:57 Remote job ID is 161.0
10/17 13:09:57 Remote global job ID is
scorpio.pesgrid.wipro.com#1224227353#161.0
10/17 13:09:57 JobLeaseDuration not defined: using 1800 (alive_interval
[300] * max_missed [6]
10/17 13:09:57 About to Create_Process "condor_starter -f
scorpio.pesgrid.wipro.com"
10/17 13:09:57 Create_Process: using fast clone() to create child
process.
10/17 13:09:57 Got RemoteUser (idealgrid@xxxxxxxxxxxxxxxxx) from request
classad
10/17 13:09:57 Got universe "VM" (13) from request classad
10/17 13:09:57 State change: claim-activation protocol successful
10/17 13:09:57 Changing activity: Idle -> Busy
10/17 13:09:57 condor_read(): Socket closed when trying to read 5 bytes
from <127.0.0.1:34016>
10/17 13:09:57 IO: EOF reading packet header
10/17 13:09:57 Closing job ClassAd update socket from starter.
10/17 13:09:57 DaemonCore: No more children processes to reap.
10/17 13:09:57 Starter pid 31557 exited with status 1
10/17 13:09:57 State change: starter exited
10/17 13:09:57 Changing activity: Busy -> Idle


by
Johnson