[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_read(): Socket closed when trying to read 5 bytes in StartLog



If you have a machine with multiple slots configured you'll have multiple StarterLog files, e.g. StarterLog.slot1. If there is still nothing in those files you should increase your debug level, because the starter is apparently exiting abnormally.

Best,


matt

Johnson koil Raj wrote:
Hi matt,
  It actually didn't started the Starter Process. There is no entry in
the Starter Log at that time.

In Starter Log Log last touched 10/17 10:44:30

by
Johnson




On Fri, 2008-10-17 at 07:50 -0500, Matthew Farrellee wrote:
10/17 13:09:57 Starter pid 31556 exited with status 1
 > 10/17 13:09:57 State change: starter exited
 > 10/17 13:09:57 Changing activity: Busy -> Idle

The Starter exiting with status 1 is an error in the Starter. Look in the StarterLog.

Best,


matt

Johnson koil Raj wrote:
Hi,

   When I submited a Job it Matched with all Machine in the Pool.
Negotiator sent a Matched with a particular Machine X. But the X machine
Start Log Shows like this and the Jobs Keeps on Idle.

1)Why the condor is not finding another suitable machine in the pool
because the Job is not started in X machine.
2) why it keep on trying the same X machine to submit that Job.

3) condor_read(): Socket closed is what kind of error.

-------------Start Log ---------

10/17 13:09:57 Remote global job ID is
scorpio.pesgrid.wipro.com#1224227353#161.0
10/17 13:09:57 JobLeaseDuration not defined: using 1800 (alive_interval
[300] * max_missed [6]
10/17 13:09:57 About to Create_Process "condor_starter -f
scorpio.pesgrid.wipro.com"
10/17 13:09:57 Create_Process: using fast clone() to create child
process.
10/17 13:09:57 Got RemoteUser (idealgrid@xxxxxxxxxxxxxxxxx) from request
classad
10/17 13:09:57 Got universe "VM" (13) from request classad
10/17 13:09:57 State change: claim-activation protocol successful
10/17 13:09:57 Changing activity: Idle -> Busy
10/17 13:09:57 condor_read(): Socket closed when trying to read 5 bytes
from <127.0.0.1:43202>
10/17 13:09:57 IO: EOF reading packet header
10/17 13:09:57 Closing job ClassAd update socket from starter.
10/17 13:09:57 DaemonCore: No more children processes to reap.
10/17 13:09:57 Starter pid 31556 exited with status 1
10/17 13:09:57 State change: starter exited
10/17 13:09:57 Changing activity: Busy -> Idle
10/17 13:09:57 Got activate_claim request from shadow
(<10.201.42.242:9603>)
10/17 13:09:57 Read request ad and starter from shadow.
10/17 13:09:57 Swap space: 1052124
10/17 13:09:57 28786748 kbytes available for "/vm/local.grid7/execute"
10/17 13:09:57 Looking up RESERVED_DISK parameter
10/17 13:09:57 Reserving 5120 kbytes for file system
10/17 13:09:57 Total execute space: 28781628
10/17 13:09:57 Remote job ID is 161.0
10/17 13:09:57 Remote global job ID is
scorpio.pesgrid.wipro.com#1224227353#161.0
10/17 13:09:57 JobLeaseDuration not defined: using 1800 (alive_interval
[300] * max_missed [6]
10/17 13:09:57 About to Create_Process "condor_starter -f
scorpio.pesgrid.wipro.com"
10/17 13:09:57 Create_Process: using fast clone() to create child
process.
10/17 13:09:57 Got RemoteUser (idealgrid@xxxxxxxxxxxxxxxxx) from request
classad
10/17 13:09:57 Got universe "VM" (13) from request classad
10/17 13:09:57 State change: claim-activation protocol successful
10/17 13:09:57 Changing activity: Idle -> Busy
10/17 13:09:57 condor_read(): Socket closed when trying to read 5 bytes
from <127.0.0.1:34016>
10/17 13:09:57 IO: EOF reading packet header
10/17 13:09:57 Closing job ClassAd update socket from starter.
10/17 13:09:57 DaemonCore: No more children processes to reap.
10/17 13:09:57 Starter pid 31557 exited with status 1
10/17 13:09:57 State change: starter exited
10/17 13:09:57 Changing activity: Busy -> Idle


by
Johnson
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/


Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/