[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Unable to start EC2 instance



Trying out Condor 7.6.1 -- installed via the rhap.stripped.tar.gz

I get the following in my GAHP log.
06/22/11 09:33:37 Command(AMAZON_VM_STATUS_ALL) got error(code:Client, msg:End of file or no input: Operation interrupted or timed out
06/22/11 09:38:38 Call to DescribeInstances failed: SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"End of file or no input: Operation interrupted or timed out"
Detail: [no detail]

06/22/11 09:38:38 Command(AMAZON_VM_STATUS_ALL) got error(code:Client, msg:End of file or no input: Operation interrupted or timed out
06/22/11 09:42:08 EOF reached on pipe 0
06/22/11 09:42:08 stdin buffer closed, exiting
06/22/11 09:47:19 Call to DescribeInstances failed: SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"End of file or no input: Operation interrupted or timed out"
Detail: [no detail]

06/22/11 09:47:19 Command(AMAZON_VM_STATUS_ALL) got error(code:Client, msg:End of file or no input: Operation interrupted or timed out
06/22/11 09:48:33 EOF reached on pipe 0
06/22/11 09:48:33 stdin buffer closed, exiting
06/22/11 09:49:18 Call to DescribeInstances failed: SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"End of file or no input: Operation interrupted or timed out"
Detail: [no detail]

06/22/11 09:49:18 Command(AMAZON_VM_STATUS_ALL) got error(code:Client, msg:End of file or no input: Operation interrupted or timed out


The submission file is simple:
universe = grid
grid_resource = amazon https://ec2.amazonaws.com/
periodic_release = NumHolds < 3
+NumHolds = 0
periodic_remove = NumHolds >= 3 || (JobStatus == 2 && time()-ShadowBday > 1*60*60)
executable = RunEC2VM
amazon_keypair_file = keypair.$(Process)

amazon_ami_id = ami-4ed12d27
amazon_instance_type = m1.large
amazon_user_data = condor:landphil.rocksclusters.org:40000:50000
amazon_private_key = /home/phil/.ec2/pk.pem
amazon_public_key = /home/phil/.ec2/cert.pem

queue 1


And the condor_config_val  (The salient ones I think)
$ condor_config_val -dump | grep -i amazon
AMAZON_GAHP = $(SBIN)/amazon_gahp
AMAZON_GAHP_LOG = /tmp/AmazonGahpLog.$(USERNAME)
GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE_AMAZON = 20

and
$ condor_config_val -dump | grep -i ssl  
SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
SOAP_SSL_SKIP_HOST_CHECK = True

I've tried both with an without SOAP_SSL_SKIP_HOST_CHECK.
the SSL_CA_FILE exists
If I try WITHOUT the
SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
then I get 
 Call to DescribeInstances failed: SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"SSL_ERROR_SSL
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"
Detail: SSL connect failed in tcp_connect()


Right now I'm flumoxed.

Thanks,
Phil

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)