[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Unable to start EC2 instance




I will try that when I get in this AM (I'm on the west coast) and report back.
Thanks,
Phil

 
On Thu, Jun 23, 2011 at 7:34 AM, Timothy St. Clair <tstclair@xxxxxxxxxx> wrote:
You could extract the condor_submit + gridmanager + ec2_gahp..

Cheers,
Tim

On Thu, 2011-06-23 at 07:26 -0700, Philip Papadopoulos wrote:
> Do I need all of condor 7.7 or can I just extract the ec2_gahp
> executable from it?
>
> Thanks,
> Phil
>
>
>
> On Thu, Jun 23, 2011 at 4:56 AM, Matthew Farrellee <matt@xxxxxxxxxx>
> wrote:
>
>         On 06/22/2011 02:49 PM, Philip Papadopoulos wrote:
>
>
>                 Trying out Condor 7.6.1 -- installed via the
>                 rhap.stripped.tar.gz
>
>                 I get the following in my GAHP log.
>                 06/22/11 09:33:37 Command(AMAZON_VM_STATUS_ALL) got
>                 error(code:Client,
>                 msg:End of file or no input: Operation interrupted or
>                 timed out
>                 06/22/11 09:38:38 Call to DescribeInstances failed:
>                 SOAP 1.1 fault:
>                 SOAP-ENV:Client [no subcode]
>                 "End of file or no input: Operation interrupted or
>                 timed out"
>                 Detail: [no detail]
>
>                 06/22/11 09:38:38 Command(AMAZON_VM_STATUS_ALL) got
>                 error(code:Client,
>                 msg:End of file or no input: Operation interrupted or
>                 timed out
>                 06/22/11 09:42:08 EOF reached on pipe 0
>                 06/22/11 09:42:08 stdin buffer closed, exiting
>                 06/22/11 09:47:19 Call to DescribeInstances failed:
>                 SOAP 1.1 fault:
>                 SOAP-ENV:Client [no subcode]
>                 "End of file or no input: Operation interrupted or
>                 timed out"
>                 Detail: [no detail]
>
>                 06/22/11 09:47:19 Command(AMAZON_VM_STATUS_ALL) got
>                 error(code:Client,
>                 msg:End of file or no input: Operation interrupted or
>                 timed out
>                 06/22/11 09:48:33 EOF reached on pipe 0
>                 06/22/11 09:48:33 stdin buffer closed, exiting
>                 06/22/11 09:49:18 Call to DescribeInstances failed:
>                 SOAP 1.1 fault:
>                 SOAP-ENV:Client [no subcode]
>                 "End of file or no input: Operation interrupted or
>                 timed out"
>                 Detail: [no detail]
>
>                 06/22/11 09:49:18 Command(AMAZON_VM_STATUS_ALL) got
>                 error(code:Client,
>                 msg:End of file or no input: Operation interrupted or
>                 timed out
>
>
>                 The submission file is simple:
>                 universe = grid
>                 grid_resource = amazon https://ec2.amazonaws.com/
>                 periodic_release = NumHolds < 3
>                 +NumHolds = 0
>                 periodic_remove = NumHolds >= 3 || (JobStatus == 2 &&
>                 time()-ShadowBday
>                  > 1*60*60)
>                 executable = RunEC2VM
>                 amazon_keypair_file = keypair.$(Process)
>
>                 amazon_ami_id = ami-4ed12d27
>                 amazon_instance_type = m1.large
>                 amazon_user_data =
>                 condor:landphil.rocksclusters.org:40000:50000
>                 amazon_private_key = /home/phil/.ec2/pk.pem
>                 amazon_public_key = /home/phil/.ec2/cert.pem
>
>                 queue 1
>
>
>                 And the condor_config_val  (The salient ones I think)
>                 $ condor_config_val -dump | grep -i amazon
>                 AMAZON_GAHP = $(SBIN)/amazon_gahp
>                 AMAZON_GAHP_LOG = /tmp/AmazonGahpLog.$(USERNAME)
>                 GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE_AMAZON =
>                 20
>
>                 and
>                 $ condor_config_val -dump | grep -i ssl
>                 SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
>                 SOAP_SSL_SKIP_HOST_CHECK = True
>
>                 I've tried both with an without
>                 SOAP_SSL_SKIP_HOST_CHECK.
>                 the SSL_CA_FILE exists
>                 If I try WITHOUT the
>                 SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
>                 then I get
>                  Call to DescribeInstances failed: SOAP 1.1 fault:
>                 SOAP-ENV:Client [no
>                 subcode]
>                 "SSL_ERROR_SSL
>                 error:14090086:SSL
>                 routines:SSL3_GET_SERVER_CERTIFICATE:certificate
>                 verify failed"
>                 Detail: SSL connect failed in tcp_connect()
>
>
>                 Right now I'm flumoxed.
>
>                 Thanks,
>                 Phil
>
>                 --
>                 Philip Papadopoulos, PhD
>                 University of California, San Diego
>
>                 858-822-3628 <tel:858-822-3628> (Ofc)
>                 619-331-2990 <tel:619-331-2990> (Fax)
>
>         Phil,
>
>         Assuming you aren't getting those errors 100% of the time, and
>         you're actually talking to AWS's EC2 service.
>
>         I've seen similar intermittent issues in the past. They came
>         and went by days. After much investigation, I eventually
>         chalked them up to transient issues with AWS' EC2 SOAP
>         interface. The amazon_gahp was Condor's first means to
>         interact with EC2 and was written to the (then popular) SOAP
>         interface. Over the years the EC2 Query interface has
>         apparently taken hold as the interface of choice, with many
>         EC2 clones not supporting SOAP. In response, the ec2_gahp has
>         been written, available in 7.7, against the Query interface.
>         You should try it out, especially on a day when the SOAP
>         interface is failing so that we might get a better handle on
>         if the issue is truly SOAP v Query.
>
>         Best,
>
>
>         matt
>
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)