[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor + EC2



Hey Matthew,

Since I am not using 7.2 anymore I may have a look at shared_port. Just have a question regarding the limit of workers.
According to the documentation:

http://research.cs.wisc.edu/condor/manual/v7.7/3_3Configuration.html#SECTION004336000000000000000
"SHARED_PORT_MAX_WORKERS An integer that specifies the maximum number of sub-processes
created by condor_shared_port while servicing requests to connect to the daemons that are
sharing the port. The default is 50."

The default is 50 workers, how much can I increase this value? If I push it too much, should I expect problems? Isnt there a limit of connections that can be accepted by second, or something like that?

Can you send me the instructions for using a password?

Nice sugestion about the hour, makes more sense than the way I am doing.

Thanks,

On Fri, Nov 18, 2011 at 3:40 AM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
As you've discovered, you don't need the shared_port, you can do ranges. The benefit of shared_port is that you only have to open a single port. On execute nodes the range isn't so bad, since you'll have a fixed number of ports needed. On the submit side you'll need a port or more per running job, so the range may have to be wide.

It isn't too surprising that the hibernation detection doesn't work well inside a VM.

I'd suggest setting up PASSWORD authentication instead of hostname-based. I've instructions sitting around here if you're interested.

One $ optimization, EC2 charges by the hour and partial hours get the full hour charge. Consider shutting down only if idle and near an hour boundary.

Best,


matt


On 11/17/2011 09:24 PM, Tiago Macarios wrote:
Thanks Chi, but setting condor in amazon was pretty straight forward.
Just the poweroff part that did not go as expected. As I promised I am
posting here how to do it, you can get everything running in 15 minutes.
Feel free to optimize for your needs, it fits mine the way it is.

<> Instructions on how to set a condor pool in the amazon cloud using
ubuntu 10.04 LTS (ami-ad36fbc4):

Set a security group to enable ports:
TCP: 22, 9628, 40000-40050
UDP: 9618, 40000-40050
Fire AMI and after connection download necessary files:
$ wget
http://www.cs.wisc.edu/condor/debian/development/dists/squeeze/contrib/binary-amd64/condor-7.7.2-1-deb_6.0_amd64.deb
$ wget
http://ftp.us.debian.org/debian/pool/main/o/openssl098/libssl0.9.8_0.9.8o-7_amd64.deb

Install everything:
$ sudo apt-get install libc6-i386 libvirt0 libxen3
$ sudo dpkg -i libssl0.9.8_0.9.8o-7_amd64.deb
$ sudo dpkg -i condor-7.7.2-1-deb_6.0_amd64.deb

I did a script that is executed before the condor service and sets the
machine as Central Manager or Worker, depending on the user-data of the
REST interface.
You can download it here: http://pastie.org/pastes/2881119/text
Save the script to /etc/init.d/ and set it to run before condor on the
appropriate run levels:
$ for run_level in 2 3 4 5;
 > do
 > sudo ln -vs /etc/init.d/condor_config
/etc/rc${run_level}.d/S01condor-config
 > echo /etc/rc2.d/S0${i}condor-config
 > done
$ sudo chmod 755 /etc/init.d/condor_config

I also set a crontab to turn the machines off automatically if idle for
more than 5 minutes:
$ sudo crontab file

where file is:
*/1 * * * * if grep -Fxq 'CONDOR_HOST = $(FULL_HOSTNAME)'
/etc/condor/condor_config.local; then loadavg=`cat /proc/loadavg | tr "
" "\n" | awk 'NR==2'`; loadavgpoweroff=`echo "${loadavg} < 0.01" | bc`;
if [ ${loadavgpoweroff} -ne "0" ]; then /sbin/poweroff; fi; fi;

<> After all is finished just clean-up, rebundle the machine and you are
good to go.
Running the Manager: Just start the AMI.
Running the Workers: Set the number of workers and in the "Advanced
Instance Options" set the user-data as the public DNS of the Master and
set Shutdown Behavior to "Terminate" and you are good to go.

It is my first time using condor, so if I can optimize something please
let me know. Thanks a lot for the help guys.


On Thu, Nov 17, 2011 at 5:50 PM, Chi Chan <chichan2008@xxxxxxxxx
<mailto:chichan2008@xxxxxxxxx>> wrote:

   If you are new to Condor, then it can be hard to setup a Condor
   cluster on EC2.

   You may want to see if the open-source StarCluster project from MIT
   fits your need:

   http://web.mit.edu/stardev/cluster/

   "StarCluster 0.91 Demo": http://www.youtube.com/watch?v=vC3lJcPq1FY

   Currently, the only scheduler StarCluster supports is Sun Grid Engine
   (now called "Open Grid Scheduler" with Oracle), but people are adding
   Torque & Condor support. I think it is way easier to setup a cluster
   with StarCluster than doing all of them by hand.

   And if someone from the Condor project can give the StarCluster
   developers a hand in implementing the Condor plugin, then open source
   schedulers can all share one single EC2 provisioning layer, and can
   easily migrate from job scheduler to scheduler.

   --Chi



   On Tue, Nov 15, 2011 at 1:27 AM, Tiago Macarios
   <tiagomacarios@xxxxxxxxx <mailto:tiagomacarios@gmail.com>> wrote:
    > Hi all,
    > I am trying to get condor to run on amazon ec2. I am trying to
   use ubuntu
    > and the standard apt-get ubuntu (7.2.4), but I saw at some places
   that this
    > version may be a bit old for that. Did any of you guys made it
   work using
    > this configuration?
    > Would you have any suggestion of distros I could try? I found
   this link that
    > seems to be what I wanted to do, but I am not familiar with
   Fedora and
    > wanted to try ubuntu
    > first:
   http://spinningmatt.wordpress.com/2011/11/10/getting-started-condor-and-ec2-ec2-execute-node/
    > To be honest I am new to condor, have been playing around for
   about 2 weeks
    > on a couple of virtual machines. I tried installing the last
   stable release
    > of condor in ubuntu, but I failed is there a tutorial for
   installing it on
    > ubuntu?
    > I saw this version here: http://neuro.debian.net/pkgs/condor.html
   did anyone
    > tried it out?
    > Please help me out. I am more than willing to write a
   documentation about it
    > if I get things running.
    > --
    > Tiago
    >
    >
    >
    > _______________________________________________
    > Condor-users mailing list
    > To unsubscribe, send a message to
   condor-users-request@xxxxxxxxedu
   <mailto:condor-users-request@cs.wisc.edu> with a

    > subject: Unsubscribe
    > You can also unsubscribe by visiting
    > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
    >
    > The archives can be found at:
    > https://lists.cs.wisc.edu/archive/condor-users/
    >
    >
   _______________________________________________
   Condor-users mailing list
   To unsubscribe, send a message to condor-users-request@xxxxxxxxedu
   <mailto:condor-users-request@cs.wisc.edu> with a

   subject: Unsubscribe
   You can also unsubscribe by visiting
   https://lists.cs.wisc.edu/mailman/listinfo/condor-users

   The archives can be found at:
   https://lists.cs.wisc.edu/archive/condor-users/




--





_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/




--