[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Stop making users explore



Miha,

As a new adopter of HTCondor, moving from a poorly-maintained, poorly-understood, five-year-old Sun Grid Engine system over the past few months, I think your post is pretty much spot on. Even the administrators have to explore too, I found - it took me a while to figure out just how my users' intended jobs mapped into a meaningful, coherent HTCondor configuration, and I'm still getting the hang of rank and requirement expressions.

I can't remember the last time I saw a single-CPU machine in a business context, but it took me a surprising amount of time to get a handle on dynamic slots, which I think was because I didn't have a clear understanding of how the pieces fit together.

I mean, the "Introduction" to the Admin's section of the manual is basically just a reference guide for the different daemons, describing what they do, but not so much of why they do it, and with whom, and to what end. I sometimes felt like I was trying to learn to drive by reading a Haynes manual.

The "hole" I wanted was standing up a pool of dedicated machines for vanilla jobs, and it was a bit of work to find the proper drill bit.

I think the challenge the dev team is trying to deal with in what they're putting out is that there's so many different possible use scenarios - Windows machines, Linux/Solaris/etc, EC2, different UID and FS domains, scavenging from desktops, you name it - not everyone is going to want a vanilla-only dedicated-exec-host pool. Plus they've been immersed in it for so long, they may not even realize what it really takes for an average admin to go from zero to condor_submit.

I've been collecting some mental notes as I've worked through the process of introducing HTCondor to our environment and migrating from SGE, which I hope to use to help others avoid the kind of effort I had to expend in order to make the journey.

And I'm thinking that a clear walkthrough on standing  up a simple Linux-only vanilla pool with dedicated exec hosts, a "condor" username and home directory, shared filesystem, common UID domain, and all that very basic stuff would be a good foundation for novices - even novices with 25 years of experience, like me - to start building from.

This is how I've got about 1,000 cores in the pool running pretty well, particularly after teaching about the joys of $_CONDOR_SCRATCH_DIR. But as I'm gaining confidence and knowledge, I'm eyeing the hundreds of 8-core Windows desktops on the network with my mouth watering over the transformational power that can offer to my users and our customers, and meanwhile I preach the HTCondor gospel to the users and gain converts.

    -Michael Pelletier.