[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTC and HPC





Thanks Inês.

Here is what I learned from engineers I trust:

When Hadoop started it was different than classic Grid Engines like SGE, HTCondor, PBS. Over the time, the big data frameworks are becoming more and more like grid engines, except the scalability aspects. As you would agree  SGE was built around stable systems with expected #jobs. Most jobs were larger in size - big batch processing. 

Where as the current day data centers are built out of off the self components. While these servers cost less, their failure rates are higher. Also, the job sizes, IMO, are smaller than typical SGE job sizes - as a result, the number of jobs being scheduled are very high as compared to SGE. These frameworks are optimized for smaller, more frequent, realtime jobs.

Here is something that might be of help:

"Although a variety of cluster schedulers (e.g. Torque, Sun Grid Engine, Condor) already exist in the scientific computing community, they are not well suited for today's data center environment. These schedulers generally give jobs coarse-grained static allocations of the cluster (e.g. X nodes for the full duration of the job). This is problematic because many cluster applications are elastic (can scale up and down), so utilization is not optimal under static partitioning, and because data-intensive applications such as MapReduce need to run a few tasks on every node of the cluster to read data locally. To address these challenges, Mesos is designed around two principles:

Fine-grained sharing: Mesos allocates resources at the level of "tasks" within a job, allowing applications to scale up and down over time and to take turns accessing data on cluster nodes.
 
Application-controlled scheduling: Applications control which nodes their tasks run on, allowing them to achieve placement goals such as data locality."

 As you found in machine learning a proper use for HTCondor, perhaps you should document it somewhere, so other can use it. Also is interesting to try the newest tools, Hadoop, Spark, Mezos and define borderline of what can be done with tools.

These tools were created by people, today star coders , open source architects, and entrepreneurs who were not even born when Condor started.

Cheers,

miha

On Thu, Dec 4, 2014 at 3:00 PM, Ines Dutra <ines@xxxxxxxxxxxx> wrote:
I have been using Condor for many years to run thousands of experiments in machine learning. The evaluation methodology used in machine learning experiments require tuning of parameters and cross validation which can easily span thousands of processes, even when not using big data. So far, I am very happy with the results and the many resources I can use to run my experiments! So I would sign up the table "Who should be using HTCondor".

Best regards,

Inês Dutra.
Dept of Radiology, UW-Madison, USA
on leave from Dept of Computer Science, University of Porto, Portugal



On Thu, Dec 4, 2014 at 3:06 PM, Miha Ahronovitz <myinnervoice@xxxxxxxxx> wrote:
This is an excellent question. After 34 years of Condor, tens of thousands of threads still there is no clarity of what HTCondor is and why using it,

A simple table stating, in two columns: "Who shoulld be using HTCondor", and "Who should not be using HTCondor"

Here is a quote from bit.ly/1FhQTMZ 

There is an acute need to transform the classical high performance computing (HPC)  skill set. Adrian Cockroft, from Battery Ventures says in a recent interview with InsideHPC"
"The biggest skill shortage in IT today is related to big data and data scientists. Most people in HPC have the analysis and math skills needed to meet this need, but you may need to retool from being an MPI programmer to learning R or Hadoop or Spark."
 

I would ad Mezos as well. Times have changed. So what is position of HTCondor today, relative to the newer products? Relative to Docker and the new arrival Rocket?

I think the top thinkers on this group should spend some time to position clearly where HTCondor fits and what unexplored poten tials this software has.

Cheers,

Miha Ahronovitz

--- --- --- --- --- --- --- --- --- --- --- --- ---

Miha Ahronovitz

Principal Ahrono Associates

Web: http://www.ahrono.com/

Blog: http://my-inner-voice.blogspot.com/

c: 408 422 2757

emiha.ahronovitz@xxxxxxxxxx

tw: @myinnervoice

--- --- --- --- --- --- --- --- --- --- --- --- ---



On Thu, Dec 4, 2014 at 3:10 AM, marrodriguez <marc.rodriguez@xxxxxxxx> wrote:
Hi

Hi
I have interest to implement condor on my site, but I have a doubt, Why condor is consider a HTC  batch system and not HPC. it have some disadvantage on HPC field respect Slurm, PBS vs SGE?

Thanks in advanced
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/