[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Application specific scheduler



Thank Tevfik.

 I will not argue much, but , just play the music, which rescue dag? "new-style" rescue DAG ? Some values from condor_submit_dag instance will  not pass to any lower-level condor_submit_dag instances (like -oldrescue) others will  (like -debug), etc

Such complexity is bound to create errors and misunderstandings. We can not hide this from the users.

I am not a Condor engineer, but after 20 years in resource management products I know a bit. What I know this type of complexity will make the product un-usable by many people who are not "Condor alumnus". Do I have to be a Word alumnus to use Word? Do I have to be a an Android alumnus to use my phone and text?

Also such talented engineers are wasting their talents as their fabulous experience is something not easy to market.for highest paid job. Quora had a thread where someone asked: "What type of engineer I should be to qualify for $500K per year salary package at Google?"  We learn that these jobs exists. and many more pay 200K plus per year.

The only way to command these type of packages is to do something people want
I have in mind something like http://agaveapi.co/

Using these API Portal like these were created
http://www.iplantcollaborative.org/

In my dreams, I could see that every new implementation of HTCondor to be like a Portal. Can you imagine the joy Nick will give his users by delivering a an HTCondor portal customized to their needs?

I know this is easier said than done, but I think this team must stat sometime. Now is the right moment.

M iha



--- --- --- --- --- --- --- --- --- --- --- --- ---

Miha Ahronovitz

Principal Ahrono Associates

Web: http://www.ahrono.com/

Blog: http://my-inner-voice.blogspot.com/

c: 408 422 2757

emiha.ahronovitz@xxxxxxxxxx

tw: @myinnervoice

--- --- --- --- --- --- --- --- --- --- --- --- ---




On Mon, Jun 30, 2014 at 8:25 AM, Tevfikkosar <tkosar@xxxxxxxxxxx> wrote:
And, to add what Ken just said, even in the case of a failure where the user would need some manual action to fix the problem, the entire workflow still does not need to be restarted. DAGMan would create a "rescue DAG", marking already completed jobs as "DONE", and would only rerun/retry unfinished jobs in the workflow. Another feature with 10+ years of history...

Tevfik Kosar
A Condor Alumnus

-- Sent from a mobile phone.


On Jun 30, 2014, at 10:20, "R. Kent Wenger" <wenger@xxxxxxxxxxx> wrote:

On Sat, 28 Jun 2014, Miha Ahronovitz wrote:

So Nick, says, I want to migrate my home grown distributed environment to
HTCondor. As a new user he considers 3 options. Miron says use DAGman. Miha
asks why. Miron says because it manages job dependencies. Gabriel says
DAGman  is the way to go, but he wonders "why, in case of failure, one
has to restart the workflow rather than retry the failed jobs, "
Kent Wegner from CHTC team clarifies ans says, yes we know it is problem,
gives the link and has a name for it: this is issue #2831.
Let me stop here. Nick seems an an experienced sysadmin  / engineer. But
HTCondor-list  has 2,100 subscribers. How many of these subscribers know
about DAGman?  Maybe they search and read why, in case of failure, they hae
resubmitt all jobs from the beginning?

Just to clarify, I was assuming (perhaps incorrectly) that Gabriel was referring to the case where the user has to take some kind of manual action to fix the problem with a job that failed, before retrying that job.

If a job fails, but it may succeed on being retried without any action from the user, the retry option in DAGMan can handle that case.  The retry option for nodes in DAGMan has existed for a long time (10+ years, I think), so hopefully many people are aware of that...

Kent
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/