[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Put jobs on hold if output or error files grow large?



I like Rita's motto:
Get your facts first, then you can distort them as you please.--

I will take her advise, as the Product Manager in me is puzzled

1. Carsten says:
2. Brian , one of the most competent Condor man I know, answers
3. Carsten comments:
4. Brian clarifies 
Rita then proposes a new feature nice to have
  • Here is how we attack a similar problem, we look at disk utilization (write) and if we notice a spike in the traffic we start to snoop around. We consider a spike as a function of standard deviation from 24 hours. 
So here is what happens. A champion user de-facto uses big data with HTCondor. This is "complex" using an "ancient code and using dagman , whose documentation is not for the faint of heart. It does not work. Rita proposes a new feature.

This sequence  I have seen  many time since I watch this user group. Before taking a decision to add yet another feature to the gargantuan list of features HTCondor accumulated in over thirty years. The little PM bird sitting my shoulder asks me:
  1. Are they many users like Carsten?
  2. Is this a typical big data feature?
  3. Are they any related features on big data we should add to make handling of very large files a pleasure to use?
  4. Do we want to bury the Big Data features into the same colossal code named HTCondor, or do we want to make a separate module, HTCondor BigData Extensions?
  5. How do we know the ontology of PostgreSQLBig  Data?
    1. See a recent reference to the Ontology of Data Science
  6. What is definition of Big Data?
  7. Can we produce clear documentation about Big Data?
    For example the manual HTCondor there is a section 
     Contrib and Source Modules , which looks like a disparate farmers market, grouping together Hadoop, with View, Quill a PostgreSQL apps (not the newer NoSql products), etc
I turned around and told my little PM bird to stop asking questions that are difficult to answer. This is the way HTCondor survived for 30+ years and changing the HTCondor means to change its' DNA. This is is not possible. Yet, I have hope that I am wrong.

Miha

PS: But for the record, someone must point out  the invisible emperor's cloth. My name is associated to Bosco and HTC (therefore HTCondor), so I was an Insider Product Manager without a title. When I became the Sun Grid Engine Product Manager, no one in Sun in the year 2000 wanted this  position. It was a really a position where (1) success means everyone else takes the credit (2) product is criticized, then all the blame go to the PM,  while everyone else says ;"I told you so"

After I became "Mr. Grid Engine" the product is now still available, but I am the first to say it's a legacy product. A few companies make a living supporting it, but it may not last forever

My daughter 22,  tells me everyone today wants to be a Product Manager, not as coder. My blog maybe made the name glamorous, :-) but basically all successes in high tech on Silicon Valley are the due to excellent Product managers. There is Mark Zuckerberg coder A#1, but he transformed into a Product Manager A#1 on his way to become a CEO of one large public company 


"The essential first step is to think for yourself. "

Thiel, Peter; Masters, Blake (2014-09-16). Zero to One: Notes on Startups, 

--- --- --- --- --- --- --- --- --- --- --- --- ---

Miha Ahronovitz

Principal Ahrono Associates

Web: http://www.ahrono.com/

Blog: http://my-inner-voice.blogspot.com/

c: 408 422 2757

emiha.ahronovitz@xxxxxxxxxx

tw: @myinnervoice

--- --- --- --- --- --- --- --- --- --- --- --- ---



On Sat, Dec 6, 2014 at 6:52 AM, Rita <rmorgan466@xxxxxxxxx> wrote:
This would be a nice feature to have.
Here is how we attack a similar problem, we look at disk utilization (write) and if we notice a spike in the traffic we start to snoop around. We consider a spike as a function of standard deviation from 24 hours. 

On Wed, Jul 23, 2014 at 4:03 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:

On Jul 23, 2014, at 2:24 PM, Carsten Aulbert <Carsten.Aulbert@xxxxxxxxxx> wrote:

> Hi Brian
>
> On 07/23/2014 09:21 PM, Brian Bockelman wrote:
>> Would MAX_TRANSFER_OUTPUT_MB be what you are looking for?
>>
>> That places the job on hold if the final output files (all of them in aggregate) are above a certain size.
>>
>> (See http://research.cs.wisc.edu/htcondor/manual/v8.1/3_3Configuration.html)
>
> I briefly looked at that, but I'm not sure if HTCondor counts this is
> one uses a shared file system or these files are placed on the execute
> nodes local storage. Or will these counted as well?

This only counts the files done through HTCondor file transfer.  So, files on a shared file system likely won't be counted.

Brian
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
--- Get your facts first, then you can distort them as you please.--

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/