[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] collecting data on standard universe jobs



Hello -

I'm doing an analysis of the data sharing and variability within condor
jobs.  There is a new feature in 6.6.9 and 6.7.6.  For those of you that
administer condor pools that run standard universe jobs, I'd appreciate it
very much if you would read the following email and consider configuring
your pool to help me in this analysis.

Specifically, this new features sends to some configurable email address a
copy of the email Condor sends when a standard universe job completes.
This email includes information about the files used during the job's
lifetime and the quantity of data read/written from each.  We run
usernames and filenames through a hash, so we don't know who's accessing
what, but we're still able to see what sort of sharing goes on between
jobs.

To turn the feature on, you'll need to upgrade your submit machines to
6.6.9 or 6.7.6 and add:

EMAIL_NOTIFICATION_CC = johnbent@xxxxxxxxxxx

to the config file on your submit machine.

Some things:

1. It's _opt out_ - so if you turn it on, every job from that submit
machine will send email, unless the job includes:
+AllowNotificationCC = FALSE
in the submit file.

2. The email to NOTIFICATION_CC always gets sent, even if the user has
said notification=never.  In this case, the submittor would not get email,
but I would.

3. In addition, I'm always investigating users' ability to make
predictions about there jobs.  Another new feature is that users can set
an +EmailAttributes in their submit files which will then include
arbitrary information in the notification emails.  If you can upgrade your
pools, then any willing users can use the +EmailAttributes to make
estimates that I can then include in my study.

I've included below the email that I sent to the users in the UWCS pool
when we made the upgrade here.

Thanks!

-John

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

To all standard universe users of the local Condor pool:

Starting 3/1/05, I will be collecting information about the I/O
behavior of jobs in the UW-CS condor pool as part of my dissertation
research.  Specifically, I will be looking for patterns of data sharing
across different jobs.  This information will be collected from all
standard universe jobs by default, but you may opt out if you wish
(instructions are below).

The data collected for each standard universe job will be the names of the
user, the job, the job cluster and process id, the job runtime, and the
name of each file read and written by that job as well as the amount of
data read or written.

However, the user names and the file names will be run through a standard
SHA-1 hash to preserve privacy, so the resulting data which I will be
examining will be anonymized.

To opt-out and not have information about your jobs collected, include the
following in your submit files:
+AllowNotificationCC = FALSE

Conversely, if you would like to know about the behavior of your own jobs,
you can email me and I will not anonymize your jobs.  When the study is
complete, I will then send you an indivualized report about your jobs.
Note that in such a case, any published or released information would
still of course preserve privacy.

Finally, all of the above will happen by default (unless you choose to
opt-out).  In addition, I am also interested in additional information
pertaining to user ability to predict different types of information about
their jobs.

Users who are willing to provide this extra information can do so by
including the following extra lines in their submit files:

 +EstimateIO = "123Mb"
 +EstimateInput = "73Mb"
 +EstimateOutput = "60Mb"
 +EstimateInputShared = "72Mb"
 +EstimateRuntime = "1000s"
 +EmailAttributes = "EstimateIO,EstimateInput,EstimateOutput,EstimateInputShared,EstimateRuntime"

Users can provide as many or as few of these estimates as they like.
Each estimate provided must then be listed in the EmailAttributes line.

I would very much like to encourage everyone to take the extra time if
possible and include these estimates.  Don't worry about their accuracy;
just make the best guess that you can.  This information will immediately
help my research and will ultimately improve Condor for all of us.

Please direct any questions about this to me (johnbent@xxxxxxxxxxx), any
member of the Condor Staff, or Miron Livny (miron@xxxxxxxxxxx).

Thanks in advance to all who are willing to participate in both the
default study and the additional estimate study.

John Bent
_______________________________________________
Uwcs-condor mailing list
Uwcs-condor@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/uwcs-condor