[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] collecting data on standard universe jobs

Hello -

I'm doing an analysis of the data sharing and variability within condor
jobs.  This analysis is enabled by a new feature in 6.6.9 and 6.7.6.  For
those of you that administer condor pools that run standard universe jobs,
I'd appreciate it very much if you would read the following email and
consider configuring your pool to help me in this analysis.

Specifically, this new features sends to some configurable email address a
copy of the email Condor sends when a standard universe job completes.
This email includes information about the files used during the job's
lifetime and the quantity of data read/written from each.  We run
usernames and filenames through a hash, so we don't know who's accessing
what, but we're still able to see what sort of sharing goes on between
jobs. (An example of what this email looks like is attached.)

To turn the feature on, you'll need to upgrade your submit machines to
6.6.9 or 6.7.6 and add:

EMAIL_NOTIFICATION_CC = johnbent@xxxxxxxxxxx

to the config file on your submit machine.

Some things:

1. It's _opt out_ - so if you turn it on, every job from that submit
machine will send email, unless the job includes:
+AllowNotificationCC = FALSE
in the submit file.

2. The email to NOTIFICATION_CC always gets sent, even if the user has
said notification=never.  In this case, the submittor would not get email,
but I would.

3. In addition, I'm always investigating users' ability to make
predictions about there jobs.  Another new feature is that users can set
an +EmailAttributes in their submit files which will then include
arbitrary information in the notification emails.  If you can upgrade your
pools, then any willing users can use the +EmailAttributes to make
estimates that I can then include in my study.

I've included below the email that I sent to the users in the UWCS pool
when we made the upgrade here.




To all standard universe users of the local Condor pool:

Starting 3/1/05, I will be collecting information about the I/O
behavior of jobs in the UW-CS condor pool as part of my dissertation
research.  Specifically, I will be looking for patterns of data sharing
across different jobs.  This information will be collected from all
standard universe jobs by default, but you may opt out if you wish
(instructions are below).

The data collected for each standard universe job will be the names of the
user, the job, the job cluster and process id, the job runtime, and the
name of each file read and written by that job as well as the amount of
data read or written.

However, the user names and the file names will be run through a standard
SHA-1 hash to preserve privacy, so the resulting data which I will be
examining will be anonymized.

To opt-out and not have information about your jobs collected, include the
following in your submit files:
+AllowNotificationCC = FALSE

Conversely, if you would like to know about the behavior of your own jobs,
you can email me and I will not anonymize your jobs.  When the study is
complete, I will then send you an indivualized report about your jobs.
Note that in such a case, any published or released information would
still of course preserve privacy.

Finally, all of the above will happen by default (unless you choose to
opt-out).  In addition, I am also interested in additional information
pertaining to user ability to predict different types of information about
their jobs.

Users who are willing to provide this extra information can do so by
including the following extra lines in their submit files:

 +EstimateIO = "123Mb"
 +EstimateInput = "73Mb"
 +EstimateOutput = "60Mb"
 +EstimateInputShared = "72Mb"
 +EstimateRuntime = "1000s"
 +EmailAttributes = "EstimateIO,EstimateInput,EstimateOutput,EstimateInputShared,EstimateRuntime"

Users can provide as many or as few of these estimates as they like.
Each estimate provided must then be listed in the EmailAttributes line.

I would very much like to encourage everyone to take the extra time if
possible and include these estimates.  Don't worry about their accuracy;
just make the best guess that you can.  This information will immediately
help my research and will ultimately improve Condor for all of us.

Please direct any questions about this to me (johnbent@xxxxxxxxxxx), any
member of the Condor Staff, or Miron Livny (miron@xxxxxxxxxxx).

Thanks in advance to all who are willing to participate in both the
default study and the additional estimate study.

John Bent
Uwcs-condor mailing list
This is an automated email from the Condor system
on machine 2cafb1dbe93118e4f79a844d022328a71a686812 Do not reply. 
Your condor job /afs/cs.wisc.edu/516b9783fca517eecbd1d064da2d165310b19759/4e0a689d1ebbe753a7d5b535b62d7eabacb6cc9e/5b7dcd14a4faa2cdd54cf6eb8d4bc35da31914a1/fa77123c115913ca9e97c82fc48a0b47e477b7b4/490df2abc3c9875fc0f9f85c511e9d6dd771eb59/c1bb4b2f7b06fd65fea89464a59eb1dbdcdfbd31 TRACEDUMPPUBjbb_500-8p-MOESI_CMP_XNUCA_TRACE-1000-1152-0.trace.gz TRACEDUMPPUBjbb_500-8p-MOESI_CMP_XNUCA_TRACE-1000-1152-0.cache.cache exited with status 0. 
Submitted by: f9f7d68df5572d7c8129fcf921e2184bd0f6d039 
	Submitted at:        Thu Mar 10 12:09:39 2005
	Completed at:        Thu Mar 10 15:39:36 2005
	Real Time:           0 03:29:57
	Run Time:            0 02:51:44
	Committed Time:      0 02:51:44
	Remote User Time:    0 02:36:10
	Remote System Time:  0 00:00:00
	Total Remote Time:   0 02:36:10
	Local User Time:     0 00:00:00
	Local System Time:   0 00:00:00
	Total Local Time:    0 00:00:00
	Virtual Image Size:  32706 Kilobytes
Checkpoints written: 8
Checkpoint restarts: 4
	211.6 MB read
	79.2 MB written
Buffer Configuration:
	512.0 KB max buffer space per open file
	32.0 KB buffer block size
Total I/O:
	7.1 KB/s effective throughput
	3 files opened
	8479 reads totaling 71.6 MB
	16 writes totaling 586.0 B 
	1 seeks
I/O by File:
	opened 1 times
	7787 reads totaling 60.8 MB
	0 writes totaling 0.0 B 
	0 seeks
	opened 1 times
	0 reads totaling 0.0 B 
	16 writes totaling 586.0 B 
	0 seeks
	opened 1 times
	692 reads totaling 10.8 MB
	0 writes totaling 0.0 B 
	1 seeks
Remote System Calls:
	CONDOR_get_file_info_new           2
	CONDOR_report_file_info_new       10
	CONDOR_get_buffer_info             1
	CONDOR_get_ckpt_mode               1
	CONDOR_register_opsys              1
	CONDOR_register_arch               1
	CONDOR_register_ckpt_server        1
	CONDOR_lseekwrite                  1
	CONDOR_lseekread                  22
	CONDOR_register_fs_domain          1
	CONDOR_register_uid_domain         1
	CONDOR_get_a_out_name              1
	CONDOR_file_info                   2
	CONDOR_get_ckpt_name               1
	CONDOR_get_iwd                     1
	CONDOR_startup_info_request        1
	CONDOR_get_file_stream             2
	CONDOR_reallyexit                  1
	CONDOR_chdir                       2
	CONDOR_close                       2
	CONDOR_lseek                       4
	CONDOR_open                        2
ReadEstimate = "50Mb"
WriteEstimate = "500b"
ShareEstimate = "0Kb"