[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] understanding condor history file


> Has any one got any answer for me please? I really appreciate some help.
> Cheers,
> Santanu
> On 22/10/10 16:35, Santanu Das wrote:
> >Hi there,
> >
> >Recently we started seeing some mismatch in our accounting data and 
> >when I looked in to the history file, I found the number of fields are 
> >duplicated. Can any one please explain the meaning of those values 
> >please? My concerns are especially with RemoteWallClockTime, 
> >CompletionDate  and JobStatus but I'd like to know rest of the things 
> >as well.

This is my understanding of the duplicate entries.

When a job is added to the job queue (i.e. in the job_queue.log file)
the parameters shared by all jobs in the cluster are identified by
entries of the form:

103 0<cluster>.-1 <parameter> <value>

(I'm not sure why there is a leading zero, but there is.)

Parameters unique to each process within the cluster are then added:

103 <cluster>.<process> <parameter> <value>

Parameters that are deleted are marked like this:

104 <cluster>.<process> <parameter>

What you see in the history file is first the set of cluster-wide
parameters (i.e. the "0<cluster>.-1" values), followed by the final
set of process specific parameters (i.e. the "<cluster>.<process>"
values that have not been deleted).

For example, all jobs are queued with "CommittedTime = 0" for all
processes in the cluster.  When a job finishes it gets its own value
for "CommittedTime".

So when you see a duplicate parameter in the history file, you should
ignore the previous value because all it means is that there is a
process specific value that differs from the cluster-wide one.

So you don't want to "sort" a history file entry, order is important!
Use something like this instead:

gawk '{ H[$1]=$0 } END { N=asort(H); for (I=1; I<=N; ++I) print H[I] }'

To apply this to the entire history file you would use something like:

gawk '{ H[$1]=$0 } /^\*\*\*/ { N=asort(H); for (I=1; I<=N; ++I) print H[I]; delete H }'

Note that this reverses the position of the "***" markers, but that
actually looks nicer when checking the file by hand.

I hope this helps.