[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Interpreting XferStatsLog and understanding message flow



Hello,


Are there any race conditions while condor is trying to write to XferStatsLog ?

This could explain the weird log entries I observed.


Thanks,

George


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of George Papadimitriou <georgpap@xxxxxxx>
Sent: Tuesday, March 13, 2018 10:12:53 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Interpreting XferStatsLog and understanding message flow
 

Hello,


I'm trying to read and understand the log entries of the XferStatsLog, that present tcp connection statistics from condor io.

However, I'm a bit confused with the message flow and it's not entirely clear what does each log entry means.


Here is an example output from the log (excluding the tcp related fields), where I execute a workflow with 4 jobs (2 running at the same time, each time) and condor stages in and stages out files to and from the worker nodes.


           Timestamp                                                             Description (???)         JobId          files        bytes      seconds         dest
02/08/18 18:52:31 (1822.0) (523676): (peer stats from starter): 02/08/18 18:52:31 (1823.0) (523677): File Transfer Upload 1823 8 13070752 0 10.103.0.11
02/08/18 18:52:31 (1823.0) (523677): (peer stats from starter): 02/08/18 18:55:54 (1822.0) (523676): File Transfer Download 1822 5 196890 0.01 10.103.0.11
02/08/18 18:55:54 (1822.0) (523676): (peer stats from starter): File Transfer Upload 1822 5 196890 0.03 10.103.0.12
02/08/18 18:56:07 (1823.0) (523677): File Transfer Download 1823 5 196900 0.01 10.103.0.11
02/08/18 18:56:07 (1823.0) (523677): (peer stats from starter): File Transfer Upload 1823 5 196900 0.03 10.103.0.12
02/08/18 18:56:07 (1824.0) (524617): File Transfer Upload 1823 10 13248090 0 10.103.0.11
02/08/18 18:56:07 (1824.0) (524617): (peer stats from starter): 02/08/18 18:56:17 (1826.0) (524676): File Transfer Upload 1826 10 13248093 0 10.103.0.11
02/08/18 18:56:17 (1826.0) (524676): (peer stats from starter): 02/08/18 19:22:15 (1824.0) (524617): File Transfer Download 1824 3 1842522 0.01 10.103.0.11
02/08/18 19:22:15 (1824.0) (524617): (peer stats from starter): File Transfer Upload 1824 3 1842522 0.03 10.103.0.12
02/08/18 19:22:30 (1826.0) (524676): File Transfer Download 1826 3 1842569 0.01 10.103.0.11
02/08/18 19:22:30 (1826.0) (524676): (peer stats from starter): File Transfer Upload 1826 3 1842569 0.03 10.103.0.12

In this example the field I have marked as "Description" is the most confusing to me, especially because some records (eg. lines 1, 2, 7, 8) seem to have information about multiple jobs and I cannot understand the meaning.

Could someone explain the meaning of each record and maybe give me a high level description of the message flow ?


Thanks,

George