[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job Status not Updated When Using The BOINC GAHP



When using the BOINC gahp the job status is still not being updated. The following is a detailed explanation of the situation and it leads to a few questions. Does anyone have any idea why the first requests have min_mod_time = 0 and the subsequent ones have this value as a timestamps? Why even though the first requests seem to return the status are the HTCondor job statuses not being updated?

The condor_q command shows the jobs being idle.

-- Schedd: boinc-submitter.cern.ch : <188.184.165.253:13440?...
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  17.0   test            7/31 21:22   0+00:00:00 I  0   0.0 Sixtrack
  18.0   test            8/1  12:10   0+00:00:00 I  0   0.0 Sixtrack
  19.0   test            8/1  13:02   0+00:00:00 I  0   0.0 Sixtrack
  20.0   test            8/1  13:06   0+00:00:00 I  0   0.0 Sixtrack
  21.0   test            8/1  13:10   0+00:00:00 I  0   0.0 Sixtrack
  22.0   test            8/2  00:03   0+00:00:00 I  0   0.0 Sixtrack
  23.0   test            8/2  10:21   0+00:00:00 I  0   0.0 Sixtrack
  26.0   test            8/6  23:45   0+00:00:00 I  0   0.0 Sixtrack
  27.0   test            8/9  14:48   0+00:00:00 I  0   0.0 Sixtrack

The gahp requests an update as can be seen in the GridmanagerLog.

BOINC_QUERY_BATCHES 4 0 9 condor#boinc-submitter.cern.ch#Sixtrack#1470046242 condor#boinc-submitter.cern.ch#Sixtrack#1470049845 condor#boinc-submitter.cern.ch#Sixtrack#1470746903 condor#boinc-submitter.cern.ch#Sixtrack#1470042521 condor#boinc-submitter.cern.ch#Sixtrack#1470049582 condor#boinc-submitter.cern.ch#Sixtrack#1470126076 condor#boinc-submitter.cern.ch#Sixtrack#1470519953 condor#boinc-submitter.cern.ch#Sixtrack#1470049378 condor#boinc-submitter.cern.ch#Sixtrack#1470089006

And we do get the following response which seems correct.

08/09/16 16:37:17 [157885] GAHP[157887] -> '4' 'NULL' '1470753439.291700' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470046242#18.0' 'ERROR' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470049845#21.0' 'ERROR' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470746903#27.0' 'IN_PROGRESS' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470042521#17.0' 'ERROR' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470049582#20.0' 'ERROR' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470126076#23.0' 'ERROR' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470519953#26.0' 'IN_PROGRESS' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470049378#19.0' 'ERROR' '1' 'condor#boinc-submitter.cern.ch#Sixtrack#1470089006#22.0' 'ERROR'

But after this BOINC_QUERY_BATCHES 14 etc. include a timestamp where before theres was 0.

08/09/16 19:32:27 [158462] GAHP[158465] <- 'BOINC_QUERY_BATCHES 14 1470763888.431700 9 condor#boinc-submitter.cern.ch#Sixtrack#1470046242 condor#boinc-submitter.cern.ch#Sixtrack#1470049845 condor#boinc-submitter.cern.ch#Sixtrack#1470746903 condor#boinc-submitter.cern.ch#Sixtrack#1470042521 condor#boinc-submitter.cern.ch#Sixtrack#1470049582 condor#boinc-submitter.cern.ch#Sixtrack#1470126076 condor#boinc-submitter.cern.ch#Sixtrack#1470519953 condor#boinc-submitter.cern.ch#Sixtrack#1470049378 condor#boinc-submitter.cern.ch#Sixtrack#1470089006'

And the following is returned.

08/09/16 16:39:18 [157885] GAHP[157887] -> '14' 'NULL' '1470753559.707100' '0' '0' '0' '0' '0' '0' '0' '0' '0'

The difference in the XML summited is:

<min_mod_time>0.000000</min_mod_time><min_mod_time>1470763888.431700</min_mod_time>

And the returned XML respectively are:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<query_batch2>
<server_time>1470764185.2719</server_time>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470042521#17.0</job_name>
        <status>ERROR</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470046242#18.0</job_name>
        <status>ERROR</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470049378#19.0</job_name>
        <status>ERROR</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470049582#20.0</job_name>
        <status>ERROR</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470049845#21.0</job_name>
        <status>ERROR</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470089006#22.0</job_name>
        <status>ERROR</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470126076#23.0</job_name>
        <status>ERROR</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470169805#24.0</job_name>
        <status>DONE</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470314864#25.0</job_name>
        <status>DONE</status>
    </job>
   <batch_size>1</batch_size>
    <job>
<job_name>condor#boinc-submitter.cern.ch#Sixtrack#1470519953#26.0</job_name>
        <status>IN_PROGRESS</status>
    </job>
</query_batch2>

and

<?xml version="1.0" encoding="ISO-8859-1" ?>
<query_batch2>
<server_time>1470764518.9648</server_time>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
   <batch_size>0</batch_size>
</query_batch2>