[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] unwelcome job classad



For the submit node 

INO_USER = true

is correct, but incomplete,  you also have to say that the variable INO_USER should be automatically inserted into jobs.  One way to do this is to add it to the
SUBMIT_ATTS list like this

SUBMIT_ATTRS = $(SUBMIT_ATTRS) INO_USER


-tj



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Nagaraj Panyam <pn@xxxxxxxxxxx>
Sent: Friday, March 26, 2021 4:05 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] unwelcome job classad
 

Hi,


On 3/24/21 10:16 PM, John M Knoeller wrote:
IsValidCheckpointPlatform is automatically inserted into the Requirements for the Execute nodes to ensure that Standard universe jobs that are being resumed after a checkpoint will not match machines that they cannot resume on.

The sub _expression_ TARGET.JobUniverse isnt 1  is how this is limited to standard universe.   Standard universe is a (nearly) obsolete universe that only a very small number of Condor users that are still using - in fact. Standard universe has been removed from the latest version of HTCondor.  

So you can safely ignore IsValidCheckpointPlatform and assume that the reason jobs are stuck in idle is the INO_USER _expression_.

Indeed, I have INO_USER=TRUE in the condor_config.local of submit node.

And on the exec nodes where I want those jobs to go, I have START = INO_USER == TRUE


I expected that all jobs submitted on that submit node will carry INO_USER=TRUE. Is something wrong with my syntax?


Also, is there a way I can see all the variables that a job carries with it to the exec node?


Thanks

Nagaraj






-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Nagaraj Panyam <pn@xxxxxxxxxxx>
Sent: Wednesday, March 24, 2021 1:15 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] unwelcome job classad
 
Hi,

All of a sudden all jobs of all users are idle and I find the reason as
below. Point is, I myself have put in the requirement for INO_USER, but
I have not put in the requirement for IsValidcheckpointPlatform. Could
it be in some updated packages? How to take it out?

==================================================

The Requirements _expression_ for this slot is

     (START) &&
     (IsValidCheckpointPlatform)

   START is
     INO_USER == true

   IsValidCheckpointPlatform is
     (TARGET.JobUniverse isnt 1 ||
       ((MY.CheckpointPlatform isnt undefined) &&
         ((TARGET.LastCheckpointPlatform is MY.CheckpointPlatform) ||
           (TARGET.NumCkpts == 0))))

This slot defines the following attributes:

     CheckpointPlatform = "LINUX X86_64 3.10.0-1160.11.1.el7.x86_64
normal N/A avx avx2 ssse3 sse4_1 sse4_2"

The Requirements _expression_ for this slot reduces to these conditions:

        Clusters
Step    Matched  Condition
-----  --------  ---------
[0]           0  INO_USER == true
[1]           0  IsValidCheckpointPlatform

======================================================

Thanks  a lot!

Nagaraj

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/