[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CheckpointPlatform error



Todd--the message is saying that CheckpointPlatform is not in
the _job_ classad, this has nothing to do with machine classads.
I have been seeing this warning message from condor_q -better-analyze
for a very long time and, like you say, it is harmless.

To explore the "unknown reasons" why a job is rejected by certain machines
condor_q -ana -l <Clusterid>
will give you the last reason that a particular
job was rejected.  Group quotas, or sometimes the negotiator
just hasn't run yet.

Steve Timm



Queue

But after the job has been submited the condor_q -better-analyze
returns:
-----------------------
002.000:  Run analysis summary.  Of 6 machines,
      0 are rejected by your job's requirements
      2 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 match but are currently offline
      0 are available to run your job

The following attributes are missing from the job ClassAd:

CheckpointPlatform
----------------------
Where is the error? What is the CheckpointPlatform?

From the Condor Manual -

CheckpointPlatform: A string which opaquely encodes various aspects
about a machine's operating system, hardware, and kernel attributes.
It is used to identify systems where previously taken checkpoints for
the standard universe may resume.

But this strange to see better-analyze saying it is missing. CheckpointPlatform should appear by default in all the machine classads, the above message from condor_analyze would imply that some of your machines are not advertising a checkpoint platform. Would be curious to see what this command
 condor_status -con 'CheckpointPlatform =?= UNDEFINED'
returns (it will print out all machines in your pool that do not have CheckpointPlatform defined)