[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CheckpointPlatform error



Hello

Thank you for all the answers.

Ok. I have changed my job to only: echo "hello world". All the files are
world-readable/writable

After submitting the "condor_q -ana -l <Clusterid>" returns:
----------------------------
slot2@xxxxxxxxx Failed offer constraint
---
011.000:  Run analysis summary.  Of 6 machines,
      0 are rejected by your job's requirements
      2 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 match but are currently offline
      0 are available to run your job
----------------------------

In the NegotiatorLog I found this error:
05/10 10:27:08 Phase 4.1:  Negotiating with schedds ...
05/10 10:27:08   Negotiating with condor@xxxxxxxxxxxxxxxx at
<192.168.1.40:44936>
05/10 10:27:08 0 seconds so far
05/10 10:27:08 condor_read() failed: recv() returned -1, errno = 104
Connection reset by peer, reading 5 bytes from schedd
condor@xxxxxxxxxxxxxxxxx
05/10 10:27:08 IO: Failed to read packet header
05/10 10:27:08     Failed to get reply from schedd
05/10 10:27:08   Error: Ignoring submitter for this cycle

In the SchedLog I found this other error:
05/10 10:27:08 (pid:2505) Sent ad to central manager for
condor@xxxxxxxxxxxxxxxx
05/10 10:27:08 (pid:2505) Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxx
05/10 10:27:08 (pid:2505) Can't find address for startd
dalia.intranet.iac3.eu
05/10 10:27:08 (pid:2505) PERMISSION DENIED to unauthenticated user from
host 192.168.1.40 for command 493 (NEGOTIATE_WITH_SIGATTRS), access
level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case
for the full reason

I think that I could have some error in my condor configuration.

Thanks for any advice

Regards



El vie, 07-05-2010 a las 09:38 -0500, Steven Timm escribió:
> Todd--the message is saying that CheckpointPlatform is not in
> the _job_ classad, this has nothing to do with machine classads.
> I have been seeing this warning message from condor_q -better-analyze
> for a very long time and, like you say, it is harmless.
> 
> To explore the "unknown reasons" why a job is rejected by certain machines
> condor_q -ana -l <Clusterid>
> will give you the last reason that a particular
> job was rejected.  Group quotas, or sometimes the negotiator
> just hasn't run yet.
> 
> Steve Timm
> 
> 
> 
> >> Queue
> >> 
> >> But after the job has been submited the condor_q -better-analyze
> >> returns:
> >> -----------------------
> >> 002.000:  Run analysis summary.  Of 6 machines,
> >>       0 are rejected by your job's requirements
> >>       2 reject your job because of their own requirements
> >>       0 match but are serving users with a better priority in the pool
> >>       4 match but reject the job for unknown reasons
> >>       0 match but will not currently preempt their existing job
> >>       0 match but are currently offline
> >>       0 are available to run your job
> >> 
> >> The following attributes are missing from the job ClassAd:
> >> 
> >> CheckpointPlatform
> >> ----------------------
> >> Where is the error? What is the CheckpointPlatform?
> >
> > From the Condor Manual -
> >
> >> CheckpointPlatform: A string which opaquely encodes various aspects
> >> about a machine's operating system, hardware, and kernel attributes.
> >> It is used to identify systems where previously taken checkpoints for
> >> the standard universe may resume.
> >
> > But this strange to see better-analyze saying it is missing. 
> > CheckpointPlatform should appear by default in all the machine classads, the 
> > above message from condor_analyze would imply that some of your machines are 
> > not advertising a checkpoint platform.  Would be curious to see what this 
> > command
> >  condor_status -con 'CheckpointPlatform =?= UNDEFINED'
> > returns (it will print out all machines in your pool that do not have 
> > CheckpointPlatform defined)
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/