[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem Condor Job Stays Idle Because of target.CkptArch




       Last successful match: Tue Nov 20 22:36:21 2007


This indicates that the job is successfully getting matched to a machine. Something must be going wrong when the Condor tries to run the job on that machine. Look for clues about what is going wrong here:

The "user log": /usr/local/globus-4.0.5//var/globus-condor.log
The ShadowLog (condor_config_val SHADOW_LOG)
The StartLog (condor_config_val STARTD_LOG)
The StarterLog (condor_config_val STARTER_LOG)

I hope that helps!

--Dan

Nitin Gavhane wrote:

hello all,
i am submitting job through globus to condor but the job stays in idle state. the job details are as follows.
================================================
*The Job Description Generated by GRAM is as follows *

[condor@niting-w2p etc]$ cat /tmp/condor_job_description
#
# description file for condor submission
#
Universe = standard
Notification = Never
Executable = /home/psegrid/NIP/nip
Requirements = OpSys == "LINUX"  && Arch == "INTEL"
Environment = GLOBUS_LOCATION=/usr/local/globus-4.0.5/;X509_CERT_DIR=/etc/grid-security/certificates;X509_USER_PROXY=;X509_USER_CERT=;X509_USER_KEY=;HOME=/home/psegrid;LOGNAME=psegrid;SCRATCH_DIRECTORY=/home/psegrid/.globus/scratch;JAVA_HOME=/usr/java/jdk1.6.0_03/jre;GLOBUS_GRAM_JOB_HANDLE= https://192.168.7.221:8443/wsrf/services/ManagedExecutableJobService?7f408200-9789-11dc-9f1a-b41f06e1e2ea;LD_LIBRARY_PATH= <https://192.168.7.221:8443/wsrf/services/ManagedExecutableJobService?7f408200-9789-11dc-9f1a-b41f06e1e2ea;LD_LIBRARY_PATH=>
Arguments =
InitialDir = /home/psegrid
Input = /dev/null
Log = /usr/local/globus-4.0.5//var/globus-condor.log
log_xml = True
#Extra attributes specified by client

Output = /home/psegrid/stdout
Error = /home/psegrid/stderr
queue 1
=======================================================================
*[psegrid@niting-w2p NIP]$ condor_q -better-analyze*


-- Submitter: niting-w2p.corp.cdac.in <http://niting-w2p.corp.cdac.in> : <192.168.7.221:42993 <http://192.168.7.221:42993>> : niting-w2p.corp.cdac.in <http://niting-w2p.corp.cdac.in>
---
005.000:  Run analysis summary.  Of 7 machines,
     4 are rejected by your job's requirements
     0 reject your job because of their own requirements
     0 match but are serving users with a better priority in the pool
     3 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
     0 are available to run your job
       Last successful match: Tue Nov 20 22:36:21 2007

The Requirements expression for your job is:

( target.OpSys == "LINUX" && target.Arch == "INTEL" ) &&
( ( target.CkptArch == target.Arch ) || ( target.CkptArch is undefined ) ) && ( ( target.CkptOpSys == target.OpSys ) || ( target.CkptOpSys is undefined ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize )

   Condition                         Machines Matched    Suggestion
   ---------                         ----------------    ----------
1 target.Arch == "INTEL" 3 2 target.OpSys == "LINUX" 7 3 ( ( target.CkptArch == target.Arch ) || ( target.CkptArch is undefined ) ) 7 4 ( ( target.CkptOpSys == target.OpSys ) || ( target.CkptOpSys is undefined ) ) 7 5 ( target.Disk >= 20000 ) 7 6 ( ( 1024 * target.Memory ) >= 20000 )7



==========================================================
*[psegrid@niting-w2p NIP]$ condor_status*

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

vm1@niting-w2 LINUX INTEL Unclaimed Idle 0.000 469 0+00:05:26 vm2@niting-w2 LINUX INTEL Unclaimed Idle 0.140 469 0+00:26:42 sskadam-w2p.c LINUX INTEL Unclaimed Idle 0.000 248 0+00:44:38 vm1@psewebs-w LINUX X86_64 Unclaimed Idle 0.400 753 0+00:30:04 vm2@psewebs-w LINUX X86_64 Unclaimed Idle 0.000 753 0+00:30:05 vm3@psewebs-w LINUX X86_64 Unclaimed Idle 0.000 753 0+00:30:06 vm4@psewebs-w LINUX X86_64 Unclaimed Idle 0.000 753 0+00:30:27

Total Owner Claimed Unclaimed Matched Preempting Backfill

INTEL/LINUX 3 0 0 3 0 0 0 X86_64/LINUX 4 0 0 4 0 0 0

Total 7 0 0 7 0 0 0
==============================================================
*The DAEMON details for all three machines are as follows *

[condor@niting-w2p etc]$ ./test.sh
current file: condor_config
##  checkpoint server isn't available or USE_CKPT_SERVER is set to
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
##  checkpoint server?  If False, the CKPT_SERVER_HOST set on
##  the submit machine is used.  Otherwise, the CKPT_SERVER_HOST set
STARTER_CHOOSES_CKPT_SERVER = True
#WALL_CLOCK_CKPT_INTERVAL = 3600
##  setting is only used if USE_CKPT_SERVER (from above) is True.
#COMPRESS_PERIODIC_CKPT = False
#COMPRESS_VACATE_CKPT = False
#SLOW_CKPT_SPEED = 0
DAEMON_LIST                     = MASTER, STARTD, SCHEDD
#DC_DAEMON_LIST = \
=============
current file: psewebs-w2p.local
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
DAEMON_LIST = MASTER, STARTD, SCHEDD
DAEMON_LIST   = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
=============
current file: niting-w2p.local
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
DAEMON_LIST = MASTER, STARTD, SCHEDD
=============
current file: sskadam-w2p.local
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
DAEMON_LIST = MASTER, STARTD, SCHEDD
===============================

Please Tell what is wrong with job submission.
Thank you.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nitin M. Gavhane
MS in Adavanced Software Technologies
International Institute of Information Technology
P-14,Hinjewadi,Pune, India.
---------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/