[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] problem regarding job submission



Hi,
 
I had several problems with WinXP and Condor v6.6.10 as well.
 
I'M NOT AN EXPERT, BUT CHECK THIS LIST! All problems I had, I could solve doing individual solutions from the ones listed below
 
Check following:
- If you don't submit the jobs on the master node, check condor_cred_add
- Check firewall
- What does condor_status say?
- Reinstall Condor: uninstall, REMOVE directory, reboot, install and CHECK SETTINGS
- SETTINGS: Try IP address of master node instead of name (DNS-ERRORS), Check read write rights and so on!
 
good luck!
 
 
----- Original Message -----
Sent: Tuesday, January 09, 2007 12:46 PM
Subject: [Condor-users] problem regarding job submission

Hi
I m santosh jaiswal . I have installed condor6.6.10 on window XP.I m getting problem in job submission.When i submit job(prime.sub), it is submitted without any error but it doesn't give any output and gives following messages:

submitting job(s).
logging submit event(s).
1 job(s) submitted to cluster 1.

I m sending log information

START LOG
1/9 15:45:39 ******************************************************
1/9 15:45:39 ** condor_startd.exe (CONDOR_STARTD) STARTING UP
1/9 15:45:39 ** C:\Condor\bin\condor_startd.exe
1/9 15:45:39 ** $CondorVersion: 6.6.10 Jun 22 2005 $
1/9 15:45:39 ** $CondorPlatform: INTEL-WINNT50 $
1/9 15:45:39 ** PID = 1768
1/9 15:45:39 ******************************************************
1/9 15:45:39 Using config file: C:\Condor\condor_config
1/9 15:45:39 Using local config files: C:\Condor/condor_config.local
1/9 15:45:39 DaemonCore: Command Socket at <172.31.5.115:1235>
1/9 15:45:39 "C:\Condor/bin/condor_starter.pvm -classad" did not produce any output, ignoring
1/9 15:45:39 "C:\Condor/bin/condor_starter.std -classad" did not produce any output, ignoring
1/9 15:45:39 vm1: New machine resource allocated
1/9 15:45:39 vm2: New machine resource allocated
1/9 15:45:44 About to run initial benchmarks.
1/9 15:45:50 Completed initial benchmarks.
1/9 15:45:50 vm2: State change: IS_OWNER is false
1/9 15:45:50 vm2: Changing state: Owner -> Unclaimed
1/9 15:45:50 vm1: State change: IS_OWNER is false
1/9 15:45:50 vm1: Changing state: Owner -> Unclaimed


schedlog


1/9 16:17:24 DaemonCore: Command received via UDP from host <172.31.5.115:1435>
1/9 16:17:24 DaemonCore: received command 60000 (DC_RAISESIGNAL), calling handler (HandleSigCommand())
1/9 16:17:24 Got SIGTERM. Performing graceful shutdown.
1/9 16:17:24 shutdown graceful
1/9 16:17:24 Deleting Cronmgr
1/9 16:17:24 All resources are free, exiting.
1/9 16:17:24 **** condor_startd.exe (condor_STARTD) EXITING WITH STATUS 0
1/9 16:17:24 ******************************************************
1/9 16:17:24 ** condor_startd.exe (CONDOR_STARTD) STARTING UP
1/9 16:17:24 ** C:\Condor\bin\condor_startd.exe
1/9 16:17:24 ** $CondorVersion: 6.6.10 Jun 22 2005 $
1/9 16:17:24 ** $CondorPlatform: INTEL-WINNT50 $
1/9 16:17:24 ** PID = 712
1/9 16:17:24 ******************************************************
1/9 16:17:24 Using config file: C:\Condor\condor_config
1/9 16:17:24 Using local config files: C:\Condor/condor_config.local
1/9 16:17:24 DaemonCore: Command Socket at <172.31.5.115:1447>
1/9 16:17:24 "C:\Condor/bin/condor_starter.pvm -classad" did not produce any output, ignoring
1/9 16:17:24 "C:\Condor/bin/condor_starter.std -classad" did not produce any output, ignoring
1/9 16:17:24 vm1: New machine resource allocated
1/9 16:17:24 vm2: New machine resource allocated
1/9 16:17:29 About to run initial benchmarks.

collector log



1/9 16:15:39 Housekeeper:  Ready to clean old ads
1/9 16:15:39      Cleaning StartdAds ...
1/9 16:15:39      Cleaning StartdPrivateAds ...
1/9 16:15:39      Cleaning ScheddAds ...
1/9 16:15:39      Cleaning SubmittorAds ...
1/9 16:15:39      Cleaning LicenseAds ...
1/9 16:15:39      Cleaning MasterAds ...
1/9 16:15:39      Cleaning CkptServerAds ...
1/9 16:15:39      Cleaning CollectorAds ...
1/9 16:15:39      Cleaning StorageAds ...
1/9 16:15:39 Housekeeper:  Done cleaning
1/9 16:15:40 (Sent 8 ads in response to query)
1/9 16:15:40 Got QUERY_STARTD_PVT_ADS
1/9 16:15:40 (Sent 3 ads in response to query)
1/9 16:16:23 Can't connect to <128.105.143.14:9618>:0, errno = 10060
1/9 16:16:23 Will keep trying for 10 seconds...
1/9 16:16:24 Connect failed for 10 seconds; returning FALSE
1/9 16:16:24 ERROR:
SECMAN:2003:TCP connection to <128.105.143.14:9618> failed

1/9 16:16:24 Can't send UPDATE_COLLECTOR_AD to collector (condor.cs.wisc.edu): Failed to send UDP update command to collector
1/9 16:17:24 Got SIGTERM. Performing graceful shutdown.
1/9 16:17:24 **** condor_collector.exe (condor_COLLECTOR) EXITING WITH STATUS 0
1/9 16:17:24 ******************************************************
1/9 16:17:24 ** condor_collector.exe (CONDOR_COLLECTOR) STARTING UP
1/9 16:17:24 ** C:\Condor\bin\condor_collector.exe
1/9 16:17:24 ** $CondorVersion: 6.6.10 Jun 22 2005 $
1/9 16:17:24 ** $CondorPlatform: INTEL-WINNT50 $
1/9 16:17:24 ** PID = 1400
1/9 16:17:24 ******************************************************
1/9 16:17:24 Using config file: C:\Condor\condor_config
1/9 16:17:24 Using local config files: C:\Condor/condor_config.local
1/9 16:17:24 DaemonCore: Command Socket at <172.31.5.115:9618>
1/9 16:17:24 In ViewServer::Init()
1/9 16:17:24 In CollectorDaemon::Init()
1/9 16:17:24 In ViewServer::Config()
1/9 16:17:24 In CollectorDaemon::Config()
1/9 16:17:24 enable: Creating stats hash table
1/9 16:17:24 (Sent 0 ads in response to query)
1/9 16:17:24 Got QUERY_STARTD_PVT_ADS
1/9 16:17:24 (Sent 0 ads in response to query)
1/9 16:17:24 WARNING:  No master ad for < research1 >
1/9 16:17:24 ScheddAd    : Inserting ** "< research1 , 172.31.5.115 >"
1/9 16:17:24 stats: Inserting new hashent for 'Schedd':'research1':'172.31.5.115'
1/9 16:17:29 ** Master < research1 > rejuvenated from recently down
1/9 16:17:29 stats: Inserting new hashent for 'Master':'research1':'172.31.5.115'
1/9 16:17:39 WARNING:  No master ad for < vm1@research1 >
1/9 16:17:39 StartdAd    : Inserting ** "< vm1@research1 , 172.31.5.115 >"
1/9 16:17:39 stats: Inserting new hashent for 'Start':'vm1@research1':'172.31.5.115'
1/9 16:17:39 StartdPvtAd  : Inserting ** "< vm1@research1 , 172.31.5.115 >"
1/9 16:17:39 stats: Inserting new hashent for 'StartdPvt':'vm1@research1':'172.31.5.115'
1/9 16:17:40 WARNING:  No master ad for < vm2@research1 >
1/9 16:17:40 StartdAd    : Inserting ** "< vm2@research1 , 172.31.5.115 >"
1/9 16:17:40 stats: Inserting new hashent for 'Start':'vm2@research1':'172.31.5.115'
1/9 16:17:40 StartdPvtAd  : Inserting ** "< vm2@research1 , 172.31.5.115 >"
1/9 16:17:40 stats: Inserting new hashent for 'StartdPvt':'vm2@research1':'172.31.5.115'
1/9 16:17:57 DC_AUTHENTICATE: attempt to open invalid session research1:1896:1168386753:6, failing.
1/9 16:18:16 DC_AUTHENTICATE: attempt to open invalid session research1:1896:1168386797:9, failing.
1/9 16:21:01 DC_AUTHENTICATE: attempt to open invalid session research1:1896:1168386753:7, failing.
1/9 16:21:01 DC_AUTHENTICATE: attempt to open invalid session research1:1896:1168386753:7, failing.


match log

1/9 16:15:40      Matched 1.0 np@xxxxxxxxxx <172.31.5.119:3264> preempting none <172.31.5.119:3287>
1/9 16:15:40      Rejected 2.0 np@xxxxxxxxxx <172.31.5.119:3264>: no match found



master log

1/9 15:45:38 ******************************************************
1/9 15:45:38 ** Condor (CONDOR_MASTER) STARTING UP
1/9 15:45:38 ** C:\Condor\bin\condor_master.exe
1/9 15:45:38 ** $CondorVersion: 6.6.10 Jun 22 2005 $
1/9 15:45:38 ** $CondorPlatform: INTEL-WINNT50 $
1/9 15:45:38 ** PID = 284
1/9 15:45:38 ******************************************************
1/9 15:45:38 Using config file: C:\Condor\condor_config
1/9 15:45:38 Using local config files: C:\Condor/condor_config.local
1/9 15:45:38 DaemonCore: Command Socket at <172.31.5.115:1234>
1/9 15:45:38 Started DaemonCore process "C:\Condor/bin/condor_collector.exe", pid and pgroup = 1896
1/9 15:45:38 Started DaemonCore process "C:\Condor/bin/condor_negotiator.exe", pid and pgroup = 1452
1/9 15:45:38 Started DaemonCore process "C:\Condor/bin/condor_startd.exe", pid and pgroup = 1768
1/9 15:45:38 Started DaemonCore process "C:\Condor/bin/condor_schedd.exe", pid and pgroup = 180
1/9 16:17:24 DaemonCore: Command received via TCP from host <172.31.5.115:1430>
1/9 16:17:24 DaemonCore: received command 453 (RESTART), calling handler (admin_command_handler)
1/9 16:17:24 Sent signal 15 to COLLECTOR (pid 1896)
1/9 16:17:24 Sent signal 15 to NEGOTIATOR (pid 1452)
1/9 16:17:24 Sent signal 15 to STARTD (pid 1768)
1/9 16:17:24 Sent signal 15 to SCHEDD (pid 180)
1/9 16:17:24 DaemonCore: Command received via UDP from host <172.31.5.115:1442>
1/9 16:17:24 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
1/9 16:17:24 The COLLECTOR (pid 1896) exited with status 0
1/9 16:17:24 DaemonCore: Command received via UDP from host <172.31.5.115:1443>
1/9 16:17:24 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
1/9 16:17:24 The NEGOTIATOR (pid 1452) exited with status 0
1/9 16:17:24 DaemonCore: Command received via UDP from host <172.31.5.115:1444>
1/9 16:17:24 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
1/9 16:17:24 The STARTD (pid 1768) exited with status 0
1/9 16:17:24 DaemonCore: Command received via UDP from host <172.31.5.115:1445>
1/9 16:17:24 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
1/9 16:17:24 The SCHEDD (pid 180) exited with status 0
1/9 16:17:24 All daemons are gone.  Restarting.
1/9 16:17:24 Restarting master right away.
1/9 16:17:24 Doing exec( "C:\Condor/bin/condor_master.exe" )
1/9 16:17:24 ******************************************************
1/9 16:17:24 ** Condor (CONDOR_MASTER) STARTING UP
1/9 16:17:24 ** C:\Condor\bin\condor_master.exe
1/9 16:17:24 ** $CondorVersion: 6.6.10 Jun 22 2005 $
1/9 16:17:24 ** $CondorPlatform: INTEL-WINNT50 $
1/9 16:17:24 ** PID = 1608
1/9 16:17:24 ******************************************************
1/9 16:17:24 Using config file: C:\Condor\condor_config
1/9 16:17:24 Using local config files: C:\Condor/condor_config.local
1/9 16:17:24 DaemonCore: Command Socket at <172.31.5.115:1446>
1/9 16:17:24 Started DaemonCore process "C:\Condor/bin/condor_collector.exe", pid and pgroup = 1400
1/9 16:17:24 Started DaemonCore process "C:\Condor/bin/condor_negotiator.exe", pid and pgroup = 972
1/9 16:17:24 Started DaemonCore process "C:\Condor/bin/condor_startd.exe", pid and pgroup = 712
1/9 16:17:24 Started DaemonCore process "C:\Condor/bin/condor_schedd.exe", pid and pgroup = 400


negotiator log


1/9 16:15:40 ---------- Started Negotiation Cycle ----------
1/9 16:15:40 Phase 1:  Obtaining ads from collector ...
1/9 16:15:40  Getting all public ads ...
1/9 16:15:40  Sorting 8 ads ...
1/9 16:15:40  Getting startd private ads ...
1/9 16:15:40 Got ads: 8 public and 3 private
1/9 16:15:40 Public ads include 1 submitter, 3 startd
1/9 16:15:40 Phase 2:  Performing accounting ...
1/9 16:15:40 Phase 3:  Sorting submitter ads by priority ...
1/9 16:15:40 Phase 4.1:  Negotiating with schedds ...
1/9 16:15:40  Negotiating with np@xxxxxxxxxx at <172.31.5.119:3264>
1/9 16:15:40    Request 00001.00000:
1/9 16:15:40      Matched 1.0 np@xxxxxxxxxx <172.31.5.119:3264> preempting none <172.31.5.119:3287>
1/9 16:15:40      Successfully matched with research
1/9 16:15:40    Request 00002.00000:
1/9 16:15:40      Rejected 2.0 np@xxxxxxxxxx <172.31.5.119:3264>: no match found
1/9 16:15:40    Got NO_MORE_JOBS;  done negotiating
1/9 16:15:40 ---------- Finished Negotiation Cycle ----------
1/9 16:17:24 Got SIGTERM. Performing graceful shutdown.
1/9 16:17:24 **** condor_negotiator.exe (condor_NEGOTIATOR) EXITING WITH STATUS 0
1/9 16:17:24 ******************************************************
1/9 16:17:24 ** condor_negotiator.exe (CONDOR_NEGOTIATOR) STARTING UP
1/9 16:17:24 ** C:\Condor\bin\condor_negotiator.exe
1/9 16:17:24 ** $CondorVersion: 6.6.10 Jun 22 2005 $
1/9 16:17:24 ** $CondorPlatform: INTEL-WINNT50 $
1/9 16:17:24 ** PID = 972
1/9 16:17:24 ******************************************************
1/9 16:17:24 Using config file: C:\Condor\condor_config
1/9 16:17:24 Using local config files: C:\Condor/condor_config.local
1/9 16:17:24 DaemonCore: Command Socket at <172.31.5.115:9614>
1/9 16:17:24 ACCOUNTANT_HOST = None (local)
1/9 16:17:24 NEGOTIATOR_INTERVAL = 300 sec
1/9 16:17:24 NEGOTIATOR_TIMEOUT = 30 sec
1/9 16:17:24 PREEMPTION_REQUIREMENTS = (CurrentTime - EnteredCurrentState) > (1 * (60 * 60)) && RemoteUserPrio > SubmittorPrio * 1.2
1/9 16:17:24 PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize
1/9 16:17:24 ---------- Started Negotiation Cycle ----------
1/9 16:17:24 Phase 1:  Obtaining ads from collector ...
1/9 16:17:24  Getting all public ads ...
1/9 16:17:24  Sorting 0 ads ...
1/9 16:17:24  Getting startd private ads ...
1/9 16:17:24 Got ads: 0 public and 0 private
1/9 16:17:24 Public ads include 0 submitter, 0 startd
1/9 16:17:24 Phase 2:  Performing accounting ...
1/9 16:17:24 Phase 3:  Sorting submitter ads by priority ...
1/9 16:17:24 Phase 4.1:  Negotiating with schedds ...
1/9 16:17:24 ---------- Finished Negotiation Cycle ----------
1/9 16:22:24 ---------- Started Negotiation Cycle ----------
1/9 16:22:24 Phase 1:  Obtaining ads from collector ...
1/9 16:22:24  Getting all public ads ...
1/9 16:22:24  Sorting 4 ads ...
1/9 16:22:24  Getting startd private ads ...
1/9 16:22:24 Got ads: 4 public and 2 private
1/9 16:22:24 Public ads include 0 submitter, 2 startd
1/9 16:22:24 Phase 2:  Performing accounting ...
1/9 16:22:24 Phase 3:  Sorting submitter ads by priority ...
1/9 16:22:24 Phase 4.1:  Negotiating with schedds ...
1/9 16:22:24 ---------- Finished Negotiation Cycle ----------


kindly sortout my problem as early as possible.
Regards
Santosh

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR