[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] StartLog Error



On 1/31/2014 6:23 AM, Dennis Zheleznyak wrote:
Hi Todd,

I meant execution machine :)

Thanks for the long explanation, however c:\condor\execute exists, do you
still think that the error occurred since the service was still up?

Thank you,
Dennis.


The Windows Condor service will stay running as long as the condor_master.exe process is still running. Is the condor_startd.exe processes running? If the condor_startd.exe process is running, you should see the execute node (slots) via "condor_status". Did something change, aka did this machine used to successfully appear with condor_status and one day it stopped, or has it never (to your knowledge) appeared? If the latter, perhaps a firewall or configuration issue...

Todd


On Thu, Jan 30, 2014 at 10:29 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx>wrote:

On 1/30/2014 1:59 PM, Dennis Zheleznyak wrote:

Hi everyone,

Today I encountered an issue when a machine didn't appear in my Condor's
pool, all services were up and didn't see any errors in logs but this one:

I'm running Condor 8.0.5 on Windows 7 Professional 64 Bit, this machine is
a submit only node.

StartLog:
01/30/14 18:48:30 slot1: New machine resource allocated
01/30/14 18:48:30 slot2: New machine resource allocated
01/30/14 18:48:30 slot3: New machine resource allocated
01/30/14 18:48:30 slot4: New machine resource allocated
01/30/14 18:48:30 ERROR "stat exec path (C:\condor\execute), errno: 2 (No
such file or directory)" at line 97 in file
c:\condor\execute\dir_29540\userdir\src\condor_startd.v6\util.cpp

Thank you,
Dennis.



My guess is the HTCondor service was still up because the
condor_master.exe daemon was likely still running, and attempting to
periodically restart the condor_startd.exe daemon.  The condor_master
should have been sending email about the problem to you, assuming you
configured the CONDOR_ADMIN and SMTP_SERVER settings in your condor_config
file.

The problem is the condor_startd needs c:\condor\execute to exist.

But given that this machine is just a submit node, why run a condor_startd
at all?  The condor_startd is the daemon that executes jobs, the
condor_schedd is the one that submits jobs.  I suggest removing "STARTD"
from the DAEMON_LIST entry in this machine's condor_config[.local] file.

Note that if you do not run a condor_startd, then the machine will not
appear when you do "condor_status", but that is just because by default
condor_status will show information about startds (i.e. execute nodes).
  You could see all your submit nodes by doing
   condor_status -schedd
and/or information about all users that have jobs submitted via
   condor_status -submitters

Hope the above helps,
Todd



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxxxxxx a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685