[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Starter exited with status -1073740940



I'm running HTCondor 8.2.1 in a small cluster on AWS and I'm having a
hard time getting my Windows jobs to run. The Windows execute node is
Server 2k8 R2 (which HTCondor identifies as Windows 7). The job
matches, appears to start, but then the condor_starter.exe dies. The
StartLog records:

ïïïïïïï07/10/14 17:48:47 condor_read() failed: recv(fd=1012) returned
-1, errno = 10054 , reading 5 bytes from <127.0.0.1:50882>.
07/10/14 17:48:47 IO: Failed to read packet header
07/10/14 17:48:47 Closing job ClassAd update socket from starter.
07/10/14 17:48:47 Starter pid 336 exited with status -1073740940

>From the StarterLog:
07/10/14 17:48:47 (fd:7) (pid:336) (D_HOSTNAME) Daemon client (shadow)
address determined: name: "ip-10-151-7-218.ec2.internal", pool:
"NULL", alias: "NULL", addr: "<10.151.7.218:48140?noUDP>"
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) Communicating with
shadow <10.151.7.218:48140?noUDP>
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) Submitting machine is
"ip-10-151-7-218.ec2.internal"
07/10/14 17:48:47 (fd:7) (pid:336) (D_SYSCALLS) Doing
CONDOR_register_starter_info
07/10/14 17:48:47 (fd:7) (pid:336) (D_NETWORK) condor_write(fd=604
<10.151.7.218:59144>,,size=515,timeout=300,flags=0,non_blocking=0)
07/10/14 17:48:47 (fd:7) (pid:336) (D_NETWORK) condor_read(fd=604
<10.151.7.218:59144>,,size=5,timeout=300,flags=0,non_blocking=0)
07/10/14 17:48:47 (fd:7) (pid:336) (D_NETWORK) condor_read(fd=604
<10.151.7.218:59144>,,size=8,timeout=300,flags=0,non_blocking=0)
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) setting the orig job
name in starter
07/10/14 17:48:47 (fd:7) (pid:336) (D_ALWAYS) setting the orig job iwd
in starter
07/10/14 17:48:47 (fd:7) (pid:336) (D_PRIV) PRIV_CONDOR -->
PRIV_CONDOR at c:\condor\execute\dir_18128\userdir\src\condor_starter.v6.1\basestarter.cpp:1789

And then it goes poof. I see on the MagicNumbers page[1] that negative
statuses might mean "Possibly missing libraries or missing functions
in libraries on Windows. Try running from the command line to see if
you get any errors." I tried running from the command line and got no
output, error or otherwise. The other daemons seem to be fine. Any
ideas what's going on here?

[1] https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=MagicNumbers


Thanks,
BC

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing