[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] ACCESS_VIOLATION under Windows



Came across this message in the archives, and didn't see a resolution in the thread.
 
If I had to guess, I'd say that condor doesn't realize that you're giving it a full path to java.exe.  It is still trying to find it inside the execute dir.  Look at this line from the log:
 
>  7/25 16:04:52 (fd:9) (pid:3940) About to exec
> C:\condor/execute\dir_3940\"C:\\Program
> Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE" -Xmx247m
> -classpath C:\condor/lib;C:\condor/lib/scimark2lib.jar;. -
> Dchirp.config=C:\condor\execute\dir_3940\chirp.config
> CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start
> C:\condor\execute\dir_3940\jvm.end JavaTest
 
While your Java location appears to be set correctly, you can see it's still trying to execute 
 
C:\condor/execute\dir_3940\"C:\\Program Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE"
 
I was having the same problem (which led me to the thread in the archives) and the problem was that condor was looking for java.exe in the execute dir.  I then specified the full path using forward slashes, and it works great.  You might try that.  From my condor_config:
 
JAVA = C:/Windows/System32/Java.exe
 
From my StarterLog, before:
 
1/16 10:41:47 About to exec C:\condor/execute\dir_4264\JAVA.EXE ...
 
After:
 
1/16 10:50:47 About to exec C:/Windows/System32/Java.exe ...
 
Michael
 
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
> Wojtek Goscinski
>  Sent: Sunday, July 29, 2007 8:19 PM
>  To: condor-users@xxxxxxxxxxx
>  Subject: [Condor-users] ACCESS_VIOLATION under Windows
>
>
>
>
> Hi All,
>
>  I'm experiencing a problem setting up a windows box as a condor execute
> node - specifically to execute java jobs.
>
>  I have a windows box running xp sp2. It is purely set up as an execute
> node. The start deamon picks successfully picks up the job and attempts to
> execute it. It spawns the condor_starter - but the condor_starter seems to
> crash with an exception (an ACCESS_VIOLATION).
>
>  As you can see in log below, the starter process seems to try to launch
> java, but this ends in an exception? The starter crashes immediately after
> that last log. I've confirmed that java exists at the location specified
> etc.
>
>  I assume this might be some sort of windows security issue, but I'm not
> sure how to debug it. The condor vm user was given rights to execute the
> java directory - though i'm not sure whether this is enough.
>
>  Any help or tips for debugging are most welcome.
>
>  -james
>
>
>  Start Log
>  -------------
>
>  7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: setting sock->decode()
>  7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: allowing an empty message
> for sock.
>  7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: Success.
>  7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: Command received via UDP from
> host < 172.19.189.3:9629>
>  7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: received command 60011
> (DC_NOP), calling handler (handle_nop())
>  7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
>  7/25 16:04:52 (fd:3) (pid:3636) Calling Handler
> <HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS>
>  7/25 16:04:52 (fd:3) (pid:3636) KEYCACHEX: removing session
> hp-test-02:3636:1185343491:6 for <172.19.189.3:9618 >
>  7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: pid 3940 exited with status
> -1073741819, invoking reaper 1 <reaper>
>  7/25 16:04:52 (fd:3) (pid:3636) Starter pid 3940 died on signal -1073741819
> (exception ACCESS_VIOLATION)
>  7/25 16:04:52 (fd:3) (pid:3636) Entering ProcFamily::hardkill
>  7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_c++_util\killfamily.C:274
>  7/25 16:04:52 (fd:3) (pid:3636) Destroying Daemon object:
>  7/25 16:04:52 (fd:3) (pid:3636) Type: 1 (any), Name: (null), Addr: <
> 172.19.189.3:9611>
>  7/25 16:04:52 (fd:3) (pid:3636) FullHost: (null), Host: (null), Pool:
> (null), Port: -1
>  7/25 16:04:52 (fd:3) (pid:3636) IsLocal: N, IdStr: (null), Error: (null)
>  7/25 16:04:52 (fd:3) (pid:3636)  --- End of Daemon object info ---
>  7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found
> (OpenProcess err=1308)
>  7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found
> (OpenProcess err=1308)
>  7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: parent: 3940 family:
>  7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: alive_cpu_user = 0, exited_cpu
> = 0, max_image = 3624k
>  7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_c++_util\killfamily.C:475
>  7/25 16:04:52 (fd:3) (pid:3636) Attempting to remove
> C:\condor\execute\dir_3940 as SuperUser (system)
>  7/25 16:04:52 (fd:3) (pid:3636) Deleted ProcFamily w/ pid 3940 as parent
>  7/25 16:04:52 (fd:3) (pid:3636) State change: starter exited
>  7/25 16:04:52 (fd:3) (pid:3636) Changing activity: Busy -> Idle
>  7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
>  7/25 16:04:52 (fd:3) (pid:3636) In cancel_timer(), id=66
>  7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
>  7/25 16:04:52 (fd:3) (pid:3636) In DaemonCore Timeout()
>  7/25 16:04:52 (fd:3) (pid:3636)
>
>  Starter Log
>  ----------------
>  7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: setting sock->decode()
>  7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: allowing an empty message
> for sock.
>  7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: Success.
>  7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: Command received via UDP from
> host < 172.19.189.3:9614>
>  7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: received command 60011
> (DC_NOP), calling handler (handle_nop())
>  7/25 16:04:51 (fd:8) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
>  7/25 16:04:51 (fd:8) (pid:3940) Calling Handler
> <HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS>
>  7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: tid 3300 exited with status 1,
> invoking reaper 2 <FileTransfer::Reaper()>
>  7/25 16:04:51 (fd:8) (pid:3940) File transfer completed successfully.
>  7/25 16:04:51 (fd:6) (pid:3940) Destroying Daemon object:
>  7/25 16:04:51 (fd:6) (pid:3940) Type: 1 (any), Name: (null), Addr:
> <172.19.189.3:9618>
>  7/25 16:04:51 (fd:6) (pid:3940) FullHost: (null), Host: (null), Pool:
> (null), Port: -1
>  7/25 16:04:51 (fd:6) (pid:3940) IsLocal: N, IdStr: (null), Error: (null)
>  7/25 16:04:51 (fd:6) (pid:3940)  --- End of Daemon object info ---
>  7/25 16:04:52 (fd:6) (pid:3940) Calling client FileTransfer handler
> function.
>  7/25 16:04:52 (fd:6) (pid:3940) in DaemonCore NewTimer()
>  7/25 16:04:52 (fd:6) (pid:3940)
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492,
> period = 0, handler_descrip=<deferred job start>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551,
> period = 0, handler_descrip=<dc_touch_log_file>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731,
> period = 240, handler_descrip=<self_monitor>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791,
> period = 300, handler_descrip=<check_session_cache>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661,
> period = 1170,
> handler_descrip=<DaemonCore::SendAliveToParent>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292,
> period = 1801, handler_descrip=<handle_cookie_refresh>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290,
> period = 0, handler_descrip=<DaemonCore::ReInit()>
>  7/25 16:04:52 (fd:6) (pid:3940)
>  7/25 16:04:52 (fd:6) (pid:3940) leaving DaemonCore NewTimer, id=7
>  7/25 16:04:52 (fd:6) (pid:3940) Job 71.0 set to execute immediately
>  7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
>  7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
> ..\src\condor_daemon_core.V6\daemon_core.C:2743
>  7/25 16:04:52 (fd:6) (pid:3940) In DaemonCore Timeout()
>  7/25 16:04:52 (fd:6) (pid:3940)
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492,
> period = 0, handler_descrip=<deferred job start>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551,
> period = 0, handler_descrip=<dc_touch_log_file>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731,
> period = 240, handler_descrip=<self_monitor>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791,
> period = 300, handler_descrip=<check_session_cache>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661,
> period = 1170,
> handler_descrip=<DaemonCore::SendAliveToParent>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292,
> period = 1801, handler_descrip=<handle_cookie_refresh>
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290,
> period = 0, handler_descrip=<DaemonCore::ReInit()>
>  7/25 16:04:52 (fd:6) (pid:3940)
>  7/25 16:04:52 (fd:6) (pid:3940) DaemonCore: Calling handler for Timer 7
> (deferred job start)
>  7/25 16:04:52 (fd:6) (pid:3940) Starting a JAVA universe job with ID: 71.0
>  7/25 16:04:52 (fd:6) (pid:3940) In OsProc::OsProc()
>  7/25 16:04:52 (fd:6) (pid:3940) Main job KillSignal: 15 (Unknown)
>  7/25 16:04:52 (fd:6) (pid:3940) Main job RmKillSignal: 15 (Unknown)
>  7/25 16:04:52 (fd:6) (pid:3940) Main job HoldKillSignal: 15 (Unknown)
>  7/25 16:04:52 (fd:6) (pid:3940) SYSAPI_GET_LOADAVG is undefined, using
> default value of True
>  7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Cmd="C:\\Program
> Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE"
>  7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Args=-Xmx247m -classpath
> C:\condor/lib;C:\condor/lib/scimark2lib.jar;.
> -Dchirp.config=C:\condor\execute\dir_3940\chirp.config
> CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start
> C:\condor\execute\dir_3940\jvm.end JavaTest
>  7/25 16:04:52 (fd:6) (pid:3940) in VanillaProc::StartJob()
>  7/25 16:04:52 (fd:6) (pid:3940) in OsProc::StartJob()
>  7/25 16:04:52 (fd:6) (pid:3940) IWD: C:\condor/execute\dir_3940
>  7/25 16:04:52 (fd:6) (pid:3940) get_port_range - (LOWPORT,HIGHPORT) is
> (9600,9700).
>  7/25 16:04:52 (fd:6) (pid:3940) TokenCache contents:
>  condor-reuse-vm1@.
>  7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_USER at
> ..\src\condor_starter.V6.1\os_proc.C:227
>  7/25 16:04:52 (fd:7) (pid:3940) Input file: NUL
>  7/25 16:04:52 (fd:8) (pid:3940) Output file:
> C:\condor/execute\dir_3940\JavaTest.output.0
>  7/25 16:04:52 (fd:9) (pid:3940) Error file:
> C:\condor/execute\dir_3940\JavaTest.error.0
>  7/25 16:04:52 (fd:9) (pid:3940) Doing CONDOR_begin_execution
>  7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0
>  7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1
>  7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0
>  7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1
>  7/25 16:04:52 (fd:9) (pid:3940) Renice expr "10" evaluated to 10
>  7/25 16:04:52 (fd:9) (pid:3940) About to exec
> C:\condor/execute\dir_3940\"C:\\Program
> Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE" -Xmx247m
> -classpath C:\condor/lib;C:\condor/lib/scimark2lib.jar;. -
> Dchirp.config=C:\condor\execute\dir_3940\chirp.config
> CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start
> C:\condor\execute\dir_3940\jvm.end JavaTest
>  7/25 16:04:52 (fd:9) (pid:3940) Env =
> _CONDOR_SCRATCH_DIR=C:\condor\execute\dir_3940
> _CONDOR_HIGHPORT=9700 _CONDOR_LOWPORT=9600
>  7/25 16:04:52 (fd:9) (pid:3940)
> JOB_INHERITS_STARTER_ENVIRONMENT is undefined, using
> default value of False
>  7/25 16:04:52 (fd:9) (pid:3940) PRIV_USER --> PRIV_CONDOR at
> ..\src\condor_starter.V6.1\os_proc.C:343
>  7/25 16:04:52 (fd:9) (pid:3940) In
> DaemonCore::Create_Process(C:\condor/execute\dir_3940\"C:\\Program
> Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE",...)
>
>
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>