[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] ACCESS_VIOLATION under Windows



Wojtek Goscinski wrote:
It looks like this might be a user privileges issue. I've just
discovered some logs in the windows event viewer, including: "Unkown
username or bad password" and "The user has not been granted the
requested logon type at this machine", both for the condor-reuse-vm1.


Maybe something with your local security policy?

In both cases the user exists and i've given that user rights to "log
on locally" and "log on as a batch job" through the Local Security
Settings.


Condor should give the condor-reuse-vm1 account all the required privileges when it first creates the account (assuming the service is started as local system). I wonder if just deleting the account and letting Condor re-create it would help?

Just to confirm, does the condor_starter run as a service or does it
run as the condor-reuse-vm1 user?


The condor_starter process itself will run as LocalSystem (aka a service), but it launches the job as condor-reuse-vm1, or whatever other user is specified in the config file, or as the submitting user if that is requested in the config file.

-Todd





Regards,

James

On 8/8/07, Wojtek Goscinski <Wojtek.Goscinski@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Ben, just to provide you with some additional information about things
I've tried...

I have also tried the latest development release.

In addition, i have tried turning off DEP, which seems to be sometimes
mentioned when discussing the ACCESS_VIOLATION exception:
https://lists.cs.wisc.edu/archive/condor-users/2007-May/msg00103.shtml

Neither of these solved the issue.

Do you have any other suggestions?


Regards,

James


On 8/3/07, Wojtek Goscinski <Wojtek.Goscinski@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Ben,

Please find attached the logs and configs for the box in question.
Regards,

james


On 8/2/07, Ben Burnett <burnett@xxxxxxxxxxx> wrote:
James:

That's strange; however, you have set the configuration correctly, so it's
nothing you're missing--it sounds as if they haven't been created.  Could
you try turning your debugging level up (STARTER_DEBUG = D_ALL), re-run the
job, and repost the resulting logs in full.

-B

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Wojtek Goscinski
Sent: Wednesday, August 01, 2007 12:08 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] ACCESS_VIOLATION under Windows

Hi Ben,

SMTP is currently unavailable from that machine - a firewall issue
which i'm getting fixed.
I set CREATE_CORE_FILES = true - which i assume should give me a core
file in the log directory? However, I do not receive a core file in
either the machines log directory or the directory i submitted the
java job from.

Am i missing something? do i have to set something else for core files
to be dumped to log, or is it possible that a core file is not
created?

Regards,

James


On 7/31/07, Ben Burnett <burnett@xxxxxxxxxxx> wrote:



Hi James:



I wonder if you could post the core file from the execute node's
starter-it
should have been emailed to your admin email after the crash.



-B




From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
Wojtek Goscinski
 Sent: Sunday, July 29, 2007 8:19 PM
 To: condor-users@xxxxxxxxxxx
 Subject: [Condor-users] ACCESS_VIOLATION under Windows




Hi All,

 I'm experiencing a problem setting up a windows box as a condor execute
node - specifically to execute java jobs.

 I have a windows box running xp sp2. It is purely set up as an execute
node. The start deamon picks successfully picks up the job and attempts to
execute it. It spawns the condor_starter - but the condor_starter seems to
crash with an exception (an ACCESS_VIOLATION).

 As you can see in log below, the starter process seems to try to launch
java, but this ends in an exception? The starter crashes immediately after
that last log. I've confirmed that java exists at the location specified
etc.

 I assume this might be some sort of windows security issue, but I'm not
sure how to debug it. The condor vm user was given rights to execute the
java directory - though i'm not sure whether this is enough.

 Any help or tips for debugging are most welcome.

 -james


 Start Log
 -------------

 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: setting sock->decode()
 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: allowing an empty
message
for sock.
 7/25 16:04:52 (fd:3) (pid:3636) DC_AUTHENTICATE: Success.
 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: Command received via UDP from
host < 172.19.189.3:9629>
 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: received command 60011
(DC_NOP), calling handler (handle_nop())
 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_daemon_core.V6\daemon_core.C:2743
 7/25 16:04:52 (fd:3) (pid:3636) Calling Handler
<HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS>
 7/25 16:04:52 (fd:3) (pid:3636) KEYCACHEX: removing session
hp-test-02:3636:1185343491:6 for <172.19.189.3:9618 >
 7/25 16:04:52 (fd:3) (pid:3636) DaemonCore: pid 3940 exited with status
-1073741819, invoking reaper 1 <reaper>
 7/25 16:04:52 (fd:3) (pid:3636) Starter pid 3940 died on signal
-1073741819
(exception ACCESS_VIOLATION)
 7/25 16:04:52 (fd:3) (pid:3636) Entering ProcFamily::hardkill
 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_c++_util\killfamily.C:274
 7/25 16:04:52 (fd:3) (pid:3636) Destroying Daemon object:
 7/25 16:04:52 (fd:3) (pid:3636) Type: 1 (any), Name: (null), Addr: <
172.19.189.3:9611>
 7/25 16:04:52 (fd:3) (pid:3636) FullHost: (null), Host: (null), Pool:
(null), Port: -1
 7/25 16:04:52 (fd:3) (pid:3636) IsLocal: N, IdStr: (null), Error: (null)
 7/25 16:04:52 (fd:3) (pid:3636)  --- End of Daemon object info ---
 7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found
(OpenProcess err=1308)
 7/25 16:04:52 (fd:3) (pid:3636) ProcAPI: pid # 3940 was not found
(OpenProcess err=1308)
 7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: parent: 3940 family:
 7/25 16:04:52 (fd:3) (pid:3636) ProcFamily: alive_cpu_user = 0,
exited_cpu
= 0, max_image = 3624k
 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_c++_util\killfamily.C:475
 7/25 16:04:52 (fd:3) (pid:3636) Attempting to remove
C:\condor\execute\dir_3940 as SuperUser (system)
 7/25 16:04:52 (fd:3) (pid:3636) Deleted ProcFamily w/ pid 3940 as parent
 7/25 16:04:52 (fd:3) (pid:3636) State change: starter exited
 7/25 16:04:52 (fd:3) (pid:3636) Changing activity: Busy -> Idle
 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_daemon_core.V6\daemon_core.C:2743
 7/25 16:04:52 (fd:3) (pid:3636) In cancel_timer(), id=66
 7/25 16:04:52 (fd:3) (pid:3636) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_daemon_core.V6\daemon_core.C:2743
 7/25 16:04:52 (fd:3) (pid:3636) In DaemonCore Timeout()
 7/25 16:04:52 (fd:3) (pid:3636)

 Starter Log
 ----------------
 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: setting sock->decode()
 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: allowing an empty
message
for sock.
 7/25 16:04:51 (fd:8) (pid:3940) DC_AUTHENTICATE: Success.
 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: Command received via UDP from
host < 172.19.189.3:9614>
 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: received command 60011
(DC_NOP), calling handler (handle_nop())
 7/25 16:04:51 (fd:8) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_daemon_core.V6\daemon_core.C:2743
 7/25 16:04:51 (fd:8) (pid:3940) Calling Handler
<HandleDC_SERVICEWAITPIDS()> for Signal 60009 <DC_SERVICEWAITPIDS>
 7/25 16:04:51 (fd:8) (pid:3940) DaemonCore: tid 3300 exited with status
1,
invoking reaper 2 <FileTransfer::Reaper()>
 7/25 16:04:51 (fd:8) (pid:3940) File transfer completed successfully.
 7/25 16:04:51 (fd:6) (pid:3940) Destroying Daemon object:
 7/25 16:04:51 (fd:6) (pid:3940) Type: 1 (any), Name: (null), Addr:
<172.19.189.3:9618>
 7/25 16:04:51 (fd:6) (pid:3940) FullHost: (null), Host: (null), Pool:
(null), Port: -1
 7/25 16:04:51 (fd:6) (pid:3940) IsLocal: N, IdStr: (null), Error: (null)
 7/25 16:04:51 (fd:6) (pid:3940)  --- End of Daemon object info ---
 7/25 16:04:52 (fd:6) (pid:3940) Calling client FileTransfer handler
function.
 7/25 16:04:52 (fd:6) (pid:3940) in DaemonCore NewTimer()
 7/25 16:04:52 (fd:6) (pid:3940)
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492,
period = 0, handler_descrip=<deferred job start>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551,
period = 0, handler_descrip=<dc_touch_log_file>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731,
period = 240, handler_descrip=<self_monitor>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791,
period = 300, handler_descrip=<check_session_cache>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661,
period = 1170,
handler_descrip=<DaemonCore::SendAliveToParent>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292,
period = 1801, handler_descrip=<handle_cookie_refresh>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290,
period = 0, handler_descrip=<DaemonCore::ReInit()>
 7/25 16:04:52 (fd:6) (pid:3940)
 7/25 16:04:52 (fd:6) (pid:3940) leaving DaemonCore NewTimer, id=7
 7/25 16:04:52 (fd:6) (pid:3940) Job 71.0 set to execute immediately
 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_daemon_core.V6\daemon_core.C:2743
 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_CONDOR at
..\src\condor_daemon_core.V6\daemon_core.C:2743
 7/25 16:04:52 (fd:6) (pid:3940) In DaemonCore Timeout()
 7/25 16:04:52 (fd:6) (pid:3940)
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> Timers
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> ~~~~~~
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 7, when = 1185343492,
period = 0, handler_descrip=<deferred job start>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 6, when = 1185343551,
period = 0, handler_descrip=<dc_touch_log_file>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 3, when = 1185343731,
period = 240, handler_descrip=<self_monitor>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 1, when = 1185343791,
period = 300, handler_descrip=<check_session_cache>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 5, when = 1185344661,
period = 1170,
handler_descrip=<DaemonCore::SendAliveToParent>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 2, when = 1185345292,
period = 1801, handler_descrip=<handle_cookie_refresh>
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore--> id = 4, when = 1185372290,
period = 0, handler_descrip=<DaemonCore::ReInit()>
 7/25 16:04:52 (fd:6) (pid:3940)
 7/25 16:04:52 (fd:6) (pid:3940) DaemonCore: Calling handler for Timer 7
(deferred job start)
 7/25 16:04:52 (fd:6) (pid:3940) Starting a JAVA universe job with ID:
71.0
 7/25 16:04:52 (fd:6) (pid:3940) In OsProc::OsProc()
 7/25 16:04:52 (fd:6) (pid:3940) Main job KillSignal: 15 (Unknown)
 7/25 16:04:52 (fd:6) (pid:3940) Main job RmKillSignal: 15 (Unknown)
 7/25 16:04:52 (fd:6) (pid:3940) Main job HoldKillSignal: 15 (Unknown)
 7/25 16:04:52 (fd:6) (pid:3940) SYSAPI_GET_LOADAVG is undefined, using
default value of True
 7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Cmd="C:\\Program
Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE"
 7/25 16:04:52 (fd:6) (pid:3940) JavaProc: Args=-Xmx247m -classpath
C:\condor/lib;C:\condor/lib/scimark2lib.jar;.
-Dchirp.config=C:\condor\execute\dir_3940\chirp.config
CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start
C:\condor\execute\dir_3940\jvm.end JavaTest
 7/25 16:04:52 (fd:6) (pid:3940) in VanillaProc::StartJob()
 7/25 16:04:52 (fd:6) (pid:3940) in OsProc::StartJob()
 7/25 16:04:52 (fd:6) (pid:3940) IWD: C:\condor/execute\dir_3940
 7/25 16:04:52 (fd:6) (pid:3940) get_port_range - (LOWPORT,HIGHPORT) is
(9600,9700).
 7/25 16:04:52 (fd:6) (pid:3940) TokenCache contents:
 condor-reuse-vm1@.
 7/25 16:04:52 (fd:6) (pid:3940) PRIV_CONDOR --> PRIV_USER at
..\src\condor_starter.V6.1\os_proc.C:227
 7/25 16:04:52 (fd:7) (pid:3940) Input file: NUL
 7/25 16:04:52 (fd:8) (pid:3940) Output file:
C:\condor/execute\dir_3940\JavaTest.output.0
 7/25 16:04:52 (fd:9) (pid:3940) Error file:
C:\condor/execute\dir_3940\JavaTest.error.0
 7/25 16:04:52 (fd:9) (pid:3940) Doing CONDOR_begin_execution
 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0
 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1
 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfds=0
 7/25 16:04:52 (fd:9) (pid:3940) condor_read(): nfound=1
 7/25 16:04:52 (fd:9) (pid:3940) Renice expr "10" evaluated to 10
 7/25 16:04:52 (fd:9) (pid:3940) About to exec
C:\condor/execute\dir_3940\"C:\\Program
Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE" -Xmx247m
-classpath C:\condor/lib;C:\condor/lib/scimark2lib.jar;. -
Dchirp.config=C:\condor\execute\dir_3940\chirp.config
CondorJavaWrapper C:\condor\execute\dir_3940\jvm.start
C:\condor\execute\dir_3940\jvm.end JavaTest
 7/25 16:04:52 (fd:9) (pid:3940) Env =
_CONDOR_SCRATCH_DIR=C:\condor\execute\dir_3940
_CONDOR_HIGHPORT=9700 _CONDOR_LOWPORT=9600
 7/25 16:04:52 (fd:9) (pid:3940)
JOB_INHERITS_STARTER_ENVIRONMENT is undefined, using
default value of False
 7/25 16:04:52 (fd:9) (pid:3940) PRIV_USER --> PRIV_CONDOR at
..\src\condor_starter.V6.1\os_proc.C:343
 7/25 16:04:52 (fd:9) (pid:3940) In
DaemonCore::Create_Process(C:\condor/execute\dir_3940\"C:\\Program
Files\\Java\\jre1.5.0_06\\bin\\JAVA.EXE",...)





_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/