[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Condor starter crashes in NTsenders



Colin,

Unfortunately, I can't.  I cached the logs for the client then, but not the logfiles on the Central Manager.  They've rolled over long since.

Since then, I've changed the network connection that that grid node was using form a wireless card to a direct line into the LAN.  Starter is still continuously exiting, but not with that particular error.

For the current problem, I checked ShadowLog on the Central Manager, and found this:
3/10 14:09:56 (fd:5) (1820.0) (2940): DoUpload: Permission denied to read file C:\Condor/spool\cluster1820.proc0.subproc0\azetidine_t2.out.0!
3/10 14:09:56 (fd:5) (1820.0) (2940): DoUpload: exiting at 1154

The file in question is one of the intermediate files created by the job as a manual checkpoint.

StarterLog on the grid node at this time period reads:
3/10 14:09:56 (fd:5) DaemonCore: Command received via UDP from host <192.168.33.165:3208>
3/10 14:09:56 (fd:5) DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
3/10 14:09:56 (fd:5) DaemonCore: tid 1280 exited with status 0, invoking reaper 2 <FileTransfer::Reaper()>
3/10 14:09:56 (fd:5) File transfer failed (status=0).
3/10 14:09:56 (fd:3) Calling client FileTransfer handler function.
3/10 14:09:56 (fd:3) ERROR "Failed to transfer files" at line 577 in file ..\src\condor_starter.V6.1\starter_class.C
3/10 14:09:56 (fd:3) ShutdownFast all jobs.

Does this help?
-David


-----Original Message-----
From: Colin Stolley [mailto:stolley@xxxxxxxxxxx]
Sent: Tuesday, March 09, 2004 6:14 PM
To: condor-users@xxxxxxxxxxx
Subject: [SPAM] - Re: [condor-users] Condor starter crashes in NTsenders
- Email found in subject


>then immediately crashes.  The StarterLog on the run machine contains this 
>to explain the crashes:
>
>3/4 16:41:16 (fd:3) In CStarter::StartJob()
>3/4 16:41:16 (fd:3) Doing CONDOR_get_job_info
>3/4 16:41:16 (fd:3) ERROR "Assertion ERROR on (result)" at line 148 in 
>file ..\src\condor_starter.V6.1\NTsenders.C
>3/4 16:41:16 (fd:3) ShutdownFast all jobs.

Can you post a snippet of the corresponding ShadowLog when this happens?

thanks,
Colin
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>