[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] ProcAPI sanity failure, age = -98161996



This problem seems to have been related to job privileges. 

I followed ("Job Privileges):
http://condor.optena.com/display/CONDOR/Common+Windows+Problems

And made it so condor runs as me (a privileged user) on each machine. My "hello-world" seems to work fine now.

Thanks!
Matt

 

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Matthew Galati
> Sent: Monday, March 13, 2006 12:56 PM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] ProcAPI sanity failure, age = -98161996
> 
> Here is the corresponding StarterLog.vm2 on ORCLUS01.na.sas.com.
> 
> Thanks,
> Matt
> 
> 
> 3/13 10:57:06 ******************************************************
> 3/13 10:57:06 ** condor_starter (CONDOR_STARTER) STARTING UP
> 3/13 10:57:06 ** C:\condor\bin\condor_starter.exe
> 3/13 10:57:06 ** $CondorVersion: 6.7.17 Feb 18 2006 $
> 3/13 10:57:06 ** $CondorPlatform: INTEL-WINNT50 $
> 3/13 10:57:06 ** PID = 3660
> 3/13 10:57:06 ******************************************************
> 3/13 10:57:06 Using config file: C:\condor\condor_config
> 3/13 10:57:06 Using local config files: C:\condor/condor_config.local
> 3/13 10:57:06 DaemonCore: Command Socket at <10.40.12.183:4696>
> 3/13 10:57:06 SEC_DEFAULT_SESSION_DURATION is undefined, 
> using default value of 3600
> 3/13 10:57:06 Setting resource limits not implemented!
> 3/13 10:57:06 STARTER_TIMEOUT_MULTIPLIER is undefined, using 
> default value of 0
> 3/13 10:57:06 Communicating with shadow <10.40.12.183:4689>
> 3/13 10:57:06 Shadow version: $CondorVersion: 6.7.17 Feb 18 2006 $
> 3/13 10:57:06 Submitting machine is "ORCLUS01.na.sas.com"
> 3/13 10:57:06 ShouldTransferFiles is "YES", transfering files
> 3/13 10:57:06 STARTER_ALLOW_RUNAS_OWNER is undefined, using 
> default value of False
> 3/13 10:57:06 init_user_ids: want user 'nobody@.', current is 
> '(null)@(null)'
> 3/13 10:57:06 Using dynamic user account.
> 3/13 10:57:06 dynuser: Re-enabling account (condor-reuse-vm2)
> 3/13 10:57:06 dynuser::createuser(condor-reuse-vm2) successful
> 3/13 10:57:06 perm::init() starting up for account 
> (condor-reuse-vm2) domain (NULL)
> 3/13 10:57:06 perm::init: Found Account Name condor-reuse-vm2
> 3/13 10:57:06 Done moving to directory "C:\condor\execute\dir_3660"
> 3/13 10:57:06 TokenCache contents: 
> condor-reuse-vm2@.
> 3/13 10:57:06 JICShadow::initIOProxy(): Job does not define 
> WantIOProxy
> 3/13 10:57:06 No StarterUserLog found in job ClassAd
> 3/13 10:57:06 Starter will not write a local UserLog
> 3/13 10:57:06 Changing the executable name
> 3/13 10:57:06 entering FileTransfer::Init
> 3/13 10:57:06 entering FileTransfer::SimpleInit
> 3/13 10:57:06 TransferIntermediate="(none)"
> 3/13 10:57:06 entering FileTransfer::DownloadFiles
> 3/13 10:57:06 STARTER_TIMEOUT_MULTIPLIER is undefined, using 
> default value of 0
> 3/13 10:57:06 entering FileTransfer::Download
> 3/13 10:57:06 About to sock duplicate, old sock=6C0 new 
> sock=FFFFFFFF state=0
> 3/13 10:57:06 Socket duplicated, old sock=6C0 new sock=698 state=0
> 3/13 10:57:06 In win32_thread_start_func
> 3/13 10:57:06 entering FileTransfer::DownloadThread
> 3/13 10:57:06 entering FileTransfer::DoDownload sync=1
> 3/13 10:57:06 TokenCache contents: 
> condor-reuse-vm2@.
> 3/13 10:57:06 get_file(): going to write to filename 
> C:\condor/execute\dir_3660\condor_exec.exe
> 3/13 10:57:06 get_file: Receiving 473 bytes
> 3/13 10:57:06 get_file: wrote 473 bytes to file
> 3/13 10:57:06 ReliSock::get_file_with_permissions(): received 
> null permissions from peer, not setting
> 3/13 10:57:06 ProcAPI sanity failure, cpuusage = -0.000000
> 3/13 10:57:06 ProcAPI sanity failure, age = -98162766
> 3/13 10:57:06 STARTER_TIMEOUT_MULTIPLIER is undefined, using 
> default value of 0
> 3/13 10:57:06 File transfer completed successfully.
> 3/13 10:57:07 Calling client FileTransfer handler function.
> 3/13 10:57:07 Job 13.1 set to execute immediately
> 3/13 10:57:07 DaemonCore: in SendAliveToParent()
> 3/13 10:57:07 DaemonCore: attempting to connect to 
> '<10.40.12.183:1737>'
> 3/13 10:57:07 STARTER_TIMEOUT_MULTIPLIER is undefined, using 
> default value of 0
> 3/13 10:57:07 SEC_TCP_SESSION_TIMEOUT is undefined, using 
> default value of 20
> 3/13 10:57:07 Starting a VANILLA universe job with ID: 13.1
> 3/13 10:57:07 In OsProc::OsProc()
> 3/13 10:57:07 Main job KillSignal: 15 (Unknown)
> 3/13 10:57:07 Main job RmKillSignal: 15 (Unknown)
> 3/13 10:57:07 Main job HoldKillSignal: 15 (Unknown)
> 3/13 10:57:07 in VanillaProc::StartJob()
> 3/13 10:57:07 Executable is .bat, so running 
> C:\WINDOWS\system32\cmd.exe /Q /C condor_exec.bat
> 3/13 10:57:07 in OsProc::StartJob()
> 3/13 10:57:07 IWD: C:\condor/execute\dir_3660
> 3/13 10:57:07 TokenCache contents: 
> condor-reuse-vm2@.
> 3/13 10:57:07 Input file: NUL
> 3/13 10:57:07 Output file: C:\condor/execute\dir_3660\hello1.out
> 3/13 10:57:07 Error file: NUL
> 3/13 10:57:07 Renice expr "10" evaluated to 10
> 3/13 10:57:07 About to exec C:\WINDOWS\system32\cmd.exe 
> condor_exec.exe /Q /C condor_exec.bat
> 3/13 10:57:07 Env = _CONDOR_SCRATCH_DIR=C:\condor\execute\dir_3660
> 3/13 10:57:07 GetBinaryType() returned 0
> 3/13 10:57:07 TokenCache contents: 
> condor-reuse-vm2@.
> 3/13 10:57:07 Create_Process: CreateProcess failed, errno=5
> 3/13 10:57:07 ERROR 
> "Create_Process(C:\WINDOWS\system32\cmd.exe,condor_exec.exe 
> /Q /C condor_exec.bat, ...) failed" at line 373 in file 
> ..\src\condor_starter.V6.1\os_proc.C
> 3/13 10:57:07 ShutdownFast all jobs.
> 3/13 10:57:07 Got ShutdownFast when no jobs running.
> 3/13 10:57:31 NET_REMAP_ENABLE is undefined, using default 
> value of False
> 
> 
> 
> > > Here's the shadow log on the submit machine - I am not sure
> > if that helps... 
> > > 
> > 
> > What would be more useful would be StarterLog.vm2 on 
> > ORCLUS01.na.sas.com
> > 
> > > 
> > > In the MasterLog, I also keep seeing the following: 
> > "ProcAPI sanity failure, age = xxxx". This error seems serious.
> > 
> > I think we fixed this bug just this morning (the tyep we were using 
> > didn't have enough precision, hence the bogus value) - it 
> will be in 
> > 6.7.18.
> > 
> > -Erik
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>