[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Args not found error



In fact I submitted from a 6.7.14 schedd to a 6.7.14 startd.  This did
originate however from a Condor-C schedd running 6.7.18.  The job file I
used is:

universe = grid
executable = pi-compute
arguments= 5000000
output = out.$(Process)
log = log.$(Process)

# Condor
grid_resource = $$(Resource)
queue 1

Dan Bradley wrote:
> Ryan,
>
> There was a bug in 6.7.15 through 6.7.17 that caused an incompatibility  
> problem when submitting jobs from these versions to an older starter  
> (in your case 6.7.14).  Am I guessing correctly that you submitted the  
> job from a 6.7.15-6.7.17 schedd?
>
> Upgrading to 6.7.18 should solve the problem.  Another possible  
> workaround is to always use the "old style" arguments syntax in your  
> submit file (no quoting) and if you have no arguments at all, to  
> explicitly set arguments to an empty value in your submit file.   
> Example:
>
> arguments=
>
> --Dan
>
> On Apr 5, 2006, at 6:01 PM, Ryan Garver wrote:
>
>   
>> I'm getting a weird error when I submit a job.  The program runs fine
>> from a local console; however, when run through condor (in the vanilla
>> universe) I get a strange error:
>>
>> 4/5 15:54:08 ******************************************************
>> 4/5 15:54:08 ** condor_starter (CONDOR_STARTER) STARTING UP
>> 4/5 15:54:08 ** /home/condor/6.7.14/sbin/condor_starter
>> 4/5 15:54:08 ** $CondorVersion: 6.7.14 Dec 13 2005 $
>> 4/5 15:54:08 ** $CondorPlatform: I386-LINUX_RH9 $
>> 4/5 15:54:08 ** PID = 28678
>> 4/5 15:54:08 ******************************************************
>> 4/5 15:54:08 Using config file: /home/condor/condor_config
>> 4/5 15:54:08 Using local config files:
>> /home/condor/hosts/sei/condor_config.local
>> 4/5 15:54:08 DaemonCore: Command Socket at <128.111.45.22:45276>
>> 4/5 15:54:08 Done setting resource limits
>> 4/5 15:54:08 Communicating with shadow <128.111.45.35:51873>
>> 4/5 15:54:08 Submitting machine is "pompone.cs.ucsb.edu"
>> 4/5 15:54:08 Starting a VANILLA universe job with ID: 1956.0
>> 4/5 15:54:08 Args not found in JobAd.  Aborting OsProc::StartJob.
>> 4/5 15:54:08 Failed to start job, exiting
>> 4/5 15:54:08 ShutdownFast all jobs.
>> 4/5 15:54:08 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0
>>
>> This is funny because I do have an Arguments value set in my JobAd,
>> and the binary that ends up in the spool directory runs as expected:
>>
>> $ condor_q -long
>> -- Submitter: pompone.cs.ucsb.edu : <128.111.45.35:34041> :
>> pompone.cs.ucsb.edu
>> MyType = "Job"
>> TargetType = "Machine"
>> GlobalJobId = "pompone.cs.ucsb.edu#1144276112#1956.0"
>> RootDir = "/"
>> MinHosts = 1
>> WantRemoteSyscalls = FALSE
>> WantCheckpoint = FALSE
>> RemoteSpoolDir =
>> "/tmp/home/rgarver/dynamic_condor/localcondor/conf.noir/spool/ 
>> cluster2.proc0.subproc0"
>> JobPrio = 0
>> NiceUser = FALSE
>> WantRemoteIO = TRUE
>> CoreSize = 0
>> KillSig = "SIGTERM"
>> Rank = 0.000000
>> In = "/dev/null"
>> TransferIn = FALSE
>> Out = "out.0"
>> StreamOut = FALSE
>> Err = "/dev/null"
>> TransferErr = FALSE
>> BufferSize = 524288
>> BufferBlockSize = 32768
>> ShouldTransferFiles = "NO"
>> TransferFiles = "NEVER"
>> ImageSize = 12
>> ExecutableSize = 12
>> DiskUsage = 12
>> Requirements = TRUE
>> GlobusResubmit = FALSE
>> GlobusStatus = 32
>> NumGlobusSubmits = 0
>> JobUniverse = 5
>> QDate = 1144276072
>> CompletionDate = 0
>> LocalUserCpu = 0.000000
>> LocalSysCpu = 0.000000
>> RemoteUserCpu = 0.000000
>> RemoteSysCpu = 0.000000
>> ExitStatus = 0
>> NumCkpts = 0
>> NumRestarts = 0
>> NumSystemHolds = 0
>> CommittedTime = 0
>> TotalSuspensions = 0
>> CumulativeSuspensionTime = 0
>> ExitBySignal = FALSE
>> JobNotification = 0
>> LeaveJobInQueue = JobStatus == 4
>> User = "rgarver@xxxxxxxxxxx"
>> Owner = "rgarver"
>> PeriodicRemove = (StageInFinish > 0) =!= TRUE && CurrentTime > QDate +
>> 28800
>> SubmitterId = "rgarver@xxxxxxxxxxxxxxxxxxx"
>> Arguments = "5000000"
>> Environment = ""
>> ClusterId = 1956
>> ProcId = 0
>> StageInStart = 1144276132
>> SUBMIT_Iwd =
>> "/tmp/home/rgarver/condor_install/daisy/conf.pompone/spool/ 
>> cluster2.proc0.subproc0"
>> Iwd = "/home/condor/hosts/pompone/spool/cluster1956.proc0.subproc0"
>> SUBMIT_Cmd =
>> "/tmp/home/rgarver/condor_install/daisy/conf.pompone/spool/ 
>> cluster2.proc0.subproc0/pi-compute"
>> Cmd =
>> "/home/condor/hosts/pompone/spool/cluster1956.proc0.subproc0/pi- 
>> compute"
>> StageInFinish = 1144276133
>> ReleaseReason = "Data files spooled"
>> LastHoldReason = "Spooling input data files"
>> JobStartDate = 1144276138
>> PeriodicHold = FALSE
>> PeriodicRelease = FALSE
>> OnExitHold = FALSE
>> OnExitRemove = TRUE
>> WantMatchDiagnostics = TRUE
>> LastMatchTime = 1144277635
>> NumJobMatches = 7
>> OrigMaxHosts = 1
>> LastJobLeaseRenewal = 1144277648
>> JobLastStartDate = 1144277645
>> JobCurrentStartDate = 1144277648
>> JobRunCount = 30
>> RemoteWallClockTime = 14.000000
>> LastRemoteHost = "sei.cs.ucsb.edu"
>> LastClaimId = "<128.111.45.22:34762>#1140125522#550"
>> CurrentHosts = 0
>> JobStatus = 1
>> EnteredCurrentStatus = 1144277648
>> LastSuspensionTime = 0
>> MaxHosts = 1
>> ServerTime = 1144277881
>>
>> Any suggestions?
>>
>> -- 
>> Ryan Garver
>> <rgarver@xxxxxxxxxxx>
>>
>> _______________________________________________
>> Condor-users mailing list
>> Condor-users@xxxxxxxxxxx
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>     
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>   


-- 
Ryan Garver
<rgarver@xxxxxxxxxxx>