[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs stay Idle ... been looking for 24 hours....



I got it. thanks Nick.

Had to add " #!/bin/csh -f "
in the script...

On 4/26/07, Askar Zaidi <askar.zaidi@xxxxxxxxx> wrote:
Yes ! Specifying Arch and OpSys helped. But now, the job runs and immediately goes into "held" state.
----------------------------------------------------------------------------------------------------------------------------------
Job Log:
000 (012.000.000) 04/26 15:20:28 Job submitted from host: <128.226.128.31:39183>
...
001 (012.000.000) 04/26 15:20:32 Job executing on host: < 128.226.128.46:40854>
...
007 (012.000.000) 04/26 15:20:33 Shadow exception!
        Error from starter on vm1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed to execute '/home/condor/leftouts': Exec format error
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...
012 (012.000.000) 04/26 15:20:33 Job was held.
        Error from starter on vm1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed to execute '/home/condor/leftouts': Exec format error
        Code 6 Subcode 8
-------------------------------------------------------------------------------------------------------------------------------------
I know I am so close to get this all running ... well..at least I hope so ;-)

Thanks !!
Askar


On 4/26/07, Nick LeRoy < nleroy@xxxxxxxxxxx> wrote:
On Thu April 26 2007 1:46 pm, Askar Zaidi wrote:
> Hi,
Hello,

> My jobs stay idle forever...
Yup.  ;)  See below.


> vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.000   378
> 0+00:10:09
<snip>
> vm3@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.000   504
> 0+00:10:11
> vm1@xxxxxxxxx LINUX       X86_64 Owner      Idle       0.890   250
<snip>
> vm4@clouseau. LINUX       X86_64 Unclaimed  Idle       0.000   250
> 0+00:10:07
> vm1@dogmatix. LINUX       X86_64 Owner      Idle       0.110   501
> 0+00:10:10
<snip>
>
>                      Total Owner Claimed Unclaimed Matched Preempting
> Backfill
>
>          INTEL/LINUX     8     8       0         0       0          0
> 0
>         X86_64/LINUX    16     9       0         7       0          0
> 0
>
>                Total    24    17       0         7       0          0
> 0


> ( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&

If you look at all of the above, you'll see:

1. Your job's requirements specify 'Arch == "INTEL"'
2. While there are indeed some INTEL/LINUX machines in the pool, they are all
in the "Owner" state, and thus won't run jobs.  The others are all "X86_64",
which (as far as Condor is concerned) different that "INTEL".

So, what should you do?

1. Look at the START _expression_ on the 32 bit, see why they're all in the
Owner state.
2. Allow your job to run on "INTEL" and "X86_64" machines (after all, these
are "INTEL" binaries, and should run on either IA32 or X86_64, right?).  You
can do this by adding something like this to your requirements _expression_:

Requirements = (  (OpSys == "LINUX") && ((Arch == "INTEL") || (Arch
== "X86_64" ))  )

This will for Condor to match your job to machines with the Arch attribute
as "INTEL" or "X86_64".

Hope this helps

-Nick

--
           <<< There is no spoon. >>>
/`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy   http://www.cs.wisc.edu/condor
\    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
|_*_|   608-265-5761                    Department of Computer Sciences
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR