[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor on X86_64 no run works



Some thoughts:

1. You mention "flock". You shouldn't need this if you just have a
single pool.

2. I notice you have vm1, vm2 ... vm5 mentioned, that implies more than
4 processors
   per node, you might have hyperthreading turned on, in which case
condor will register
   (possibly) 8 slots per node.

3. Have you tried
   condor_q -anal
   or
   condor_q -better-anal
   to see why it isn't matching?

4. You do a "queue 5", but all the jobs write to the same error and
output files,
   this may not be what is desired. To write to different ones, use
something like
   output = loop$(PROCESS).out
   error = loop$(PROCESS).err

5. I can't see a
   log = loop.log
   line, this is useful - have a look in there to see what is produced.
   [Note: don't use $(PROCESS) for this one

6. Have a look in the SchedLog of your submit node to see what is in
there

7. Are these nodes on a cluster, i.e. on a private network, if so then
you
   will need full connectivity between all submit nodes and all execute
nodes.
   See paper and presentation on
   http://epubs.cclrc.ac.uk/work-details?w=34452
   for more details

Good luck

JK

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of jmferrer
> Sent: Monday, November 05, 2007 12:34 PM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] Condor on X86_64 no run works
> 
> Hi.
> 
> I'm trying build a Cluster with:
> 
>     OpenSuse 10.2
>     Condor-6.8.6
>     Kernel suse 2.6.18.2-34-default
> 
> 
> System:
> 
> 1 Central Manager  1cpu x P4 ----------> no execute and yes flock
> 19 nodes 2 quadcore inet X86_64
> 
> I share /home in Central manger (for all nodes NFS)
> 
> If I run condor_status
> 
> gargamel:/home/condor # condor_status
> 
> Name          OpSys       Arch   State      Activity   LoadAv Mem  
> ActvtyTime
> 
> vm1@smurf0 LINUX       X86_64 Owner      Idle       0.000   
> 996  0+00:06:45
> vm2@smurf0 LINUX       X86_64 Unclaimed  Idle       0.000   
> 996  4+23:45:04
> vm3@smurf0 LINUX       X86_64 Unclaimed  Idle       0.000   
> 996  4+23:45:05
> vm4@smurf0 LINUX       X86_64 Unclaimed  Idle       0.000   
> 996  4+23:45:07
> vm5@smurf0 LINUX       X86_64 Unclaimed  Idle       0.000   
> 996  4+23:45:08
> ..............................
>                Total    87     1       0        86       0         
> 0        0
> 
> some nodes is off
> 
> My submit file
> gargamel:/home/condor # cat /home/pepe/test_condor/loop.submit
> #archivo de descripcion generado automaticamente universe = 
> vanilla executable = loop output = loop.out error = loop.err
> Requirements   = (Arch =="INTEL" && OpSys == "LINUX") || \
>                  (Arch =="X86_64" && OpSys == "LINUX") queue 5
> 
> 
> 
> 
> somebody can show me how do work this?
> 
> 
> 
> Sorry for my englis, I'm from almeria IR.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to 
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
>