[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] about setting up dedicated resources in Condor Windows cluster



fly zebra wrote:
Hi All,

This maybe a newbie question about how to setting up dedicated
resources in Condor (condor-7.5.2-winnt50-x86) Windows Cluster, please
offer help if you can.

I read the document about "3.13.10.1 Selecting and Setting Up a
Dedicated Scheduler " in Condor manual, but still can not make
parallel job work after trying several times.


Couple pointers:

1. You do not need to setup a Condor "dedicated scheduler" to use dedicated resources. The only reason you would need to setup a dedicated scheduler in Condor is if you must submit parallel universe jobs, i.e. jobs that require multiple machines at the same time. Typical examples of this are MPI or PVM jobs, usually on Unix - folks using MPI on Windows are fairly rare. If you do not need to use parallel universe, e.g. just want to submit loads of vanilla universe jobs, you do not need to set this up. Maybe the "dedicated scheduler" in Condor would be better named "the parallel scheduler" :).

2. If you truly need to submit parallel universe jobs, you need to customize more settings than you mentioned below. The easiest way to do this is to consult the example file:
  c:\condor\etc\condor_config.local.dedicated.resource
for the rest of the settings you need. It is well-commented.

Hope the above pointers help
Todd


my Condor Windows cluster consist of one central
manager(headnode.condor.org), and two execution machine
(c01.condor.org, c02.condor.org)
I added following configuration string in the condor_config file of
the above three machines
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxx
STARTD_EXPRS = ${STARTD_EXPRS}, DedicatedScheduler

following is the job description file (jdf.sub)
universe = parallel
environment = path=c:\winnt\system32
executable = simpleCounter.exe
output = simpleCounter.out
error = simpleCounter.err
log = simpleCounter.log
machine_count = 1
arguments = 1 100
queue

after submitting the job with "condor_submit  jdf.sub"
there is no error but the job never run (just stay in the idle status)
any suggestion?

Thanks in advance,
Kimaru
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
Todd Tannenbaum                       University of Wisconsin-Madison
Center for High Throughput Computing  Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685