[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] New to ht condor and have basic questions



Hello,

i'm all new to HTCondor concepts and usage.
I'm "playing" with it, looking at the users manual and looking through
video tutorials for one week and now, I'd lke to get clues from "real"
users.

First, let me describe the background and motivations for using (at
least trying to use) HTCondor...

I work in the field of topographic lidar data processing and if i'm
understanding HTCondor purposes correctly (which is not that sure) we'd
like to use it together with some command line software called lastools
(http://rapidlasso.com/) than runs on Windows (so this will be our OS on
all machines) for adressing the task of ground extraction from large
amount of airborne lidar data sets or other tasks in the set of lastools
executables.

Those large data sets are already pre-tiled into smaller files, so
individual "parallelizable" "jobs" are of the form : lasground.exe -i
file_in.las -o file_classified_out.las

Up to now, we where using pretty good multicore machines and the
software allready have a command line option -cores N which enables to
split the work amongst the machine capabilities when multiple input
files are given (in a "one file one core fashion")

In order to address larger dataset in a faster time, I think that we
could spread the whole set of files amongst several machine in our
office as all jobs are pretty independants and this idea plus google
search drove me to HTCondor...

Those machine are networked through a 10Gb internal lan to a common
storage area and each machine have the lasground software installed (so
"sending" the executable and all the RPC machinery could be avoided in
our case)

So far, I think this match HTcondor purposes (of course probably only a
very small part of it as we do not need the remote procedure calls : the
software to execute is already deployed on each target machine, the file
transfer as we have  shared storage, and probably do not need a lot of
the fancy features that are described in the user manual).

I've installed the HTCondor software on a single machine first for trial
purposes. This machine is already dual network card so i found that i
add to add

NETWORK_INTERFACE = 192.168.1.66

into the condor_config file (unless i missed something) I attached my
condor_config file as well as the submit_file i've tried so far...

when submitting this file with condor_submit, it eventually runs
everything and all tasks are completed succesfully but I never see the
machine CPU going up and using all the available ressources as when
typing a single command with the -cores 8 options which shows a 100% cpu
usage in windows task monitor.

Here, the cpu usage remains very low, and of course, the whole jobs
takes much longuer to complete than the direct (but single machine)
traditionnal way (which is not automatically multi-machines scalable)

I'm not using any kind of nice-level commands inside htcondor, as I
understand that by default, the end process should be run with default
priority level (with the windows meaning of process priority). When
looking at the various condor_shadow daemons that are triggered, I can
indeed see that they run in normal priority (but each of those remains
at a low cpu usage when it should be using it intensivly).

The machine is dedicated, and should not care about "desktop user"
confort while doing computation tasks.

So now that the background is settled, my first "real" question is : am
I missing something about why the CPU is not going up once the
condor_submit command is sent ?

Best regards,

Mathieu
-- 
tel : +33 (0)6 87 30 83 59
####################
#
####################
Executable = c:\lastools\bin\laszip.exe
Universe = vanilla
should_transfer_files = no
initialdir = C:\Data\test_condor
#error = err.$(Process)
#input = in.$(Process)
#output = $(filename).out
log = test.$(Process).log
arguments = -i $(initialdir)\$(filename)
queue filename matching files *.las
######################################################################
##
##  condor_config
##
##  This is the global configuration file for condor. This is where
##  you define where the local config file is. Any settings
##  made here may potentially be overridden in the local configuration
##  file.  KEEP THAT IN MIND!  To double-check that a variable is
##  getting set from the configuration file that you expect, use
##  condor_config_val -v <variable name>
##
##  condor_config.annotated is a more detailed sample config file
##
##  Unless otherwise specified, settings that are commented out show
##  the defaults that are used if you don't define a value.  Settings
##  that are defined here MUST BE DEFINED since they have no default
##  value.
##
######################################################################

##  Where have you installed the bin, sbin and lib condor directories?   
RELEASE_DIR = C:\condor

##  Where is the local condor directory for each host?  This is where the local config file(s), logs and
##  spool/execute directories are located. this is the default for Linux and Unix systems.
#LOCAL_DIR = $(TILDE)
##  this is the default on Windows sytems
#LOCAL_DIR = $(RELEASE_DIR)

##  Where is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = $(LOCAL_DIR)\condor_config.local
##  If your configuration is on a shared file system, then this might be a better default
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)\etc\$(HOSTNAME).local
##  If the local config file is not present, is it an error? (WARNING: This is a potential security issue.)
REQUIRE_LOCAL_CONFIG_FILE = FALSE

##  The normal way to do configuration with RPMs is to read all of the
##  files in a given directory that don't match a regex as configuration files.
##  Config files are read in lexicographic order.
LOCAL_CONFIG_DIR = $(LOCAL_DIR)\config
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

##  Use a host-based security policy. By default CONDOR_HOST and the local machine will be allowed
use SECURITY : HOST_BASED
##  To expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts
#ALLOW_WRITE = *.cs.wisc.edu
##  FLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).
#FLOCK_FROM =
##  FLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).
#FLOCK_TO = condor.cs.wisc.edu, cm.example.edu

##--------------------------------------------------------------------
## Values set by the condor_configure script:
##--------------------------------------------------------------------

CONDOR_HOST = $(FULL_HOSTNAME)
NETWORK_INTERFACE = 10.192.4.53
COLLECTOR_NAME = ATLAS
UID_DOMAIN = 
CONDOR_ADMIN = 
SMTP_SERVER = 
ALLOW_READ = *
ALLOW_WRITE = $(CONDOR_HOST), $(IP_ADDRESS)
ALLOW_ADMINISTRATOR = $(IP_ADDRESS)
JAVA = C:\PROGRA~2\Java\JRE18~1.0_6\bin\java.exe
use POLICY : ALWAYS_RUN_JOBS
WANT_VACATE = FALSE
WANT_SUSPEND = FALSE
nice_user = false
START = True
SUSPEND = False
PREEMPT = False
KILL = False
NEGOTIATOR_CONSIDER_PREEMPTION = False
DAEMON_LIST = MASTER SCHEDD COLLECTOR NEGOTIATOR STARTD