[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] New to ht condor and have basic questions
- Date: Tue, 12 Jan 2016 15:22:19 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] New to ht condor and have basic questions
The condor_shadow processes do not run the job. They exist to act as a proxy for the user on the submit machine while the job is running (usually on another machine). The condor_shadow handles the submit side of the file transfer (if there is any) and writes events into the userlog when the job changes state. You should expect it to be mostly idle when the job is actually running.
The job itself will run on a HTCondor execute node (which can be the same machine as the submit node). On Windows the jobs will run under the services desktop, so the job will not be visible in task manager unless showing processes from all users.
If you show processes from all users, you should expect to see some processes called condor_exec that are your actual job.
So I guess the question I have is "how are you determining that the cpu is not busy?"
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Mathieu Peyréga
Sent: Monday, January 11, 2016 4:46 PM
Subject: [HTCondor-users] New to ht condor and have basic questions
i'm all new to HTCondor concepts and usage.
I'm "playing" with it, looking at the users manual and looking through video tutorials for one week and now, I'd lke to get clues from "real"
First, let me describe the background and motivations for using (at least trying to use) HTCondor...
I work in the field of topographic lidar data processing and if i'm understanding HTCondor purposes correctly (which is not that sure) we'd like to use it together with some command line software called lastools
(http://rapidlasso.com/) than runs on Windows (so this will be our OS on all machines) for adressing the task of ground extraction from large amount of airborne lidar data sets or other tasks in the set of lastools executables.
Those large data sets are already pre-tiled into smaller files, so individual "parallelizable" "jobs" are of the form : lasground.exe -i file_in.las -o file_classified_out.las
Up to now, we where using pretty good multicore machines and the software allready have a command line option -cores N which enables to split the work amongst the machine capabilities when multiple input files are given (in a "one file one core fashion")
In order to address larger dataset in a faster time, I think that we could spread the whole set of files amongst several machine in our office as all jobs are pretty independants and this idea plus google search drove me to HTCondor...
Those machine are networked through a 10Gb internal lan to a common storage area and each machine have the lasground software installed (so "sending" the executable and all the RPC machinery could be avoided in our case)
So far, I think this match HTcondor purposes (of course probably only a very small part of it as we do not need the remote procedure calls : the software to execute is already deployed on each target machine, the file transfer as we have shared storage, and probably do not need a lot of the fancy features that are described in the user manual).
I've installed the HTCondor software on a single machine first for trial purposes. This machine is already dual network card so i found that i add to add
NETWORK_INTERFACE = 192.168.1.66
into the condor_config file (unless i missed something) I attached my condor_config file as well as the submit_file i've tried so far...
when submitting this file with condor_submit, it eventually runs everything and all tasks are completed succesfully but I never see the machine CPU going up and using all the available ressources as when typing a single command with the -cores 8 options which shows a 100% cpu usage in windows task monitor.
Here, the cpu usage remains very low, and of course, the whole jobs takes much longuer to complete than the direct (but single machine) traditionnal way (which is not automatically multi-machines scalable)
I'm not using any kind of nice-level commands inside htcondor, as I understand that by default, the end process should be run with default priority level (with the windows meaning of process priority). When looking at the various condor_shadow daemons that are triggered, I can indeed see that they run in normal priority (but each of those remains at a low cpu usage when it should be using it intensivly).
The machine is dedicated, and should not care about "desktop user"
confort while doing computation tasks.
So now that the background is settled, my first "real" question is : am I missing something about why the CPU is not going up once the condor_submit command is sent ?
tel : +33 (0)6 87 30 83 59