[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] move spool and log folders on highspeednetwork drive



Hi Jordan,
 
ok just to get this out of the way – I am a phd student not an IT officer hence unfortunately I have some limitations as to the knowledge of the extended network set up of Condor. This is a new system for all of us around here that we are trying to sort out.
 
The way I am submitting my jobs is via a manual submission process which involves creating a folder structure in a particular network location which includes my executable file which I want to run, some input files to the executable and a submitting script that condor uses to distribute the jobs in the system. The submission is done via my machine using cmd and the “condor_submit” command which pinpoints to the submit.sub file in the network location. The output files of each node are downloaded on the network folder I am starting the submit.sub job from. The log and spool files are by default stored on my local drive – hence the limitation.
 
Regarding the UiDomain settings:
 
##--------------------------------------------------------------------
##  Network domain parameters:
##--------------------------------------------------------------------
##  Internet domain of machines sharing a common UID space.  If your
##  machines don't share a common UID space, set it to
##  UID_DOMAIN = $(FULL_HOSTNAME)
##  to specify that each machine has its own UID space.
UID_DOMAIN=
 
##  Internet domain of machines sharing a common file system.
##  If your machines don't use a network file system, set it to
##  FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
##  to specify that each machine has its own file system.
#FILESYSTEM_DOMAIN    = $(FULL_HOSTNAME)
 
##  This macro is used to specify a short description of your pool.
##  It should be about 20 characters long. For example, the name of
##  the UW-Madison Computer Science Condor Pool is ``UW-Madison CS''.
COLLECTOR_NAME         = My Pool - $(CONDOR_HOST)
 
######################################################################
 
I think those settings are set up correctly as I am still able to run a few hundreds of jobs without any problem.
 
Antonis
 
 
Antonis
Sent: Thursday, March 28, 2013 6:24 PM
Subject: Re: [HTCondor-users] move spool and log folders on highspeednetwork drive
 
Antonis,

I haven't personally used the NFS functionality, so I'm not sure how much help I can be, but I'll try.

If you haven't yet read the file transfer documentation, you can do so here: http://research.cs.wisc.edu/htcondor/manual/v7.6/2_5Submitting_Job.html#SECTION00353000000000000000

What are your FileSystemDomain and UidDomain settings set to? From the docs:

So, if a pool does have access to a shared file system, the pool administrator must correctly configure Condor such that all the machines mounting the same files have the same FileSystemDomain configuration. Similarly, all machines that share common user information must be configured to have the same UidDomain configuration.
 
 
On Thu, Mar 28, 2013 at 2:13 PM, Antonis Sergis <sergis_antonis@xxxxxxxxxxx> wrote:
hello John,
 
many thanks for the advices! I am afraid an http server such as an S3 is beyond our scope (and budget) at the moment. We would prefer something more simple since the gear is already installed and I wouldn’t want to get the university’s network people involved at this stage. There must be a simpler way to get this done given our set up - its just a matter of changing condor’s config file to read and write the spool and log files on the network drive. So frustrating because it worked for some time and now I can’t even figure out why it doesn’t!
 
Antonis
 
Sent: Thursday, March 28, 2013 5:57 PM
Subject: Re: [HTCondor-users] move spool and log folders on high speednetwork drive
 
Antonis-
 
We were planning on attempting something similar, but we think we found a better solution:
 
We use HTTP servers as our "spool" and use a modified curl plugin. We can send you the source if you'd like, but since we use S3, I'm not sure how much it would help you. Apache or nginx as your HTTP server and the curl plugin that comes with condor will definitely take a lot of load off of your submit machines.
 
Thanks,
John Lambert


On Thu, Mar 28, 2013 at 1:11 PM, Antonis Sergis <sergis_antonis@xxxxxxxxxxx> wrote:
hello. I am writing to get some more ideas regarding a problem which is becoming rather hard to tackle. I have my machine as a condor submitter and unfortunately we realised that the local disk transfer speeds for the log and spool files is too slow and limits our maximum job number. Replacing the disk with an ssd will bring another problem close which is processor speed. I have hence decided to alter the config file to be able to make the submitting machine exchanging data over our super fast connection and network storage. I had a go trying out different things for the last days. I got it to momentarily work and the number of jobs I could carry out simultaneously went up to 1200 from 300 which was the earlier limit however then the processor maxed out and cut off taking up more jobs. We are planning to split the administration job to other PCs to get the processing speed required and the max amount of jobs running. I have tried adding the network location folders for the spool and log pathnames in the configuration file:
 
######################################################################
##  Daemon-wide settings:
######################################################################
 
##  Pathnames
LOG        = \\PATHNAME\log
SPOOL        = \\PATHNAME\spool
EXECUTE        = $(LOCAL_DIR)/execute
BIN        = $(RELEASE_DIR)/bin
LIB        = $(RELEASE_DIR)/lib
INCLUDE        = $(RELEASE_DIR)/include
SBIN        = $(BIN)
LIBEXEC        = $(BIN)
 
However this does not work and the condor service is cut off and I cannot restart it or enquire about it unless I change the config file back to the initial one (i.e. local log and spool folders). I am running condor on a windows 7 machine. Replacing the memory with an SSD is not an option as the job sizes are quite large and there are no funds to do that on a large scale while the network storage can provide the speed we are after. Any ideas?
 
Cheers
 
Antonis

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
 

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/