[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAG jobs with data affinity

On Mon, Apr 29, 2013 at 11:14:31AM +0300, Edward Aronovich wrote:
> Hi,
>       I want to run a bunch of jobs which have some data correlation
> between them. It means that if each job uses 10 inputs file (out of
> thousands) there are 9 other jobs that each uses 9 out of 10 files and
> another additional file.
>      Since the these files are large, the data movement takes most of
> the time (approx as long as the process) and therefore we would like to
> minimize the data transfer.
>       There is a notion of gang scheduling that deals with CPU affinity,
> but I could not find some similar solutions with data affinity.
>      Any suggestions ?

1. HTCondor tries really hard to clean up after itself when a job is
finished, so you have to make sure that you do actually succeed in
caching the files in a place where HTCondor is not going to delete them.

2. Simplest solution is then to have the post script write out the
submit file, or edit the submit file for the next job. Here is what I am
thinking (Completely untested and off the top of my head, so ymmv):


job A A.cmd
job B B.cmd
script post A A.post $JOBID

Contents of A.post:

condor_history -l $1 | grep -i lastremotehost | sed 's/^.\+ = /requirements = Machine == /' > A.remotehost
while read line; do 
	if test "$line" = "<requirements>"; then
		cat A.remotehost
		echo "$line"
done < B.template > B.cmd

Contents of B.post:

Nathan Panike