[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] DAG jobs with data affinity



Hi,

 

      I want to run a bunch of jobs which have some data correlation between them. It means that if each job uses 10 inputs file (out of thousands) there are 9 other jobs that each uses 9 out of 10 files and another additional file.

 

     Since the these files are large, the data movement takes most of the time (approx as long as the process) and therefore we would like to minimize the data transfer.

 

      There is a notion of gang scheduling that deals with CPU affinity, but I could not find some similar solutions with data affinity.

 

     Any suggestions ?

 

Thanks,

Eddie