[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] best way to use cached data



This is probably the easiest and most common problem in HPC community, but as newcomer I still haven't found the right solution.

I have over 1TB data total (of many files) and I need them for most jobs. 
For simplicity, we have one slave machine to run the computation and one submit master machine. 

Say we put our data on the slave, how do we let condor knows these data be found under XYZ directory? Is there a special command / special configuration? 
I tried to run a script to read a file on the slave machine, but condor couldn't read it even if it has full privilege. I think condor runs every job in an isolated environment?

My real concern is to cache frequent data on some servers, and when I run jobs, I can have condor pull them over, or let condor decides where the jobs should be sent to depending on where those data exist.

I've looked at DAG-C but it seems like it's doing multi-job.

Thanks
John