[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] best way to use cached data



Thanks.
That will work if all vms have the same storage and structural requirement, but not ideal since not every VMs can have the same capacity.
I guess the real question is if anyone has ever tried to intergrate/connect HTCondor with any SRMs such as BeStMan or dCache SRM.

I've read a number of paper just now and they seem pretty interesting. They basically "manage" data access, lifetime, and replication.

John


On Mon, Dec 10, 2012 at 12:45 AM, Sarnath K - ERS, HCLTech <k_sarnath@xxxxxxx> wrote:

Simplest way is to have the same directory structure on all machines

And simply give the PATH as an argument.

That would do.

 

Condor would run the program as you had given on the client machine.
It the data is there, Bingo! It will work. That’s it.

 

That said so confidently,, I am also not a frequent user of condor.

But, this is what my memory recollects. Good luck!

 

From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of John Wong
Sent: Monday, December 10, 2012 11:07 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] best way to use cached data

 

This is probably the easiest and most common problem in HPC community, but as newcomer I still haven't found the right solution.

 

I have over 1TB data total (of many files) and I need them for most jobs. 

For simplicity, we have one slave machine to run the computation and one submit master machine. 

 

Say we put our data on the slave, how do we let condor knows these data be found under XYZ directory? Is there a special command / special configuration? 

I tried to run a script to read a file on the slave machine, but condor couldn't read it even if it has full privilege. I think condor runs every job in an isolated environment?

 

My real concern is to cache frequent data on some servers, and when I run jobs, I can have condor pull them over, or let condor decides where the jobs should be sent to depending on where those data exist.

 

I've looked at DAG-C but it seems like it's doing multi-job.

 

Thanks

John

 



::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/