[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor-g data transfer mechanism

Hi Maryam,


I am not sure what you mean by Condor-G meta-scheduling.

In our work we use the Pegasus Workflow Management System (http://pegasus.isi.edu) to map an abstract (resource-independent) DAG onto the distributed resources. As part of the mapping, Pegasus adds nodes to the DAG to stage the data to the execution sites and move the results off. 

Our workflow execution engine, DAGMan uses Condor-G to send the jobs in the workflow to the grid sites. The results do not have to come back to the submitter. If you don’t want to save the intermediate products, they will just exist on the remote sites for the duration of the workflow and only the final data will be brought back to where you want it to go.  If intermediate data needs to be moved between sites, it will be done via 3rd party, so that the submitter is not involved.


For large data is it usually better to just transfer it from A to B directly.  When jobs are mapped to the same site and there is shared file system on that site, B can access the data directly.


Please let me know if you have any questions,






From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Maryam Khademi
Sent: Thursday, December 04, 2008 7:06 PM
To: Condor-Users Mail List
Subject: [Condor-users] condor-g data transfer mechanism



I am using Condor-G to execute a DAG. As I understand its mechanism, the meta-scheduler dispatches the jobs among different grid sites. Meanwhile, all the intermediate data are transfered to the submitter machine, so that if a job on grid site A needs the output of another job on grid site B, it can have it through submitter. Is that correct? Following this method, how about dependent jobs mapped to the same grid sites?! Is the input data again being asked from submitter?