[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] about submission jobs from condor-g to globus to condor pool



Hi Dan,

Thanks for your clear explanation, I get it work by setting the GlobusRSL
in the submit files.

But I am interested in setting the NFS also. Since I am new to NFS also,
so may I ask a question on the settings on NFS. During setting of NFS, I
should set NFS in all machines (execution nodes) with the same user
accounts or only set cluster head node (submittor)?

Thanks,
Carson

>
> No, it will not work to simply set FILESYSTEM_DOMAIN to the same name on
> all nodes. You need a shared filesystem too!
>
> However, if you are willing to mess around with an alternate
> configuration that has some restrictions, it is possible to get Condor's
> transfer-files mode to work for jobs submitted by the Globus Condor
> jobmanager. Then you do not need a common filesystem domain for jobs to
> be matched to other worker nodes. When submitting jobs through Globus,
> it will be necessary to specify some special RSL settings that turn on
> transfer-files mode and tell Condor about any additional input files
> that are needed by the job.
>
> --Dan
>
> Carson Hung wrote:
>
>>First of all, thanks alot for your explanation.
>>In other words, all the nodes in the clusters are supposed to share the
>> same file system using NFS or AFS. However in this case, if I don't
>> have a shared file system, can I simply set the FILESYSTEM_DOMAIN to
>> the same name in all nodes, will this work?
>>
>>Thanks,
>>Carson
>>
>>
>>
>>>The automatic filesystem requirements are there to prevent jobs from
>>> running on worker nodes that cannot access the job's working
>>> directory, executable, and input files. For jobs submitted by hand,
>>> you can simply tell Condor to transfer files back and forth between
>>> submission machine and execution machine, in order to avoid the
>>> requirement of being in the same filesystem domain. For jobs submitted
>>> through the Globus condor jobmanager, you normally need the gatekeeper
>>> machine and the worker nodes to have a shared filesystem (at least for
>>> the home directory of the account on the gatekeeper that runs the
>>> jobs). To tell Condor that you have such a common filesystem,
>>> configure FILESYSTEM_DOMAIN in condor_config on all nodes that share
>>> that filesystem.
>>>
>>>For more information on configuring Condor with respect to shared
>>> filesystems, see:
>>>
>>>http://www.cs.wisc.edu/condor/manual/v6.6/3_3Configuration.html#sec:Shared-Filesystem-Config-File-Entries
>>>
>>>--Dan
>>>
>>>Carson Hung wrote:
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>>I have tried submitting jobs from one remote machine to a machine
>>>> with gatekeeper and central manager of condor pool using condor-g.
>>>>
>>>>However, the jobs are rejected within the condor pool and stated
>>>> there. After I checked it with condor_q -analyze in central manager.
>>>>
>>>>I found that it has a strange requirement added automatically i.e.
>>>> MY.FileSystemDomain == TARGET.FileSystemDomain
>>>>
>>>>I think this is the reason why the jobs are rejected. Can anyone
>>>> suggest why this occur and how can I solve it please?
>>>>
>>>>Thanks very much for any suggestions.
>>>>Carson
>>>>
>>>>
>>>>_______________________________________________
>>>>Condor-users mailing list
>>>>Condor-users@xxxxxxxxxxx
>>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>
>>>>
>>>>
>>>>
>>>_______________________________________________
>>>Condor-users mailing list
>>>Condor-users@xxxxxxxxxxx
>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>>
>>
>>
>>