[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor-G Features



>
>
> ------------------------------------------------------------------
> Steven C. Timm, Ph.D  (630) 840-8525
> timm@xxxxxxxx  http://home.fnal.gov/~timm/
> Fermilab Computing Division, Scientific Computing Facilities,
> Grid Facilities Department, FermiGrid Services Group, Assistant Group
> Leader.
>
> On Tue, 13 May 2008, txcom2003@xxxxxxxxxxxxxxxxxxx wrote:
>
>> Hello There ....
>>
>> I have installed Condor-G which bundled with condor 7.0.1 and now it can
>> submit job to Globus.
>> I have some question According to two condor-G features : check point
>> and
>> matchmaking.
>>
>> 1. Condor-G doesn't support check point.
>> When Condor-G submit to Globus, condor-G can submit job type "Condor"
>> which   use "Standard Universe". And "Standard universe" supports "Check
>> point". So I think condor-G also support Check point but not directly
>> (when we use condor as local scheduler). Is that true ?
>
> I have heard of people who have done this--i.e. override the universe
> type that is submitted on the remote end so it is standard rather
> than vanilla, which is the default.  I have never done it myself.
> And at best you could only checkpoint/restart on the remote end, if
> they happen to be running a checkpoint server, which many grid sites
> don't. Doesn't help if the whole remote globus resource is down.
>

I added several values in condor-G submission file

universe=grid
globusxml=<jobType>globusJobType</jobType>

where the globusJobType is single|multiple|condor|mpi.
when globusJobType is set to condor, globus job manager will translate to
condor submission file and submit it to condor central manager in the
remote site. when condor at remote site support check point i don't know
whether it will be check ponted or not, since i submitted from condor-G.

>>
>> 2. Condor-G doesn't support Matchmaking
>> In manual, condor not support matchmaking but can use Glide-In mechanism
>> to run condor daemons (startd) on remote resource, so matchmaking can be
>> done in this way. Is there any other mechanism now that can be used
>> beside
>> Glide-In (in currently version of condor-G) ?
>>
> There is condor-G matchmaking, in
> which instead of saying
>
> GridResource=="gt2 mygatekeeper.com/jobmanager-condor"
>
> you say
>
> GridResource==$$(GridResource)
>
> But if you do that, you have to somehow produce a list of
> condor Classads that describe the clusters.  That is not included
> with Condor. Various grids have come up with their own.
>

matchmaking that i mean are job Requirements, like Operating system,
minimal memory, CPU, etc.

I have one more question :
I submitted 4 jobs and i kill the condor_master (condor-G) when the job state
345.0 tonny PENDING Condor cluster-02.petruk.  /home/grid/mahasis
346.0 tonny STAGE_OUT Condor   cluster-02.petruk.  /home/grid/mahasis
347.0 tonny ACTIVE Condor   cluster-02.petruk.  /home/grid/mahasis
348.0 tonny DONE Condor   cluster-02.petruk.  /home/grid/mahasis
349.0 tonny PENDING Condor   cluster-02.petruk.  /home/grid/mahasis

and when i start the condor master again the job that was in the state
"DONE" wasn't be removed from the queue unless the job is removed with
command condor_rm. and the jobs that wasn't done was evicted and
resubmitted again, why condor-G didn't just continue contact "globus job
manager" and continue monitor the job that was submitted before (i have
read that one of condor-G fault tolerant is when there is local crash for
example condor-G crash, condor-G will connect to job manager again and
continue the job). why  this happened?
thanks