[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Realtime output file



Hi Guillermo,

This was working before, correct?  Do your normal SGE jobs work correctly?

The true way to debug this is to up the debugging level on the gridmanager, and check the stderr in the GRIDMANAGER_LOG.
1) In the file ~/bosco/local.bosco/condor_config.local, add to the bottom: 
GRIDMANAGER_DEBUG = D_FULLDEBUG

2) Reconfig condor:
$ condor_reconfig

3) Release any held jobs:
$ condor_release mastablasta



-Derek




On Mar 7, 2013, at 6:08 AM, Guillermo Marco Puche <guillermo.marco@xxxxxxxxxxxxxxxxxxxxx> wrote:

> Hello,
> 
> On 02/28/2013 07:34 AM, Derek Weitzel wrote:
>> Hi Guillermo,
>> 
>> Since your SGE direct submissions worked, something odd must be going on with the glidein submission.  This should work.
>> 
>> What's the output from `condor_q -hold` on the glidein_wrapper.sh jobs?
>> 
>> -Derek
>> 
> 
> condor_q -hold
> -- Submitter: brugal : <192.168.6.2:11000?sock=2009_e522_3> : brugal
>  ID      OWNER          HELD_SINCE  HOLD_REASON                                
>  152.0   mastablasta     3/7  06:14 Attempts to submit failed                  
>  153.0   mastablasta     3/7  06:15 Attempts to submit failed
> 
> I'm very interested in this because i need to stream stdout from my SGE jobs on remote cluster and with grid universe I can't.
> 
> Best regards,
> Guillermo.
>> 
>> 
>> On Feb 27, 2013, at 2:50 AM, Guillermo Marco Puche 
>> <guillermo.marco@xxxxxxxxxxxxxxxxxxxxx>
>>  wrote:
>> 
>> 
>>> Hello,
>>> 
>>> Already tried that with no success. 
>>> I've my SGE cluster added to bosco resources:
>>> 
>>> $ bosco_cluster -l
>>> gmarco@cacique/condor
>>> 
>>> 
>>> But wrapper job gets hold forever. 
>>> 
>>> $ condor_q
>>> -- Submitter: brugal : <192.168.6.2:11000?sock=27234_2f33_3> : brugal
>>>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD                  
>>>   52.0   gmarco          2/27 02:52   0+00:00:00 I  0   0.0  bwa_glide.sh 52 0 
>>>   53.0   gmarco          2/27 02:52   0+00:00:00 H  0   0.0  glidein_wrapper.sh
>>>   54.0   gmarco          2/27 02:53   0+00:00:00 H  0   0.0  glidein_wrapper.sh
>>>   55.0   gmarco          2/27 02:53   0+00:00:00 H  0   0.0  glidein_wrapper.sh
>>>   56.0   gmarco          2/27 02:54   0+00:00:00 H  0   0.0  glidein_wrapper.sh
>>>   57.0   gmarco          2/27 02:54   0+00:00:00 H  0   0.0  glidein_wrapper.sh
>>> 
>>> 
>>> I guess the only way to submit a job to SGE cluster through Bosco/Condor is with grid universe.
>>> 
>>> Best regards,
>>> Guillermo.
>>> 
>>> On 02/26/2013 06:46 PM, Jaime Frey wrote:
>>> 
>>>> Bosco has an advanced submit option that automatically distributes your jobs between multiple clusters. One of its benefits is that you can stream the job's stdout and stderr back to your submit machine while the job executes.
>>>> 
>>>> On the Bosco installation page, check out section 5.2.2 Glidein Job submission example:
>>>> 
>>>> https://twiki.grid.iu.edu/bin/view/CampusGrids/BoscoInstall
>>>> 
>>>> 
>>>>  -- Jaime
>>>> 
>>>> On Feb 26, 2013, at 10:45 AM, Guillermo Marco Puche 
>>>> <guillermo.marco@xxxxxxxxxxxxxxxxxxxxx>
>>>>  wrote:
>>>> 
>>>> 
>>>>> Hello,
>>>>> 
>>>>> I've my cluster added to Bosco cluster resource list. 
>>>>> I'm a bit confused, so i won't be able to use stream output with Bosco? The only way to submit jobs to SGE with bosco is through grid universe.
>>>>> 
>>>>> Best regards,
>>>>> Guillermo.
>>>>> 
>>>>> 
>>>>> On 02/26/2013 05:24 PM, Jaime Frey wrote:
>>>>> 
>>>>>> On Feb 26, 2013, at 9:54 AM, R. Kent Wenger <wenger@xxxxxxxxxxx>
>>>>>> 
>>>>>>  wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Tue, 26 Feb 2013, Guillermo Marco Puche wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Since my jobs take some hours or even days to complete it would be nice to update the output file stored in condor directory. At this moment my output file gets updated with standard output only when job is completed.
>>>>>>>> 
>>>>>>>> Is there any way, let's say every X minutes or seconds to update the output file? That way I could check job execution without login to grid SGE cluster.
>>>>>>>> 
>>>>>>>> 
>>>>>>> It sounds like streaming output should do what you want.  Take a look at the relevant section in the condor_submit man page:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> http://research.cs.wisc.edu/htcondor/manual/v7.9/condor_submit.html#81025
>>>>>> The streaming of output files isn't supported for grid universe jobs.
>>>>>> 
>>>>>> Streaming of output should work with Bosco's multi-cluster mode. The potential drawback there is that your home machine needs to be contactable from the execution machines and needs to stay on the network for the duration of the job's run.
>>>>>> 
>>>>>> 
>>>> Thanks and regards,
>>>> Jaime Frey
>>>> UW-Madison HTCondor Project
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to 
>>>> 
>>>> htcondor-users-request@xxxxxxxxxxx
>>>> 
>>>>  with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> 
>>>> 
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>> 
>>>> 
>>>> 
>>>> The archives can be found at:
>>>> 
>>>> 
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to 
>>> htcondor-users-request@xxxxxxxxxxx
>>>  with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> 
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> 
>>> The archives can be found at:
>>> 
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to 
>> htcondor-users-request@xxxxxxxxxxx
>>  with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> 
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> 
>> The archives can be found at:
>> 
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature