[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Realtime output file



On 03/09/2013 10:08 PM, Guillermo Marco Puche wrote:
El 09/03/2013 18:02, Derek Weitzel escribió:
Hi Guillermo,

#$ -o output.out
#$ -e error.out
Bosco doesn't use -o and -e directives.  Matter of fact, it probably sets those as /dev/null.  Instead, it redirects the output and of the process to a file in ~/bosco/sandbox.
I know those directives are SGE directives. From my pov is SGE handles job he must be able also to handle it's own error and output logs.
I've already been looking for those output files on remote machine ~/bosco/sandbox. I'm not really sure. I'll look asap monday and report.
I just executed a job. I found a very weird thing.

That's the folder being created inside sandbox by Condor with logs inside.
Is there any way to make Bosco generate a more simple directory structure with job name and not all those strange names?
I say that because "1f2a/1f2ad606/brugal_11000_brugal#168.0#1362972287" has nothing to do with job name "bl_d260fb5a6a6b".
/home/mastablasta/bosco/sandbox/1f2a/1f2ad606/brugal_11000_brugal#168.0#1362972287
-rw-r--r-- 1 mastablasta users    0 Mar 11 12:14 bl_d260fb5a6a6b.e212
-rw-r--r-- 1 mastablasta users    0 Mar 11 12:14 bl_d260fb5a6a6b.o212
-rw-r--r-- 1 mastablasta users    0 Mar 11 12:14 bl_d260fb5a6a6b.pe212
-rw-r--r-- 1 mastablasta users    0 Mar 11 12:14 bl_d260fb5a6a6b.po212
-rwxr-xr-x 1 mastablasta users  287 Mar 11 12:14 condor_exec.exe
-rw-rw-r-- 1 mastablasta users 5643 Mar 11 12:24 _condor_stderr
-rw-rw-r-- 1 mastablasta users    0 Mar 11 12:14 _condor_stdout
drwxrwxr-x 2 mastablasta users 4096 Mar 11 12:14 home_bl_d260fb5a6a6b


Also stdout is being redirected to _condor_stderr and not to _condor_stdout.

$ tail _condor_stderr 
[bwa_aln_core] calculate SA coordinate... 124.27 sec
[bwa_aln_core] write to the disk... 0.03 sec
[bwa_aln_core] 9175040 sequences have been processed.
[bwa_aln_core] calculate SA coordinate... 123.26 sec
[bwa_aln_core] write to the disk... 0.04 sec

Thank you.

Best regards,
Guillermo.
I'm not sure how sge will handle setting the output and error twice, since Bosco already sets it.  Try removing these lines from your custom submit script parameters.

-Derek




On Mar 9, 2013, at 8:17 AM, Guillermo Marco Puche <guillermo.marco@xxxxxxxxxxxxxxxxxxxxx> wrote:

Hello Derek,

I'm not at work during weekend, but my real question is why SGE error and stdout files are not being generated?

I mean after fixing sge configs inside Bosco i can select queue, number of slots, etc.. all SGE custom job parameters. But even when i specify two different files 

#$ -o output.out
#$ -e error.out

Those are not being generated. I'm not really in need to stream the output since workdir for jobs in in NFS and can also be accesed from submit machine with Bosco.

I'll check debug options on sunday.

Thank you !

Best regards,
Guillermo.

El 08/03/2013 20:53, Derek Weitzel escribió:
Hi Guillermo,

This was working before, correct?  Do your normal SGE jobs work correctly?

The true way to debug this is to up the debugging level on the gridmanager, and check the stderr in the GRIDMANAGER_LOG.
1) In the file ~/bosco/local.bosco/condor_config.local, add to the bottom: 
GRIDMANAGER_DEBUG = D_FULLDEBUG

2) Reconfig condor:
$ condor_reconfig

3) Release any held jobs:
$ condor_release mastablasta



-Derek




On Mar 7, 2013, at 6:08 AM, Guillermo Marco Puche 
<guillermo.marco@xxxxxxxxxxxxxxxxxxxxx>
 wrote:


Hello,

On 02/28/2013 07:34 AM, Derek Weitzel wrote:

Hi Guillermo,

Since your SGE direct submissions worked, something odd must be going on with the glidein submission.  This should work.

What's the output from `condor_q -hold` on the glidein_wrapper.sh jobs?

-Derek


condor_q -hold
-- Submitter: brugal : <192.168.6.2:11000?sock=2009_e522_3> : brugal
 ID      OWNER          HELD_SINCE  HOLD_REASON                                
 152.0   mastablasta     3/7  06:14 Attempts to submit failed                  
 153.0   mastablasta     3/7  06:15 Attempts to submit failed

I'm very interested in this because i need to stream stdout from my SGE jobs on remote cluster and with grid universe I can't.

Best regards,
Guillermo.

On Feb 27, 2013, at 2:50 AM, Guillermo Marco Puche 

<guillermo.marco@xxxxxxxxxxxxxxxxxxxxx>

 wrote:



Hello,

Already tried that with no success. 
I've my SGE cluster added to bosco resources:

$ bosco_cluster -l
gmarco@cacique/condor


But wrapper job gets hold forever. 

$ condor_q
-- Submitter: brugal : <192.168.6.2:11000?sock=27234_2f33_3> : brugal
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD                  
  52.0   gmarco          2/27 02:52   0+00:00:00 I  0   0.0  bwa_glide.sh 52 0 
  53.0   gmarco          2/27 02:52   0+00:00:00 H  0   0.0  glidein_wrapper.sh
  54.0   gmarco          2/27 02:53   0+00:00:00 H  0   0.0  glidein_wrapper.sh
  55.0   gmarco          2/27 02:53   0+00:00:00 H  0   0.0  glidein_wrapper.sh
  56.0   gmarco          2/27 02:54   0+00:00:00 H  0   0.0  glidein_wrapper.sh
  57.0   gmarco          2/27 02:54   0+00:00:00 H  0   0.0  glidein_wrapper.sh


I guess the only way to submit a job to SGE cluster through Bosco/Condor is with grid universe.

Best regards,
Guillermo.

On 02/26/2013 06:46 PM, Jaime Frey wrote:


Bosco has an advanced submit option that automatically distributes your jobs between multiple clusters. One of its benefits is that you can stream the job's stdout and stderr back to your submit machine while the job executes.

On the Bosco installation page, check out section 5.2.2 Glidein Job submission example:


https://twiki.grid.iu.edu/bin/view/CampusGrids/BoscoInstall



 -- Jaime

On Feb 26, 2013, at 10:45 AM, Guillermo Marco Puche 

<guillermo.marco@xxxxxxxxxxxxxxxxxxxxx>

 wrote:



Hello,

I've my cluster added to Bosco cluster resource list. 
I'm a bit confused, so i won't be able to use stream output with Bosco? The only way to submit jobs to SGE with bosco is through grid universe.

Best regards,
Guillermo.


On 02/26/2013 05:24 PM, Jaime Frey wrote:


On Feb 26, 2013, at 9:54 AM, R. Kent Wenger <wenger@xxxxxxxxxxx>


 wrote:




On Tue, 26 Feb 2013, Guillermo Marco Puche wrote:




Since my jobs take some hours or even days to complete it would be nice to update the output file stored in condor directory. At this moment my output file gets updated with standard output only when job is completed.

Is there any way, let's say every X minutes or seconds to update the output file? That way I could check job execution without login to grid SGE cluster.



It sounds like streaming output should do what you want.  Take a look at the relevant section in the condor_submit man page:




http://research.cs.wisc.edu/htcondor/manual/v7.9/condor_submit.html#81025
The streaming of output files isn't supported for grid universe jobs.

Streaming of output should work with Bosco's multi-cluster mode. The potential drawback there is that your home machine needs to be contactable from the execution machines and needs to stay on the network for the duration of the job's run.



Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to 


htcondor-users-request@xxxxxxxxxxx


 with a
subject: Unsubscribe
You can also unsubscribe by visiting



https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users




The archives can be found at:



https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to 

htcondor-users-request@xxxxxxxxxxx

 with a
subject: Unsubscribe
You can also unsubscribe by visiting


https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users



The archives can be found at:


https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to 

htcondor-users-request@xxxxxxxxxxx

 with a
subject: Unsubscribe
You can also unsubscribe by visiting


https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users



The archives can be found at:


https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to 
htcondor-users-request@xxxxxxxxxxx
 with a
subject: Unsubscribe
You can also unsubscribe by visiting

https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users


The archives can be found at:

https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to 
htcondor-users-request@xxxxxxxxxxx
 with a
subject: Unsubscribe
You can also unsubscribe by visiting

https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users


The archives can be found at:

https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/