[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_ssh_to_job



While goggling the different results for condor_ssh_to_job, I have found some interesting example on this page https://twiki.grid.iu.edu/bin/view/Engagement/HtmlVersion (see 9 Appendix: Monitoring a running job). In the example it shows about two interesting commands: glidein_ls and glidein_interactive. This is very cool, but as far as I know by a quick reading it is part of the glideinWMS project. Is there anything like this in condor? I guess I could look in the command files (which are python based) to understand how this is working in glideinWMS and maybe try to convert them. But I guess if someway have a different ideas, please be my guest :-)

I have the feeling condor already have this, I just don't know how yet :-)

Sassy

 

On Mon, Jul 18, 2011 at 8:10 PM, Sassy Natan <sassyn@xxxxxxxxx> wrote:
Hi,

I'm running condor on Linux, with total of 200 slots in my pool.

When running a job, my users would like from time to time to interact with the running job.
So if for example they look in the job output file (stdout) and see some error, they would like to ssh the job and do some changes for the future input files (in the execute dir).
I manage to do ssh for the job, and even get a welcome screen that point me to the slot the job is running.
I also getting the PID of the process, but I don't know how to bind to the process.

If my process in the job.sub is a perl script, getting different args and also calling to different tools (like matlab, gcc etc...), how can I get into a mode that looks like I run the command from my console? where I can see the stdout tail on screen, and I can do CTRL+C to terminate the job? same as I do when using non-condor env?

The things is that if one of the tool get error, it get into a it's own shell, like for example in matlab, where I can provide or change some parameters and resume the run. However in a condor mode, this just get into the shell and I can not bind to it. The job  is running from a condor perspective, but as a matter of fact it's just in a idle mode, waiting for some input on the shell (In my case matlab, but there are some other tools as well). 

I tried to use gdb, but that seems to stuck my job. The minute I did that, the job log file seems to hang out. Until I did that it did printed a lot of info (I use the stream option). But once I used the gdb there was no more activity on the running machine.
I know the job is getting into a shell mode, since there are some error. If there is no error the job complete suspensefuly, but my users really like to debug the job if it get into this mode and not having to run from the beginning or outside condor.

Can someone please provide an example? or a feedback?


Thanks
Sassu