[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GAHP and proxy

Barnett P. Chiu wrote:
Hi, Jan,

     Thanks for the reply.
I ran strace -f to condor_submit command and condor_gridmanager seemed to have opened the right proxy as show on the first line below:

I suggest that you strace the condor_gridmanager process (forked by condor_schedd, I think) rather than condor_submit, which just communicates with condor_schedd. That additional strace may reveal failed system calls related to directory/file access. You can attach with strace -f -p <pid> to the condor_schedd process. You can also replace the condor_gridmanager executable and/or the java binary (the path of which is configurable in Condor) with a little shell script wrapper which does something like "exec strace -f $* &> /tmp/strace.log".

I suspect that your problems are related to incorrect access permissions on some directory (or the parent directory of some directory). If you could see EACCES in the strace output, that would be a big clue. Your remarks that it works with one user but not another also support this hypothesis.

Looking at the GridmanagerLog again, I do see a return code 7, which indicates a failure of activating globus module in the process of search for CA?

The bug report I previously mentioned had something about CA search, but I don't know precisely what this message means.

Anyway to intercept the proxy in the scratch directory before it disappear?

You could suspend the condor_gridmanager process with "gdb -p <pid>" ("detach", "quit" to make it continue), but it may be difficult to hit the sweet spot or step through if your Condor doesn't have debugging symbols.

If you're on Linux, you could also override certain library calls (that would require a wrapper script mentioned above) to add artificial delay (then the problem would be not to delay the wrong calls too much...)

 * Intercepts calls to close/fclose.
 * Compile with:
 *   gcc -fPIC -c -o hijack.o hijack.c
 *   gcc -shared -o hijack.so hijack.o -ldl
 * Usage: env LD_PRELOAD=./hijack.so <some program>
#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>

static int (*_fclose)(FILE *f);
static int (*_close)(int fd);
static FILE* fl;

int close(int fd)
    if (_close == NULL)
        _close = (int (*)(int fd)) dlsym(RTLD_NEXT, "close");

    if (!fl) { fl = fopen("/tmp/hijack.log", "a"); }
    fprintf(fl, "close(%d)\n", fd);

    return _close(fd);

int fclose(FILE* f)
    if (_fclose == NULL)
        _fclose = (int (*)(FILE *f)) dlsym(RTLD_NEXT, "fclose");

    if (!fl) { fl = fopen("/tmp/hijack.log", "a"); }
    fprintf(fl, "fclose(%d)\n", fileno(f));
// uncomment to delay 10 seconds here...
//    sleep(10);

    return _fclose(f);

Jan Ploski