[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Error funning jobs on hetrogenous cluster



I have a test cluster with one Debian, one Kubuntu and one Fedora
node. I get different errors on all the nodes. I guess I need a local
executable on every node compiled for that spesific distro? Is there
some kind of requirement I can state in the submit file that can
specify distro the executable needs to run? Is there some way to send
my own libraries that my executable needs or do I have to have them on
the same path on each node? Can I have them on NFS? Guess I need to
compile them with NFS paths to lib-files in the Makefile then?

#Submit file
universe        = vanilla
executable    = dagoc
output           = dagoc.out.$(CLUSTER).$(PROCESS)
error             = dagoc.err.$(CLUSTER).$(PROCESS)
log               = dagoc.log.$(CLUSTER)
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = /mnt/dagocproject/dbases/TEST.db
arguments       = -c -start=10 -stop=20 /mnt/dagocproject/setups/TEST_remote.sup
queue 5


What does the following error mean?
dagoc.err.102.0 and dagoc.err.102.4
------------------------------------------------------------------------------------
condor_exec.exe: symbol lookup error: condor_exec.exe: undefined
symbol: _ZSt22__uninitialized_copy_aIN9__gnu_cxx17__normal_iteratorIPKSsSt6vectorISsSaISsEEEEPSsSsET0_T_SA_S9_SaIT1_E


Here I need to compile the executable on the node that got this error.
dagoc.err.102.1
------------------------------------------------------------------------------------
condor_exec.exe: /lib/tls/i686/cmov/libc.so.6: version `GLIBC_2.4' not
found (required by condor_exec.exe)

dagoc.log.11:
------------------------------------------------------------------------------------
000 (102.000.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
...
000 (102.001.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
...
000 (102.002.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
...
000 (102.003.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
...
000 (102.004.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
...
001 (102.000.000) 10/24 09:37:30 Job executing on host: <xxx.251>
...
005 (102.000.000) 10/24 09:37:32 Job terminated.
        (1) Normal termination (return value 127)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        819  -  Run Bytes Sent By Job
        23796974  -  Run Bytes Received By Job
        819  -  Total Bytes Sent By Job
        23796974  -  Total Bytes Received By Job
...
001 (102.001.000) 10/24 09:37:32 Job executing on host: <xxx.245>
...
005 (102.001.000) 10/24 09:37:32 Job terminated.
        (1) Normal termination (return value 1)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        107  -  Run Bytes Sent By Job
        23796974  -  Run Bytes Received By Job
        107  -  Total Bytes Sent By Job
        23796974  -  Total Bytes Received By Job
...
001 (102.002.000) 10/24 09:37:32 Job executing on host: <xxx.247>
...
001 (102.003.000) 10/24 09:37:34 Job executing on host: <xxx.247>
...
001 (102.004.000) 10/24 09:37:39 Job executing on host: <xxx.251>
...
005 (102.004.000) 10/24 09:37:39 Job terminated.
        (1) Normal termination (return value 127)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        819  -  Run Bytes Sent By Job
        23796974  -  Run Bytes Received By Job
        819  -  Total Bytes Sent By Job
        23796974  -  Total Bytes Received By Job
...
005 (102.002.000) 10/24 09:37:46 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:07, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:07, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        4536073  -  Run Bytes Sent By Job
        23796974  -  Run Bytes Received By Job
        4536073  -  Total Bytes Sent By Job
        23796974  -  Total Bytes Received By Job
...
005 (102.003.000) 10/24 09:37:48 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:07, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:07, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        4536071  -  Run Bytes Sent By Job
        23796974  -  Run Bytes Received By Job
        4536071  -  Total Bytes Sent By Job
        23796974  -  Total Bytes Received By Job
...