[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Am i doing right?



Hi
I just wanted to know if I am cooking right?
The program, jobscript, log and outputs are below.
Please comment.
Samir
 
ps: I only see co1ndout.0 as one output file. Shouldn't there be something line co1ndout.1, co1ndout.2 and so on?
 
 
 
 
The program:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
 
#include <mpi.h>
 
int main (int argc, char *argv[]) {
        int myrank, size;
        char HOST[256];
 
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
 
        bzero(HOST, sizeof(HOST));
        gethostname(HOST, sizeof(HOST));
 
        printf("%s \n", (char *)HOST);
 
        MPI_Finalize();
}
 
Job script:
 
universe     =parallel
#initialdir  = /home/skhanal/
executable   =/home/skhanal/mp1script
arguments   = /home/skhanal/cpi
machine_count =6
should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_input_files = /home/skhanal/cpi
output      = co1ndout.$(NODE)
error       = co1nderr.$(NODE)
log         = condor.log
 
queue
content of:    co1ndout.0
 
running /home/skhanal/cpi on 6 LINUX ch_p4 processors
Created /var/opt/condor/execute/dir_5207/PIazvckb5389
compute-0-1.local
compute-0-2.local
compute-0-3.local
compute-0-0.local
compute-0-4.local
compute-0-7.local
 
Content of : co1nderr.0
empty
 
 
Content of Condor.log:
 
000 (082.000.000) 03/20 18:21:35 Job submitted from host: <129.1.64.210:32773>
...
014 (082.000.000) 03/20 18:23:41 Node 0 executing on host: <10.255.255.247:32785>
...
014 (082.000.001) 03/20 18:23:41 Node 1 executing on host: <10.255.255.254:32785>
...
014 (082.000.002) 03/20 18:23:41 Node 2 executing on host: <10.255.255.250:32785>
...
014 (082.000.003) 03/20 18:23:41 Node 3 executing on host: <10.255.255.253:32785>
...
014 (082.000.004) 03/20 18:23:41 Node 4 executing on host: <10.255.255.252:32785>
...
014 (082.000.005) 03/20 18:23:42 Node 5 executing on host: <10.255.255.251:32785>
...
001 (082.000.000) 03/20 18:23:42 Job executing on host: MPI_job
...
015 (082.000.000) 03/20 18:23:47 Node 0 terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        1357  -  Run Bytes Sent By Node
        333145  -  Run Bytes Received By Node
        1357  -  Total Bytes Sent By Node
        333145  -  Total Bytes Received By Node
...
005 (082.000.000) 03/20 18:23:47 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        1357  -  Run Bytes Sent By Job
        1998870  -  Run Bytes Received By Job
        1357  -  Total Bytes Sent By Job
        1998870  -  Total Bytes Received By Job