[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fwd: MPI jobs are not writing.



Hi Malathi,


Two ideas to try:
1. Change "should_transfer_files" to "no" since it appears that you do not need to use file transfer.
2. Try adding "transfer_executable = no" while still keeping "should_transfer_files = yes".

Jason Patton

On Sun, May 27, 2018 at 10:58 PM, Malathi Deenadayalan <malathi@xxxxxxxx> wrote:



From: "Malathi Deenadayalan" <malathi@xxxxxxxx>
To: "htcondor-admin" <htcondor-admin@xxxxxxxxxxx>
Sent: Friday, May 25, 2018 9:55:48 AM
Subject: MPI jobs are not writing.

Hi,

I am using parallel universe for submitting mpi jobs and I am testing some benchmarking for the IO performance,
So If I submit with 10 or 100 cores with absolute path the program is running.

Problem is:

1) when I dont use Âabsolute path the program is not writing in to my /home.

2) If I increase the core to 600 the program is going to idle state and start running and then idle state like that it is looping but actually not running.

3) what could be the reason.

Here attached programs and submit file.

Program 1.

#include <mpi.h>
#include <stdio.h>
#include <time.h>
#include<string.h>
#include <stdlib.h>
#include <unistd.h>
#define MB 1048576

char * get_current_time()
/* To put time stamp when needed */
{
ÂÂÂ time_t current_time;
ÂÂÂ char *ct;
ÂÂÂ current_time = time(NULL);
ÂÂÂ ct=ctime(&current_time);
ÂÂÂ ct[strlen(ct)-1]='\0';
ÂÂÂ return ct;
}

int main(int argc, char** argv)
/* Most of the work */
{
ÂÂÂ int size;ÂÂÂÂÂÂÂÂÂÂÂÂ int ii;ÂÂÂÂÂÂÂÂÂÂÂÂÂ int rank;
ÂÂÂ char host_name[255];Â char *buffer;ÂÂÂÂÂÂÂ FILE *fd;
ÂÂÂ long buffer_size=MB;ÂÂ int name_len;ÂÂÂÂÂÂÂ clock_t begin, end;
ÂÂÂ int elapsed_secs;ÂÂÂÂ long writen_rec_len;Â char cwd[500];
ÂÂÂ
ÂÂÂ begin= clock();
ÂÂÂ
ÂÂÂ if (argc == 2 )
ÂÂÂ {
ÂÂÂ ÂÂÂ buffer_size*=atoi(argv[1]);
ÂÂÂ }
ÂÂÂ /*Initialize the MPI environment */
ÂÂÂ MPI_Init(NULL, NULL);
ÂÂÂ /* Get the number of processes */
ÂÂÂ MPI_Comm_size(MPI_COMM_WORLD, &size);
ÂÂÂ /* Get the rank of the process*/
ÂÂÂ char file_names[500];
ÂÂÂ MPI_Comm_rank(MPI_COMM_WORLD, &rank);
ÂÂÂ /* Get the name of the processor */
ÂÂÂ MPI_Get_processor_name(host_name, &name_len);
ÂÂÂ /* host name Not much usefull */
ÂÂÂ if (getcwd(cwd, sizeof(cwd)) != NULL)
ÂÂÂ ÂÂÂ printf( "%s:Current working dir: %s\n",get_current_time(), cwd);
ÂÂÂ printf("%s:[process=%d]Buffer size in MB=%ld\n",get_current_time(),rank,buffer_size/MB);
ÂÂÂ buffer=(char*)malloc(buffer_size);
ÂÂÂ memset(buffer,'a',buffer_size);
ÂÂÂ sprintf(file_names,"%s_%d_created_file.dat",host_name,rank);
ÂÂÂ fd=fopen(file_names,"wb");
ÂÂÂ if (fd==NULL)
ÂÂÂ {
ÂÂÂ ÂÂÂ printf("%s: %s, ***FILE NOT OPENED ****\n",get_current_time(), file_names);
ÂÂÂ }
ÂÂÂ else
ÂÂÂ {
ÂÂÂ ÂÂÂ printf("%s: %s, is created \n",get_current_time(), file_names);
ÂÂÂ ÂÂÂ writen_rec_len=fwrite(buffer,buffer_size,1,fd );
ÂÂÂ ÂÂÂ if (writen_rec_len != 1)
ÂÂÂ ÂÂÂ {
ÂÂÂ ÂÂÂ ÂÂÂ printf("%s:[rank=%d] **** WRITE FAILED **** \n",get_current_time(),rank);
ÂÂÂ ÂÂÂ ÂÂÂ
ÂÂÂ ÂÂÂ }
ÂÂÂ ÂÂÂ else
ÂÂÂ ÂÂÂ {
ÂÂÂ ÂÂÂ ÂÂÂ printf("%s:[rank=%d] writen %ld number of records of size %ld\n",get_current_time(),rank, writen_rec_len,buffer_size );
ÂÂÂ ÂÂÂ }
ÂÂÂ ÂÂÂ fclose(fd);
ÂÂÂ }
ÂÂÂ /* Print off a hello world message */
ÂÂÂ MPI_Finalize();
ÂÂÂ /* Finalize the MPI environment. */
ÂÂÂ end = clock();
ÂÂÂ elapsed_secs =(end - begin);
ÂÂÂ printf("%s:%s,rank %d out of %d processors, lapsed time in pulses =%d\n",get_current_time(), host_name, rank, size, elapsed_secs);
ÂÂÂ free(buffer);
}

=======================================================================================================================
submit file:
########################################
## This for openmpi script to work
#######################################
JOBNAME = data_write
universe = parallel
machine_count = 50
buffer_size= 124
executable = ~/parallel_IO/openmpiscript
arguments = ~/parallel_IO/mpi_code/data_write $(buffer_size)
#transfer_input_files = test_prog,condor_ssh,sshd.sh
#request_cpus = 1
getenv = true
#should_transfer_files = IF_NEEDED
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
output = $(JOBNAME).out
error = $(JOBNAME).err
logÂÂÂ = $(JOBNAME).log

queue

==============================================================================

I have 23 nodes with 32core (2processor) each and this is the output for

[malathi.d@cn101:] $ condor_status -af:h Name DedicatedScheduler
NameÂÂÂÂÂÂÂÂ DedicatedSchedulerÂÂÂÂÂ
slot1@cn101Â DedicatedScheduler@cn101
slot2@cn101Â DedicatedScheduler@cn101
slot3@cn101Â DedicatedScheduler@cn101
slot4@cn101Â DedicatedScheduler@cn101
slot5@cn101Â DedicatedScheduler@cn101

slot1@cn102Â DedicatedScheduler@cn101
slot2@cn102Â DedicatedScheduler@cn101
slot3@cn102Â DedicatedScheduler@cn101
slot4@cn102Â DedicatedScheduler@cn101
slot5@cn102Â DedicatedScheduler@cn101
slot6@cn102Â DedicatedScheduler@cn101
slot7@cn102Â DedicatedScheduler@cn101
slot8@cn102Â DedicatedScheduler@cn101

slot1@cn103Â DedicatedScheduler@cn101
slot2@cn103Â DedicatedScheduler@cn101
slot3@cn103Â DedicatedScheduler@cn101
slot4@cn103Â DedicatedScheduler@cn101
slot5@cn103Â DedicatedScheduler@cn101
slot6@cn103Â DedicatedScheduler@cn101
slot7@cn103Â DedicatedScheduler@cn101

slot1@cn104Â DedicatedScheduler@cn101
slot2@cn104Â DedicatedScheduler@cn101
slot3@cn104Â DedicatedScheduler@cn101
slot4@cn104Â DedicatedScheduler@cn101
slot5@cn104Â DedicatedScheduler@cn101

slot32@cn104 DedicatedScheduler@cn101
slot1@cn105Â DedicatedScheduler@cn101
slot2@cn105Â DedicatedScheduler@cn101
slot3@cn105Â DedicatedScheduler@cn101
slot4@cn105Â DedicatedScheduler@cn101
slot5@cn105Â DedicatedScheduler@cn101


Can you help me ?

And I want to do performance testing for nfs and gpfs please kindly advice.

Regards,
Malathi.


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/