K-Means-OpenMPI

This repository contains the code for the final parallel programming project of the Advanced Computer Architecture course (AY 2021/22).

Execution Instructions

1 - Setup the Git Repository

If not already done, download and install git from git-scm.com. Then clone the repository with the following command:

$ git clone https://github.com/EdoardoVacchini01/K-Means-OpenMPI.git

2 - Install OpenMPI

Download Open MPI 4.0.5 from the official Open MPI website and install it following the official guide. The project has been developed using Open MPI 4.0.5, but consider dowloading any compatible version of your choice. Furthermore, if you are not interested in the parallel implementation, this step is optional.

Please note that due to a lack of interest, the Microsoft Windows Open MPI version has been discontinued as stated on Open MPI website.

3 - Generate the Dataset

Provided you have installed Python on you computer, if you do not have a dataset you want to run the K-Means clustering algorithm on, you can run the Python script generate_dataset.py to generate sample datasets of your choice. You can find a detailed description of the parameters for this script by running:

$ ./generate_dataset.py --help

Please note that if you intend to run the application with your own dataset, it must comply with two requisites:

the first line of the dataset must contain the number of data points in the dataset;
each line of the file must represent a single data point, with the coordinates separated by a single space.

4 - Compile and Run the Program

To compile and run the serial application, run the following commands:

$ gcc -Wall *.c -o executableFile
$ ./executableFile [datasetFile] [outputFile] [nClusters] [maxIterations]

If you want to compile and run the parallel application instead, first make sure to switch to the parallel branch, then compile and run the Open MPI program:

$ git checkout parallel
$ mpicc -Wall *.c -o executableFile
$ mpirun -n N executableFile [datasetFile] [outputFile] [nClusters] [maxIterations]

In the two commands above, N is the number of cores you want to execute the parallel application on, executableFile is the name you want to give to the executable application, datasetFile is the file containing the dataset, outputFile is the output file, nClusters is the number of clusters you want the algorithm to find and maxIterations is the maximum number of iterations the algorithm will go through.

Every parameter is optional (a default value will be used if a parameter is not provided), but the order of the parameters must match the one reported previously. If not set, nClusters will default to 3 while maxIterations will default to 100.

When the program has finished running, outputFile contains the list of centroids followed by the cluster identifiers that have been assigned to the data points, in the same order as they appear in the dataset file.

edoardovacchini01 / k-means-openmpi Goto Github PK

k-means-openmpi's Introduction

K-Means-OpenMPI

Execution Instructions

1 - Setup the Git Repository

2 - Install OpenMPI

3 - Generate the Dataset

4 - Compile and Run the Program

k-means-openmpi's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent