Giter VIP home page Giter VIP logo

k-means-openmpi's Introduction

K-Means-OpenMPI

This repository contains the code for the final parallel programming project of the Advanced Computer Architecture course (AY 2021/22).

Execution Instructions

1 - Setup the Git Repository

If not already done, download and install git from git-scm.com. Then clone the repository with the following command:

$ git clone https://github.com/EdoardoVacchini01/K-Means-OpenMPI.git

2 - Install OpenMPI

Download Open MPI 4.0.5 from the official Open MPI website and install it following the official guide. The project has been developed using Open MPI 4.0.5, but consider dowloading any compatible version of your choice. Furthermore, if you are not interested in the parallel implementation, this step is optional.

Please note that due to a lack of interest, the Microsoft Windows Open MPI version has been discontinued as stated on Open MPI website.

3 - Generate the Dataset

Provided you have installed Python on you computer, if you do not have a dataset you want to run the K-Means clustering algorithm on, you can run the Python script generate_dataset.py to generate sample datasets of your choice. You can find a detailed description of the parameters for this script by running:

$ ./generate_dataset.py --help

Please note that if you intend to run the application with your own dataset, it must comply with two requisites:

  • the first line of the dataset must contain the number of data points in the dataset;
  • each line of the file must represent a single data point, with the coordinates separated by a single space.

4 - Compile and Run the Program

To compile and run the serial application, run the following commands:

$ gcc -Wall *.c -o executableFile
$ ./executableFile [datasetFile] [outputFile] [nClusters] [maxIterations]

If you want to compile and run the parallel application instead, first make sure to switch to the parallel branch, then compile and run the Open MPI program:

$ git checkout parallel
$ mpicc -Wall *.c -o executableFile
$ mpirun -n N executableFile [datasetFile] [outputFile] [nClusters] [maxIterations]

In the two commands above, N is the number of cores you want to execute the parallel application on, executableFile is the name you want to give to the executable application, datasetFile is the file containing the dataset, outputFile is the output file, nClusters is the number of clusters you want the algorithm to find and maxIterations is the maximum number of iterations the algorithm will go through.

Every parameter is optional (a default value will be used if a parameter is not provided), but the order of the parameters must match the one reported previously. If not set, nClusters will default to 3 while maxIterations will default to 100.

When the program has finished running, outputFile contains the list of centroids followed by the cluster identifiers that have been assigned to the data points, in the same order as they appear in the dataset file.

k-means-openmpi's People

Contributors

daniele-murer avatar edoardovacchini01 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.