Marius

Marius is a system under active development for training embeddings for large-scale graphs on a single machine.

Training on large scale graphs requires a large amount of data movement to get embedding parameters from storage to the computational device. Marius is designed to mitigate/reduce data movement overheads using:

Pipelined training and IO
Partition caching and buffer-aware data orderings

Details on how Marius works can be found in our OSDI '21 Paper, where experiment scripts and configurations can be found in the osdi2021 branch.

Requirements

(Other versions may work, but are untested)

Ubuntu 18.04 or MacOS 10.15
CUDA 10.1 or 10.2 (If using GPU training)
CuDNN 7 (If using GPU training)
1.7 >= pytorch
python >= 3.6
pip >= 21
GCC >= 9 (On Linux) or Clang 12.0 (On MacOS)
cmake >= 3.12
make >= 3.8

Installation from source with Pip

Install latest version of PyTorch for your CUDA version:

Linux:
- CUDA 10.1: python3 -m pip install torch==1.7.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
- CUDA 10.2: python3 -m pip install torch==1.7.1
- CPU Only: python3 -m pip install torch==1.7.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
MacOS:
- CPU Only: python3 -m pip install torch==1.7.1
Clone the repository git clone https://github.com/marius-team/marius.git
Build and install Marius cd marius; python3 -m pip install .

Full script (without torch install)

git clone https://github.com/marius-team/marius.git
cd marius
python3 -m pip install .

Installation from source with CMake

Clone the repository git clone https://github.com/marius-team/marius.git
Install dependencies cd marius; python3 -m pip install -r requirements.txt
Create build directory mkdir build; cd build
Run cmake in the build directory cmake ../ (CPU-only build) or cmake ../ -DUSE_CUDA=1 (GPU build)
Make the marius executable. make marius_train -j

Full script (without torch install)

git clone https://github.com/marius-team/marius.git
cd marius
python3 -m pip install -r requirements.txt
mkdir build
cd build
cmake ../ -DUSE_CUDA=1
make -j

Training a graph

Training embeddings on a graph requires three steps.

Define a configuration file. This example will use the config already defined in examples/training/configs/fb15k_gpu.ini

See docs/configuration.rst for full details on the configuration options.
Preprocess the dataset marius_preprocess fb15k output_dir/

The first argument of marius/tools/preprocess.py defines the dataset we wish to download and preprocess, in this case fb15k. The second argument tells the preprocessor where to put the preprocessed dataset.
Run the training executable with the config file. marius_train examples/training/configs/fb15k_gpu.ini

The output of the first epoch should be similar to the following.

[info] [03/18/21 01:33:18.778] Metadata initialized
[info] [03/18/21 01:33:18.778] Training set initialized
[info] [03/18/21 01:33:18.779] Evaluation set initialized
[info] [03/18/21 01:33:18.779] Preprocessing Complete: 2.605s
[info] [03/18/21 01:33:18.791] ################ Starting training epoch 1 ################
[info] [03/18/21 01:33:18.836] Total Edges Processed: 40000, Percent Complete: 0.082
[info] [03/18/21 01:33:18.862] Total Edges Processed: 80000, Percent Complete: 0.163
[info] [03/18/21 01:33:18.892] Total Edges Processed: 120000, Percent Complete: 0.245
[info] [03/18/21 01:33:18.918] Total Edges Processed: 160000, Percent Complete: 0.327
[info] [03/18/21 01:33:18.944] Total Edges Processed: 200000, Percent Complete: 0.408
[info] [03/18/21 01:33:18.970] Total Edges Processed: 240000, Percent Complete: 0.490
[info] [03/18/21 01:33:18.996] Total Edges Processed: 280000, Percent Complete: 0.571
[info] [03/18/21 01:33:19.021] Total Edges Processed: 320000, Percent Complete: 0.653
[info] [03/18/21 01:33:19.046] Total Edges Processed: 360000, Percent Complete: 0.735
[info] [03/18/21 01:33:19.071] Total Edges Processed: 400000, Percent Complete: 0.816
[info] [03/18/21 01:33:19.096] Total Edges Processed: 440000, Percent Complete: 0.898
[info] [03/18/21 01:33:19.122] Total Edges Processed: 480000, Percent Complete: 0.980
[info] [03/18/21 01:33:19.130] ################ Finished training epoch 1 ################
[info] [03/18/21 01:33:19.130] Epoch Runtime (Before shuffle/sync): 339ms
[info] [03/18/21 01:33:19.130] Edges per Second (Before shuffle/sync): 1425197.8
[info] [03/18/21 01:33:19.130] Edges Shuffled
[info] [03/18/21 01:33:19.130] Epoch Runtime (Including shuffle/sync): 339ms
[info] [03/18/21 01:33:19.130] Edges per Second (Including shuffle/sync): 1425197.8
[info] [03/18/21 01:33:19.148] Starting evaluating
[info] [03/18/21 01:33:19.254] Pipeline flush complete
[info] [03/18/21 01:33:19.271] Num Eval Edges: 50000
[info] [03/18/21 01:33:19.271] Num Eval Batches: 50
[info] [03/18/21 01:33:19.271] Auc: 0.973, Avg Ranks: 24.477, MRR: 0.491, Hits@1: 0.357, Hits@5: 0.651, Hits@10: 0.733, Hits@20: 0.806, Hits@50: 0.895, Hits@100: 0.943

To train using CPUs only, use the examples/training/configs/fb15k_cpu.ini configuration file instead.

Using the Python API

Sample Code

Below is a sample python script which trains a single epoch of embeddings on fb15k.

import marius as m
from marius.tools import preprocess

def fb15k_example():

    preprocess.fb15k(output_dir="output_dir/")
    
    config_path = "examples/training/configs/fb15k_cpu.ini"
    config = m.parseConfig(config_path)

    train_set, eval_set = m.initializeDatasets(config)

    model = m.initializeModel(config.model.encoder_model, config.model.decoder_model)

    trainer = m.SynchronousTrainer(train_set, model)
    evaluator = m.SynchronousEvaluator(eval_set, model)

    trainer.train(1)
    evaluator.evaluate(True)


if __name__ == "__main__":
    fb15k_example()

Marius in Docker

Marius can be deployed within a docker container. Here is a sample ubuntu dockerfile (located at examples/docker/dockerfile) which contains the necessary dependencies preinstalled for GPU training.

Building and running the container

Build an image with the name marius and the tag example:
docker build -t marius:example -f examples/docker/dockerfile examples/docker

Create and start a new container instance named gaius with:
docker run --name gaius -itd marius:example

Run docker ps to verify the container is running

Start a bash session inside the container:
docker exec -it gaius bash

Sample Dockerfile

See examples/docker/dockerfile

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
RUN apt update

RUN apt install -y g++ \ 
         make \
         wget \
         unzip \
         vim \
         git \
         python3-pip

# install gcc-9
RUN apt install -y software-properties-common
RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test
RUN apt update
RUN apt install -y gcc-9 g++-9
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 9

# install cmake 3.20
RUN wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.sh
RUN mkdir /opt/cmake
RUN sh cmake-3.20.0-linux-x86_64.sh --skip-license --prefix=/opt/cmake/
RUN ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake

# install pytorch
RUN python3 -m pip install torch==1.7.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Citing Marius

Arxiv Version:

@misc{mohoney2021marius,
      title={Marius: Learning Massive Graph Embeddings on a Single Machine}, 
      author={Jason Mohoney and Roger Waleffe and Yiheng Xu and Theodoros Rekatsinas and Shivaram Venkataraman},
      year={2021},
      eprint={2101.08358},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

OSDI Version (not yet available):

bronzepot / marius Goto Github PK

marius's Introduction

Marius

Requirements

Installation from source with Pip

Full script (without torch install)

Installation from source with CMake

Full script (without torch install)

Training a graph

Using the Python API

Sample Code

Marius in Docker

Building and running the container

Sample Dockerfile

Citing Marius

marius's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent