Giter VIP home page Giter VIP logo

marius's Introduction

Marius

Marius is a system under active development for training embeddings for large-scale graphs on a single machine.

Training on large scale graphs requires a large amount of data movement to get embedding parameters from storage to the computational device. Marius is designed to mitigate/reduce data movement overheads using:

  • Pipelined training and IO
  • Partition caching and buffer-aware data orderings

Details on how Marius works can be found in our OSDI '21 Paper, where experiment scripts and configurations can be found in the osdi2021 branch.

Requirements

(Other versions may work, but are untested)

  • Ubuntu 18.04 or MacOS 10.15
  • CUDA 10.1 or 10.2 (If using GPU training)
  • CuDNN 7 (If using GPU training)
  • 1.7 >= pytorch
  • python >= 3.6
  • pip >= 21
  • GCC >= 9 (On Linux) or Clang 12.0 (On MacOS)
  • cmake >= 3.12
  • make >= 3.8

Installation from source with Pip

  1. Install latest version of PyTorch for your CUDA version:

    Linux:

    • CUDA 10.1: python3 -m pip install torch==1.7.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
    • CUDA 10.2: python3 -m pip install torch==1.7.1
    • CPU Only: python3 -m pip install torch==1.7.1+cpu -f https://download.pytorch.org/whl/torch_stable.html

    MacOS:

    • CPU Only: python3 -m pip install torch==1.7.1
  2. Clone the repository git clone https://github.com/marius-team/marius.git

  3. Build and install Marius cd marius; python3 -m pip install .

Full script (without torch install)

git clone https://github.com/marius-team/marius.git
cd marius
python3 -m pip install .

Installation from source with CMake

  1. Clone the repository git clone https://github.com/marius-team/marius.git

  2. Install dependencies cd marius; python3 -m pip install -r requirements.txt

  3. Create build directory mkdir build; cd build

  4. Run cmake in the build directory cmake ../ (CPU-only build) or cmake ../ -DUSE_CUDA=1 (GPU build)

  5. Make the marius executable. make marius_train -j

Full script (without torch install)

git clone https://github.com/marius-team/marius.git
cd marius
python3 -m pip install -r requirements.txt
mkdir build
cd build
cmake ../ -DUSE_CUDA=1
make -j

Training a graph

Training embeddings on a graph requires three steps.

  1. Define a configuration file. This example will use the config already defined in examples/training/configs/fb15k_gpu.ini

    See docs/configuration.rst for full details on the configuration options.

  2. Preprocess the dataset marius_preprocess fb15k output_dir/

    The first argument of marius/tools/preprocess.py defines the dataset we wish to download and preprocess, in this case fb15k. The second argument tells the preprocessor where to put the preprocessed dataset.

  3. Run the training executable with the config file. marius_train examples/training/configs/fb15k_gpu.ini

The output of the first epoch should be similar to the following.

[info] [03/18/21 01:33:18.778] Metadata initialized
[info] [03/18/21 01:33:18.778] Training set initialized
[info] [03/18/21 01:33:18.779] Evaluation set initialized
[info] [03/18/21 01:33:18.779] Preprocessing Complete: 2.605s
[info] [03/18/21 01:33:18.791] ################ Starting training epoch 1 ################
[info] [03/18/21 01:33:18.836] Total Edges Processed: 40000, Percent Complete: 0.082
[info] [03/18/21 01:33:18.862] Total Edges Processed: 80000, Percent Complete: 0.163
[info] [03/18/21 01:33:18.892] Total Edges Processed: 120000, Percent Complete: 0.245
[info] [03/18/21 01:33:18.918] Total Edges Processed: 160000, Percent Complete: 0.327
[info] [03/18/21 01:33:18.944] Total Edges Processed: 200000, Percent Complete: 0.408
[info] [03/18/21 01:33:18.970] Total Edges Processed: 240000, Percent Complete: 0.490
[info] [03/18/21 01:33:18.996] Total Edges Processed: 280000, Percent Complete: 0.571
[info] [03/18/21 01:33:19.021] Total Edges Processed: 320000, Percent Complete: 0.653
[info] [03/18/21 01:33:19.046] Total Edges Processed: 360000, Percent Complete: 0.735
[info] [03/18/21 01:33:19.071] Total Edges Processed: 400000, Percent Complete: 0.816
[info] [03/18/21 01:33:19.096] Total Edges Processed: 440000, Percent Complete: 0.898
[info] [03/18/21 01:33:19.122] Total Edges Processed: 480000, Percent Complete: 0.980
[info] [03/18/21 01:33:19.130] ################ Finished training epoch 1 ################
[info] [03/18/21 01:33:19.130] Epoch Runtime (Before shuffle/sync): 339ms
[info] [03/18/21 01:33:19.130] Edges per Second (Before shuffle/sync): 1425197.8
[info] [03/18/21 01:33:19.130] Edges Shuffled
[info] [03/18/21 01:33:19.130] Epoch Runtime (Including shuffle/sync): 339ms
[info] [03/18/21 01:33:19.130] Edges per Second (Including shuffle/sync): 1425197.8
[info] [03/18/21 01:33:19.148] Starting evaluating
[info] [03/18/21 01:33:19.254] Pipeline flush complete
[info] [03/18/21 01:33:19.271] Num Eval Edges: 50000
[info] [03/18/21 01:33:19.271] Num Eval Batches: 50
[info] [03/18/21 01:33:19.271] Auc: 0.973, Avg Ranks: 24.477, MRR: 0.491, Hits@1: 0.357, Hits@5: 0.651, Hits@10: 0.733, Hits@20: 0.806, Hits@50: 0.895, Hits@100: 0.943

To train using CPUs only, use the examples/training/configs/fb15k_cpu.ini configuration file instead.

Using the Python API

Sample Code

Below is a sample python script which trains a single epoch of embeddings on fb15k.

import marius as m
from marius.tools import preprocess

def fb15k_example():

    preprocess.fb15k(output_dir="output_dir/")
    
    config_path = "examples/training/configs/fb15k_cpu.ini"
    config = m.parseConfig(config_path)

    train_set, eval_set = m.initializeDatasets(config)

    model = m.initializeModel(config.model.encoder_model, config.model.decoder_model)

    trainer = m.SynchronousTrainer(train_set, model)
    evaluator = m.SynchronousEvaluator(eval_set, model)

    trainer.train(1)
    evaluator.evaluate(True)


if __name__ == "__main__":
    fb15k_example()

Marius in Docker

Marius can be deployed within a docker container. Here is a sample ubuntu dockerfile (located at examples/docker/dockerfile) which contains the necessary dependencies preinstalled for GPU training.

Building and running the container

Build an image with the name marius and the tag example:
docker build -t marius:example -f examples/docker/dockerfile examples/docker

Create and start a new container instance named gaius with:
docker run --name gaius -itd marius:example

Run docker ps to verify the container is running

Start a bash session inside the container:
docker exec -it gaius bash

Sample Dockerfile

See examples/docker/dockerfile

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
RUN apt update

RUN apt install -y g++ \ 
         make \
         wget \
         unzip \
         vim \
         git \
         python3-pip

# install gcc-9
RUN apt install -y software-properties-common
RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test
RUN apt update
RUN apt install -y gcc-9 g++-9
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 9

# install cmake 3.20
RUN wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.sh
RUN mkdir /opt/cmake
RUN sh cmake-3.20.0-linux-x86_64.sh --skip-license --prefix=/opt/cmake/
RUN ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake

# install pytorch
RUN python3 -m pip install torch==1.7.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Citing Marius

Arxiv Version:

@misc{mohoney2021marius,
      title={Marius: Learning Massive Graph Embeddings on a Single Machine}, 
      author={Jason Mohoney and Roger Waleffe and Yiheng Xu and Theodoros Rekatsinas and Shivaram Venkataraman},
      year={2021},
      eprint={2101.08358},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

OSDI Version (not yet available):

marius's People

Contributors

jasonmoho avatar anzexie avatar cthoyt avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.