Marius is a system under active development for training embeddings for large-scale graphs on a single machine.
Training on large scale graphs requires a large amount of data movement to get embedding parameters from storage to the computational device. Marius is designed to mitigate/reduce data movement overheads using:
- Pipelined training and IO
- Partition caching and buffer-aware data orderings
Details on how Marius works can be found in our OSDI '21 Paper, where experiment scripts and configurations can be found in the osdi2021
branch.
(Other versions may work, but are untested)
- Ubuntu 18.04 or MacOS 10.15
- CUDA 10.1 or 10.2 (If using GPU training)
- CuDNN 7 (If using GPU training)
- 1.7 >= pytorch
- python >= 3.6
- pip >= 21
- GCC >= 9 (On Linux) or Clang 12.0 (On MacOS)
- cmake >= 3.12
- make >= 3.8
-
Install latest version of PyTorch for your CUDA version:
Linux:
- CUDA 10.1:
python3 -m pip install torch==1.7.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
- CUDA 10.2:
python3 -m pip install torch==1.7.1
- CPU Only:
python3 -m pip install torch==1.7.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
MacOS:
- CPU Only:
python3 -m pip install torch==1.7.1
- CUDA 10.1:
-
Clone the repository
git clone https://github.com/marius-team/marius.git
-
Build and install Marius
cd marius; python3 -m pip install .
git clone https://github.com/marius-team/marius.git
cd marius
python3 -m pip install .
-
Clone the repository
git clone https://github.com/marius-team/marius.git
-
Install dependencies
cd marius; python3 -m pip install -r requirements.txt
-
Create build directory
mkdir build; cd build
-
Run cmake in the build directory
cmake ../
(CPU-only build) orcmake ../ -DUSE_CUDA=1
(GPU build) -
Make the marius executable.
make marius_train -j
git clone https://github.com/marius-team/marius.git
cd marius
python3 -m pip install -r requirements.txt
mkdir build
cd build
cmake ../ -DUSE_CUDA=1
make -j
Training embeddings on a graph requires three steps.
-
Define a configuration file. This example will use the config already defined in
examples/training/configs/fb15k_gpu.ini
See
docs/configuration.rst
for full details on the configuration options. -
Preprocess the dataset
marius_preprocess fb15k output_dir/
The first argument of marius/tools/preprocess.py defines the dataset we wish to download and preprocess, in this case fb15k. The second argument tells the preprocessor where to put the preprocessed dataset.
-
Run the training executable with the config file.
marius_train examples/training/configs/fb15k_gpu.ini
The output of the first epoch should be similar to the following.
[info] [03/18/21 01:33:18.778] Metadata initialized
[info] [03/18/21 01:33:18.778] Training set initialized
[info] [03/18/21 01:33:18.779] Evaluation set initialized
[info] [03/18/21 01:33:18.779] Preprocessing Complete: 2.605s
[info] [03/18/21 01:33:18.791] ################ Starting training epoch 1 ################
[info] [03/18/21 01:33:18.836] Total Edges Processed: 40000, Percent Complete: 0.082
[info] [03/18/21 01:33:18.862] Total Edges Processed: 80000, Percent Complete: 0.163
[info] [03/18/21 01:33:18.892] Total Edges Processed: 120000, Percent Complete: 0.245
[info] [03/18/21 01:33:18.918] Total Edges Processed: 160000, Percent Complete: 0.327
[info] [03/18/21 01:33:18.944] Total Edges Processed: 200000, Percent Complete: 0.408
[info] [03/18/21 01:33:18.970] Total Edges Processed: 240000, Percent Complete: 0.490
[info] [03/18/21 01:33:18.996] Total Edges Processed: 280000, Percent Complete: 0.571
[info] [03/18/21 01:33:19.021] Total Edges Processed: 320000, Percent Complete: 0.653
[info] [03/18/21 01:33:19.046] Total Edges Processed: 360000, Percent Complete: 0.735
[info] [03/18/21 01:33:19.071] Total Edges Processed: 400000, Percent Complete: 0.816
[info] [03/18/21 01:33:19.096] Total Edges Processed: 440000, Percent Complete: 0.898
[info] [03/18/21 01:33:19.122] Total Edges Processed: 480000, Percent Complete: 0.980
[info] [03/18/21 01:33:19.130] ################ Finished training epoch 1 ################
[info] [03/18/21 01:33:19.130] Epoch Runtime (Before shuffle/sync): 339ms
[info] [03/18/21 01:33:19.130] Edges per Second (Before shuffle/sync): 1425197.8
[info] [03/18/21 01:33:19.130] Edges Shuffled
[info] [03/18/21 01:33:19.130] Epoch Runtime (Including shuffle/sync): 339ms
[info] [03/18/21 01:33:19.130] Edges per Second (Including shuffle/sync): 1425197.8
[info] [03/18/21 01:33:19.148] Starting evaluating
[info] [03/18/21 01:33:19.254] Pipeline flush complete
[info] [03/18/21 01:33:19.271] Num Eval Edges: 50000
[info] [03/18/21 01:33:19.271] Num Eval Batches: 50
[info] [03/18/21 01:33:19.271] Auc: 0.973, Avg Ranks: 24.477, MRR: 0.491, Hits@1: 0.357, Hits@5: 0.651, Hits@10: 0.733, Hits@20: 0.806, Hits@50: 0.895, Hits@100: 0.943
To train using CPUs only, use the examples/training/configs/fb15k_cpu.ini
configuration file instead.
Below is a sample python script which trains a single epoch of embeddings on fb15k.
import marius as m
from marius.tools import preprocess
def fb15k_example():
preprocess.fb15k(output_dir="output_dir/")
config_path = "examples/training/configs/fb15k_cpu.ini"
config = m.parseConfig(config_path)
train_set, eval_set = m.initializeDatasets(config)
model = m.initializeModel(config.model.encoder_model, config.model.decoder_model)
trainer = m.SynchronousTrainer(train_set, model)
evaluator = m.SynchronousEvaluator(eval_set, model)
trainer.train(1)
evaluator.evaluate(True)
if __name__ == "__main__":
fb15k_example()
Marius can be deployed within a docker container. Here is a sample ubuntu dockerfile (located at examples/docker/dockerfile
) which contains the necessary dependencies preinstalled for GPU training.
Build an image with the name marius
and the tag example
:
docker build -t marius:example -f examples/docker/dockerfile examples/docker
Create and start a new container instance named gaius
with:
docker run --name gaius -itd marius:example
Run docker ps
to verify the container is running
Start a bash session inside the container:
docker exec -it gaius bash
See examples/docker/dockerfile
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
RUN apt update
RUN apt install -y g++ \
make \
wget \
unzip \
vim \
git \
python3-pip
# install gcc-9
RUN apt install -y software-properties-common
RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test
RUN apt update
RUN apt install -y gcc-9 g++-9
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9
RUN update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 9
# install cmake 3.20
RUN wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.sh
RUN mkdir /opt/cmake
RUN sh cmake-3.20.0-linux-x86_64.sh --skip-license --prefix=/opt/cmake/
RUN ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake
# install pytorch
RUN python3 -m pip install torch==1.7.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Arxiv Version:
@misc{mohoney2021marius,
title={Marius: Learning Massive Graph Embeddings on a Single Machine},
author={Jason Mohoney and Roger Waleffe and Yiheng Xu and Theodoros Rekatsinas and Shivaram Venkataraman},
year={2021},
eprint={2101.08358},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
OSDI Version (not yet available):