Giter VIP home page Giter VIP logo

elf's Introduction

PyTorch Logo


PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Our trunk health (Continuous Integration signals) can be found at hud.pytorch.org.

More About PyTorch

Learn the basics of PyTorch

At a granular level, PyTorch is a library that consists of the following components:

Component Description
torch A Tensor library like NumPy, with strong GPU support
torch.autograd A tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
torch.jit A compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
torch.nn A neural networks library deeply integrated with autograd designed for maximum flexibility
torch.multiprocessing Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
torch.utils DataLoader and other utility functions for convenience

Usually, PyTorch is used either as:

  • A replacement for NumPy to use the power of GPUs.
  • A deep learning research platform that provides maximum flexibility and speed.

Elaborating Further:

A GPU-Ready Tensor Library

If you use NumPy, then you have used Tensors (a.k.a. ndarray).

Tensor illustration

PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast!

Dynamic Neural Networks: Tape-Based Autograd

PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research.

Dynamic graph

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use NumPy / SciPy / scikit-learn etc. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate.

Imperative Experiences

PyTorch is designed to be intuitive, linear in thought, and easy to use. When you execute a line of code, it gets executed. There isn't an asynchronous view of the world. When you drop into a debugger or receive error messages and stack traces, understanding them is straightforward. The stack trace points to exactly where your code was defined. We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.

Fast and Lean

PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed. At the core, its CPU and GPU Tensor and neural network backends are mature and have been tested for years.

Hence, PyTorch is quite fast — whether you run small or large neural networks.

The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives. We've written custom memory allocators for the GPU to make sure that your deep learning models are maximally memory efficient. This enables you to train bigger deep learning models than before.

Extensions Without Pain

Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward and with minimal abstractions.

You can write new neural network layers in Python using the torch API or your favorite NumPy-based libraries such as SciPy.

If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate. No wrapper code needs to be written. You can see a tutorial here and an example here.

Installation

Binaries

Commands to install binaries via Conda or pip wheels are on our website: https://pytorch.org/get-started/locally/

NVIDIA Jetson Platforms

Python wheels for NVIDIA's Jetson Nano, Jetson TX1/TX2, Jetson Xavier NX/AGX, and Jetson AGX Orin are provided here and the L4T container is published here

They require JetPack 4.2 and above, and @dusty-nv and @ptrblck are maintaining them.

From Source

Prerequisites

If you are installing from source, you will need:

  • Python 3.8 or later (for Linux, Python 3.8.1+ is needed)
  • A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required, on Linux)
  • Visual Studio or Visual Studio Build Tool on Windows

* PyTorch CI uses Visual C++ BuildTools, which come with Visual Studio Enterprise, Professional, or Community Editions. You can also install the build tools from https://visualstudio.microsoft.com/visual-cpp-build-tools/. The build tools do not come with Visual Studio Code by default.

* We highly recommend installing an Anaconda environment. You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.

An example of environment setup is shown below:

  • Linux:
$ source <CONDA_INSTALL_DIR>/bin/activate
$ conda create -y -n <CONDA_NAME>
$ conda activate <CONDA_NAME>
  • Windows:
$ source <CONDA_INSTALL_DIR>\Scripts\activate.bat
$ conda create -y -n <CONDA_NAME>
$ conda activate <CONDA_NAME>
$ call "C:\Program Files\Microsoft Visual Studio\<VERSION>\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
NVIDIA CUDA Support

If you want to compile with CUDA support, select a supported version of CUDA from our support matrix, then install the following:

Note: You could refer to the cuDNN Support Matrix for cuDNN versions with the various supported CUDA, CUDA driver and NVIDIA hardware

If you want to disable CUDA support, export the environment variable USE_CUDA=0. Other potentially useful environment variables may be found in setup.py.

If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to install PyTorch for Jetson Nano are available here

AMD ROCm Support

If you want to compile with ROCm support, install

  • AMD ROCm 4.0 and above installation
  • ROCm is currently supported only for Linux systems.

If you want to disable ROCm support, export the environment variable USE_ROCM=0. Other potentially useful environment variables may be found in setup.py.

Intel GPU Support

If you want to compile with Intel GPU support, follow these

If you want to disable Intel GPU support, export the environment variable USE_XPU=0. Other potentially useful environment variables may be found in setup.py.

Get the PyTorch Source

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

Install Dependencies

Common

conda install cmake ninja
# Run this command on native Windows
conda install rust
# Run this command from the PyTorch directory after cloning the source code using the “Get the PyTorch Source“ section below
pip install -r requirements.txt

On Linux

pip install mkl-static mkl-include
# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda121  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

# (optional) If using torch.compile with inductor/triton, install the matching version of triton
# Run from the pytorch directory after cloning
# For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
make triton

On MacOS

# Add this package on intel x86 processor machines only
pip install mkl-static mkl-include
# Add these packages if torch.distributed is needed
conda install pkg-config libuv

On Windows

pip install mkl-static mkl-include
# Add these packages if torch.distributed is needed.
# Distributed package support on Windows is a prototype feature and is subject to changes.
conda install -c conda-forge libuv=1.39

Install PyTorch

On Linux

If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:

export _GLIBCXX_USE_CXX11_ABI=1

Please note that starting from PyTorch 2.5, the PyTorch build with XPU supports both new and old C++ ABIs. Previously, XPU only supported the new C++ ABI. If you want to compile with Intel GPU support, please follow Intel GPU Support.

If you're compiling for AMD ROCm then first run this command:

# Only run this if you're compiling for ROCm
python tools/amd_build/build_amd.py

Install PyTorch

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py develop

Aside: If you are using Anaconda, you may experience an error caused by the linker:

build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

This is caused by ld from the Conda environment shadowing the system ld. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.8.1+.

On macOS

python3 setup.py develop

On Windows

If you want to build legacy python code, please refer to Building on legacy code and CUDA

CPU-only builds

In this mode PyTorch computations will run on your CPU, not your GPU

python setup.py develop

Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In order to link against iomp, you'll need to manually download the library and set up the building environment by tweaking CMAKE_INCLUDE_PATH and LIB. The instruction here is an example for setting up both MKL and Intel OpenMP. Without these configurations for CMake, Microsoft Visual C OpenMP runtime (vcomp) will be used.

CUDA based build

In this mode PyTorch computations will leverage your GPU via CUDA for faster number crunching

NVTX is needed to build Pytorch with CUDA. NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto an already installed CUDA run CUDA installation once again and check the corresponding checkbox. Make sure that CUDA with Nsight Compute is installed after Visual Studio.

Currently, VS 2017 / 2019, and Ninja are supported as the generator of CMake. If ninja.exe is detected in PATH, then Ninja will be used as the default generator, otherwise, it will use VS 2017 / 2019.
If Ninja is selected as the generator, the latest MSVC will get selected as the underlying toolchain.

Additional libraries such as Magma, oneDNN, a.k.a. MKLDNN or DNNL, and Sccache are often needed. Please refer to the installation-helper to install them.

You can refer to the build_pytorch.bat script for some other environment variables configurations

cmd

:: Set the environment variables after you have downloaded and unzipped the mkl package,
:: else CMake would throw an error as `Could NOT find OpenMP`.
set CMAKE_INCLUDE_PATH={Your directory}\mkl\include
set LIB={Your directory}\mkl\lib;%LIB%

:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2019 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.27
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,17^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

:: [Optional] If you want to override the CUDA host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe

python setup.py develop
Adjust Build Options (Optional)

You can adjust the configuration of cmake variables optionally (without building first), by doing the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done with such a step.

On Linux

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build  # or cmake-gui build

On macOS

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build  # or cmake-gui build

Docker Image

Using pre-built images

You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+

docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

Building the image yourself

NOTE: Must be built with a docker version > 18.06

The Dockerfile is supplied to build images with CUDA 11.1 support and cuDNN v8. You can pass PYTHON_VERSION=x.y make variable to specify which Python version is to be used by Miniconda, or leave it unset to use the default.

make -f docker.Makefile
# images are tagged as docker.io/${your_docker_username}/pytorch

You can also pass the CMAKE_VARS="..." environment variable to specify additional CMake variables to be passed to CMake during the build. See setup.py for the list of available variables.

make -f docker.Makefile

Building the Documentation

To build documentation in various formats, you will need Sphinx and the readthedocs theme.

cd docs/
pip install -r requirements.txt

You can then build the documentation by running make <format> from the docs/ folder. Run make to get a list of all available output formats.

If you get a katex error run npm install katex. If it persists, try npm install -g katex

Note: if you installed nodejs with a different package manager (e.g., conda) then npm will probably install a version of katex that is not compatible with your version of nodejs and doc builds will fail. A combination of versions that is known to work is [email protected] and [email protected]. To install the latter with npm you can run npm install -g [email protected]

Previous Versions

Installation instructions and binaries for previous PyTorch versions may be found on our website.

Getting Started

Three-pointers to get you started:

Resources

Communication

Releases and Contributing

Typically, PyTorch has three minor releases a year. Please let us know if you encounter a bug by filing an issue.

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

To learn more about making a contribution to Pytorch, please see our Contribution page. For more information about PyTorch releases, see Release page.

The Team

PyTorch is a community-driven project with several skillful engineers and researchers contributing to it.

PyTorch is currently maintained by Soumith Chintala, Gregory Chanan, Dmytro Dzhulgakov, Edward Yang, and Nikita Shulga with major contributions coming from hundreds of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

Note: This project is unrelated to hughperkins/pytorch with the same name. Hugh is a valuable contributor to the Torch community and has helped with many things Torch and PyTorch.

License

PyTorch has a BSD-style license, as found in the LICENSE file.

elf's People

Contributors

jma127 avatar ppwwyyxx avatar soumith avatar yuandong-tian avatar zchen0211 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elf's Issues

When will it start to train?

After 29000 games, it doesn't start to train.

=== Record Stats (0) ====
B/W/A: 14683/14317/29000 (50.631%). B #Resign: 11904 (41.0483%), W #Resign: 11770 (40.5862%), #NoResign: 5326 (18.3655%)
Dynamic resign threshold: 0.01
Move: [0, 100): 0, [100, 200): 0, [200, 300): 0, [300, up): 29000
=== End Record Stats ====

The script of server is like:

save=./myserver game=elfgames.go.game model=df_kl model_file=elfgames.go.df_model3
stdbuf -o 0 -e 0 python -u ./train.py
--mode train --batchsize 2048
--num_games 64 --keys_in_reply V
--T 1 --use_data_parallel
--num_minibatch 1000 --num_episode 1000000
--mcts_threads 8 --mcts_rollout_per_thread 100
--keep_prev_selfplay --keep_prev_selfplay
--use_mcts --use_mcts_ai2
--mcts_persistent_tree --mcts_use_prior
--mcts_virtual_loss 5 --mcts_epsilon 0.25
--mcts_alpha 0.03 --mcts_puct 0.85
--resign_thres 0.01 --gpu 0
--server_id myserver --eval_num_games 400
--eval_winrate_thres 0.55 --port 1234
--q_min_size 200 --q_max_size 4000
--save_first
--num_block 5 --dim 64
--weight_decay 0.0002 --opt_method sgd
--bn_momentum=0 --num_cooldown=50
--expected_num_client 496
--selfplay_init_num 0 --selfplay_update_num 0
--eval_num_games 0 --selfplay_async
--lr 0.01 --momentum 0.9 1>> log.log 2>&1 &

I want to know when will it start to train?

Unused parameter fails build

Hi all, I'm trying to build ELF and I got this :

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:65:24: error: unused parameter 's' [-Werror,-Wunused-parameter]
  moves_since(const S& s, size_t* next_move_number, std::vector<A>* moves) {
                       ^

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:65:35: error: unused parameter 'next_move_number' [-Werror,-Wunused-parameter]
  moves_since(const S& s, size_t* next_move_number, std::vector<A>* moves) {
                                  ^

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:65:69: error: unused parameter 'moves' [-Werror,-Wunused-parameter]
  moves_since(const S& s, size_t* next_move_number, std::vector<A>* moves) {
                                                                    ^

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:87:45: error: unused parameter 'a' [-Werror,-Wunused-parameter]
  static std::string to_string(const Actor& a) {

Thanks for the help. And good luck for the dev

runtime error

Hello, I successfully compiled the source code on Ubuntu 18.04, but
got an error message. Am I missing something?

Traceback (most recent call last):
File "df_console.py", line 12, in
from rlpytorch import Evaluator, load_env
File "/home/jehee/ELF/src_py/rlpytorch/init.py", line 8, in
from .model_loader import ModelLoader, load_env
File "/home/jehee/ELF/src_py/rlpytorch/model_loader.py", line 13, in
from elf.options import import_options, PyOptionSpec
File "/home/jehee/ELF/src_py/elf/init.py", line 8, in
from _elf import *
ImportError: dynamic module does not define module export function (PyInit__elf)

Please do a 400 game match Pheonix Go vs ELF OpenGo

I looked at the cgos chart, Cronus stopped playing in March whilst Monroe was in April. The two never played one another. So I don't believe that Cronus is stronger than the current ELF since some tests have already show that ELF seems to be stronger.

Can your team do a 400 game match between Pheonix Go vs ELF and publish the results?
Tencent who made the Cronus/Pheonix stated that in their next AI competition they prohibit ELF from playing. I think it is because they don't want to see ELF win.

Facebook has changed Go forever

Okay so I'm trying to think this out logically. Facebook releases ELF OpenGo, instantly becoming by far the strongest public open source /weights Go AI program, overnight gets adopted as the baseline by many other programs.... so going forward most programs are going to be more or less the same strength if not simply identical whitelabel altogether, so what is left? Seems to be the "engine" part is now more or less solved and even I would say undifferentiatable. superhuman on a gtx 970, is basically the end of the road.

The remaining things of innovation are GUI, analysis, high handicap, different komi, teaching tools, etc etc

And marketing/branding/mindshare/PR for the Go bots (including Leela Zero) will perhaps be more important than ever before. We gonna see a consolidation of Go AI bots and my guess is only one or two will survive this, and that is if they are lucky.

Some immediate implications is that essentially its killed commerical Go at least from the standpoint of selling engines go. We can't compare to Chess because not only does Chess have an order of magnitude larger userbase esp in the West, Chess also enjoyed a good two decades whereby classical algorithms and programming made it such that there was a healthy ecosystem of different engines completing with one another for top listings. WIth the advent of the "zero" method, all zero programs converge to the same ultimate state and its just a matter of compute. There is really nothing left to do. More or less.

This also means there is little to no more point in having Go AI engine competition and matches. We already see cgos is defunt and its benchmark is less and less useful, UEC cup ended, Zen pulled the plug and called it quits, I seriously doubt we'll see another version or edition of CrazyStone, and now with so many engines adopting the facebook weights, whats the point? I see this as portending the demise of Go AI competitions and engine vs engine games as well. Think about it, LZ had beat DolBaram in that last competition match, now DolBaram adopts ELF weights, and ELF is stronger than both Pheonix and FineArt... it doesn't take much to put two and two together and see where this is headed... Didn't Golaxy just beat Ke Jie last week? Ill bet that was the shortest triump ever. And whatever aire of exclusitivity that FineArt enjoyed prior to the facebook event has now been obliterated, top pro in China no longer need to use FineArt to get competitive advantage in training when everyone in the world on half a decent graphics card can now run the same or better. The implications are indeed far fetching.

Lets examine the distributed community based crowd computing aspect angle. It took the public six months to get LZ to top pro level from scratch and yet facebook only needed two weeks and argueably far surpassed top pro levels and went deep into superhuman arena. Not that I know it is going to happen, but there is nothing to prevent facebook from doing it again, say another couple months down the road it can sudden drop a new weight that will be the new state of the art and far surpassing anything any community effort could have hoped to come up with within that allocation of time. Who knows maybe Google will see all this and publish the AGZ weights, or maybe in another few months the second round of weights that facebook puts out will far suprass AGZ altogether! In light of the recent developments these are all realistic possibilities now! But none of these possibilities foster morale for community initatives.

I'm thankful that prior to facebook dropping ELF onto the world, that LZ already reached and imho surpassed top pro level on its last/final network 131, (I see 132 just came out hours after the Haylee game 2 and is 60% stronger!) and that LZ project was able to convert ELF weights into native LZ format so that it can be used just like any other weightfile and now its even working great in Lizzie.

I hope that Leela Zero project finds a way to position itself to best take advantage of this new and changing landscape. By far it enjoys the most mindshare in the community of Go at large right now and I hope it continues to evolve and find ways of remaining relevant and bringing value to people's lives.

What does it mean that won 200 games against LZ on "default settings"?

ELF OpenGo has been successful playing against both other open source bots and human Go players. We played and won 200 games against LeelaZero (158603eb, Apr. 25, 2018), the strongest publicly available bot, using its default settings and no pondering.

Does this mean that only 3200 visits were used? By default settings I'm assuming this is referring to LZ's own match games settings? Can we get more details on the exact specifications and hardware used for both sides for these matches? And when stated that it played and won 200 matches, is that means it won all 200 matches that it played, or that it simply won a total number of 200 matches against an unknown number of total matches played? Can you publish the sgf for these 200 games?

Can you advise as to how strong is the raw network on 1 single playout? (can you release a binary executable for Windows, Linux?)

edit: Has the raw training data of the self played games been published?

The release article mentioned one of the objectives of this openness is to help community projects such as Leela Zero, I see no better way of doing that then releasing the raw played games and allowing LZ to immediately train on them to get stronger. (assuming it was a 200:0 match and on objective parity conditions, and not like the Deepmind AlphaZero vs Stockfish chess shenanangians)

--mcts_puct tuning

How did you tune the --mcts_puct values? Is it true different values are used for generating self-play games for training vs match play?

I think self-play for training uses --mcts_puct 0.85

--mcts_puct 0.85 --mcts_rollout_per_thread 200 \

And match play uses --mcts_puct 1.50
https://github.com/pytorch/ELF/blob/a4edc96e8bf94aa1a84134431ce3758a6ade27c7/README.rst#running-a-go-bot

Edit: BTW I think this is the relevant part of the AGZ paper:

AlphaGo Zero tuned the hyper-parameter of its search by Bayesian optimisation. In AlphaZero we reuse the same hyper-parameters for all games without game-specific tuning."

It doesn't really clarify if this tuning is done for self-play only, or something more expensive involving the entire training feedback loop.

Unexpected key(s) in state_dict: "init_conv.1.num_batches_tracked"

I have successfully install PyTorch (python -c "import torch" ).
The make and make test run successfully.
I have run the path fixer (source scripts/devmode_set_pythonpath.sh)

echo $PYTHONPATH
$HOME/src/elf/src_py/:$HOME/src/elf/build/elf/:$HOME/src/elf/build/elfgames/go/

But when I try to run the gtp.sh command (after downloading the pretrained model):

Traceback (most recent call last):
  File "df_console.py", line 40, in <module>
    model = model_loader.load_model(GC.params)
  File "$HOME/src/elf/src_py/rlpytorch/model_loader.py", line 164, in load_model
    check_loaded_options=self.options.check_loaded_options)
  File "$HOME/src/elf/src_py/rlpytorch/model_base.py", line 139, in load
    self.load_state_dict(sd)
  File "$HOME/src/elf/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model_PolicyValue:
	Unexpected key(s) in state_dict: "init_conv.1.num_batches_tracked", "pi_final_conv.1.num_batches_tracked", "value_final_conv.1.num_batches_tracked", "resnet.resnet.0.conv_lower.1.num_batches_tracked", "resnet.resnet.0.conv_upper.1.num_batches_tracked", "resnet.resnet.1.conv_lower.1.num_batches_tracked", "resnet.resnet.1.conv_upper.1.num_batches_tracked", "resnet.resnet.2.conv_lower.1.num_batches_tracked", "resnet.resnet.2.conv_upper.1.num_batches_tracked", "resnet.resnet.3.conv_lower.1.num_batches_tracked", "resnet.resnet.3.conv_upper.1.num_batches_tracked", "resnet.resnet.4.conv_lower.1.num_batches_tracked", "resnet.resnet.4.conv_upper.1.num_batches_tracked", "resnet.resnet.5.conv_lower.1.num_batches_tracked", "resnet.resnet.5.conv_upper.1.num_batches_tracked", "resnet.resnet.6.conv_lower.1.num_batches_tracked", "resnet.resnet.6.conv_upper.1.num_batches_tracked", "resnet.resnet.7.conv_lower.1.num_batches_tracked", "resnet.resnet.7.conv_upper.1.num_batches_tracked", "resnet.resnet.8.conv_lower.1.num_batches_tracked", "resnet.resnet.8.conv_upper.1.num_batches_tracked", "resnet.resnet.9.conv_lower.1.num_batches_tracked", "resnet.resnet.9.conv_upper.1.num_batches_tracked", "resnet.resnet.10.conv_lower.1.num_batches_tracked", "resnet.resnet.10.conv_upper.1.num_batches_tracked", "resnet.resnet.11.conv_lower.1.num_batches_tracked", "resnet.resnet.11.conv_upper.1.num_batches_tracked", "resnet.resnet.12.conv_lower.1.num_batches_tracked", "resnet.resnet.12.conv_upper.1.num_batches_tracked", "resnet.resnet.13.conv_lower.1.num_batches_tracked", "resnet.resnet.13.conv_upper.1.num_batches_tracked", "resnet.resnet.14.conv_lower.1.num_batches_tracked", "resnet.resnet.14.conv_upper.1.num_batches_tracked", "resnet.resnet.15.conv_lower.1.num_batches_tracked", "resnet.resnet.15.conv_upper.1.num_batches_tracked", "resnet.resnet.16.conv_lower.1.num_batches_tracked", "resnet.resnet.16.conv_upper.1.num_batches_tracked", "resnet.resnet.17.conv_lower.1.num_batches_tracked", "resnet.resnet.17.conv_upper.1.num_batches_tracked", "resnet.resnet.18.conv_lower.1.num_batches_tracked", "resnet.resnet.18.conv_upper.1.num_batches_tracked", "resnet.resnet.19.conv_lower.1.num_batches_tracked", "resnet.resnet.19.conv_upper.1.num_batches_tracked".

I tried redownloading the model with not effect. I am up to date. Could it be linked to using pytorch in a virtualenv ?

ELF?

ELF is already a thing (binary format), and the acronym has nothing to do with the use.

Not trying to offend, just asking for a justification

Segmentation fault (core dumped) on Unbuntu 17.10

I compiled PyTorch and ELF go on a Ubuntu 17.10 machine, and when I am trying to launch the program I got this error:

./gtp.sh: line 18: 24480 Segmentation fault      (core dumped) game=elfgames.go.game model=df_pred model_file=elfgames.go.df_model3 python3 df_console.py --mode online --keys_in_reply V rv --use_mcts --mcts_verbose_time --mcts_use_prior --mcts_persistent_tree --load $MODEL --server_addr localhost --port 1234 --replace_prefix resnet.module,resnet --no_check_loaded_options --no_parameter_print --leaky_relu "$@"

I test PyTorch, and it works fine. So any idea where the error comes from?

ImportError: dynamic module does not define module export function (PyInit__elf)

make test
(cd build/elf && GTEST_COLOR=1 ctest --output-on-failure)
Test project /mnt/ken-volume/ai/ELF/build/elf
Start 1: test_cpp_elf_options_OptionMapTest
1/2 Test #1: test_cpp_elf_options_OptionMapTest .... Passed 0.01 sec
Start 2: test_cpp_elf_options_OptionSpecTest
2/2 Test #2: test_cpp_elf_options_OptionSpecTest ... Passed 0.00 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) = 0.02 sec
(cd build/elfgames/go && GTEST_COLOR=1 ctest --output-on-failure)
Test project /mnt/ken-volume/ai/ELF/build/elfgames/go
Start 1: test_cpp_elfgames_go_base_coord_test
1/6 Test #1: test_cpp_elfgames_go_base_coord_test ........... Passed 0.01 sec
Start 2: test_cpp_elfgames_go_base_go_test
2/6 Test #2: test_cpp_elfgames_go_base_go_test .............. Passed 0.00 sec
Start 3: test_cpp_elfgames_go_base_board_feature_test
3/6 Test #3: test_cpp_elfgames_go_base_board_feature_test ... Passed 0.00 sec
Start 4: test_cpp_elfgames_go_base_symmetry_test
4/6 Test #4: test_cpp_elfgames_go_base_symmetry_test ........ Passed 0.01 sec
Start 5: test_cpp_elfgames_go_sgf_sgf_test
5/6 Test #5: test_cpp_elfgames_go_sgf_sgf_test .............. Passed 0.01 sec
Start 6: test_cpp_elfgames_go_mcts_mcts_test
6/6 Test #6: test_cpp_elfgames_go_mcts_mcts_test ............ Passed 0.01 sec

100% tests passed, 0 tests failed out of 6

Total Test time (real) = 0.05 sec
ken@ken-server1:/XXX/ELF$source scripts/devmode_set_pythonpath.sh
ken@ken-server1:/XXX/ELF$ cd scripts/elfgames/go/
ken@ken-server1:/XXX/ELF/scripts/elfgames/go$ ./gtp.sh /mnt/ken-volume/ai/ELF/pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1
Traceback (most recent call last):
File "df_console.py", line 12, in
from rlpytorch import Evaluator, load_env
File "/mnt/ken-volume/ai/ELF/src_py/rlpytorch/init.py", line 8, in
from .model_loader import ModelLoader, load_env
File "/mnt/ken-volume/ai/ELF/src_py/rlpytorch/model_loader.py", line 13, in
from elf.options import import_options, PyOptionSpec
File "/mnt/ken-volume/ai/ELF/src_py/elf/init.py", line 11, in
from .context_utils import ContextArgs
File "/mnt/ken-volume/ai/ELF/src_py/elf/context_utils.py", line 7, in
from elf.options import auto_import_options, PyOptionSpec
File "/mnt/ken-volume/ai/ELF/src_py/elf/options/init.py", line 8, in
from .py_option_map import PyOptionMap
File "/mnt/ken-volume/ai/ELF/src_py/elf/options/py_option_map.py", line 10, in
from _elf import _options
ImportError: dynamic module does not define module export function (PyInit__elf)

cannot make a move with GeForce GTX 650

Trying to run ./gtp.sh ./v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 512 --resign_thres 0.05 --mcts_virtual_loss 1
under supported environment I try to play with command:
genmove B
and get this error:
... /root/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py:116: UserWarning: Found GPU0 GeForce GTX 650 which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. ... THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch-nightly_1525389156111/work/aten/src/THCUNN/generic/Threshold.cu line=34 error=48 : no kernel image is available for execution on the device Traceback (most recent call last): File "df_console.py", line 78, in <module> GC.run() File "/root/ELF/src_py/elf/utils_elf.py", line 435, in run self._call(smem, *args, **kwargs) File "/root/ELF/src_py/elf/utils_elf.py", line 398, in _call reply = self._cb[idx](picked, *args, **kwargs) File "df_console.py", line 60, in actor return console.actor(batch) File "/root/ELF/scripts/elfgames/go/console_lib.py", line 302, in actor reply = self.evaluator.actor(batch) File "/root/ELF/src_py/rlpytorch/trainer/trainer.py", line 97, in actor state_curr = m.forward(batch) File "/root/ELF/src_py/elfgames/go/df_model3.py", line 274, in forward s = self.init_conv(s) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 46, in forward return F.threshold(input, self.threshold, self.value, self.inplace) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 603, in threshold return torch._C._nn.threshold(input, threshold, value) RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /opt/conda/conda-bld/pytorch-nightly_1525389156111/work/aten/src/THCUNN/generic/Threshold.cu:34

There is no way to run ELF with CUDA capability 3.0? Which compute capability is enough - 3.5, 5.2, 6.1, 7.0? It would be nice to mention this requirements in prerequisites.

Something may be lost in CMakelists.txt

When I cd to the project root and run maketo build, an error occurs and CMakeError.log is as follows,

CMakeFiles/cmTC_d0017.dir/CheckSymbolExists.c.o: In function `main':
CheckSymbolExists.c:(.text+0x16): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status

Then I add -lpthread in set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -lpthread -Werror -Wextra -Wno-register -fPIC -march=native") to fix it.

The only difference is my ubuntu 16.04 and cudnn 7.1.

AttributeError: Can't get attribute '_rebuild_tensor_v2'

This error happens when trying to read in the weights file using an older version of Pytorch. I assume this is why you say Pytorch needs to be built from source. However I've tried that all evening and I can't find a way to navigate all of the nvcc/gcc/cuda incompatibilities to get it to compile. Many errors, all of which are common when I google, with lots of workarounds, but all of them only partially work. Fundamentally it seems like some sort of std::tuple issue with CUDA/nvcc which Nvidia acknowledges but say they won't fix until the next CUDA release.

Is there any chance you could cave your weight file out into an older PyTorch format? Then I could just install python-pytorch-cuda-0.3.1-2 for my version of linux and be up and running in moments. ELF itself compiled fine, and it runs with python-pytorch-cuda-0.3.1-2... it just can't read the weights file. pytorch/pytorch#5729 states it's because of the newer file format.

Thanks!

How to read the source code

emmmm. Actually, I wanna know where can I start to read the source code. Can someone give me some intuition or a guide?

ImportError: dynamic module does not define module export function (PyInit__elf)

EDIT: Spoke quickly, I ran make clean && make and this issue was gone.

I have successfully install PyTorch (python -c "import torch" ).
The make and make test run successfully.
I have run the path fixer (source scripts/devmode_set_pythonpath.sh)

echo $PYTHONPATH
$HOME/src/elf/src_py/:$HOME/src/elf/build/elf/:$HOME/src/elf/build/elfgames/go/

But when I try to run the gtp.sh command (after downloading the pretrained model):

./gtp.sh ~/src/elf/pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

Traceback (most recent call last):
  File "df_console.py", line 12, in <module>
    from rlpytorch import Evaluator, load_env
  File "$HOME/src/elf/src_py/rlpytorch/__init__.py", line 8, in <module>
    from .model_loader import ModelLoader, load_env
  File "$HOME/src/elf/src_py/rlpytorch/model_loader.py", line 13, in <module>
    from elf.options import import_options, PyOptionSpec
  File "$HOME/src/elf/src_py/elf/__init__.py", line 8, in <module>
    from _elf import *
ImportError: dynamic module does not define module export function (PyInit__elf)

Any ideas of what might be the cause ?

Training failed with AtrributeError

Hi, i have deployed the elf go in my 2 gpu machine, and try to train a Go bot(start server in one gpu and start client in another), the client successfully did the selfplay and send the record to server, but the server failed to train with the following error:

Traceback (most recent call last): File "./train.py", line 131, in <module> runner.run() File "/root/maxim/ELF/src_py/rlpytorch/runner/single_process.py", line 113, in run self.GC.printSummary() File "/root/maxim/ELF/src_py/elf/utils_elf.py", line 463, in printSummary self.GC.printSummary() AttributeError: '_elfgames_go.GameContext' object has no attribute 'printSummary'

when i investigate the code, i found the GCWrapper defined in utils_elf.py finally called the GameContext in c++ code, but there is no method whose name is printSummary in src_cpp.

i hope the developers can check if this is a bug or some other reason? thank you very much

Hangs when run Go bot and "genmove b"

Linux: 16.04
Python: 3.6
GCC: 7.3
GPU: No
pytorch: pytorch-nightly
conda: anaconda

I compile an executable, and run the Go bot.

./gtp.sh ./pretrained-go-19x19-v0.bin --verbose --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

I seems like successful.
And I enter "list_commands", it returns value correctly.

clear_board
exit
final_score
genmove
komi
list_commands
name
play
protocol_version
quit
showboard
version

But after I enter "genmove b", the go bot hangs, no response.
What's wrong!? What can I do?

Module Not Found Error: No module named '_elf'

When I run the gtp.sh script I get the following error:

./gtp.sh ../../../pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1
Traceback (most recent call last):
  File "df_console.py", line 12, in <module>
    from rlpytorch import Evaluator, load_env
  File "/home/farhad/drive/research/ELF/src_py/rlpytorch/__init__.py", line 8, in <module>
    from .model_loader import ModelLoader, load_env
  File "/home/farhad/drive/research/ELF/src_py/rlpytorch/model_loader.py", line 13, in <module>
    from elf.options import import_options, PyOptionSpec
  File "/home/farhad/drive/research/ELF/src_py/elf/__init__.py", line 11, in <module>
    from .context_utils import ContextArgs
  File "/home/farhad/drive/research/ELF/src_py/elf/context_utils.py", line 7, in <module>
    from elf.options import auto_import_options, PyOptionSpec
  File "/home/farhad/drive/research/ELF/src_py/elf/options/__init__.py", line 8, in <module>
    from .py_option_map import PyOptionMap
  File "/home/farhad/drive/research/ELF/src_py/elf/options/py_option_map.py", line 10, in <module>
    from _elf import _options
ModuleNotFoundError: No module named '_elf'

Some questions about training a bot

I change the network to 64*5 and start 2 clients . One client is running start_server.sh and start_client.sh simultaneously, and another one is running start_client.sh. From the log file I find that there are over 4000 selfplay games after running 40 hours. However, .I cannot find anything about training information. I want to know when will the training start?

ModuleNotFoundError: No module named 'rlpytorch'

I installed the latest version of Miniconda for python 3.6. Then I followed all steps in the readme including installing pytorch from source. I'm running Ubuntu 18.04, fresh install. I got the following error when I tried to run the program with the following command:

./gtp.sh network.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

Traceback (most recent call last):
  File "df_console.py", line 12, in <module>
    from rlpytorch import Evaluator, load_env
ModuleNotFoundError: No module named 'rlpytorch'

Possibility of a 40B ELF Go?

Firstly thanks to ELF Go team for your great work.

Any chance, in the future, of continueing ELF Go upto maybe 40 blocks?

(considering how successful and popular it has been so far.)

Successfully Installed and played, but can not play with GoGui

Thank the facebook go team, and now any one can play GO with a top player.

I have successfully installed ELF and played with the pretrained network, strictly following the instructions here. But I have some further questions:

  1. CuDnn 7.0 is required in the building instruction, but the prebuilt pytorch-nightly was built upone CuDnn 7.1, why?

  2. The Bot played well in the bash command shell, but when I use GoGui 1.4.9 as the graphic interface, following messages appeared:
    "Text lines before the status character of the first reponse line are not allowed by the GTP standard"
    "The Go program is not responding to the command 'name'."
    and the program was stuck there keeping prompt these error messages.
    any one can give help me?
    Does the program's debug information cause this problem?

thanks a lot

which config used to defeat leela zero?

Hi, i have successfully ran the opengo bot by the following command which use 4096*2=8192 rollouts:

./gtp.sh ./pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 4096 --resign_thres 0.05 --mcts_virtual_loss 1

On the other hand, i also started a leela bot with model 158603eb with the following command which also use 8192 rollouts:
src/leelaz -p 8192 -v 0 -r 5 --timemanage off -t 1 -d --noponder --gpu 1 --weights 158603eb61a1e5e9dcd1aee157d813063292ae68fbc8fcd24502ae7daf4d7948 --gtp

I put all the 2 bots on my private cgos and found opengo cannot defeat leela zero in recent 3 games, which is so weird for me because the author of leela zero claimed that leela-elf(leela bot which use elf weights) beated leela-zero by 167:18

leelaz-18e6 v leelaz-elf (185/1000 games)
board size: 19   komi: 7.5
              wins              black         white       avg cpu
leelaz-18e6     18  9.73%       9   9.68%     9   9.78%     81.09
leelaz-elf     167 90.27%       83 90.22%     84 90.32%    127.66

So i am curious if my config of opengo is wrong? how to set the config so as to defeat leela zero?

Strange move of Open Go

I compile the Open Go, and use the weight provided in this repo. But when I test the strength of the program, it plays some strange moves, I am not sure if this is his style, but I think it is not the optimal move in such situation.
BTW, I test it with 6400 playouts.
qq 20180503235412

Answer some questions about batchsize.

Recently we have seen a lot of questions about ELF OpenGo in many forums. Here I try to answer some of them here.

Chinese version here

First, we sincerely thank LeelaZero team to convert our pre-trained v0 model to LeelaZero-compatible format, so that the Go community can verify its strength immediately via LeelaZero, by interactively playing with it. This shows that our experiments are reproducible and could truly benefit the community. We are truly happy with it.

One issue that LeelaZero team found is that OpenGo-v0 might not perform that well when the number of rollouts is small (e.g., 800 or 1600). This is because we use batching in MCTS: the network only receives a batch of rollouts (e.g., 8 or 16) before feed-forwarding. This substantially improves GPU efficiency (in M40 it is like 5.5s -> 1.2s per 1600 rollout), at the price of weakening the strength of the bot, in particular when the number of rollouts is small. This is because MCTS is intrinsically a sequential algorithm, and to maximize its strength, each rollout should be played after all the previous rollouts have been played and the Q values in each node have been updated. On the other hand, batching introduces parallel evaluation and reduces the effective number or total rollouts.

The solution is obviously simple: to reduce the batchsize when the number of rollouts is small. We suggest using batchsize=4 when total number of rollouts are 800 or 1600, which could make the thinking time longer. The default setting batchsize=16 is good only when the total number of rollouts are large (e.g., 80k). Note that larger batchsize might not help. The batchsize can be modified by switches --mcts_rollout_per_batch and --batchsize. Currently please just specify the same number for both switches (this is research code, so you know it).

image

Some people might wonder in our setting what happens for self-play. Indeed, there seems to be a dilemma if we only use 1.6k rollouts for self-play: small batchsize leads to GPU inefficiency, while large batchsize weakens the move. We solve it with ELF-specific design. For a selfplay process we spawn 32 concurrent games and a maximal batchsize of 128. Each concurrent game runs its own MCTS without any batching. When the rollout reaches the leaf, it sends the current game situations to ELF, and ELF dynamically batches game situations from multiple games together and hands the batch to PyTorch for network forwarding. This makes the batchsize a variable. During selfplay, the average batch size is around 90, which is good for overall GPU utility.

time_settings and time_left support; and playout limit?

i test the version, and only support the flow gtp commands:
`list_commands

= boardsize
clear_board
exit
final_score
genmove
komi
list_commands
name
play
protocol_version
quit
showboard
version`

is any plan add time control commands, or playout limt parameter?
thanks

can you do shogi?

how about shogi next?

We need a OpenChess
OponShogi

and there is a chinese game called five star chess or something like that I can't recall

thx

hung after running a go bot

hi, i have built the elf and sucessfully run the following command:

./gtp.sh ./pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

but i found the process hung forever with the output:

[2018-05-03 15:28:29.535] [rlpytorch.model_loader.load_env] [info] Loading env
[2018-05-03 15:28:29.550] [rlpytorch.model_loader.load_env] [info] Parsed options: {'T': 1,
 'actor_only': False,
 'adam_eps': 0.001,
 'additional_labels': ['aug_code', 'move_idx'],
...
 'white_puct': -1.0,
 'white_use_policy_network_only': False}
[2018-05-03 15:28:29.551] [rlpytorch.model_loader.load_env] [info] Finished loading env
Wait all games[1] to register their mailbox

what's wrong with it? how can i play with the bot by gtp command?

Multi-GPU for training on the server side?

Thanks for releasing Open Go! I was just wondering if the server could support training of a model with multiple GPUs. It appears from start_server.sh that 8 threads are supported but there is only one gpu specified in the command line options.

Chess version?

@yuandong-tian @jma127 can you please tell us if you are thinking about adapting ELF to a chess version (ELF OpenChess?).

As you probably know there is already an adaptation of LeelaZero for chess named "LeelaChess Zero (LCZero)": https://github.com/glinscott/leela-chess (http://lczero.org). They are doing great advances (https://docs.google.com/spreadsheets/d/1zcXqNzLNBT8RjTHO_AppL6WN0j8TGmOIh6osLPmaB6E/edit#gid=0) but they probably have no enough machine power to reach SF until three months (or more). It would be great if you could reach fastly (one or two weeks) the milestone DeepMind got defeating StockFish.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.