lbl: the original version
hlbl: a hierachical version with huffman tree
lbl_mp: lbl with multiprocessing and cythonised training
setup: used to compile the extension module
- Clone the repository
git clone https://github.com/aanodin/Log-bilinear-language-models
- Install Python 2.7 and dependencies
sudo aptitude install libatlas-base-dev gfortran python python-dev build-essential g++
- Install Python modules
sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
sudo /sbin/mkswap /var/swap.1
sudo /sbin/swapon /var/swap.1
sudo pip install numpy
sudo pip install scipy
sudo pip install cython
sudo pip install argparse
sudo swapoff /var/swap.1
sudo rm /var/swap.1
- Installing the tool from repository
cd Log-bilinear-language-models
python setup.py install
- Train the model:
python main.py --train input.txt --save-net network.hdf5
- Evaluate other (or the same :)) file:
python main.py --ppl input.txt --net network.hdf5
- Evaluate using ARPA language model:
python main.py --ppl input.txt --net network.hdf5 --arpa ClarinLM.lm.1 0.2
- Evaluate using ARPA language model and save new model:
python main.py --ppl input.txt --net network.hdf5 --arpa ClarinLM.lm.1 0.2 --save-lm newClarinLM.lm.1