This is the last project of our course: Internet-based Information Extraction. We need to extract all employee relations from a text.
-
install anaconda3
-
create a new environment with python2.7 in anaconda
-
conda create -n re-py27 python=2.7
-
conda activate re-py27
-
-
install pyltp
pip install pyltp
(会自动编译,所以会花十来分钟)
-
download LTP 3.4 model for pyltp
把模型解压到项目文件夹下,并重命名为ltp_data -
install libsvm
linux 平台无法使用conda/pip直接安装,也无法通过wheel文件安装,windows平台可以通过wheel安装- download source code
git clone
https://github.com/cjlin1/libsvm
- compile
cd libsvm
make
cd python
make
- copy files to re-py27 environment
-
copy
- libsvm/python/commonutil.py
- libsvm/python/svm.py
- libsvm/python/svmtil.py
to /home/zhouyiyuan/anaconda3/envs/re-py27/lib/python2.7/
-
- copy libsvm/libsvm.so.2 to /home/zhouyiyuan/anaconda3/envs/re-py27/lib/
- download source code
-
download Chinese word vectors list
- choose any one txt file from the github, then make a new dir and extract the txt file there
mkdir word_list
mv \your path\download_wordlist.txt \word_list
- choose any one txt file from the github, then make a new dir and extract the txt file there
-
pyltp
用于分词/词性标注/命名实体识别 -
libsvm
svm api: 训练,预测, 读取数据
- run the following command to train and test:
bash train_and_test.sh
-
to train only:
bash train.sh
-
to test only:
bash test.sh
- train
- test
- feature extraction