Giter VIP home page Giter VIP logo

ccks_kg's Introduction

方法说明

该代码方法用到了开源工具Hanlp,和官方的预训练模型bert-base-chinese。

项目目录结构如下:

其中expirement_attr、expirement_er和expirement_re三个文件夹下分别是做评测过程中进行的一些相关实验,data文件夹下存放的评测数据。

1.实体抽取方法

通过Hanlp实体识别工具,抽取“人物”和“机构”两种类型的实体。

通过规则,抽取“研报“,“文章“,“风险“,“ 机构“四种类型的实体。

除了规则匹配外,还可以采用远程监督的方法,主要用于抽取研报中的实体,具体流程如下图所示:

1.使用规则和外部工具抽取一部分实体

2.将原始数据平均分成两半,一半用于训练,一半用于测试,对用于训练的一半数据使用远程监督进行标注

3.采用将远程监督方法标注的数据按4:1划分,分别作为训练和验证集,训练模型

4.使用上一步训练出的模型在测试集上进行预测,抽取出一部分实体

5.查看是否达到中止循环的条件,达到条件后中止

6.通过规则匹配的方法筛选掉一些实体,剩下的实体加入种子知识图谱,然后从第2步开始,重复上一次训练,迭代进行实体抽取

2.属性抽取方法

使用规则匹配的抽取方法

3.关系抽取方法

使用规则匹配的抽取方法

程序运行说明

需要先安装python3.7和pytorch1.3

然后需要使用以下命令安装相关依赖库:

pip install jieba
pip install hanlp
pip install pytorch_pretrained_bert

使用如下命令启动程序:

python main.py

最终结果存放在

output文件夹下,名称为answers.json

ccks_kg's People

Contributors

javastudenttwo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.