Giter VIP home page Giter VIP logo

bond_prospectus's Introduction

募集书文本与债券发行利差 | 金融科技研究论文

项目文件结构

.
├── README.md
├── STATA
│   ├── Code
│   ├── Data
│   ├── Full_0521.dta
│   ├── Graph_Density.png
│   ├── Main.do
│   ├── README.md
│   ├── Tables_0518
│   └── Tables_0521
├── api
│   ├── bundle_merge.py
│   ├── factor_construct.py
│   ├── factor_construct_heter.py
│   ├── pdf2txt.py
│   └── pdf_access.py
├── config
│   ├── diydict.txt
│   └── stopwords.txt
├── pipeline
│   ├── target_fvalue
│   ├── target_fvalue(4425,\ 27).csv
│   ├── target_fvalue.csv
│   ├── target_fvector
│   └── text_splitted
└── src
    ├── all_pdf
    ├── all_text
    ├── parse_target_all.csv
    ├── urllib
    ├── urllib_combined.csv
    ├── wind_info_all.csv
    └── wind_info_gre.csv
  • README.md 说明

  • 文件夹STATA: 描述性统计和回归的Stata代码

  • 文件夹api: 文本获取和处理的Python代码

  • 文件夹config: 外部配置文件

    • diydict.txt 用户自定义词典
    • stopwords.txt 用户自定义停用词词典
  • 文件夹pipeline: 存放计算过程数据

    • 文件夹target_fvalue 分段处理的文本指标
    • target_fvalue(4425,\ 27).csv 最终的样本文本指标面板
    • target_fvalue.csv 分段处理后文本指标合并
    • 文件夹target_fvector 分段处理的文本词频向量
    • 文件夹text_splitted 分词、停用、分句后的文本,可用作未来研究
  • 文件夹src: 存放外部添加的数据

    • 文件夹all_pdf 下载的原始pdf文件
    • 文件夹all_text 原始pdf读取获得的txt文件
    • parse_target_all.csv 筛选后的样本
    • urllib_combined.csv 从爬虫结果合并的pdf网址记录
    • 文件夹urllib 由“八抓鱼采集器”爬取的募集说明书获取地址
    • wind_info_all.csv Wind上16-21年发行的所有公司债和企业债
    • wind_info_gre.csv Wind上16-21年发行的所有绿色债券

Python部分运行流程

  1. 运行代码前,修改开头的_PATH为项目根目录
  2. pdf_access.py 下载PDF文件,来源于src/urllib/*.xlsx
  3. pdf2txt.py 上一步获取的PDF文件依次读取为TXT存储
  4. factor_construct.py 逐个分析TXT文档,计算词频指标和词频向量
  5. bundle_merge.py 合并上一步分块存储的的计算结果
  6. factor_construct_heter.py 由合并后的词频向量计算文本异质性指标

bond_prospectus's People

Contributors

wins-m avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

eliminatedby

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.