Giter VIP home page Giter VIP logo

cnki-selenium-crawler's Introduction

CNKI-selenium-crawler

配置:

本项目使用selenium模块,浏览器使用的是火狐。

1.下载geckodriver,地址https://github.com/mozilla/geckodriver/releases

2.将适配的安装包放置在火狐浏览器的安装路径、Python的Stricpts文件夹

3.将火狐的安装路径添加到电脑环境变量的用户变量的path中。

功能:

1.社科基金项目数据爬取

image

2.论文的元数据爬取

image

3.论文的参考和引证的期刊文献爬取

参考文献 image

引证文献 image

注意事项:

1.任意网络均适用,不需要购买知网。

2.可以按原始代码从社科基金项目开始直到产出论文的参考、引证文献的爬取。也可以自定义。

3.爬取速度可以调节,修改程序里的t.sleep()中的数值即可,建议1到6之间,可以采用random随机。

4.论文元数据爬取需要严格按照三个程序的顺序,即题名等、被引数等、论文地址。

5.所有结果均以excel方式保存,注意看文件路径。本项目中基金号为主键。

6.仅作学习使用。

cnki-selenium-crawler's People

Contributors

stay-leave avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.