Giter VIP home page Giter VIP logo

newsspiders's Introduction

NewsSpiders

get news contents and can be searched on web

针对新闻网站的一个爬虫,并将结果提供web页面的搜索 实现语言:python 所用技术:scrapy(爬虫框架) + Django(web框架) + jieba(分词模块)

安装

安装 python 模块

pip install scrapy  --  pip install -U setuptools
pip install pymongo
pip install jieba
pip install Django
pip install uwsgi
pip install mongo-connector
pip install elastic-doc-manager

安装依赖环境

  • uwsgi
  • nginx
  • mongo
# 启动
cd $MONGO_PATH
mongod -port 10001 --dbpath data/ --logpath log/mongodb.log -fork --replSet myDevReplSet &
mongod -port 10001 --dbpath data/ --logpath log/mongodb.log -fork
mongod -port 10002 --dbpath data02/  --rest --replSet myset &
mongod -port 10003 --dbpath data03/  --rest --replSet myset &
  • elasticsearch
mongo-connector -m 127.0.0.1:10002 -t 127.0.0.1:9200 -d elastic_doc_manager

启动

cd $PATH/newsSpider
scrapy crawl beiqingwang
scrapy crawl btime
scrapy crawl cnr
scrapy crawl eastmoney
scrapy crawl focus
scrapy crawl jingjiaoribao
scrapy crawl net163
scrapy crawl peoplebj
scrapy crawl people
scrapy crawl xinhuanet

uwsgi -M  -p 4 -s 0.0.0.0:9090 -d /apps/logs/uwsgi.log --socket /tmp/uwsgi.sock --chdir $PATH --wsgi-file /home/dev/javin/python/yuqing/yuQing/yuQing/wsgi.py  --enable-threads  --py-autoreload 1

参考

newsspiders's People

Contributors

fevin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.