Giter VIP home page Giter VIP logo

Comments (8)

eivaltsec avatar eivaltsec commented on August 31, 2024

image
好像是两处错误 环境 ubantu14 python3.4

from livetv_mining.

taogeT avatar taogeT commented on August 31, 2024

@ROCHOU `配置文件发一下,不要发截图,用 insert code 包起来。

看起来就是Sqlalchemy和数据库连接有问题,表正常建好了吗?相关的requirement包都安装了吗?pip freeze 能看到用了什么包?

先从sqlalchemy入手。

from livetv_mining.

eivaltsec avatar eivaltsec commented on August 31, 2024

pip freeze

  • Flask==0.11.1
  • Flask-Bootstrap==3.3.7.0
  • Flask-Celery-py3==0.2.4
  • Flask-Cors==3.0.2
  • Flask-Login==0.3.2
  • Flask-Migrate==2.0.0
  • Flask-OAuthlib==0.9.3
  • Flask-RESTful==0.3.5
  • Flask-SQLAlchemy==2.1
  • Flask-Script==2.0.5
  • Flask-Vue==0.3.4
  • Flask-WTF==0.13.1
  • Jinja2==2.8
  • Mako==1.0.4
  • Markdown==2.6.7
  • MarkupSafe==0.23
  • PyDispatcher==2.0.5
  • SQLAlchemy==1.1.2
  • Scrapy==1.2.1
  • Twisted==16.4.1
  • WTForms==2.1
  • Werkzeug==0.11.11
  • alembic==0.8.8
  • amqp==1.4.9
  • aniso8601==1.2.0
  • anyjson==0.3.3
  • appdirs==1.4.3
  • attrs==16.2.0
  • billiard==3.3.0.23
  • celery==4.0.2
  • cffi==1.8.3
  • chardet==2.2.1
  • click==6.6
  • colorama==0.2.5
  • command-not-found==0.3
  • coverage==4.2
  • cryptography==1.5.2
  • cssselect==1.0.0
  • dominate==2.2.1
  • html5lib==0.999
  • idna==2.1
  • itsdangerous==0.24
  • kombu==3.0.37
  • language-selector==0.1
  • lxml==3.6.4
  • mysqlclient==1.3.10
  • oauthlib==2.0.0
  • packaging==16.8
  • parsel==1.0.3
  • pyOpenSSL==16.2.0
  • pyasn1==0.1.9
  • pyasn1-modules==0.0.8
  • pycparser==2.16
  • pycurl==7.19.3
  • pygobject==3.12.0
  • pyparsing==2.2.0
  • python-apt==0.9.3.5
  • python-dateutil==2.5.3
  • python-editor==1.0.1
  • pytz==2016.7
  • queuelib==1.4.2
  • redis==2.10.5
  • requests==2.2.1
  • requests-oauthlib==0.8.0
  • scrapy-redis==0.6.3
  • service-identity==16.0.0
  • six==1.10.0
  • ufw===0.34-rc-0ubuntu2
  • unattended-upgrades==0.1
  • urllib3==1.7.1
  • visitor==0.1.3
  • w3lib==1.15.0
  • wheel==0.24.0
  • zope.interface==4.3.2

settings.py

# -*- coding: utf-8 -*-
from urllib.parse import quote

# Scrapy settings for gather project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     http://doc.scrapy.org/en/latest/topics/settings.html
#     http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
#     http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'gather'

SPIDER_MODULES = ['gather.spiders']
NEWSPIDER_MODULE = 'gather.spiders'

LOG_LEVEL = 'INFO'
REACTOR_THREADPOOL_MAXSIZE = 50
CLOSESPIDER_TIMEOUT = 1000

# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:16.0) Gecko/20121026 Firefox/16.0'
USER_AGENT_FILE = 'ua.txt'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See http://scrapy.readthedocs.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY = 0.35
# The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 8
CONCURRENT_REQUESTS_PER_IP = 8

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, sdch',
    'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.4,ja;q=0.2',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': 1
}

# Enable or disable spider middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'gather.middlewares.MyCustomSpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'gather.middlewares.RandomUserAgentMiddleware': 500
}

DOWNLOADER_CLIENTCONTEXTFACTORY = 'scrapy.core.downloader.contextfactory.BrowserLikeContextFactory'

# Enable or disable extensions
# See http://scrapy.readthedocs.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
    'gather.pipelines.SqlalchemyPipeline': 300,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See http://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

# Database URI
SQLALCHEMY_DATABASE_URI = 'mysql://root:[email protected]/bdm255611631_db'

# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"

# Ensure all spiders share same duplicates filter through redis.
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"

# Default requests serializer is pickle, but it can be changed to any module
# with loads and dumps functions. Note that pickle is not compatible between
# python versions.
# Caveat: In python 3.x, the serializer must return strings keys and support
# bytes as values. Because of this reason the json or msgpack module will not
# work by default. In python 2.x there is no such issue and you can use
# 'json' or 'msgpack' as serializers.
#SCHEDULER_SERIALIZER = "scrapy_redis.picklecompat"

# Don't cleanup redis queues, allows to pause/resume crawls.
#SCHEDULER_PERSIST = True

# Schedule requests using a priority queue. (default)
#SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderPriorityQueue'

# Schedule requests using a queue (FIFO).
#SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'

# Max idle time to prevent the spider from being closed when distributed crawling.
# This only works if queue class is SpiderQueue or SpiderStack,
# and may also block the same time when your spider start at the first time (because the queue is empty).
#SCHEDULER_IDLE_BEFORE_CLOSE = 10

# Store scraped item in redis for post-processing.
#ITEM_PIPELINES = {
#    'scrapy_redis.pipelines.RedisPipeline': 300
#}

# The item pipeline serializes and stores the items in this redis key.
#REDIS_ITEMS_KEY = '%(spider)s:items'

# The items serializer is by default ScrapyJSONEncoder. You can use any
# importable path to a callable object.
#REDIS_ITEMS_SERIALIZER = 'json.dumps'

# Specify the host and port to use when connecting to Redis (optional).
#REDIS_HOST = '127.0.0.1'
#REDIS_PORT = 6379

# Specify the full Redis URL for connecting (optional).
# If set, this takes precedence over the REDIS_HOST and REDIS_PORT settings.
#REDIS_URL = 'redis://user:pass@hostname:9001'
REDIS_URL = 'redis://:[email protected]:6379'
# Custom redis client parameters (i.e.: socket timeout, etc.)
#REDIS_PARAMS  = {}
# Use custom redis client class.
#REDIS_PARAMS['redis_cls'] = 'myproject.RedisClient'

# If True, it uses redis' ``spop`` operation. This could be useful if you
# want to avoid duplicates in your start urls list. In this cases, urls must
# be added via ``sadd`` command or you will get a type error from redis.
#REDIS_START_URLS_AS_SET = False

# How many start urls to fetch at once.
#REDIS_START_URLS_BATCH_SIZE = 16

# Default start urls key for RedisSpider and RedisCrawlSpider.
#REDIS_START_URLS_KEY = '%(name)s:start_urls'

表截图吧~~!
image

from livetv_mining.

taogeT avatar taogeT commented on August 31, 2024

@ROCHOU DBAPI 没装

from livetv_mining.

eivaltsec avatar eivaltsec commented on August 31, 2024

多谢,我先检查一下 把包补齐了

from livetv_mining.

eivaltsec avatar eivaltsec commented on August 31, 2024

我还是想问一下 你的python3安装的哪个mysql的第三方操作库?sf上有人说python3不支持mysql-python,用的是mysqlclient或者是PyMySQL

from livetv_mining.

taogeT avatar taogeT commented on August 31, 2024

@ROCHOU 我是用postgresql所以mysql的不是特别了解,我觉得你可以看下sqlalchemy的官方文档推荐,然后到各个库的官网上查一下应该就全了。

from livetv_mining.

eivaltsec avatar eivaltsec commented on August 31, 2024

多谢楼主

from livetv_mining.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.