Comments (8)
When commenting SCHEDULER = "scrapy_redis.scheduler.Scheduler"
from settings.py
the scrapy shell
command back to work.
The above note about the debug "Reading URLs from redis list" don't make difference, it is still present when scrapy shell
works: https://paste.ee/r/rAL1V
The question is, how your scrapy_redis.scheduler.Scheduler
is not compatible to work with scrapy shell <url>
.
Thank you
from scrapy-redis.
Scrapy 1.1.0 is finally released in pypi... would be awesome if scrapy-redis support py3 and scrapy 1.1.0
from scrapy-redis.
The problem is happening When the engine calls enqueue_request
and that method calls self.queue.push(request)
.
I'm using the SpiderQueue
and the push
method throws error when calling self._encode_request(request)
:
def push(self, request):
"""Push a request"""
self.server.lpush(self.key, self._encode_request(request))
the method _encode_request
defined at queue.Base
use the scrapy request_to_dict
to serialize the request:
def _encode_request(self, request):
"""Encode a request object"""
return pickle.dumps(request_to_dict(request, self.spider), protocol=-1)
I have added 3 prints into request_to_dict
to understand the inputs:
def request_to_dict(request, spider=None):
"""Convert Request object to a dict.
If a spider is given, it will try to find out the name of the spider method
used in the callback and store that as the callback.
"""
cb = request.callback
print('callback: ', cb)
print('request: ', request)
print('spider: ', spider)
if callable(cb):
cb = _find_method(spider, cb)
eb = request.errback
if callable(eb):
eb = _find_method(spider, eb)
d = {
'url': to_unicode(request.url), # urls should be safe (safe_string_url)
'callback': cb,
'errback': eb,
'method': request.method,
'headers': dict(request.headers),
'body': request.body,
'cookies': request.cookies,
'meta': request.meta,
'_encoding': request._encoding,
'priority': request.priority,
'dont_filter': request.dont_filter,
}
return d
Running the shell
command It return:
callback: <bound method Deferred.callback of <Deferred at 0x7f3e13b69358>> request: <GET http://www.epocacosmeticos.com.br/any-url-goes-here> spider: <EpocaCosmeticosSpider 'epocacosmeticos.com.br' at 0x7f3e13032d30>
The previous error:
ValueError: Function <bound method Deferred.callback of <Deferred at 0x7f3e13b69358>> is not a method of: <EpocaCosmeticosSpider 'epocacosmeticos.com.br' at 0x7f3e13032d30>
happens when _find_method
is executed:
def _find_method(obj, func):
if obj:
try:
func_self = six.get_method_self(func)
except AttributeError: # func has no __self__
pass
else:
if func_self is obj:
return six.get_method_function(func).__name__
raise ValueError("Function %s is not a method of: %s" % (func, obj))
I don't know yet what is wrong
from scrapy-redis.
Sorry for the late response.Py3k support is on its way (see #53). And your issue seems related to either your spider or a middleware yielding a request with a deferred object as a callback (media middleware?).
I have created #54 to follow up this deserialization issue.
from scrapy-redis.
Latest release support Python 3.x. Besides that, it's a scrapy limitation to require callbacks to me spider methods. However, I have added a TODO to avoid serializing those requests.
from scrapy-redis.
Any updates? It's still an issue in 2018. :/
from scrapy-redis.
@Congee a workaround is to open the shell first (i.e.: scrapy shell
) and then fetching the URL (fetch('https://...')
).
from scrapy-redis.
@rmax Yeah, that's a workaround. Thanks anyway.
from scrapy-redis.
Related Issues (20)
- error object has no attribute 'make_requests_from_url' HOT 2
- [spiders] remove duplicate check setting types
- why I can not see request record in redis HOT 2
- 你好,大佬,请问一个问题,我用scrapy-redis执行爬虫的时候,设置最大并发是20,但是怎么感觉像创建20个队列一样,上一个20执行完之后,下一个20再执行,大佬,您能为我解答一下吗,感激不尽 HOT 2
- How to use scrapy-redis if I'm using start_requests() instead of start_urls in my spider? HOT 3
- 日志报警:String request is deprecated
- 警告: Passing a 'spider' argument to ExecutionEngine.crawl is deprecated HOT 4
- make_request_from_data implementation in RedisMixin HOT 2
- Cleanup requirements HOT 1
- Playwright? HOT 2
- Scrapy 2.8.0 deprecated function scrapy.utils.request.request_fingerprint() warning HOT 1
- Is there a planned support for Python3.11? HOT 5
- [dev] Add Type annotations
- Add Type annotations pep-0483
- How does the CrawlSpider work?
- [Question] Fetch request url from redis fail HOT 4
- Add metadata to URLs to retrieve from Redis HOT 2
- Add Kafka Topic Integration to Scrapy Redis HOT 1
- There is a compatibility issue with the latest scrapy-redis package of the new version of scrapy HOT 12
- [QUESTION] Is there a way to use response.meta ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapy-redis.