Hello, When executing a scrapy shell</code

Sorry for the late response.Py3k support is on its way (see <a class="issue-link js-is

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Command Shell Error, Read URLs from Redis about scrapy-redis HOT 8 CLOSED

rafaelcapucho commented on May 9, 2024

Command Shell Error, Read URLs from Redis

from scrapy-redis.

Comments (8)

rafaelcapucho commented on May 9, 2024

When commenting SCHEDULER = "scrapy_redis.scheduler.Scheduler" from settings.py the scrapy shell command back to work.

The above note about the debug "Reading URLs from redis list" don't make difference, it is still present when scrapy shell works: https://paste.ee/r/rAL1V

The question is, how your scrapy_redis.scheduler.Scheduler is not compatible to work with scrapy shell <url>.

Thank you

from scrapy-redis.

rafaelcapucho commented on May 9, 2024

Scrapy 1.1.0 is finally released in pypi... would be awesome if scrapy-redis support py3 and scrapy 1.1.0

from scrapy-redis.

rafaelcapucho commented on May 9, 2024

The problem is happening When the engine calls enqueue_request and that method calls self.queue.push(request).

I'm using the SpiderQueue and the push method throws error when calling self._encode_request(request):

    def push(self, request):
        """Push a request"""
        self.server.lpush(self.key, self._encode_request(request))

the method _encode_request defined at queue.Base use the scrapy request_to_dict to serialize the request:

    def _encode_request(self, request):
        """Encode a request object"""
        return pickle.dumps(request_to_dict(request, self.spider), protocol=-1)

I have added 3 prints into request_to_dict to understand the inputs:

def request_to_dict(request, spider=None):
    """Convert Request object to a dict.

    If a spider is given, it will try to find out the name of the spider method
    used in the callback and store that as the callback.
    """
    cb = request.callback
    print('callback: ', cb)
    print('request: ', request)
    print('spider: ', spider)

    if callable(cb):
        cb = _find_method(spider, cb)
    eb = request.errback
    if callable(eb):
        eb = _find_method(spider, eb)
    d = {
        'url': to_unicode(request.url),  # urls should be safe (safe_string_url)
        'callback': cb,
        'errback': eb,
        'method': request.method,
        'headers': dict(request.headers),
        'body': request.body,
        'cookies': request.cookies,
        'meta': request.meta,
        '_encoding': request._encoding,
        'priority': request.priority,
        'dont_filter': request.dont_filter,
    }
    return d

Running the shell command It return:

callback: <bound method Deferred.callback of <Deferred at 0x7f3e13b69358>> request: <GET http://www.epocacosmeticos.com.br/any-url-goes-here> spider: <EpocaCosmeticosSpider 'epocacosmeticos.com.br' at 0x7f3e13032d30>

The previous error:
ValueError: Function <bound method Deferred.callback of <Deferred at 0x7f3e13b69358>> is not a method of: <EpocaCosmeticosSpider 'epocacosmeticos.com.br' at 0x7f3e13032d30>

happens when _find_method is executed:

def _find_method(obj, func):
    if obj:
        try:
            func_self = six.get_method_self(func)
        except AttributeError:  # func has no __self__
            pass
        else:
            if func_self is obj:
                return six.get_method_function(func).__name__
    raise ValueError("Function %s is not a method of: %s" % (func, obj))

I don't know yet what is wrong

from scrapy-redis.

rmax commented on May 9, 2024

Sorry for the late response.Py3k support is on its way (see #53). And your issue seems related to either your spider or a middleware yielding a request with a deferred object as a callback (media middleware?).

I have created #54 to follow up this deserialization issue.

from scrapy-redis.

rmax commented on May 9, 2024

Latest release support Python 3.x. Besides that, it's a scrapy limitation to require callbacks to me spider methods. However, I have added a TODO to avoid serializing those requests.

from scrapy-redis.

Congee commented on May 9, 2024

Any updates? It's still an issue in 2018. :/

from scrapy-redis.

rmax commented on May 9, 2024

@Congee a workaround is to open the shell first (i.e.: scrapy shell) and then fetching the URL (fetch('https://...')).

from scrapy-redis.

Congee commented on May 9, 2024

@rmax Yeah, that's a workaround. Thanks anyway.

from scrapy-redis.

Command Shell Error, Read URLs from Redis about scrapy-redis HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent