Giter VIP home page Giter VIP logo

Comments (8)

rafaelcapucho avatar rafaelcapucho commented on May 9, 2024

When commenting SCHEDULER = "scrapy_redis.scheduler.Scheduler" from settings.py the scrapy shell command back to work.

The above note about the debug "Reading URLs from redis list" don't make difference, it is still present when scrapy shell works: https://paste.ee/r/rAL1V

The question is, how your scrapy_redis.scheduler.Scheduler is not compatible to work with scrapy shell <url>.

Thank you

from scrapy-redis.

rafaelcapucho avatar rafaelcapucho commented on May 9, 2024

Scrapy 1.1.0 is finally released in pypi... would be awesome if scrapy-redis support py3 and scrapy 1.1.0

from scrapy-redis.

rafaelcapucho avatar rafaelcapucho commented on May 9, 2024

The problem is happening When the engine calls enqueue_request and that method calls self.queue.push(request).

I'm using the SpiderQueue and the push method throws error when calling self._encode_request(request):

    def push(self, request):
        """Push a request"""
        self.server.lpush(self.key, self._encode_request(request))

the method _encode_request defined at queue.Base use the scrapy request_to_dict to serialize the request:

    def _encode_request(self, request):
        """Encode a request object"""
        return pickle.dumps(request_to_dict(request, self.spider), protocol=-1)

I have added 3 prints into request_to_dict to understand the inputs:

def request_to_dict(request, spider=None):
    """Convert Request object to a dict.

    If a spider is given, it will try to find out the name of the spider method
    used in the callback and store that as the callback.
    """
    cb = request.callback
    print('callback: ', cb)
    print('request: ', request)
    print('spider: ', spider)

    if callable(cb):
        cb = _find_method(spider, cb)
    eb = request.errback
    if callable(eb):
        eb = _find_method(spider, eb)
    d = {
        'url': to_unicode(request.url),  # urls should be safe (safe_string_url)
        'callback': cb,
        'errback': eb,
        'method': request.method,
        'headers': dict(request.headers),
        'body': request.body,
        'cookies': request.cookies,
        'meta': request.meta,
        '_encoding': request._encoding,
        'priority': request.priority,
        'dont_filter': request.dont_filter,
    }
    return d

Running the shell command It return:

callback: <bound method Deferred.callback of <Deferred at 0x7f3e13b69358>> request: <GET http://www.epocacosmeticos.com.br/any-url-goes-here> spider: <EpocaCosmeticosSpider 'epocacosmeticos.com.br' at 0x7f3e13032d30>

The previous error:
ValueError: Function <bound method Deferred.callback of <Deferred at 0x7f3e13b69358>> is not a method of: <EpocaCosmeticosSpider 'epocacosmeticos.com.br' at 0x7f3e13032d30>

happens when _find_method is executed:

def _find_method(obj, func):
    if obj:
        try:
            func_self = six.get_method_self(func)
        except AttributeError:  # func has no __self__
            pass
        else:
            if func_self is obj:
                return six.get_method_function(func).__name__
    raise ValueError("Function %s is not a method of: %s" % (func, obj))

I don't know yet what is wrong

from scrapy-redis.

rmax avatar rmax commented on May 9, 2024

Sorry for the late response.Py3k support is on its way (see #53). And your issue seems related to either your spider or a middleware yielding a request with a deferred object as a callback (media middleware?).

I have created #54 to follow up this deserialization issue.

from scrapy-redis.

rmax avatar rmax commented on May 9, 2024

Latest release support Python 3.x. Besides that, it's a scrapy limitation to require callbacks to me spider methods. However, I have added a TODO to avoid serializing those requests.

from scrapy-redis.

Congee avatar Congee commented on May 9, 2024

Any updates? It's still an issue in 2018. :/

from scrapy-redis.

rmax avatar rmax commented on May 9, 2024

@Congee a workaround is to open the shell first (i.e.: scrapy shell) and then fetching the URL (fetch('https://...')).

from scrapy-redis.

Congee avatar Congee commented on May 9, 2024

@rmax Yeah, that's a workaround. Thanks anyway.

from scrapy-redis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.