Giter VIP home page Giter VIP logo

Comments (11)

verginer avatar verginer commented on August 21, 2024

Updated the xpath query now it works again.
Airbnb changed their css on the landing page.

from bnb_scrapy_tutorial.

gesteves91 avatar gesteves91 commented on August 21, 2024

Hi Lucca, really nice work. Could you confirm if the script still works? Airbnb seems to have changed the css one more time, and I'm getting 0 bytes even cloning your repo.
Thanks,

from bnb_scrapy_tutorial.

verginer avatar verginer commented on August 21, 2024

I have fixed it on my end with the latest commit if you could confirm that it works also for you that would be great.

from bnb_scrapy_tutorial.

verginer avatar verginer commented on August 21, 2024

Ok I have found the issue, airbnb serves a different site now sometimes, where page= is replaced by section_offset=, but it only affects the .com domain from certain countries.

However I don't have time right now to fix it considering all the possible permutations.
If you would like to work on a possible solution that would also be great.

from bnb_scrapy_tutorial.

Michaelp86 avatar Michaelp86 commented on August 21, 2024

Hi Lucca,

It worked well few days ago but I think there is a new problem for finding the last pages.

def last_pagenumer_in_search(self, response):
        try:  # to get the last page number
            last_page_number = int(response
                                   .xpath('//ul[@class="list-unstyled"]/li[last()-1]/a/@href')
                                   .extract()[0]
                                   .split('page=')[1]
                                   )
            return last_page_number

In the script website it is still at the li[last()-1]/a/@href but there is no more ('page=').
If someone could have a look. I don't find how to fix it.
Cheers,

Michael

from bnb_scrapy_tutorial.

toxydose avatar toxydose commented on August 21, 2024

I found that the value 'page=' is changed to the value 'section_offset=' But this value is not presented on the first page, the second page has a value section_offset=1, and the 17-th page section_offset=16

from bnb_scrapy_tutorial.

verginer avatar verginer commented on August 21, 2024

This has been addressed in #4 and solved #6

from bnb_scrapy_tutorial.

aabid0193 avatar aabid0193 commented on August 21, 2024

possible to get another update, as I am getting 0 bytes now from this

from bnb_scrapy_tutorial.

aabid0193 avatar aabid0193 commented on August 21, 2024

Airbnb seems to have changed the css one more time, and I'm getting 0 bytes even cloning your repo.
This is incredible work though,
Thanks

from bnb_scrapy_tutorial.

verginer avatar verginer commented on August 21, 2024

Hi @aabid0193, thanks glad you like it. Currently to get this version of the code to work would require to rewrite it from the ground up since the data is no longer served as it was a year ago. In the README.md I have a link to an up to dare extension of this code. Wish you best of luck with scraping 😉

from bnb_scrapy_tutorial.

aabid0193 avatar aabid0193 commented on August 21, 2024

Thank you!

from bnb_scrapy_tutorial.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.