Giter VIP home page Giter VIP logo

Comments (22)

vladfrangu avatar vladfrangu commented on May 26, 2024

Is there any way you could send us a minimum reproduction sample we can use to debug this further? 🙏

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Sure.

// For more information, see https://crawlee.dev/
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';
import { firefox } from 'playwright';
import { router } from './routes.js';
import cookies from './cookies.json' assert { type: "json" }
import storage from './storage.json' assert { type: "json" }



const crawler = new PlaywrightCrawler({
    preNavigationHooks: [
        async (crawlingContext, gotoOptions) => {
            const {page} = crawlingContext;
            await page.context().addCookies(cookies.map(cookie => ({...cookie, sameSite: 'None'})));

            {
            const data = storage;
            const code = items =>
                Object
                    .entries(items)
                    .forEach(([key, value]) => localStorage[key] = value);

            await page.evaluate(code, data).catch(error => console.warn(error.message));
            }

        },
    ],
    postNavigationHooks: [
        async (crawlingContext, gotoOptions) => {
            await crawlingContext.closeCookieModals();
        }
    ],
    headless: false,
    // proxyConfiguration: new ProxyConfiguration({ proxyUrls: ['...'] }),
    requestHandler: router,
    // Comment this option to scrape the full website.
    maxRequestsPerCrawl: 20,
    requestHandlerTimeoutSecs: 5 * 60,
    useSessionPool: true,
    persistCookiesPerSession: true,
    launchContext: {
        launcher: firefox,
        useChrome: true,
        useIncognitoPages: false
    },
});

await crawler.run(startUrls);

from crawlee.

vladfrangu avatar vladfrangu commented on May 26, 2024

Maybe something smaller? Or..if its easier for you, a GitHub repository we can clone? Either way we'll take a look

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Great - thanks. It's not really dependent on any specific code. The issue simply is that crawlers are using incognito contexts with cookies and other site data features disabled apparently. The question is how to change that.

from crawlee.

B4nan avatar B4nan commented on May 26, 2024

FWIW, useIncognitoPages defaults to false, I doubt your problem is about that.

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Indeed. Desperate attempt to induce some change in behavior.

from crawlee.

B4nan avatar B4nan commented on May 26, 2024

Don't you need to use page.evaluate here to actually execute the code in the browser? As that is where the localStorage object lives.

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Yes, there is page.evaluate. The problem is it's evaluating within an incognito context with access to these window props disabled

from crawlee.

B4nan avatar B4nan commented on May 26, 2024

And another suspicious thing, why would you use useChrome: true with firefox?

launcher: firefox,
useChrome: true,

from crawlee.

B4nan avatar B4nan commented on May 26, 2024

Maybe firefox opens in incognito context by default, not sure about that. Does it work when you actually use chrome?

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Have tried shuffling various different properties around so there might be some leftovers but pretty sure that has no effect here.

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Doesn't matter which browser is used. The problem seems to be at a higher level - browserPool most likely.

from crawlee.

B4nan avatar B4nan commented on May 26, 2024

Well, I am more than sure that what you say is not true with chrome, we would be well aware if we open in incognito by default, as that hurts performance badly (~50% overhead).

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Negative on chrome. Same thing across all browsers.
image

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Not sure what's wrong. The setup seems pretty standard. Caught me by surprise to find out about the above.

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Didn't see any reason for it other than some flag the library is using on startup, since it's happening with any browser. Should be quite straightforward to reproduce by visiting chrome://settings/content/cookies

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Standard Playwright project setup produced by npx crawlee create

from crawlee.

B4nan avatar B4nan commented on May 26, 2024

I dont see how settings page is connected to this, when I open settings locally in incognito mode, it opens them in the normal window.

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

Most likely, that is what's causing the problem with access to local storage as described in https://www.chromium.org/for-testers/bug-reporting-guidelines/uncaught-securityerror-failed-to-read-the-localstorage-property-from-window-access-is-denied-for-this-document

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

I don't see any other reason why all browsers should have this setting enabled unless the library is forcing that behavior, through a launch flag...

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

I can confirm that in local browser, the settings are open in non-incognito window for me as well. However, I'm not entirely sure that incognito window vs. incognito context in Playwright are the same thing and can be compared in such way.

from crawlee.

cybairfly avatar cybairfly commented on May 26, 2024

I seem to have misunderstood the third-party cookie settings page by glancing over it. This option seems to be enabled always, since it's about third-party cookies. The question then is about the possibility to change this setting in order to avoid the error. Since this no longer seems to be related to the library, I'll need to dig deeper and find out if this setting can be changed through flags.

If this setting is checked, third-party scripts cookies are disallowed and access to localStorage may result in thrown SecurityError exceptions.

from crawlee.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.