Replies: 22 comments
-
Is there any way you could send us a minimum reproduction sample we can use to debug this further? 🙏 |
Beta Was this translation helpful? Give feedback.
-
Sure. // For more information, see https://crawlee.dev/
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';
import { firefox } from 'playwright';
import { router } from './routes.js';
import cookies from './cookies.json' assert { type: "json" }
import storage from './storage.json' assert { type: "json" }
const crawler = new PlaywrightCrawler({
preNavigationHooks: [
async (crawlingContext, gotoOptions) => {
const {page} = crawlingContext;
await page.context().addCookies(cookies.map(cookie => ({...cookie, sameSite: 'None'})));
{
const data = storage;
const code = items =>
Object
.entries(items)
.forEach(([key, value]) => localStorage[key] = value);
await page.evaluate(code, data).catch(error => console.warn(error.message));
}
},
],
postNavigationHooks: [
async (crawlingContext, gotoOptions) => {
await crawlingContext.closeCookieModals();
}
],
headless: false,
// proxyConfiguration: new ProxyConfiguration({ proxyUrls: ['...'] }),
requestHandler: router,
// Comment this option to scrape the full website.
maxRequestsPerCrawl: 20,
requestHandlerTimeoutSecs: 5 * 60,
useSessionPool: true,
persistCookiesPerSession: true,
launchContext: {
launcher: firefox,
useChrome: true,
useIncognitoPages: false
},
});
await crawler.run(startUrls); |
Beta Was this translation helpful? Give feedback.
-
Maybe something smaller? Or..if its easier for you, a GitHub repository we can clone? Either way we'll take a look |
Beta Was this translation helpful? Give feedback.
-
Great - thanks. It's not really dependent on any specific code. The issue simply is that crawlers are using incognito contexts with cookies and other site data features disabled apparently. The question is how to change that. |
Beta Was this translation helpful? Give feedback.
-
FWIW, |
Beta Was this translation helpful? Give feedback.
-
Indeed. Desperate attempt to induce some change in behavior. |
Beta Was this translation helpful? Give feedback.
-
Don't you need to use |
Beta Was this translation helpful? Give feedback.
-
Yes, there is |
Beta Was this translation helpful? Give feedback.
-
And another suspicious thing, why would you use
|
Beta Was this translation helpful? Give feedback.
-
Maybe firefox opens in incognito context by default, not sure about that. Does it work when you actually use chrome? |
Beta Was this translation helpful? Give feedback.
-
Have tried shuffling various different properties around so there might be some leftovers but pretty sure that has no effect here. |
Beta Was this translation helpful? Give feedback.
-
Doesn't matter which browser is used. The problem seems to be at a higher level - |
Beta Was this translation helpful? Give feedback.
-
Well, I am more than sure that what you say is not true with chrome, we would be well aware if we open in incognito by default, as that hurts performance badly (~50% overhead). |
Beta Was this translation helpful? Give feedback.
-
Not sure what's wrong. The setup seems pretty standard. Caught me by surprise to find out about the above. |
Beta Was this translation helpful? Give feedback.
-
Didn't see any reason for it other than some flag the library is using on startup, since it's happening with any browser. Should be quite straightforward to reproduce by visiting |
Beta Was this translation helpful? Give feedback.
-
Standard Playwright project setup produced by |
Beta Was this translation helpful? Give feedback.
-
I dont see how settings page is connected to this, when I open settings locally in incognito mode, it opens them in the normal window. |
Beta Was this translation helpful? Give feedback.
-
Most likely, that is what's causing the problem with access to local storage as described in https://www.chromium.org/for-testers/bug-reporting-guidelines/uncaught-securityerror-failed-to-read-the-localstorage-property-from-window-access-is-denied-for-this-document |
Beta Was this translation helpful? Give feedback.
-
I don't see any other reason why all browsers should have this setting enabled unless the library is forcing that behavior, through a launch flag... |
Beta Was this translation helpful? Give feedback.
-
I can confirm that in local browser, the settings are open in non-incognito window for me as well. However, I'm not entirely sure that incognito window vs. incognito context in Playwright are the same thing and can be compared in such way. |
Beta Was this translation helpful? Give feedback.
-
I seem to have misunderstood the third-party cookie settings page by glancing over it. This option seems to be enabled always, since it's about third-party cookies. The question then is about the possibility to change this setting in order to avoid the error. Since this no longer seems to be related to the library, I'll need to dig deeper and find out if this setting can be changed through flags.
|
Beta Was this translation helpful? Give feedback.
-
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/browser (BrowserCrawler)
Issue description
Injection of
localStorage
produces error:DOMException: Failed to read the 'localStorage' property from 'Window': Access is denied for this document.
Investigation is pointing to https://www.chromium.org/for-testers/bug-reporting-guidelines/uncaught-securityerror-failed-to-read-the-localstorage-property-from-window-access-is-denied-for-this-document
PlaywrightCrawler
is opening incognito contexts for both Chrome and Firefox. Using the optionoptionaluseIncognitoPages
is not helping.Code sample
Package version
latest
Node.js version
latest
Operating system
No response
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
Beta Was this translation helpful? Give feedback.
All reactions