Comments (3)
Hey @calebpeffer, I checked the progress for samsara.com and managed to retrieve the partial_data
during the process, which took about 1.5 hours. The crawler successfully scraped all 6,295 pages, but now that the process is complete, I'm getting data=null
from crawl check status. Additionally, I couldn't find the job data on supabase, which suggests there might have been an error in attempting to save this large amount of data.
I'm currently looking into ways to improve our process for saving job data on supabase.
from firecrawl.
Hey @calebpeffer, I've resolved a bug with the playground request, and now you can crawl samsara.com without any issues.
Regarding the developers' webpage, none of the links on this page are children of /reference/overview
. I tested this by requesting with crawlerOptions.allowBackwardCrawling: true
(which is no longer available on the playground), and it successfully retrieved the links.
from firecrawl.
Thanks @rafaelsideguide. Relayed this to the customer!
from firecrawl.
Related Issues (20)
- [Feat] Fire-engine self hosted? HOT 1
- [BUG] baseURL showing twice on activity logs
- Self-hosting doc improvements HOT 1
- Potential issues with nested sitemaps HOT 3
- [Feat] Add Go SDK implementation HOT 4
- Only one item in Structured Output HOT 2
- [BUG] Doesn't crawl on https://stripe.com/resources/more/banking-as-a-service HOT 5
- [Feat] Post to a Webhook once a crawl is finished (instead of expecting us to check the crawl status) HOT 4
- v1: Extract LinksOnPage from HTML that has includedTags / excludedTags parameters already applied
- [v1] Better webhook support for /crawl + docs HOT 1
- Firecrawl is blocked by Cloudflare HOT 9
- [BUG] https://refact.ai/ timesout and returns an error on playground not sure why the site seems fairly simple HOT 1
- [BUG] https://www.mccarthy.com/craft/search?jobviteiframe=job%2FoYD3tfwR has missing content for the page, even with the waitFor parameter set to True
- [BUG] https://static01.nyt.com/newsgraphics/documenttools/f6ab5c368725101c/43d7c2a0-full.pdf
- [BUG] [SelfHost] Certain Web Page Scrape Return Wrong Encoding Content HOT 1
- [BUG] [SelfHost] Unable to start api due to redis connection error
- [Feat] Run actions like clicking or scrolling on page before extraction
- [Feat] Add automatic retries to failed links on crawl
- [Feat] What will happen to the links that uses authentication services?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firecrawl.