(Using Latest SourceCode from 2024-03-29)
I requested for three posts and only one was downloaded successfully and hitting the following errors
FIRST post download hit exception at
def get_url_soup(self, url: str) -> BeautifulSoup:
"""
Gets soup from URL using logged in selenium driver
"""
try:
self.driver.get(url) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< EXCEPTION HIT HERE
return BeautifulSoup(self.driver.page_source, "html.parser")
except Exception as e:
raise ValueError(f"Error fetching page: {e}") from e
CALLSTACK
get_url_soup (/Users/username/Dev/Substack2Markdown/substack_scraper.py:341)
scrape_posts (/Users/username/Dev/Substack2Markdown/substack_scraper.py:228)
main (/Users/username/Dev/Substack2Markdown/substack_scraper.py:394)
(/Users/username/Dev/Substack2Markdown/substack_scraper.py:398)
OUTPUT
0%| | 0/3 [00:00<?, ?it/s]Error scraping post: Error fetching page: Message: no such execution context
(Session info: MicrosoftEdge=123.0.2420.65)
Stacktrace:
0 msedgedriver 0x0000000104bc99d8 msedgedriver + 4823512
1 msedgedriver 0x0000000104bc1a13 msedgedriver + 4790803
2 msedgedriver 0x0000000104787d35 msedgedriver + 359733
3 msedgedriver 0x000000010477434a msedgedriver + 279370
4 msedgedriver 0x00000001047732a3 msedgedriver + 275107
5 msedgedriver 0x00000001047736df msedgedriver + 276191
6 msedgedriver 0x0000000104781fa4 msedgedriver + 335780
7 msedgedriver 0x000000010479211b msedgedriver + 401691
8 msedgedriver 0x00000001047968ab msedgedriver + 420011
9 msedgedriver 0x0000000104773c8b msedgedriver + 277643
10 msedgedriver 0x0000000104791da0 msedgedriver + 400800
11 msedgedriver 0x000000010480887f msedgedriver + 886911
12 msedgedriver 0x00000001047ec543 msedgedriver + 771395
13 msedgedriver 0x00000001047c0dbf msedgedriver + 593343
14 msedgedriver 0x00000001047c171e msedgedriver + 595742
15 msedgedriver 0x0000000104b85f32 msedgedriver + 4546354
16 msedgedriver 0x0000000104b8c2c6 msedgedriver + 4571846
17 msedgedriver 0x0000000104b67d5a msedgedriver + 4423002
18 msedgedriver 0x0000000104b8cd2d msedgedriver + 4574509
19 msedgedriver 0x0000000104b583d4 msedgedriver + 4359124
20 msedgedriver 0x0000000104bb0348 msedgedriver + 4719432
21 msedgedriver 0x0000000104bb04c1 msedgedriver + 4719809
22 msedgedriver 0x0000000104bc15a7 msedgedriver + 4789671
23 libsystem_pthread.dylib 0x00007ff803e6818b _pthread_start + 99
24 libsystem_pthread.dylib 0x00007ff803e63ae3 thread_start + 15
33%|███████████████████████████████████████████████████████████████▎ | 1/3 [00:16<00:32, 16.22s/it]Error scraping post: 'NoneType' object has no attribute 'text'
67%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 2/3 [00:50<00:25, 25.26s/it]
SECOND post download hit exception at
def scrape_posts(self, num_posts_to_scrape: int = 0) -> None:
"""
Iterates over all posts and saves them as markdown files
"""
...
title, subtitle, like_count, date, md = self.extract_post_data(soup) <<<<<<<<<<<<<<<<<<<<< EXCEPTION HIT HERE
CALLSTACK
scrape_posts (/Users/avib/AviDev/Substack2Markdown/substack_scraper.py:232)
main (/Users/avib/AviDev/Substack2Markdown/substack_scraper.py:394)
(/Users/avib/AviDev/Substack2Markdown/substack_scraper.py:398)
OUTPUT
33%|████████████████████████████▎ | 1/3 [02:02<04:05, 122.79s/it]Error scraping post: 'NoneType' object has no attribute 'text'