Giter VIP home page Giter VIP logo

twitter-insight-llm's People

Contributors

alexzhangji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

twitter-insight-llm's Issues

Is it limited to 50 tweets? Or it's twitter website limit

Tried few times, only 50 likes downloaded, last few lines log below

`2024-04-19 02:27:04,735 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements {'using': 'css selector', 'value': "div[data-testid='videoPlayer']"}
2024-04-19 02:27:04,738 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements HTTP/1.1" 200 0
2024-04-19 02:27:04,739 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":[]} | headers=HTTPHeaderDict({'Content-Length': '12', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,739 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,739 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements {'using': 'css selector', 'value': "div[data-testid='tweetPhoto']"}
2024-04-19 02:27:04,742 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements HTTP/1.1" 200 0
2024-04-19 02:27:04,742 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":[]} | headers=HTTPHeaderDict({'Content-Length': '12', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,742 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,742 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements {'using': 'css selector', 'value': "div[data-testid='videoPlayer']"}
2024-04-19 02:27:04,745 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements HTTP/1.1" 200 0
2024-04-19 02:27:04,745 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":[]} | headers=HTTPHeaderDict({'Content-Length': '12', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,745 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,745 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements {'using': 'css selector', 'value': "div[data-testid='tweetPhoto']"}
2024-04-19 02:27:04,747 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/elements HTTP/1.1" 200 0
2024-04-19 02:27:04,747 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":[]} | headers=HTTPHeaderDict({'Content-Length': '12', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,748 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,748 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/element {'using': 'css selector', 'value': "div[data-testid='reply']"}
2024-04-19 02:27:04,751 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/element HTTP/1.1" 200 0
2024-04-19 02:27:04,751 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":{"element-6066-11e4-a52e-4f735466cecf":"f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.584"}} | headers=HTTPHeaderDict({'Content-Length': '127', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,751 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,751 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/execute/sync {'script': '/* getAttribute /return (function(){return (function(){var d=this||self;function f(a,b){function c(...', 'args': [{'element-6066-11e4-a52e-4f735466cecf': 'f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.584'}, 'aria-label']}
2024-04-19 02:27:04,770 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/execute/sync HTTP/1.1" 200 0
2024-04-19 02:27:04,770 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":"19 Replies. Reply"} | headers=HTTPHeaderDict({'Content-Length': '29', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,770 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,770 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/element {'using': 'css selector', 'value': "div[data-testid='retweet']"}
2024-04-19 02:27:04,774 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/element HTTP/1.1" 200 0
2024-04-19 02:27:04,774 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":{"element-6066-11e4-a52e-4f735466cecf":"f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.585"}} | headers=HTTPHeaderDict({'Content-Length': '127', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,774 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,774 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/execute/sync {'script': '/
getAttribute */return (function(){return (function(){var d=this||self;function f(a,b){function c(...', 'args': [{'element-6066-11e4-a52e-4f735466cecf': 'f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.585'}, 'aria-label']}
2024-04-19 02:27:04,776 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/execute/sync HTTP/1.1" 200 0
2024-04-19 02:27:04,776 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=200 | data={"value":"89 reposts. Repost"} | headers=HTTPHeaderDict({'Content-Length': '30', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,776 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,776 - selenium.webdriver.remote.remote_connection - DEBUG - POST http://localhost:50146/session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/element {'using': 'css selector', 'value': "div[data-testid='like']"}
2024-04-19 02:27:04,780 - urllib3.connectionpool - DEBUG - http://localhost:50146 "POST /session/861c99898cb0272e5e265c5f1b0d503b/element/f.E668CD9998144E0499B79B48775B5205.d.F61E3435C244E04795D5639F5028A5C9.e.520/element HTTP/1.1" 404 0
2024-04-19 02:27:04,780 - selenium.webdriver.remote.remote_connection - DEBUG - Remote response: status=404 | data={"value":{"error":"no such element","message":"no such element: Unable to locate element: {"method":"css selector","selector":"div[data-testid='like']"}\n (Session info: chrome=124.0.6367.60)","stacktrace":"0 chromedriver 0x00000001031be934 chromedriver + 4368692\n1 chromedriver 0x00000001031b6dc8 chromedriver + 4337096\n2 chromedriver 0x0000000102ddac04 chromedriver + 289796\n3 chromedriver 0x0000000102e1ce00 chromedriver + 560640\n4 chromedriver 0x0000000102e13368 chromedriver + 521064\n5 chromedriver 0x0000000102e555ec chromedriver + 792044\n6 chromedriver 0x0000000102e11ab4 chromedriver + 514740\n7 chromedriver 0x0000000102e1250c chromedriver + 517388\n8 chromedriver 0x0000000103182e50 chromedriver + 4124240\n9 chromedriver 0x0000000103187c40 chromedriver + 4144192\n10 chromedriver 0x0000000103168818 chromedriver + 4016152\n11 chromedriver 0x0000000103188570 chromedriver + 4146544\n12 chromedriver 0x000000010315a2cc chromedriver + 3957452\n13 chromedriver 0x00000001031a7eb8 chromedriver + 4275896\n14 chromedriver 0x00000001031a8034 chromedriver + 4276276\n15 chromedriver 0x00000001031b6a28 chromedriver + 4336168\n16 libsystem_pthread.dylib 0x000000019c7f3fa8 _pthread_start + 148\n17 libsystem_pthread.dylib 0x000000019c7eeda0 thread_start + 8\n"}} | headers=HTTPHeaderDict({'Content-Length': '1699', 'Content-Type': 'application/json; charset=utf-8', 'cache-control': 'no-cache'})
2024-04-19 02:27:04,780 - selenium.webdriver.remote.remote_connection - DEBUG - Finished Request
2024-04-19 02:27:04,914 - main - INFO -

Done saving to data/tweets_2024-04-19_02.xlsx. Total of 50 unique tweets.`

点赞数多的时候报错了,浏览器崩溃

2024-03-08 10:42:16,951 - main - INFO - Tweet: <selenium.webdriver.remote.webelement.WebElement (session="6d68c615a31a66014517db5a18a144ed", element="f.4558563DBA9B5CDD7FCE8259F38DCFD1.d.A0B7EFFDA9F1D15135C1B32CAF8A42EB.e.12597")>
Traceback (most recent call last):
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\tenacity_init_.py", line 382, in call
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "E:\Twitter-Insight-LLM-main\twitter_data_ingestion.py", line 152, in _process_tweet
author_name, author_handle = self._extract_author_details(tweet)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Twitter-Insight-LLM-main\twitter_data_ingestion.py", line 240, in _extract_author_details
author_details = self._get_element_text(
^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Twitter-Insight-LLM-main\twitter_data_ingestion.py", line 196, in _get_element_text
return parent.find_element(By.XPATH, selector).text
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\selenium\webdriver\remote\webelement.py", line 417, in find_element
return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\selenium\webdriver\remote\webelement.py", line 395, in _execute
return self._parent.execute(command, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 347, in execute
self.error_handler.check_response(response)
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 229, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found
(Session info: chrome=122.0.6261.112); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#stale-element-reference-exception
Stacktrace:
GetHandleVerifier [0x00007FF656AFAD32+56930]
(No symbol) [0x00007FF656A6F632]
(No symbol) [0x00007FF6569242E5]
(No symbol) [0x00007FF656929261]
(No symbol) [0x00007FF65692B6EB]
(No symbol) [0x00007FF65692B7B0]
(No symbol) [0x00007FF65696955C]
(No symbol) [0x00007FF656969A2C]
(No symbol) [0x00007FF65695F13C]
(No symbol) [0x00007FF65698BCDF]
(No symbol) [0x00007FF65695F09A]
(No symbol) [0x00007FF65698BEB0]
(No symbol) [0x00007FF6569A81E2]
(No symbol) [0x00007FF65698BA43]
(No symbol) [0x00007FF65695D438]
(No symbol) [0x00007FF65695E4D1]
GetHandleVerifier [0x00007FF656E76ABD+3709933]
GetHandleVerifier [0x00007FF656ECFFFD+4075821]
GetHandleVerifier [0x00007FF656EC818F+4043455]
GetHandleVerifier [0x00007FF656B99766+706710]
(No symbol) [0x00007FF656A7B90F]
(No symbol) [0x00007FF656A76AF4]
(No symbol) [0x00007FF656A76C4C]
(No symbol) [0x00007FF656A66904]
BaseThreadInitThunk [0x00007FF8AB1B7344+20]
RtlUserThreadStart [0x00007FF8ABE026B1+33]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\Twitter-Insight-LLM-main\twitter_data_ingestion.py", line 311, in
scraper.fetch_tweets(
File "E:\Twitter-Insight-LLM-main\twitter_data_ingestion.py", line 55, in fetch_tweets
row = self.process_tweet(tweet)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\tenacity_init
.py", line 289, in wrapped_f
return self(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\tenacity_init_.py", line 379, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\abc\PycharmProjects\pythonProject.venv\Lib\site-packages\tenacity_init_.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x173c9675e80 state=finished raised StaleElementReferenceException>]

Maybe .json not found..idk

Traceback (most recent call last):
File "d:\codes.vscode\twitter_ingection.py", line 311, in
scraper.fetch_tweets(
File "d:\codes.vscode\twitter_ingection.py", line 74, in fetch_tweets
self._save_to_json(row, filename=f"{cur_filename}.json")
File "d:\codes.vscode\twitter_ingection.py", line 292, in _save_to_json
with open(filename, "a", encoding="utf-8") as file:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'data/tweets_2024-04-12_17-53-05.json'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.