Giter VIP home page Giter VIP logo

maxdemaio / thelangbot Goto Github PK

View Code? Open in Web Editor NEW
7.0 4.0 5.0 792 KB

Twitter bot to help you learn foreign languages. Building a community through tweets. Retweets #100DaysOfLanguage and #langtwt. Archived since the monetization of the Twitter API.

Home Page: https://maxdemaio.github.io/thelangbot/

Python 90.72% Dockerfile 9.28%
challenge foreign-language-learning foreign-language language 100daysoflanguage python mysql

thelangbot's Issues

Add Langtwt introduction starter

Make a card in CSS that people can copy/paste into Twitter to introduce themselves. Add this to the website. Or they could click the hyperlink to auto setup the tweet w/ intention queries.

Implement Anti-Spamming Logic

Recently, there was an influx of accounts that spammed the #langtwt and #100DaysOfLanguage hashtags on Twitter. This resulted in the bot retweeting all of their posts. To prevent this, I should figure out how to rate limit accounts which are spamming so that the community doesn't have to deal with seeing too many posts from one user.

Limit DB queries

Currently, for ever Twitter user in my list of tweets we query for their username in the DB.

Better idea:

  • Query the DB once for blacklist/supporters
SELECT * FROM example
  • Turn query object into Python accessible object
  • Check if in they are in that list

Self Hosting or Cloud Hosting

heroku might start charging and getting rid of free tier stuff. from their docs it just says anything using free dynos will be removed. We use their scheduler and MySQL DB but no docs mention these will be removed yet.

Make Repo More Developer Friendly

dev setup guide with env.example file, easy way to recreate the bot if others wanted to. also, get some ci for test cases up in here and a perhaps later some containerization.

Fix thelangbot!

  • Dockerfile with Cron job
  • JSON, getting rid of that darn DB because it's not needed
    • Index on immutable ID

Extract DB functions into a `utils.py` File

Right now I think it would be good to offload the db functions into a util file. Each function could take a cursor parameter so that the setup can still stay in bot.py

  • create utils.py
  • move functions from bot.py over to utils.py
  • add cursor parameter to all functions (also pass as a parameter in bot.py)
  • import the functions in bot.py

Retweet tweets in order

Currently, the bot retweets tweets in order from newest->oldest since we are given and iterator from the api as such.

Make it so that we can start from the end, and then iterate from oldest->newest.

Duplicate Status Error: Users Tweeting the Same Tweet Across Multiple Accounts

Basically someone or some group owns multiple "@garage" accounts for language. They tweet out the same content at more or less the same time in multiple languages. Since we're retweeting them, it seems like we're spamming since the text is the same (despite 2 languages, which doesn't seem to matter).

For now, I will keep these users banned until an alternate solution is found. Unfortunately, we will not be able to retweet their duplicate tweets because of this error.

Example tweet 1 (1620150411725250564):

スーツケースを詰めなければいけません。 Suutsukeesu o tsumenakereba ikemasen. I need to pack my suitcase.

Example tweet 2 (1249460011991883776):

Мне надо упаковать чемодан. I need to pack my suitcase.

Found tweet by @GarageJapanese
User ID is: 1116761967996354562
Tweet ID is: 1620150411725250564
Tweet retweeted!
Found tweet by @GarageRussian
User ID is: 1249460011991883776
Tweet ID is: 1620150411117076483
Traceback (most recent call last):
  File "/app/bot.py", line 71, in <module>
    main()
  File "/app/bot.py", line 33, in main
    retweet(api, tweets, banned_ids, supporter_ids, last_seen_id)
  File "/app/bot.py", line 65, in retweet
    api.retweet(tweet.id)
  File "/usr/local/lib/python3.9/site-packages/tweepy/binder.py", line 253, in _call
    return method.execute()
  File "/usr/local/lib/python3.9/site-packages/tweepy/binder.py", line 234, in execute
    raise TweepError(error_msg, resp, api_code=api_error_code)
tweepy.error.TweepError: [{'code': 187, 'message': 'Status is a duplicate.'}]

Convert Tests to Use `unittest`

For now we just make sure the tweets are being retweeted via the command line in tester.py. But we could set these up as unit tests with assertions to make sure when code is changed we can quickly run/verify all is good.

  • create unified testing file
  • import unittest
  • create class for LangbotTests
  • move main method of tester.py to be a valid method in the LangboTests class

Resources:

get rid of the DB

We want to get rid of the DB. Creates a point of failure and we don't even really need one. TBH we don't even store sensitive data. We could just legit have text files. We will just update them in version control. All we have are blacklisted/banner users (ban by immutable IDs), supporter IDs (people that have supported the bot and get likes alongside retweets), and then we'll try and remove storing the "last seen ID" because of #28.

MySQL DB update issue

Currently the bot will go down in the situation where the app fails to do an UPDATE on the MySQL database. At the very end of retweeting tweets, the bot will attempt to store the last seen ID into the DB. If it fails, this causes the last seen tweet to be stale, and if the bot re-queries Twitter, most subsequent tweets from the stale ID will have already been retweeted.

Error seen from logs:

2022-04-28T13:22:41.643519+00:00 app[scheduler.3653]:     Utils.storeLastSeenId(mydb, mycursor, currLastSeenId)
2022-04-28T13:22:41.643519+00:00 app[scheduler.3653]:   File "/app/utils.py", line 41, in storeLastSeenId
2022-04-28T13:22:41.643580+00:00 app[scheduler.3653]:     mycursor.execute("UPDATE tweet SET tweetId = '%s' WHERE id = 1", (exampleId,))
2022-04-28T13:22:41.643582+00:00 app[scheduler.3653]:   File "/app/.heroku/python/lib/python3.9/site-packages/mysql/connector/cursor_cext.py", line 269, in execute
2022-04-28T13:22:41.643692+00:00 app[scheduler.3653]:     result = self._cnx.cmd_query(stmt, raw=self._raw,
2022-04-28T13:22:41.643693+00:00 app[scheduler.3653]:   File "/app/.heroku/python/lib/python3.9/site-packages/mysql/connector/connection_cext.py", line 510, in cmd_query
2022-04-28T13:22:41.643837+00:00 app[scheduler.3653]:     raise errors.get_mysql_exception(exc.errno, msg=exc.msg,
2022-04-28T13:22:41.643856+00:00 app[scheduler.3653]: mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query

Possible solution: attempt to store the last ID in the iterable first, and if this fails then we should not continue with retweets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.