Comments (15)
i have some experience with the API limits :), i used to scrape the entire osrs ge.
but first things first, some refactoring, a database would be beneficial, what data are we getting from the plugin.
it would be nice if we had the following information from the plugin:
- player_reporter
- player_reported
- location
i suggest 2 endpoints. Report_player & report_players.
both endpoints do inserts in the database,
table:
- player_reports
- players (all unique players)
the difference between report_player & report players is that we would set a column in player_reports as Nearby_players, True (1).
in the players table we keep track when a player is created, banned, banned_date.
ban is detected if a player is removed from high scores.
For highscores we need some tables
Table: Highscores
Columns:
- player_id (as defined in players)
- Timestamp
- stats (skill lvl & xp, minigame rank, score)
Table: Highscores_latest (don't know if needed)
Columns:
- player_id (as defined in players)
- stats (skill lvl & xp, minigame rank, score)
We would need routes to request data from the database. i suggest:
- /player?player_name=
- /player_reports?reporter_player_id=&reported_player_id=
- /highscores?player_id=&start_date=&end_date=
- /highscores_latest?player_id=
Can you set up the database side, i can setup the Flask api that will run on the server.
what i have described should be the basis for a nice website that can display our best bot detector :D.
additionally it should be the basis for our AI idea's.
AI workflow will be the following:
- Request data
- pre processing (Data cleaning & Feature engineering )
- ai modelling
- model evaluation
- model deployment
from bot-detector-core-files.
it might also be useful to have the plugin push a user token, so we can stop abuse?
from bot-detector-core-files.
Hey there - sorry about the data flow mess. I'll work on cleaning that up so it's much more readable.
- "OSRS_KNN_V1" - This pickled file is the KNN classifier, you can use:
osrsknn = pickle.load(open("OSRS_KNN_V1","rb"))
osrsknn_predict = osrsknn.predict(PLAYER_IND[2].reshape(1,-1))
print(osrsknn_predict)
to use the classifier.
-
"ykmfile" - These are the labels produced by KMeans (n_clusters = 300). They are in the order of the input data, and correlate with the "Pnamefile" which are the player names. Ex. ykmfile[8] is the group label of pnamefile[8].
-
"pnamefile" - These are the player names.
-
"PIfile" - This is the raw dataform and needs to have
x = np.reshape(PIfile,(-1,78))
passed through it in order to produce an array with [ENTRIES,78]. ENTRIES = the number of total names added to the dataset, and 78 being the number of features. This raw dataform has not been normalized or adjusted in any way. -
"traindata" - This is just the section of the raw data used in the KNN classifier, and can largely be ignored.
I will focus on making a much more readable format shortly - which should help answer some of your questions.
from bot-detector-core-files.
the data you have are only player names?
from bot-detector-core-files.
FYI we are doing something very similar to:
https://www.youtube.com/watch?v=Dk4Yahv2lek&list=PLX9loFun2zNkqwEk3abeMzZnVlT0YPxkp
but on a cleaner way.
from bot-detector-core-files.
the data you have are only player names?
No, the data are the stats from the hiscores, located in "PIfile" :)
So if loaded in,
ykmfile = generated labels
Pifile.reshape(-1,78) = hiscore data
Pnames = names
So that:
ykm[4] is the label for player pname[4], with features PIfile[4]
from bot-detector-core-files.
FYI we are doing something very similar to:
https://www.youtube.com/watch?v=Dk4Yahv2lek&list=PLX9loFun2zNkqwEk3abeMzZnVlT0YPxkp
but on a cleaner way.
Really cool project!
from bot-detector-core-files.
i don't know how effective the raw players stats are, for detecting bots.
Some data engineering, we can scrape every hour, 6 hours, day scrape highscores to get the xp gains over time.
a hypothesis is that bots gain xp at a similar rate, in a specific skill compared to normal people gaining xp in many skills
from bot-detector-core-files.
also gathering labeled data will make it way easier :D
from bot-detector-core-files.
i don't know how effective the raw players stats are, for detecting bots.
Some data engineering, we can scrape every hour, 6 hours, day scrape highscores to get the xp gains over time.a hypothesis is that bots gain xp at a similar rate, in a specific skill compared to normal people gaining xp in many skills
Yeah I would love to scrape the hiscores every 6 hrs, unfortunately there is a rate limit of 2-3 seconds per name. So 100K names could take 69 hrs to scrape.
from bot-detector-core-files.
also gathering labeled data will make it way easier :D
It would be, but we don't know the labels unfortunately, since we don't know/can't easily trust the accuracy of sent in labels for individual players. So kmeans can group players on their stats and output labels for us. Those labels then go into the KNN classifier Which seems to work well so far at least.
from bot-detector-core-files.
It's a very tough situation due to the API ratelimit.
from bot-detector-core-files.
Excellent suggestions. I will work this week and weekend to make the code much easier to read. We will also try to make it so that the reporting player's info will be included, as well as report the location of the found players. I can definitely set up the database side and properly reconfigure everything so that it is very clear and manageable. I have also recently set up a flask app on a Linode server w/ gunicorn and nginx as a test for a switch from Google cloud app ==> Linode. - however my flask app is very rudimentary so changes are highly appreciated. I will let you know once the changes have been made, and when a database will become available - this will definitely assist in improving the workflow from this point onward.
As for the data we are getting from the plugin: Simply player names are being given at this time. Those names are then processed on our end to retrieve the OSRS Hiscore data values. Location and the reporting player were planned to be included in later updates, but we can shift the schedule to include these values earlier on.
from bot-detector-core-files.
the data is in json format?
for an minimum viable product the Location and the reporting player would be really good, combined with a website. It gives people something to show, with a bit of luck sir pugger will pick it up :D.
(recently i've got myself a vps for my tools aswell, but i'm not experienced in any of that linux stuf :p )
maybe send me a message on twitter, so we can share a .env file, @3xtreme4all
from bot-detector-core-files.
Haha no. Embarrassingly, the data is in a text file format. I'm going to try and convert it all into a json format from now on. Also I'd be super excited to have a great looking website where you can look up statistics/etc. That would really be remarkable to add in the future!
Also don't worry - I don't know anything regarding linux. I just followed a tutorial on youtube (As with basically how I've done everything that I've done so far, youtube is the way to go)
from bot-detector-core-files.
Related Issues (20)
- CORS Policy: Our website's requests are blocked if not using www. subdomain. HOT 1
- 500 when using put /v1/report
- anonymous feedback doesnt work
- classes should have a config subclass with orm_mode = True
- New Prediction API breakdown dictionary
- dev vs api differences in contributions
- Change Unknown Bot to Unknown Account HOT 1
- Remove Breakdown Prediction for Stats too Low HOT 1
- Do not provide a confidence rating for "Stats Too Low" HOT 2
- Route logging with Request object
- improve error handling
- handle deadlocks
- KC does not tick up in plugin — plugin receives 201/200 Created/OK HOT 1
- add black formatting to github actions HOT 7
- normalized name is not inserted & names are not normalized
- Use normalized name when posting feedback HOT 1
- Use http error in 400 range for player not found in feedback
- Broken normalization on predictions
- Discord Verification: Return wrong code error if code not found in pending links HOT 2
- api versioning
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bot-detector-core-files.