Giter VIP home page Giter VIP logo

abuse-data's Introduction

TL;DR: I've found a way to reliably detect abuse matches by applying anomaly detection to game stats.

Introduction

While following a machine learning course I started wondering if what I was learning could be applied to detecting abuse matches in Dota2. This has been quite a hot topic on here lately, with many people lamenting the number of boosters, account-buyers, ... so it seemed worthwhile to do some testing. If a sufficient number of match statistics would follow a Gaussian distribution, it could theoretically be possible to automatically detect outliers by choosing the right cut-off value. As it turns out, there are a lot of stats that are distributed normally and applying anomaly detection based on multivariate gaussian distribution works surprisingly well!

Method

Feel free to skip this section, it's slightly heavy on math.

I collected two non-overlapping sets of matchdetails, each about a n ≈ 100,000 10 player matches in size. I used the first set of matches as a training set and the second set as the test set to verify results. For each of those matches, I took the GPM, XPM, kills, deaths and assists for all players on both teams. Each one of these stats were sorted on size for the winning and for the losing team. The sorted result is considered as a feature vector. This way each match is represented by a vector of features, that contains:

| Winning team GPM | Losing team GPM  | Winning team XPM | ...
|------------------|------------------|------------------| ...
|Lowest|...|Highest|Lowest|...|Highest|Lowest|...|Highest| ...
|---------------------------------------------------------

After combining the feature vectors of the learning set in a giant matrix, I calculated its covariance matrix ∑ and mean vector μ. These I could then use to create a model based on which can be decided whether or not a match is an abuse match. This model looks as follows: p(x) = (2π)^(-n / 2) * |∑|^(-0.5) * exp(-0.5 * (x-μ) * ∑^(-1) * (x-μ)^T ) where x is a vector representing a feature vector from the test set. Most of this model only needs to be calculated once and the resulting calculations that need to be run for each tested sample are sufficiently fast to be able to be calculated in real time. This means you could directly feed the values you receive from the GetMatchHistoryBySequenceNum endpoint to this model and have it spit out the suspect matches/players.

Results

By analyzing the p-values obtained by running the model for each one of the feature vectors in the test set, it seems that the majority of the matches (about 90%) have a value that lies between 10^(-80) and 10^(-60), with outliers between 0 and 10^(-100). Based on this I decided to put my cut-off value at 10^(-100) and only consider games which produced a p-value < 10^(-100). Looking at the games having these values (about 700 samples), they all fall into the following categories:

The latter two categories all have p-values between 10^(-110) and 10^(-100) so while in theory these could be filtered out by lowering the cut-off value, this would mean that certain bot matches would slip through. However, even though these matches only make out the minority of these outliers, I only consider the last category to be real false positives. By saving the accounts for each suspect match and looking for multiple occurences, you can safely assume these last two cases will be eliminated. This would only leave legit bot abuse matches and persistent feeders.

Conclusion

By feeding the match details obtained from the web API's history endpoint through this model, it becomes very easy to detect bot accounts. This proves that even when using a very basic machine learning model and very little effort, we could make big strides cleaning up bot accounts. Furthermore, these preliminary results suggest that it would be equally possible to detect certain player behaviour like excessive feeding, by further tweaking the model. This could potentially automate or complement the reporting system, making it more accurate.

All code used to produce these results can be found on GitHub.

abuse-data's People

Contributors

crazy-duck avatar

Stargazers

Talal Najam avatar  avatar UZERE avatar Karthik Panjaje avatar Cristian Lozano avatar Vasily Krakovetsky avatar Bagus Rianto avatar Vasu Mahesh avatar K avatar Hrvoje Orejas avatar

Watchers

 avatar

abuse-data's Issues

syntax error

\abuse-data\index.js(3, 1) Compilation error of Microsoft JScript: Syntax Error

but in this line i just have: const request = require("request");

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.