chrismuir / mma-data-scrape Goto Github PK
View Code? Open in Web Editor NEWScrape and clean MMA/UFC data using R and rvest
Scrape and clean MMA/UFC data using R and rvest
All of the below event #s are missing from the scrape:
225
207
204
198
196
It's 2 FX, 2 FUEL, and 1 FOX card.
It would be pretty useful indicator of fighter/fight importance. And, could be used to calculate fighters' PPV sales impact.
For this snippet of code:
# Eliminate all championship tags from strings within variables
# FighterA, FighterB, and Belt.
id <- which(colnames(bouts) %in% c("FighterA", "FighterB", "Belt"))
bouts[, id] <- lapply(
bouts[, id], function(x)
gsub(" \\(Fighter)| \\(c)| \\(ic)| \\(UFC Champion)| \\(Pride Champion)",
"", x, ignore.case = TRUE))
This isn't an issue (not sure where else to post), but I think the (c) tag can be useful as it's the only data we have to determine who went into the fight as champion. This can then be turned into defining the potential difference between a title defense vs a title fight win.
I personally commented this out. I then wrote to an .xls file, moved the (c) to their own column, and can now also track who the champions were entering the fight.
Again, not sure if I should open up new issues as this isn't related any coding error, but there are issues with names.
As would be expected with a user contributed site like wikipedia, names are typed differently from time to time.
Georges St-Pierre vs Georges St. Pierre.
James Te Huna vs James Te-Huna
I'm manually going through and making changes now, but I don't know if this is of any use to the actual scrape itself.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.