I enjoyed doing this project as we had to work with a lot of messy data that was called from a varierty of datasets. I spent considerable time figuring out how to create a list of lists while maintaining a persistent count. It should have been basic but required some thought and pacing around my room.
As for my results, I thought it was clear that the large populations would be underrepresented in the population proportion, but its also clear that in these coutnries, there are less politicians in general. We also didnt really talk about newer countries who dont have many politicans in their history.
I was blown away that North Korea had such a high proportion of high quality articles since their politics are so secretive. But then I realized its probably because the politicians we know about are very high profile and the others are not even known.
In conclusion, I felt like more than learning about bias in data, I felt like studies in sociological areas rely more on math than anything else. Its obvious that Western countries were overrepresented, but it probably corelates to where the traffic for wikipedia is coming from. Also I am always hesitant to base any major conclusions about humanity or some blackbox NLP API that spitting out classifications.