This page contains the materials for the short MA political science course Automated Text Analysis in Political Science for political science MA students at CEU (16-27 April 2018). Materials will be added as we go along.
Instructor: Martijn Schoonvelde
You can find the syllabus here. For any questions, send me an email at mschoonvelde[at]gmail[dot]com.
Date | Link | |
---|---|---|
April 23, 17:00h | Assignment 1 | |
April 30, 17:00h | Assignment 2 |
Date | Slides | Date | Slides |
---|---|---|---|
April 16 | Link | April 23 | Link |
April 17 | Link | April 24 | Link |
April 18 | Link | April 25 | Link |
April 19 | Link | April 26 | Link |
April 20 | Link | April 27 | Presentations |
Date | Link |
---|---|
April 16 | Introduction |
April 17 | Script |
April 18 | Script, Data |
April 19 | Script, Data |
April 20 | Script, Data |
April 23 | Script, Data |
April 24 | Script, Data |
April 25 | Applications |
April 26 | Conclusion |
April 27 | Presentations |
For some code in the code practice scripts, I made use of materials by Jos Elkink here, and here, and Wouter van Atteveldt here and here. The setup of the code practice scripts follows the structure in Welbers, K., Van Atteveldt, W., & Benoit, K. (2017) (see below for citation). For some slides in week 1 of the course I made use of materials by Pablo Barberá and Ken Benoit here.
April 16: 15:30 - 17:10:
- Introduction to the course and to EUSpeech, a dataset which will use for running examples: Link
- Required reading:
- Schumacher, G., Schoonvelde, M., Traber, D., Dahiya, T., & De Vries, E. (2016). EUSpeech: a New Dataset of EU Elite Speeches. In: Proceedings of the International Conference on the Advances in Computational Analysis of Political Text, 75-80.
April 17: 15:30 - 17:10:
- A survey of automated text analysis in political science. Supervised and unsupervised methods. Validation, validation, validation. Text Analysis in R.
- Required reading:
- Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267-297.
- Welbers, K., Van Atteveldt, W., & Benoit, K. (2017). Text Analysis in R. Communication Methods and Measures, 11(4), 245-265.
April 18: 15:30 - 19:00:
- Pre-processing data. Going from text to data, including a few notes of caution. Discussion of the research design and research note.
- Required reading:
- Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Forthcoming at Political Analysis.
- Greene, Z., Ceron, A., Schumacher, G., & Fazekas, Z. (2016). The Nuts and Bolts of Automated Text Analysis. Comparing Different Document Pre-Processing Techniques in Four Countries. Working paper.
April 19: 15:30 - 17:10:
- Systematically describing and comparing texts.
- Required reading:
- Chapters 3 and 4 of Silge, J., & Robinson, D. (2018). Text Mining with R: A Tidy Approach. O'Reilly Media, Inc. Available at Link
April 20: 15:30 - 17:10:
- Using dictionaries to measure sentiment, happiness and other things we're interested in.
- Required reading:
- Pennebaker JW & King L (1999) Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296-1312.
- Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205-231.
- Suggested reading:
- Rooduijn, M., & Pauwels, T. (2011). Measuring populism: Comparing two methods of content analysis. West European Politics, 34(6), 1272?1283.
- Rheault, L., Beelen, K., Cochrane, C., & Hirst, G. (2016). Measuring Emotion in Parliamentary Debates with Automated Textual Analysis. PLoS One, 11(12).
- 17:00: Coding Assignment 1 Due
April 23: 09:00 - 10:40:
- Scaling methods locating text on an underlying (political) dimension. What do they mean? And how do they work?
- Required reading:
- Slapin JB & Proksch SO (2008) A Scaling Model for Estimating Time-Serial Positions from Texts. American Journal of Political Science 52, 705-722.
- Hjorth, F., Klemmensen, R., Hobolt, S., Hansen, M. E., & Kurrild-Klitgaard, P. (2015). Computers, coders, and voters: Comparing automated methods for estimating party positions. Research & Politics, 2(2).
- Suggested reading:
- Lo, J., Proksch, S. O., & Slapin, J. B. (2016). Ideological clarity in multiparty competition: A new measure and test using election manifestos. British Journal of Political Science, 46(3), 591-610.
April 24: 09:00 - 10:40:
- Topic models, unsupervised models for summarizing what a text is about.
- Required reading:
- Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
- Roberts, M et al.. (2014). Structural Topic Models for Open-Ended Survey Responses. American Journal of Political Science, 58(4), 1064-1082.
- Suggested reading:
- Boumans JW & Trilling D (2016) Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars.Digital Journalism 4(1): 8-23.
- http://www.scottbot.net/HIAL/index.html@p=19113.html
April 25: 09:00 - 10:40 & 11:00 - 12:40:
- New developments in automated text analysis: (i) crowd-sourcing and (ii) measurement of elite personality, (iii) measurement of semantic shifts.
- Required reading:
- Ramey, A. J., Klingler, J. D., & Hollibaugh, G. E. (2016). Measuring elite personality using speech. Political Science Research and Methods, 1-22.
- Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review, 110(2), 278-295.
- Suggested reading:
- Azarbonyad, H., Dehghani, M., Beelen, K., Arkut, A., Marx, M., & Kamps, J. (2017). Words are Malleable: Computing Semantic Shifts in Political and Media Discourse. Proceedings of the 2017 ACM Conference on Information and Knowledge Management, 1509-1518.
April 26: 09:00 - 10:40:
- Loose ends, review, and general discussion of pros and cons of automated text analysis.
April 27: 09:00 - 10:40:
- Research design presentations.
30 April, 17:00: Coding Assignment 2 Due
4 May, 17:00: Research Note Due