This project focuses on analyzing the recent positions of leading countries in the realm of Artificial Intelligence (AI), specifically China, Europe, and the United States. The analysis involves extracting insights from provided documents, categorized into "Main sources" and "Additional sources."
- Python: Programming language used for analysis.
- Libraries:
- pandas
- nltk
- gensim
- wordcloud
- networkx
- matplotlib
Main_sources/
: Folder containing primary documents for mandatory analysis.Additional_sources/
: Folder for optional documents, including "AI_EUvsUS.pdf."
- Clean the text by converting to lowercase, removing stopwords, and ensuring alphanumeric content.
- Generate bi-grams and tri-grams to capture meaningful phrases in the text.
- Create word clouds to visually represent the most frequent words in the text.
- Calculate various statistics, including the number of words, unique words, and entropy of the text.
- Create a network graph to visualize relationships between words to derive insights.
- Generate bi-grams and tri-grams.
- Perform topic detection and generate word clouds.
- Calculate statistics and visualize networks.
- Write findings in the report file.
- Calculate and print top words and bigrams for China, Europe, and the US.
- Load and preprocess text data for EU and US.
- Generate word clouds for China, Europe, and the US.
-Review the attached report for detailed findings and valuable insights from the analysis on the AI positions of China, Europe, and the United States.