Giter VIP home page Giter VIP logo

koreandiscordchatanalyzer's Introduction

KoreanDiscordChatAnalyzer

Project Overview

The KoreanDiscordChatAnalyzer is an NLP-based project aimed at analyzing chat data from public Korean Discord communities. The goal is to extract meaningful trends, sentiments, and key phrases from chat conversations to gain insights into community dynamics, interests, and feedback. It can serve as a resource for community managers, game developers, and researchers interested in understanding the social interactions and preferences within these digital spaces.

Breakdown

Data Collection

  • Prerequisite: A Discord Bot is required for data collection. To create a Discord bot and obtain a bot token, follow this follow this tutorial.
  • The data_collection/discord_data_collector.py script utilizes a Discord bot to gather chat data from a specified channel in a Korean Discord community. Configured with necessary permissions, the bot fetches and stores up to 100,000 messages (or change to any numbers you need) as a CSV file, focusing on text content, author, and timestamps. Timestamps are converted from UTC to Korea Standard Time.

Data Cleaning

Performs the following operations:

  • Retention of only specified characters such as Korean, English, specific special characters, and emojis.
  • Removal of URLs and Discord-specific markup (like user mentions and custom emojis).
  • Normalization of repeated sequences to a maximum of three repetitions.
  • Trimming whitespace.
  • Removal of lines that are empty, contain only whitespace, or consist solely of punctuation, Korean punctuation, numbers, or one to two English characters.

Analysis

Chat Frequency over Time

Identify patterns in chat activity over time to understand peak activity periods and any noticeable trends.

  • Use the timestamps in the collected data to aggregate message counts on a daily / weekly / monthly basis.
  • Plot these counts in a time series graph to visualize the chat activity over time.

(Monthly) Most Active Users

Recognize the most active community members on a monthly basis to potentially reward engagement or identify key influencers.

  • Summarize the number of messages per user for each month.
  • Rank users by their message count to find the top contributors.

(Monthly) Term Frequency

Analyze the most frequently used terms or words in the community chats to gauge the prevalent topics or interests.

  • Tokenize the chat messages into words, remove common stopwords, and count the frequency of each term on a monthly basis.
  • Plot the terms in a bar chart or a word cloud.

Sentiment Analysis

Assess the general sentiment (positive, negative, neutral) of the community chats to understand the overall mood or response to certain topics or events.

Topic Modeling

Uncover underlying topics in the chat messages to identify key themes or subjects of interest within the community.

  • Use topic modeling algorithms like Latent Dirichlet Allocation (LDA) to extract topics from the collection of text data.

Roadmap

  • Collect data from Discord
  • Clean and process the collected text
  • Perform core data analysis
  • Build a web interface to visualize insights
  • Add tests to ensure reliability

Data Privacy and Ethics

  • Public Data Only: The analysis is performed exclusively on publicly available chat data. No private conversations or personal data are used.
  • Anonymization: All data used in the project is anonymized to ensure individual privacy is maintained.
  • Ethical Considerations: The project adheres to ethical guidelines for data usage and analysis, ensuring no community or individual is targeted or negatively impacted.

License

Distributed under the GNU General Public License v3.0 License. See LICENSE file for more information.

Acknowledgments

koreandiscordchatanalyzer's People

Contributors

ariachen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.