dtaivpp / community-pulse Goto Github PK

This is a system that will allow any person to start monitoring their communities activity.

License: Other

Python 100.00%

community-pulse's Issues

[Feature] Add CLI for Running

There should be a CLI that allows users to pass in configuration parameters for running. These parameters include config file location or which specific jobs should be run.

Examples:
pulse --config=/etc/pulse/ --job=twitter
pulse --job twitter discourse

[Feature] Create default dashboards by job type

Users should be presented default dashboards for all the data in a particular job type. These dashboards should be generically applicable and versioned so that they can be updated at a later date.

SubTasks:

Create dashboards and store them in the repo
Store the version of dashboards in a meta index
Check the version of the dashboard on connect and upgrade

[FEATURE] Threaded jobs

At the moment every job will execute in series which could cause issues as one job failing would take down the pipeline and most jobs (of different types) could be run in parallel.

community-pulse should have a way to ensure that jobs of a similar type (eg. twitter) are executed in series so that they do not exhaust their API limits but jobs of dissimilar types are executed in parallel such that they do not block each other.

Duplicate entries being added

At the moment, the query to return the most recent tweet ID are only returning the latest tweet from the previous day. Need to debug to determine the cause.

[BUG] Indexes fail to create silently with capital letters in job names

If you use capital letter in a job name this will silently cause the index to fail to create. Two parts are needed to fix this:

Auto lowercase names
Pull errors out of the bulk insert and present in a standard way

[BUG] Get_Marker Fails if last tweet older than 7 days

Get marker currently returns the last scraped tweet ID. The issue with this is when that tweet is older than seven days old passing it to twitter as the since_id causes the twitter automation to fail.

Therefor the get_marker function from twitter.py should only return an ID if the tweets created at is less than 7 days old.

[FEATURE] Add other integrations

At the moment community pulse only supports twitter but the goal is to eventually support all the places communities meet and discuss information.

Some ideas are:
Find New Articles:

Medium.com
Dev.to

Async Discussions:

Discorse
Github
Stack Overflow

Sync Discussions:

Discord
Slack

Social Media:

Duplicate tweet regression

There is currently a bug where duplicate tweets are being introduced.

[FEATURE] Separate out tweet processing from collection

At the moment get_data from twitter.py is tightly coupled making it difficult to troubleshoot. This should be separated into two different functions. One that retrieves the data from twitter and another that enriches it for storage in OpenSearch.

[Feature] Multi Job Support

Multi Job Support is as follows. Users should be able to name their jobs whatever they wish in the config file. These names should be carried over into the index names. There should be a default dashboard deployed by type which is then filterable by job name.

Sub tasks:

Add job_type field
Combine all new indexes of similar job type into alias
Name indexes by job name

dtaivpp / community-pulse Goto Github PK

community-pulse's Issues

[Feature] Add CLI for Running

[Feature] Create default dashboards by job type

[FEATURE] Threaded jobs

Duplicate entries being added

[BUG] Indexes fail to create silently with capital letters in job names

[BUG] Get_Marker Fails if last tweet older than 7 days

[FEATURE] Add other integrations

Duplicate tweet regression

[FEATURE] Separate out tweet processing from collection

[Feature] Multi Job Support

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent