aicoe-aiops / github-labeler Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 6.0 982 KB

License: Other

Python 6.55% Makefile 1.60% Jupyter Notebook 91.34% Shell 0.05% JavaScript 0.39% Dockerfile 0.06%

github-labeler's People

Contributors

Watchers

Forkers

antter michaelclifford harshad16 aakankshaduggal cdolfi yuunlee

github-labeler's Issues

Update OWNERS to add Aakanksha as an approver

Acceptance Criteria :
Updated OWNERS list to add @aakankshaduggal as an approver

use .env file to manage environment variables

Can you add a .env file to your local repository and provide an .env.example file in the repo that provides a template for the expected environment variables required by your project?

Here is an example of how to use the dotenv package to read env variables into your notebooks.

from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

This issue came up do to this line, where I assume there is some out of notebook variable setting:
https://github.com/aicoe-aiops/github-labeler/pull/3/files#diff-35e4337b926c8d4cfdaefebb14768d4b7fca4988ca74f7f50b833333205e8619R39

make k-fold loop outside training and use same dataset for both models

Is your feature request related to a problem? Please describe.
Currently to test between fasttext and SVM we create the subdatasets twice and perform k-fold cross validation on them. Since we take negative samples randomly, this brings up some variance in the process. If fasttext and SVM are using two different datasets that have variance in their predictability, we cannot honestly compare them.

Describe the solution you'd like
I would like the functions to be altered so both models use the same dataset. Instead of using 5-fold cross validation we should randomize the negative sampling 5 times, then train the models on 80% and average the 20% of the datasets where we do our validation.

[EPIC] Add the project to Operate First Website

Re-do the Readme file to match the format for all Data science projects on Opf
Add the project to Operate-first Data Science Page
(https://github.com/operate-first/operate-first.github.io/issues/319)

New major release

Hey, Kebechet!

Create a new major release, please.

Create pre-processing notebook

Is your feature request related to a problem? Please describe.
Github issue data is super noisy and if it was to be fed into most models it would need a fair amount of pre-processing.

Describe the solution you'd like
A notebook/script to preprocess the text with some options, and testing to see which one works the best

Work on midway presentation

Create methods for visualizing word vectors

Is your feature request related to a problem? Please describe.
fastText is a blackbox model and I would like to visualize how it works.

Describe the solution you'd like
A notebook where a visualization of word vectors is done

Make visualization look cooler

Is your feature request related to a problem? Please describe.
Currently in the ft_viz notebook, the notebooks trains an unsupervised model on the small openshift/origin dataset or something similar. This is really not enough to give the word vectors a very meaningful interpretation. Now, we have a very powerful word vector model that would be much more interesting to visualize.

Describe the solution you'd like
Change the ft_viz notebook to import the newest word vector model from ceph, and perform our visualizations on that model instead.

New minor release

Hey, Kebechet!

Create a new minor release, please.

(just want to see what happens)

make a demo

functionality to exclude bots from dataset

Is your feature request related to a problem? Please describe.
There are often bots that create issues or tag labels that we do not want to include in the dataset.

Describe the solution you'd like
A way to exclude certain users/bots from counting their issues or tags.

No pinned version for src

Package src does not use pinned version in Pipfile: {'editable': True, 'path': './'}

Make README more user-friendly

Is your feature request related to a problem? Please describe.
Currently, the README gives a high-level overview of what the project sets out to do. It gives no explanation of the code that exists within the project or how to use it.

Describe the solution you'd like
A thorough, enhanced README that explains ALL the code would be useful.

make pretrained W2V ceph compatible and able to pick up where it left off

Reduce fastText models size

Is your feature request related to a problem? Please describe.
fastText has proven itself to be quite useful in predicting github labels, but the models are way too big, especially for the pre-trained version. It is not convenient to save multiple models and load them in the app. Some care has already been taken to reduce model size, such as reducing vocabulary and vector size in the src/data/build_w2v_vocab notebook. More can still be done to significantly reduce the size.

Describe the solution you'd like

Two main ideas are to quantize the model, which fastText allows easily in their python package. Another option is to throw out even more vocabulary (e.g. if a word doesn't appear once in our target issues), and reduce vector size yet again. We could have a finetuned vector model that is useful for anyone, and smaller, specific ones per user.

Pre commit - Pod pending issue

Increase the prow resource limits to fix this issue.

Update Readme with project description

Update the README.md file to provide a brief summary of this project and remove the template repo structure info.

Missing document `existing_methods_research.md`

The document - reports/existing_methods_research.md seems empty or missing.

Acceptance Criteria :

Find the missing content
Add existing methods research to the document.

Demo the opf/support bot and get it connected!

Is your feature request related to a problem? Please describe.
We need to actually implement this somewhere. A good place would be operate-first/support

Describe the solution you'd like
We need a github app that can label github issues on the opf/support repo. The app has been created and can be found here: https://github.com/apps/issue-labeler-opfirst

Note it is not deployed yet, only has been tested locally.

This issue is currently frozen by this issue: operate-first/continuous-delivery#18

The image needs to be updated in order for the app to actually work.

create app

Describe the solution you'd like
There must be an app that can take a title and body as input and return a list of label names as outputs

Make a blog post

Is your feature request related to a problem? Please describe.
Right now, there is a lot of work done that is not very well explained.

Describe the solution you'd like
A blog post would be a good way to thoroughly explain the work done and the motivations behind it.

Application cannot be managed by Kebechet due to it containing an unsupported package location.

Kebechet cannot support maintaining this application as it contain's local
version of packages.

The package causing the issue is - src
Linked SHA - 9e6e308

For more information, see Pipfile and Pipfile.lock.

Environment details


Kebechet version: 1.5.4
Python version: 3.8.6
Platform: Linux-4.18.0-305.19.1.el8_4.x86_64-x86_64-with-glibc2.2.5
pipenv version: pipenv, version 2020.11.15

/kind bug
/priority critical-urgent

Create overlay for bot

Is your feature request related to a problem? Please describe.
The bot needs to be deployed to openshit

Describe the solution you'd like
The best way to do this is probably to create an overlay for the bot so it's own image can be made and ArgoCD will create its pod.

New minor release

Hey, Kebechet!

Create a new minor release, please.

Replace use_ceph with environment variables

Is your feature request related to a problem? Please describe.

Currently in every notebook and script that uses storage there is a line of code that says use_ceph = True. This is an inefficient way to do this.

Describe the solution you'd like

There should be a USE_CEPH environment variables or something related.

preprocess data that enters app

The data entering the app has an old preprocessing technique. It is necessary to import the proper preprocessing technique from a different notebook/script

Label bot dataset [issue originally from mi repository]

We've had a issue in mi repository (thoth-station/mi#278 (comment)) for a longer period of time, related to labeling github issues (one of our back-then interns made some Transformer-based model, it is referenced in the issue).

Do you think the interest is the same and the issue can be moved (or closed and referenced) here?

Kebechet update manager: KeyError - 420da9ba23

Description

This is an automated issue generated by Kebechet. The update manager threw an exception (KeyError) at
runtime. If you think this exception is a bug please open an issue upstream at https://github.com/thoth-station/kebechet
otherwise use the traceback below to help you fix whatever issues were encountered with your repository.

Traceback

Traceback (most recent call last):
File "/home/user/kebechet/kebechet_runners.py", line 193, in run
instance.run(**manager_configuration)
File "/home/user/kebechet/managers/update/update.py", line 919, in run
result = self._do_update(
File "/home/user/kebechet/managers/update/update.py", line 762, in _do_update
old_environment = self._get_all_packages_versions()
File "/home/user/kebechet/managers/update/update.py", line 210, in _get_all_packages_versions
"version": package_info["version"][len("==") :],
KeyError: 'version'

adjust app so it can handle empty requests

Is your feature request related to a problem? Please describe.
Sometimes issues are empty, or they become empty after preprocessing. In this case an error occurs.

Describe the solution you'd like
We want to return nothing, not create an error.

Add Markdown Descriptions to Each Notebook

As a Data Scientist and an end-user of the notebooks, it is easier to understand the objective of the notebook if we have a brief introduction header and a concluding footer.

Acceptance Criteria :

Polish the existing notebooks with -

A heading
An Introduction
A conclusion for cells that have visualizations or interesting insights
A final conclusion with future work overview.

Explore usability on operate-first

Is your feature request related to a problem? Please describe.
Now that the project is practically done and is usable, we need to see which repo it can help out in. Most repos in operate-first have little to no labelled issue data, so we will have to explore to see what works

Describe the solution you'd like
THe pipeline should be run on a handful of different repos to see which gives the best-looking results for an issue-labeler.

Application cannot be managed by Kebechet due to it containing an unsupported package location in rhel:8 environment.

Kebechet cannot support maintaining this application as it contain's local
version of packages.

The package causing the issue is - src
Linked SHA - 554cb47

For more information, see Pipfile and Pipfile.lock.

Environment details


Kebechet version: 1.6.6
Python version: 3.8.8
Platform: Linux-4.18.0-305.10.2.el8_4.x86_64-x86_64-with-glibc2.2.5
pipenv version: pipenv, version 2020.11.15

/kind bug
/priority critical-urgent

Update pipeline

Is your feature request related to a problem? Please describe.
The Elyra pipeline should be updated to include the preprocessing step.

Describe the solution you'd like
In the .pipeline file, the notebooks/preprocess.ipynb notebook should be run after the data extraction and before the model training.

add JupyterBook to this repo

use https://drive.google.com/file/d/1XsHRhRhQc3n1PnHNRTXL_5Qz5qRvf4eA/view as a basis.

convert presentation slides and notebooks into JupyterBook format
convert voiceover of video to text for book with HTTP://rev.ai
deploy JupyterBook via shower.meteor.zone

Polish the existing doc - `reports/existing_research.md`

This document - reports/existing_research.md seems to have some formatting issues.

Acceptance Criteria -

Clean up this document

Add content to `existing_research.md`

Add content to existing_research.md it is currently an empty file.

New major release

Hey, Kebechet!

Create a new major release, please.

see how pretrained fastText model can improve performance

Is your feature request related to a problem? Please describe.
The current fastText model has to learn language from scratch, which is difficult/impossible with little training data

Describe the solution you'd like
I would like there to be a method to use a pretrained fastText model, downloading either the generic Wikipedia model or creating an unsupervised pretrained model trained on github issues.

Additional context
Pretrained English model available at https://fasttext.cc/docs/en/english-vectors.html

New minor release

Hey, Kebechet!

Create a new minor release, please.

introduce balanced negative sampling amongst issues

Is your feature request related to a problem? Please describe.
When negative samples are taken at random, the most popular labels overwhelm the dataset. If these labels are easy to predict, such as "bot", the binary classification problem becomes too trivial.
Describe the solution you'd like
In the model notebook, a function to evenly take negative samples from the other labels.

Kebechet pipfile-requirements manager: ValueError - caa120a9ae

Description

This is an automated issue generated by Kebechet. The pipfile-requirements manager threw an exception (ValueError) at
runtime. If you think this exception is a bug please open an issue upstream at https://github.com/thoth-station/kebechet
otherwise use the traceback below to help you fix whatever issues were encountered with your repository.

Traceback

Traceback (most recent call last):
File "/home/user/kebechet/kebechet_runners.py", line 193, in run
instance.run(**manager_configuration)
File "/home/user/kebechet/managers/pipfile_requirements/pipfile_requirements.py", line 94, in run
else sorted(self.get_pipfile_requirements(file_contents))
File "/home/user/kebechet/managers/pipfile_requirements/pipfile_requirements.py", line 46, in get_pipfile_requirements
raise ValueError(
ValueError: Package src does not use pinned version: {'editable': True, 'path': './'}

implement pretrained W2V into fasttext models

Kebechet version manager: FileNotFoundError - b50e189391

Description

This is an automated issue generated by Kebechet. The version manager threw an exception (FileNotFoundError) at
runtime. If you think this exception is a bug please open an issue upstream at https://github.com/thoth-station/kebechet
otherwise use the traceback below to help you fix whatever issues were encountered with your repository.

Traceback

Traceback (most recent call last):
File "/home/user/kebechet/kebechet_runners.py", line 193, in run
instance.run(**manager_configuration)
File "/home/user/kebechet/managers/version/version.py", line 460, in run
changelog = self._compute_changelog(
File "/home/user/kebechet/managers/version/version.py", line 306, in _compute_changelog
with open("CHANGELOG.md", "r+") as changelog_file:
FileNotFoundError: [Errno 2] No such file or directory: 'CHANGELOG.md'

aicoe-aiops / github-labeler Goto Github PK

github-labeler's People

Contributors

Watchers

Forkers

github-labeler's Issues

Description

Traceback

Description

Traceback

Description

Traceback

Recommend Projects

Recommend Topics

Recommend Org