chaoss / grimoirelab-manuscripts Goto Github PK

Bitergia reports engine

License: GNU General Public License v3.0

Python 4.73% TeX 0.65% Makefile 0.01% Jupyter Notebook 70.82% HTML 23.81%

grimoirelab-manuscripts's Introduction

GrimoireLab Manuscripts

The aim of this project is the automatic generation of reports from the enriched indexes with items from perceval data sources (git commits, github pull requests, bugzilla bugs ...) enriched using GrimoireELK.

To follow the basic step you need the enriched indexes in the Elastic Search provided as param to the report tool.

The basic steps creating a report for git, gerrit, its and mls data sources from April 2015 to April 2017 by quarters is:

bin/manuscripts -g --data-sources git gerrit its mls -u <elastic_url> -s 2015-04-01 -e 2017-04-01 -d project_data -i quarter

and the PDF is generated in project_data/report.pdf_

Usage

Use -h flag to show usage as follows:

$ > bin/manuscripts -h
-d DATA_DIR, --data-dir DATA_DIR
                        Directory to store the data results

Params:

-d, --data-dir: directory to store data files that will be used to create the report PDF file (csv and eps files containing metrics results).

grimoirelab-manuscripts's People

Contributors

Stargazers

Watchers

grimoirelab-manuscripts's Issues

get_trend() does not calculate the trend properly for the Authors class

I'm not sure I understood the get_trend() method. I tried to get the trend in number of authors of the last 12 months but it just get the value of the author on starting date (on that day) and the number of authors on the end date (on that day) and do the math.

from datetime import date, timedelta, timezone
from dateutil import parser
from manuscripts.metrics import git

url = "https://mydashboard.com/data"
start_date = "2017-04-02"
end_date = "2018-04-02"

sd = parser.parse(start_date).replace(tzinfo=timezone.utc)
ed = parser.parse(end_date).replace(tzinfo=timezone.utc)
g = git.Authors(url, "git", start=sd, end=ed, esfilters={}, interval=None, offset=None)

print(g.get_trend())

Is the code above right? Did i do something wrong?

How to reproduce:

get code with version 53604fe
install the requirements.txt in a fresh new virtual environment
execute the script above with a git index with some data, at least from 2017-04-01

The data and figs for the report creation are copied recursively

In each generation of a report the data and figs folders used to create the PDF report file are copied recursively. For example:

(acs@dellx) (quarters-titles % u=) ~/devel/grimoirelab-manuscripts-fork-acs $ ls -l bitergia-servers/opnfv/report_data/figs/figs/data/figs/data
total 804
drwxrwxr-x 2 acs acs   4096 mar  1 14:40 activity
....

No documentation available about the expected type of the arguments

The methods of the metrics library should be documented with the type of parameters it expects. This is what we have in the code:

    def __init__(self, es_url, es_index, start=None, end=None, esfilters={},
                 interval=None, offset=None):
        """es connection and filter to be used"""
        self.es_url = es_url
        ..

If we go to the help offered by the module, we just get this:

     |  ----------------------------------------------------------------------
     |  Methods inherited from manuscripts.metrics.metrics.Metrics:
     |  
     |  __init__(self, es_url, es_index, start=None, end=None, esfilters={}, interval=None, offset=None)
     |      es connection and filter to be used
     |  
     |  get_agg(self)
     |      Returns an aggregated value
     |  
     |  get_definition(self)
     |  
     |  get_list(self)
     |  
     |  get_metrics_data(self, query)
     |      Get the metrics data from ES

Error when sortinghat is not installed

I just install manuscripts with:

pip install manuscripts

and then try to run it, but I get an error:

$ manuscripts --help
Traceback (most recent call last):
  File "/tmp/gl/bin/manuscripts", line 38, in <module>
    from manuscripts.config import Config
  File "/tmp/gl/lib/python3.6/site-packages/manuscripts/config.py", line 27, in <module>
    from grimoire_elk.utils import get_connectors
  File "/tmp/gl/lib/python3.6/site-packages/grimoire_elk/utils.py", line 75, in <module>
    from .elk.git import GitEnrich
  File "/tmp/gl/lib/python3.6/site-packages/grimoire_elk/elk/git.py", line 36, in <module>
    from .study_ceres_aoc import areas_of_code, ESPandasConnector
  File "/tmp/gl/lib/python3.6/site-packages/grimoire_elk/elk/study_ceres_aoc.py", line 28, in <module>
    from cereslib.events.events import Git, Events
  File "/tmp/gl/lib/python3.6/site-packages/cereslib/events/events.py", line 27, in <module>
    from grimoire_elk.elk.sortinghat_gelk import SortingHat
  File "/tmp/gl/lib/python3.6/site-packages/grimoire_elk/elk/sortinghat_gelk.py", line 30, in <module>
    from sortinghat import api
ModuleNotFoundError: No module named 'sortinghat'

It seems sortinghat is not in the dependencies in setup.py.

URL and Latex error

When I tried the command manuscripts -d /tmp/reports -u http://localhost:9200 -n GrimoireLab --data-sources git
I encountered an error:

Traceback (most recent call last):
File "/home/prabhat/venvs/grimoirelab/bin/manuscripts", line 130, in
report.create()
File "/home/prabhat/venvs/grimoirelab/lib/python3.6/site-packages/manuscripts/report.py", line 711, in create
self.create_data_figs()
File "/home/prabhat/venvs/grimoirelab/lib/python3.6/site-packages/manuscripts/report.py", line 603, in create_data_figs
self.sections()section
File "/home/prabhat/venvs/grimoirelab/lib/python3.6/site-packages/manuscripts/report.py", line 267, in sec_overview
(last, percentage) = m.get_trend()
File "/home/prabhat/venvs/grimoirelab/lib/python3.6/site-packages/manuscripts/metrics/metrics.py", line 201, in get_trend
ts = self.get_ts()
File "/home/prabhat/venvs/grimoirelab/lib/python3.6/site-packages/manuscripts/metrics/metrics.py", line 150, in get_ts
res = self.get_metrics_data(query)
File "/home/prabhat/venvs/grimoirelab/lib/python3.6/site-packages/manuscripts/metrics/metrics.py", line 138, in get_metrics_data
r.raise_for_status()
File "/home/prabhat/venvs/grimoirelab/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 406 Client Error: Not Acceptable for url: http://localhost:9200/git_enrich/_search

This error arises because manuscript is trying to attempt a post request to wrong url (http://localhost:9200/git_enrich/_search) . The enriched indexes will be successfully accessed by sending a post to http://localhost:9200/git/_search.

Temporarily for this particular problem I tried a workaround after cloning the manuscript code. All the things then worked successfully except the final pdf was not generated.
It required pdfLatex which I installed but then it asks for Enter file name: and when a name of file was entered in console then it shows errer: ! LaTeX Error: File csvsimple.sty' not foundIn this case all the components like .tex , .png etc. files are present in/tmp/reports` folder but final pdf couldn't be generated.

Add the analysis of time waiting for the submitter and time waiting for the reviewer

The Gerrit code review workflow is a process between one or more reviewers, one or more submitters, and the CI systems that provide automated responses and votes for the piece of code.

Having the total time waiting for a submitter and the total time waiting for a reviewer is useful when trying to understand where to help the community. There are cases where the community may need more reviewers while there are others that the community may need some extra mentor process with newcomers.

It would be good to have this filtered at the level of project and for the whole process for a given code review process.

This ticket could be potentially extended to other code review tools.

CC @rpaik

Change parent classes for classes in github_issues.py and jira.py

Currently some classes in github_issues and jira inherit from the classes in its.py class. We should change the design and make it more consistent.

[Manuscripts2/report] Create a PDF reports using the csv/png files generated.

This issue is about discussing how the PDF should be generated from the files generated containing the metric information.

I have a MacOS system and I often face problems in generating the report using LaTex. I am looking into the problems and I'll create the first draft using the existing infrastructure.

Apart from that, I am also looking at:

ReportLab
- here is an example on how to create a report using reportlab
- here is another one

@jgbarah @valeriocos thoughts on this?

List available data sources

Right now, there is no way of knowing which data sources are available for generating a report. It would be great having something like:

$ manuscripts --list-data-sources
Available data sources: git gerrit github

At least on a first try, those could be just the data sources supported by Manuscripts, in general.

Creating chainable functions and New Classes to calculate the Metrics

This issue proposes creating functions to segregate the metrics according to different fields, such as: by_author, by_organizations and by_period (weeks, months, years) which can be applied directly to metrics objects and the corresponding aggregations can be obtained.

These functions will make it easier for the user to do analysis on the basis of different users or orgs and to see the metrics evolve over periodic intervals.

They've been currently implemented in this file as a part of the Metric class.

We will also be creating new classes to calculate the metrics.

How to add the Metrics into Manuscripts

As per the discussion with Jesus, this ticket is about how to add code for Metrics into Manuscripts.

Branches! It was so obvious, that I didn't think of it at all.
We can start with writing dirty code in Jupyter-notebooks and structuring the code there and then once it gets finalised, we can add clean code into the Master branch of Manuscripts so that the commit history has only clean code and definite updates on the project.
When we want to add a functionality into Manuscripts, a new issue can be created and that can be linked back to this main issue.

Other ideas are welcome! :D

@jgbarah @valeriocos @acs

Error if no project name is specified

When launching manuscripts with no project name (eg, no -n option), there is an error. For example:

$ manuscripts -d /tmp/report -u http://localhost:9200 --data-sources git
2018-03-14 09:31:11,530 Generating the report from 2015-01-01 00:00:00+00:00 to 2018-03-13 23:59:59.999999+00:00
2018-03-14 09:31:11,530 Generating the report data and figs from 2015-01-01 00:00:00+00:00 to 2018-03-13 23:59:59.999999+00:00
2018-03-14 09:31:11,530 Generating Overview
2018-03-14 09:31:11,889 Generating Communication Channels
2018-03-14 09:31:11,890 Generating Detailed Activity by Project
2018-03-14 09:31:11,890 Activity data for: general
2018-03-14 09:31:12,321 Community data for: general
2018-03-14 09:31:12,577 Process data for: general
2018-03-14 09:31:12,577 Data and figs done
2018-03-14 09:31:12,577 Generating PDF report
Traceback (most recent call last):
  File "/tmp/gl/bin/manuscripts", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/home/jgb/src/jgbarah-grimoire/grimoirelab-manuscripts/bin/manuscripts", line 133, in <module>
    report.create()
  File "./manuscripts/report.py", line 748, in create
    self.create_pdf()
  File "./manuscripts/report.py", line 673, in create_pdf
    project_replace = self.report_name.replace(' ', r'\ ')
AttributeError: 'NoneType' object has no attribute 'replace'

It would be much better if a default name is used, so that it doesn't fail if no -n argument is used. Given how the project name is now used in the report, I think a good default name would be "Unnamed". In the messages written by Manuscripts it should be obvious that the - n argument was not used, and that the default "Unnamed" project name was used because of that, and how to use -n to specify a name.

Cleaning Up esquery.py file

esquery.py file still does-not completely use the elasticsearch_dsl module and has specific hard coded queries in some of the functions. It also uses outdated methods to match terms and search the indices.

This ticket is about:

cleaning up the file and making each function use the Classes(Search, Agg, etc) from the elasticsearch_dsl module

Update metrics.py, esquery.py and test_esquery.py to use only elasticsearch_dsl module

After #57, this issue adds the use of elasticsearch_dsl to other files such as metrics.py.

esquery.py in the function get_aggs, returns a json dict which is then used in metrics.py to get the results via requests.get method.
This issue is about removing completely the direct querying to elasticsearch and using elasticsearch_dsl.py instead.

EDIT: The second PR is ready, Please review #58 so that the second PR can be added on top of it.

Add latest versions of dependencies to remove error

Hey @valeriocos ,
The problem still persists when I use the command
p2o.py --enrich --index perceval_git_raw --index-enrich perceval_git -e http://localhost:9200 --no_inc --debug --db-host 127.0.0.1 --db-sortinghat sortinghatDB --db-user root git https://github.com/chaoss/grimoirelab-perceval
to generate index for git date source. I was doing this for generating the report.
It is using grimoire-elk==0.36.0.

Originally posted by @harshalmittal4 in chaoss/grimoirelab-elk#574 (comment)

Re-designing the functions to calculate Metrics.

Right now, the functions and classes (in metrics.py and esquery.py) that we use to calculate the metrics are complex and repetitive. They get us the results but specialising them for specific classes is too complex and code reuse is difficult to achieve.
We are going to simplify them and add more functional and easy to use chain-able functions which let us apply filters easily and give the output in a structured format.

We will try to use only elasticsearch_dsl objects directly and look at different possible methods in which these helper functions and classes can be created.

For example:
If we are counting Closed Issues in a repo, then we be able to do it by just specifying the filters that item_type="issue" or pull_request="false" and then apply the sum aggregation in the Search object created.
But right now, we have to create an object from the GithubIssuesMetrics class which inherits from the Metrics class and then use that object's functions to get the aggregation. This is rather cumbersome.

This Notebook will be the experimentation lab for this issue.

Parsing only needed params from sirmordred config file

Manuscripts parses mordred config files including params it won't use. It should parse only those params needed to run manuscripts, and not everything else. Right now, if a new param appears in Mordred, manuscripts will break as it happened in #96.

To improve and ease code maintainability, additions or changes in mordred configs should not break manuscripts if they does not affect to the params used here.

Let users specify a logo for the reports

Now, for historical reasons, the logo shown in the reports generated by manuscripts is the Bitergia logo. By default, it should be the GrimoireLab logo. And we should let users define they want any other logo via an optional parameter in the command line. This could be one or two pull requests (one for changing the default, another one for the new option), and the task seems easy. If anyone gets interested, I will support as much as possible.

Documentation on available sources and metrics

Currently there is no explicit information on what sources and metrics are supported.

It can be done by directly updating README.md file. Probably it might be better to have this documented in source code and generate documentation from there.

I have no experience with this kind of documentation in Python, so I would like to have your opinion @acs and @jgbarah beforehand.

[Metrics] How should the GMD metrics be added and a report created?

Currently, using manuscripts2, we can generate a report for the CHAOSS metrics.

We need to also look into generating/adding to the same report the GMD/D&I/ Risk/Value metrics. This issue for discussing how these metrics should be added to the report.

Possible options:

We follow the structure that is being followed currently. When we want to create the report, we pass on the a flag such as --gmd, --risk and so on, for those metrics to be calculated and the report being generated. The default will be what is generated right now (CHAOSS, metrics)

Conflicts between manuscripts and redis

Hi, users are unable to run manuscripts due to dependency conflict with redis package. As shown in the following full dependency graph of manuscripts, kingarthur requires redis ==3.0.0，while grimoire-elk requires redis <=2.10.6,>=2.10.0.

According to pip’s “first found wins” installation strategy, redis 2.10.6 is the actually installed version. However, redis 2.10.6 does not satisfy ==3.0.0.

Dependency tree-----------

manuscripts - 0.2.20
| +- elasticsearch-dsl(install version:7.1.0 version range:*)
| | +- elasticsearch(install version:7.1.0 version range:>=7.0.0,<8.0.0)
| | | +- urllib3(install version:1.24.3 version range:>=1.21.1)
| | +- ipaddress(install version:1.0.23 version range:*)
| | +- python-dateutil(install version:2.8.1 version range:*)
| | +- six(install version:1.13.0 version range:*)
| +- grimoire-elk(install version:0.63.0 version range:>=0.30.4)
| | +- cereslib(install version:0.1.8 version range:>=0.1.0)
| | | +- grimoire-elk(install version:0.63.0 version range:>=0.30.23)
| | | | +- cereslib(install version:0.1.8 version range:>=0.1.0)
| | | | +- elasticsearch(install version:6.3.1 version range:==6.3.1)
| | | | +- elasticsearch-dsl(install version:6.3.1 version range:==6.3.1)
| | | | +- graal(install version:0.2.3 version range:>=0.2.2)
| | | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- kingarthur(install version:0.1.18 version range:>=0.1.1)
| | | | +- pandas(install version:0.22.0 version range:==0.22.0)
| | | | +- perceval(install version:0.12.24 version range:>=0.9.6)
| | | | +- perceval-finos(install version:0.1.6 version range:>=0.1.0)
| | | | +- perceval-mozilla(install version:0.2.9 version range:>=0.1.4)
| | | | +- perceval-opnfv(install version:0.1.16 version range:>=0.1.2)
| | | | +- perceval-puppet(install version:0.1.15 version range:>=0.1.4)
| | | | +- pymysql(install version:0.9.3 version range:>=0.7.0)
| | | | +- redis(install version:2.10.6 version range:<=2.10.6,>=2.10.0)
| | | | +- requests(install version:2.21.0 version range:==2.21.0)
| | | | +- sortinghat(install version:0.7.7 version range:>=0.6.2)
| | | | +- urllib3(install version:1.24.3 version range:==1.24.3)
| | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.8)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- pandas(install version:0.22.0 version range:>=0.19.2)
| | | +- scipy(install version:1.4.0rc2 version range:*)
| | +- elasticsearch(install version:6.3.1 version range:==6.3.1)
| | | +- urllib3(install version:1.24.3 version range:>=1.21.1)
| | +- elasticsearch-dsl(install version:6.3.1 version range:==6.3.1)
| | | +- elasticsearch(install version:6.4.0 version range:>=6.0.0,<7.0.0)
| | | | +- urllib3(install version:1.24.3 version range:>=1.21.1)
| | | +- ipaddress(install version:1.0.23 version range:*)
| | | +- python-dateutil(install version:2.8.1 version range:*)
| | | +- six(install version:1.13.0 version range:*)
| | +- graal(install version:0.2.3 version range:>=0.2.2)
| | | +- bandit(install version:1.6.2 version range:>=1.4.0)
| | | | +- colorama(install version:0.4.3 version range:>=0.3.9)
| | | | +- gitpython(install version:3.0.5 version range:>=1.0.1)
| | | | +- pyyaml(install version:5.2 version range:>=3.13)
| | | | +- six(install version:1.13.0 version range:>=1.10.0)
| | | | +- stevedore(install version:1.31.0 version range:>=1.20.0)
| | | +- flake8(install version:3.7.9 version range:>=3.7.7)
| | | +- lizard(install version:1.16.6 version range:>=1.16.3)
| | | +- networkx(install version:2.4 version range:>=2.1)
| | | | +- decorator(install version:4.4.1 version range:>=4.3.0)
| | | +- perceval(install version:0.12.24 version range:>=0.12.0)
| | | | +- beautifulsoup4(install version:4.8.1 version range:>=4.3.2)
| | | | +- dulwich(install version:0.18.6 version range:<0.19,>=0.18.5)
| | | | +- feedparser(install version:6.0.0b1 version range:>=5.1.3)
| | | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | | +- pydot(install version:1.4.1 version range:>=1.2.4)
| | | | +- pyparsing(install version:2.4.5 version range:>=2.1.4)
| | | +- pylint(install version:2.4.4 version range:>=1.8.4)
| | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | +- kingarthur(install version:0.1.18 version range:>=0.1.1)
| | | +- cheroot(install version:8.2.1 version range:>=8.2.1)
| | | +- cherrypy(install version:18.5.0 version range:>=17.4.2)
| | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.10)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- perceval(install version:0.12.24 version range:>=0.12.23)
| | | | +- beautifulsoup4(install version:4.8.1 version range:>=4.3.2)
| | | | +- dulwich(install version:0.18.6 version range:<0.19,>=0.18.5)
| | | | +- feedparser(install version:6.0.0b1 version range:>=5.1.3)
| | | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- redis(install version:3.0.0 version range:==3.0.0)
| | | +- rq(install version:1.0.0 version range:==1.0.0)
| | +- pandas(install version:0.22.0 version range:==0.22.0)
| | +- perceval(install version:0.12.24 version range:>=0.9.6)
| | | +- beautifulsoup4(install version:4.8.1 version range:>=4.3.2)
| | | | +- soupsieve(install version:1.9.5 version range:>=1.2)
| | | +- dulwich(install version:0.18.6 version range:<0.19,>=0.18.5)
| | | +- feedparser(install version:6.0.0b1 version range:>=5.1.3)
| | | | +- sgmllib3k(install version:1.0.0 version range:*)
| | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | +- perceval-finos(install version:0.1.6 version range:>=0.1.0)
| | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.9)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- perceval(install version:0.12.24 version range:>=0.12.12)
| | | | +- beautifulsoup4(install version:4.8.1 version range:>=4.3.2)
| | | | +- dulwich(install version:0.18.6 version range:<0.19,>=0.18.5)
| | | | +- feedparser(install version:6.0.0b1 version range:>=5.1.3)
| | | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | +- perceval-mozilla(install version:0.2.9 version range:>=0.1.4)
| | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.0)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- perceval(install version:0.12.24 version range:>=0.12.12)
| | | | +- beautifulsoup4(install version:4.8.1 version range:>=4.3.2)
| | | | +- dulwich(install version:0.18.6 version range:<0.19,>=0.18.5)
| | | | +- feedparser(install version:6.0.0b1 version range:>=5.1.3)
| | | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | +- perceval-opnfv(install version:0.1.16 version range:>=0.1.2)
| | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.9)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- perceval(install version:0.12.24 version range:>=0.12.12)
| | | | +- beautifulsoup4(install version:4.8.1 version range:>=4.3.2)
| | | | +- dulwich(install version:0.18.6 version range:<0.19,>=0.18.5)
| | | | +- feedparser(install version:6.0.0b1 version range:>=5.1.3)
| | | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | +- perceval-puppet(install version:0.1.15 version range:>=0.1.4)
| | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.9)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.8.0)
| | | +- perceval(install version:0.12.24 version range:>=0.12.12)
| | | | +- beautifulsoup4(install version:4.8.1 version range:>=4.3.2)
| | | | +- dulwich(install version:0.18.6 version range:<0.19,>=0.18.5)
| | | | +- feedparser(install version:6.0.0b1 version range:>=5.1.3)
| | | | +- grimoirelab-toolkit(install version:0.1.10 version range:>=0.1.4)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | | +- requests(install version:2.21.0 version range:>=2.7.0)
| | | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | +- pymysql(install version:0.9.3 version range:>=0.7.0)
| | +- redis(install version:2.10.6 version range:<=2.10.6,>=2.10.0)
| | +- requests(install version:2.21.0 version range:==2.21.0)
| | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | +- sortinghat(install version:0.7.7 version range:>=0.6.2)
| | | +- jinja2(install version:2.10.3 version range:*)
| | | | +- markupsafe(install version:1.1.1 version range:>=0.23)
| | | +- pandas(install version:0.22.0 version range:==0.22.0)
| | | +- pymysql(install version:0.9.3 version range:>=0.7.0)
| | | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | | +- pyyaml(install version:5.2 version range:>=3.12)
| | | +- requests(install version:2.21.0 version range:>=2.9)
| | | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | | +- sqlalchemy(install version:1.3.11 version range:>=1.2)
| | | +- urllib3(install version:1.24.3 version range:>=1.22)
| | +- urllib3(install version:1.24.3 version range:==1.24.3)
| +- matplotlib(install version:3.2.0rc1 version range:*)
| | +- cycler(install version:0.10.0 version range:>=0.10)
| | | +- six(install version:1.13.0 version range:*)
| | +- kiwisolver(install version:1.1.0 version range:>=1.0.1)
| | | +- setuptools(install version:42.0.2 version range:*)
| | +- numpy(install version:1.18.0rc1 version range:>=1.11)
| | +- pyparsing(install version:2.4.5 version range:>=2.0.1)
| | +- python-dateutil(install version:2.8.1 version range:>=2.1)
| +- prettyplotlib(install version:0.1.7 version range:*)
| | +- brewer2mpl(install version:1.4.1 version range:>=1.3.1)
| | +- matplotlib(install version:3.2.0rc1 version range:>=1.2.1)
| | | +- cycler(install version:0.10.0 version range:>=0.10)
| | | | +- six(install version:1.13.0 version range:*)
| | | +- kiwisolver(install version:1.1.0 version range:>=1.0.1)
| | | | +- setuptools(install version:42.0.2 version range:*)
| | | +- numpy(install version:1.18.0rc1 version range:>=1.11)
| | | +- pyparsing(install version:2.4.5 version range:>=2.0.1)
| | | +- python-dateutil(install version:2.8.1 version range:>=2.1)
| +- sortinghat(install version:0.7.7 version range:>=0.4.2)
| | +- jinja2(install version:2.10.3 version range:*)
| | | +- markupsafe(install version:1.1.1 version range:>=0.23)
| | +- pandas(install version:0.22.0 version range:==0.22.0)
| | +- pymysql(install version:0.9.3 version range:>=0.7.0)
| | +- python-dateutil(install version:2.8.1 version range:>=2.6.0)
| | +- pyyaml(install version:5.2 version range:>=3.12)
| | +- requests(install version:2.21.0 version range:>=2.9)
| | | +- certifi(install version:2019.11.28 version range:>=2017.4.17)
| | | +- chardet(install version:3.0.4 version range:<3.1.0,>=3.0.2)
| | | +- idna(install version:2.8 version range:>=2.5,<2.9)
| | | +- urllib3(install version:1.24.3 version range:>=1.21.1,<1.25)
| | +- sqlalchemy(install version:1.3.11 version range:>=1.2)
| | +- urllib3(install version:1.24.3 version range:>=1.22)

Thanks for your help.
Best,
Neolith

Change start_date from '2015-01-01' to the date of the first commit

Currently, the default start date is set to: 2015-01-01. Link
Instead of using that hard coded value, we can change the start date of analysis to the date the first commit was made.

How to run reports?

In the README.md I read, as an example of the command for running reports (slightly edited):

bin/report -g --data-sources git gerrit its mls -u <elastic_url> -s 2015-04-01 -e 2017-04-01 \
  -d project_data -i quarter

I guess I can figure out all the options except for -d project_data. Do you happen to have an example of that project_data file, or maybe a description of the format?

Produce reports in Markdown format

Currently, Manuscripts only supports generating reports in PDF formats. What we want is to give the user the option to generate the same reports in Markdown as well.

Generating reports in Markdown will give us two functionalities:

The markdown report then can be hosted on GitHub directly so it can, maybe, be added to the README.md of a project by the owners of that project.
Markdown will allow us to generate the same reports in HTML easily.

Change the error which is thrown when no arguments are provided to 'manuscripts' command

Currently, when we type the command manuscripts on the command line, without providing any arguments, it throws

2018-04-29 18:43:54,451 Missing needed params for Report None, 2015-01-01 00:00:00+00:00, 2018-04-28 23:59:59.999999+00:00, None

Which actually does not provide any information about what exactly is missing.
Either we can specify that es_url, start, end and data_sources are missing such as:

2018-04-29 18:43:54,451 Missing needed params for Report: elasticsearch-url, start date, end date, data-sources

or we can throw an error such as:

No arguments provided!
Usage: manuscripts <options> <arguments>
Try 'manuscripts --help' for more information.

Also, there does not exists a help command so maybe we can add that too.

Porting manuscripts to use new functions and classes added in #67

I've tried to create more chain-able functions/classes to be able to calculate the metrics. They are added as manuscripts2 through #67.

This issue is about calculating the current metrics, that manuscripts produces, using these new functions/classes.
My approach will be to use the flow that is currently being used.

Create classes for each of the data source
Create sections of the report
Calculate the metrics in each section using the above classes
Generate the report.

In addition, this new report will mostly use Markdown instead of Latex.

Let users decide the name of an index for each data source

Now, the name of the index for each supported data source is wired in the code. It would be better if that wiring acted just as a default name, that the user can override via command line options. Pull requests with that aim are welcome.

Error while generating git index

Related to #125

406 Client Error: Not Acceptable for url: http://127.0.0.1:9200/git_enrich/_search

I am trying to run the sample report generation example given in the documentation. I followed the following sequence of commands -
$ p2o.py --enrich --index git_raw --index-enrich git -e http://127.0.0.1:9200 --no_inc --debug git https://github.com/grimoirelab/perceval.git
$ p2o.py --enrich --index git_raw --index-enrich git -e http://127.0.0.1:9200 --no_inc --debug git https://github.com/grimoirelab/GrimoireELK.git
$ kidash --elastic_url-enrich http://127.0.0.1:9200 --import /tmp/git-dashboard.json
The dashboard is successfully created. Next, I tried to generate the manuscripts using the following -
$ manuscripts -g -d /tmp/reports -u http://127.0.0.1:9200 -n GrimoireLab --data-sources git
which throws the Client Error. I figured out that the url is not created and that is why manuscript command throws this error.

http://127.0.0.1:9200 is fine
http://127.0.0.1:9200/git_raw?pretty=true is also fine
http://127.0.0.1:9200/git_enrich/_search but this throws Error 404

This means that the enrich_backend function is not creating this url. But, I further investigated and found that the enrich_backend(args) function in arthur.py (through p2o.py) is working fine as it logs the following in terminal - "Enrich backend completed"

manuscripts showing error

when i executed

manuscripts -d /tmp/reports -u http://localhost:9200
-n GrimoireLab --data-sources git

Following error was shown

[2018-02-28 05:19:22,352] Debug mode activated
[2018-02-28 05:19:22,353] Generating the report from 2015-01-01 00:00:00+00:00 to 2018-02-27 23:59:59.999999+00:00
[2018-02-28 05:19:22,353] Generating the report data and figs from 2015-01-01 00:00:00+00:00 to 2018-02-27 23:59:59.999999+00:00
[2018-02-28 05:19:22,353] Generating Overview
[2018-02-28 05:19:22,353] CSV file /tmp/reports/data_source_evolution.csv generation in progress
[2018-02-28 05:19:22,355] Metric: 'Commits' (commits); Query: {"size": 0, "aggs": {"1": {"date_histogram": {"time_zone": "UTC", "extended_bounds": {"min": 1420070400000, "max": 1519775999999}, "min_doc_count": 0, "interval": "year", "field": "grimoire_creation_date"}, "aggs": {"2": {"cardinality": {"precision_threshold": 3000, "field": "hash"}}}}}, "query": {"range": {"grimoire_creation_date": {"gte": "2015-01-01T00:00:00+00:00", "lte": "2018-02-27T23:59:59.999999+00:00"}}}, "from": 0}
Traceback (most recent call last):
File "/usr/local/bin/manuscripts", line 130, in
report.create()
File "/usr/local/lib/python3.4/dist-packages/manuscripts/report.py", line 711, in create
self.create_data_figs()
File "/usr/local/lib/python3.4/dist-packages/manuscripts/report.py", line 603, in create_data_figs
self.sections()section
File "/usr/local/lib/python3.4/dist-packages/manuscripts/report.py", line 267, in sec_overview
(last, percentage) = m.get_trend()
File "/usr/local/lib/python3.4/dist-packages/manuscripts/metrics/metrics.py", line 201, in get_trend
ts = self.get_ts()
File "/usr/local/lib/python3.4/dist-packages/manuscripts/metrics/metrics.py", line 150, in get_ts
res = self.get_metrics_data(query)
File "/usr/local/lib/python3.4/dist-packages/manuscripts/metrics/metrics.py", line 138, in get_metrics_data
r.raise_for_status()
File "/usr/local/lib/python3.4/dist-packages/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 406 Client Error: Not Acceptable for url: http://localhost:9200/github_raw/git_enrich/_search

Any leads ?

Add a --chaoss flag to generate a report using CHAOSS only metrics (the current report)

Following #111 what we want is to add a --chaoss flag which will allow the user to generate a report using CHAOSS metrics only i.e the current report that is being generated.

This will be the default behaviour of Manuscripts and this will allow us to add different flags (--gmd, --risk and so on) which will be responsible for generating reports using different metrics.

Support filters per data source

Right now in manuscripts, when you specify filters in the command line to be applied to the queries to collect the metrics data, they are applied to all data sources. But some filters are only valid for some data sources (for example, filtering the merge commits for git). So we need to support filters by data source.

Make git the name of the enriched index for git

AFAIK, by default Mordred is creating the alias for the git enriched index as git. However, it seems Manuscripts is looking for git_enrich (see #16). Could somebody more knowledgeable on Manuscripts confirm this, so that we keep this issue open, or deny, and we just close it?

Test issue

This is a test issue created for the purpose of testing the data that is collected by grimoire-elk when a raw index is created and the format of that data.

I am using p2o.py on manuscripts and creating raw and enriched indices and looking at the data generated so that appropriate code can be added to grimoire-elk/perceval for creating some fields in the enriched index.

New names for files, classes in manuscripts2

Even when the structure in manuscripts2 starts to take shape, we need to discuss better names for the structures there. For example, "new_functions" is not a good name ;-)

Could we discuss here how to name files, classes and methods?

Generation and indexing of enriched data to test the reports and other functions

The reports generated (and other files/functions) need to be tested using data fetched from elasticsearch.
Right now, the functions in manuscripts2 are tested on git data from grimoirelab-perceval repository which is fetched at the time of testing. This is causing the tests to fail after some time when more commits are added in grimoirelab-perceval. Example

The other approach that is being used is to take the raw data for the data sources --> upload it to ES --> enrich that raw data --> upload that data again to ES.
This approach works but it is slow and using raw data is unnecessary as the enriched data can be directly used.

This issue is to solve the problem regarding the tests. We have to find out a good way to get frozen (which is stored locally) enriched data --> upload it to elasticsearch --> query the data for results.

We are looking for a solution which provides us with the enriched data easily and which is capable of generating that enriched data again if something goes wrong. It should also be able to upload the data into elasticsearch with proper mappings for the data.

Import error : Cannot import name timezone

While running the command
manuscripts2 -n Perceval_Project -d PERCEVAL-REPORTS -s 2016-05-01 -e 2018-04-10 -i quarter -u http://localhost:9200/ --data-sources git github_issues github_prs \ --indices perceval_git perceval_github_issues perceval_github_prs -l logo.png
to generate the report, the error arises.

Add functions to calculate the metrics for the remaining data sources

Currently, by using Manuscripts2, we can generate a report for the git, github issues and github prs data sources only.

This issue is about adding the functionality to calculate the metrics using the remaining data sources:

its (issue tracking systems)
gerrit
mailing lists
stackexchange
jira

The functions and classes for these data sources will follow the same pattern as of the currently implemented data sources.

Option to change the copyright holder

Now, reports are "signed" with Bitergia as copyright holder. This should be an option, so anyone running manuscripts could have a report "signed" by them, as they can already change, for example, the logo. The default copyright holder should be "CHAOSS/GrimoireLab".

Implementing the current Report using new functions

I've been working on redesigning the functions and classes so as to calculate the Metrics in a better way.

This issue is about trying to implement the current reports that Manuscripts produces using these new functions. I am limiting this issue to only calculate the Metrics defined under Github Issues, PRs and Git Commits, as this is for testing purposes and most of the Metrics are under these 3 categories.
I was unable to add these new functions directly into manuscripts(by creating a branch and generating a report using these functions like what happens currently) because the current functions use Inherited classes for each metric and are different than what I have implemented.
These new functions cannot just be plugged into manuscripts and will mostly require a complete makeover of the manuscripts project, if it is decided that they are correct and should be implemented.
The new functions and classes have been created, keeping in mind the metrics under GMD and others so that calculating them becomes easier and useful for the users.

I've implemented the metrics in this notebook and this is the reference pdf showing the metrics calculated by manuscripts using the old functions.

@jgbarah @valeriocos @acs please have a look at the notebook!

Require DCO sign-off for new commits

This issue is to activate protobot/dco (or similar bot) to check that all commits have a sign-off in this repository.

The CHAOSS Project Charter section 8.2.1 requires that all contributions are signed-off. The CHAOSS project has been piloting the use of DCO sign-offs. Once contributors know how to do it, sign-offs are easy to do with little overhead.

For users of the git command line interface, a sign-off is accomplished with the -s as part of the commit command: git commit -s -m 'This is a commit message'

For users of the GitHub interface, a sign-off is accomplished by writing Signed-off-by: Your Name <[email protected]> into the commit comment field. This can be automated by using a browser plugin like scottrigby/dco-gh-ui

To-Do for repo maintainers: Please inform your contributors about DCO sign-offs and comment on this issue when your are ready for the DCO bot to be activated on this repository.

Define 'Path to Maintainership' and 'List Maintainers'

Per proposal in chaoss/community#5 :

The README.md of the repository contains a list of who is maintainer. Each CHAOSS repositry brings together different people and they document in the repository specific CONTRIBUTING.md how somone becomes a maintainer on their repository.

TODO:

Specify 'path to maintainership' for this repo in CONTRIBUTING.md file
List current maintainers in README.md file

Graphs are off-center and extend beyond the right margin

The reports I have seen appear to have an issue with figures.

The figures are off-center to the right and extend past the right margin.

Here is where I reported the issue initially:
https://gitlab.com/Bitergia/c/OpenDayLight/support/issues/56

PROJECT-NAME not getting replaced in the report.pdf file

In the report.pdf file generated at the end of running the manuscripts command with the necessary parameters and also providing -n <name of the report>, report.pdf still shows the name of the project as PROJECT-NAME and is unable to change it.

Problem might be in report.py

cmd = ['grep -rl PROJECT-NAME . | xargs sed -i s/PROJECT-NAME/' + project_replace + '/g']