python-sprints / pandas-mentoring Goto Github PK

Mentoring new pandas contributors.

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 89.22% Makefile 3.82% Python 6.96%

pandas-mentoring's Introduction

Python sprints website

This is the website of the Python sprints group.

It was started by the London Python Sprints meetup, but open to any other Python User Group (PUG), interested in running sprints.

Website set up

The website is built using Jekyll, a Ruby (yes, Ruby ;) static website generator supported by GitHub pages. To build the website locally you need to:

Install ruby and ruby-dev
- dnf install ruby ruby-devel on Fedora, CentOS and RedHat
- apt-get install ruby-full ruby-dev on Debian and Ubuntu
- brew install ruby on MacOS
- More information about installing Ruby is here https://www.ruby-lang.org/en/documentation/installation/.
Install Jekyll with gem install jekyll bundle.
Install dependencies with bundle install in the project directory.
Run the server with bundle exec jekyll serve.
Open the rendered website at http://localhost:4000/

How to add your chapter

Send a pull request adding a new file _chapters/<your-chapter-name>.md, where <your-chapter-name> is the name of your chapter in ascii lowercase, and separating words with underscore (e.g. london_pyton_sprints).

The content of the file has a header section with some fields (started and finished with ---), and the main description of the chapter afterwards. We are using markdown files to store your data and they can contain either markdown tags for text formatting or pure html - it is up to you how to style your content.
The image that will represent your chapter should be in jpeg format. The size the content is optimized for is: 1920 x 600 px. Please name it <your_chapter_name_1920x600px.jpg>. This is the format:

---
category: "the-city-where-your-chapter-is-located-in"
title: "Name of your chapter"
meetup_link: <url-of-your-other-website-if-any>
address: "Description of your location, usually City, Country"
country_code: <2-digit-country-code-used-for-the-flag>
image: <relative path to the chapter image>
lat: <float-number-with-latitude-for-the-marker-in-the-map>
lng: <float-number-with-longitude-for-the-marker-in-the-map>
sponsors:
    - <first-sponsor-id-to-be-listed-in-your-page>
    - <second-sponsor-id-to-be-listed-in-your-page>
    - <feel-free-to-add-as-many-as-you-want>
---

Here you can add any information about your group. How it started, which are the goals,
what people can expect from it.

This is a good time to remind you, that no sort of discrimination (gender, religion,
age, sexual orientation...) is tolerated in the Python sprints (or in the Python
community in general). And as an organizer you must not allow any sort of harassment
in the events. Feel free to add in your chapter description to policies you have to
make sure the events are as diverse and welcoming as possible.

Example chapter setup

---
category: "london"
title: "London Python Sprints"
meetup_link: https://www.meetup.com/Python-Sprints/
address: London, United Kingdom
country_code: gb
image: static/images/chapters/london_python_sprints_1920x600px.jpg
lat: 51.512344
lng: -0.090985
sponsors:
  - harvey_nash
  - touch_surgery
  - bloomberg
---
The content of your chapters description goes below ---.

How to add an event

Before sending an event, you need to have your chapter set up. See the section How to add your chapter for more information on how to do it.

You can also add a sponsor, to give credit to the companies/institutions supporting your group with a venue, pizzas, beers... And if you are sprinting in a project not in the system, you may also want to add it.

To create an event, you need to add a new file to the _posts directory with the file name following the format YYYY-MM-DD-slug-of-the-event.md. Were YYYY-MM-DD is the date of the event (this will be the date shown on site), and the slug of the event is the title in a format URL friendly (only lowercase ascii letters and numbers, separating words with hyphens). For example 2017-12-31-django-bugfixing.md could be the name of a sprint "Django bugfixing" happening on December 31st of 2017.

The content of the file has a header section with some fields (started and finished with ---), and the main description of the event afterwards. This is the format:

---
category: "<same-as-category-of-your-chapter>"
title: "Short summary of your event"
level: "Target audience of the event (e.g. Beginners, All levels, Advanced,...)"
time: "hh:mm"
rsvp_link: <url-of-your-meetup-eventbrite-etc-page>
project: <id-of-the-project-you-will-work-on>
sponsor: <id-of-the-sponsor-for-the-event>
---

This space is the main description of the event, where you can provide further details.

Note that you don't need to add information about the project if it already exists in the `_projects` folder, as the description of the
project, the logo, and the environment set up instructions should be rendered automatically
after specifying the id of the project. If the project has not been added yet, you will need to add it before calling it here. If the event is not related to a specific project, you can leave the `<id-of-the-project-you-will-work-on>` blank.

Also, by specifying the id of the sponsor, a box with its information will appear.

Example event setup

---
category: "london"
title: "Pandas internals"
level: "All levels"
time: "18:30"
rsvp_link: https://www.meetup.com/Python-Sprints/events/249350212/
project: pandas
sponsor: harvey_nash
---
The content of your event's description goes below ---.

You may want to copy one of the last events in _posts to be used as reference.

How to add a sponsor

If you need to add your local sponsor which is not already in the _sponsors folder you can do so easily.
To add a sponsor logo please copy it to the static/images/sponsors folder with a name matching the obj_id of your sponsor. We use png files for our sponsors. The maximum size is: 258px x 82px so please scale them down to match one of those dimensions. The logo width is: 150px, if a logo having width smaller than that will be stretch to match the logo width.

Create a <name_of_your_sponsor>.md file using the format below:

---
obj_id: <unique_identifier_of_your_sponsor>
name: "name of your sponsor"
logo: <relative path to your sponsor logo>
link: <website link to your sponsor>
address: "sponsor's full address"
lat: <float-number-with-latitude-for-the-marker-in-the-map>
lng: <float-number-with-longitude-for-the-marker-in-the-map>
---
Here you can place a short description of your sponsor's business etc.

Example sponsor setup

---
obj_id: quantum_black
name: "Quantum Black"
logo: static/images/sponsors/quantum_black.png
link: https://www.quantumblack.com/
address: "Kinnaird House, 1 Pall Mall<br/>London, SW1Y 5AU, UK"
lat: 51.507954
lng: -0.130718
---
QuantumBlack is an advanced analytics firm operating at the intersection of strategy, technology & design to improve performance outcomes for organisations. With roots in Formula One, we now work across sector with some of the world's leading organisations in advanced industries, healthcare and finance.

How to add a project

If you the project that you are going to work on is not already in the _projects folder you can do add it easily.
To add a project logo please copy it to the static/images/projects folder. We use png files for projects.

Create a <name_of_your_project>.md file using the format below:

---
obj_id: <unique_identifier_of_your_project>
name: "name of your project"
logo: <relative path to your project logo>
website: <website link to your project>
setup_html: |
    <p>
        <!-- (link to) instruction of how to setup in html format -->
    </p>
---
Here you can place a short description of your project.

Example project setup

---
obj_id: pandas
name: "Pandas"
logo: static/images/projects/pandas_logo_donation.png
website: https://pandas.pydata.org/
setup_html: |
    <p>
        Please follow the instruction in this link:
        <a href="https://python-sprints.github.io/pandas/guide/index.html">
            https://python-sprints.github.io/pandas/guide/index.html
        </a>
    </p>
---
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

How does Jekyll work?

Posts (files in _posts/*.md) are the event pages in markdown
Projects (_project/*.md) are the open source projects we contribute to
Sponsors (_sponsors/*.md) are the companies providing the venue and pizzas
Layouts (files in _layouts/*.html) are equivalent to Django templates, and used to render the posts (events)
CSS is managed with scss/sass and built by Jekyll
GitHub pages automatically builds the site when the repository is updated

pandas-mentoring's People

Contributors

Stargazers

Watchers

pandas-mentoring's Issues

Fix tseries.offsets docstrings with missing period in the summary (set 2)

Same as #132 but for the next docstrings:

pandas.tseries.offsets.QuarterOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.BQuarterEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BQuarterBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.QuarterEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.QuarterBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.YearOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.BYearEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BYearBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.YearEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.YearBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.FY5253.normalize: Summary does not end with a period
pandas.tseries.offsets.FY5253Quarter.normalize: Summary does not end with a period
pandas.tseries.offsets.Easter.normalize: Summary does not end with a period
pandas.tseries.offsets.Tick.normalize: Summary does not end with a period

Add learning points related to creating PRs and issues to the contributing.md

I could work on adding to the contributing.md the learning points related to PRs and issues creations (as discussed in #75 ).

What do you think?

Error accessing gutter chanel

After signing up into gitter using GitHub and granting access to repo, getting this error.
"404. This is not the chat you're looking for. Sorry. :("

Copy alternative pandas setup instructions

In #41, we added the official pandas documentation for contributing.

A bit more than a year ago, we organized a worldwide pandas sprint to improve the documentation, and since the pandas official documentation wasn't great for beginners, we built an alternative page on how to set up the pandas development environment.

This documentation should be used, or at least considered, when improving the official pandas contributing pages. For now, this issue is only about moving the document to this repo (inside /doc/source/, but probably separate from the development one, so things don't get mixed).

This is the document: https://raw.githubusercontent.com/python-sprints/python-sprints.github.io/master/pandas/guide/_sources/pandas_setup.rst.txt?_sm_au_=iVVfbnLZnRjK7PVH (the html version is available here)

Add link to the pandas code of conduct in the README file

The pandas README file provides a short section about contributing to the project: https://github.com/pandas-dev/pandas#contributing-to-pandas-

While pandas has a strict code of conduct to make sure no sort of harassment or discrimination happens in the project development, there is no reference of link in that section. The code of conduct is in: https://github.com/pandas-dev/pandas/blob/master/.github/CODE_OF_CONDUCT.md

I think it's worth to add a reference and a link to the code of conduct in that section.

Please make sure to comment in this issue if you want to work on this, and make sure that you don't work on this if someone else already started. Also, please tag me when open the PR in pandas.

Select a Name for the Team

Please comment names you see fit for the team and react to those you love.

Update to_parquet/pyarrow tests

This issue will have a bit of trickiness on setting up locally the latest (master branch) version of arrow/pyarrow, but should be easy other than that:

pandas-dev/pandas#27955

Please add a comment to the original issue (also here) to claim it if you plan to work on it.

New citing and artwork section in the pandas website

In the new pandas website, I'd like to have a section for resources to cite pandas. To be specific, I want people to be able to cite pandas in scientific papers, and also to use the pandas logo correctly.

Scikit-learn has something like that in their about us section:
https://scikit-learn.org/stable/about.html#citing-scikit-learn

In the current website we've got two papers in the talks section:
https://pandas.pydata.org/talks.html Not sure which would be the preferred to cite pandas (will check with Wes later to see if he's got a preference).

For the logo, I expect to have different versions (for white background, for black background, with text, without...). Besides providing the svg for them, would be nice to provide some guideliness on how we expect the logo to be used (leaving certain margin, do not distort proportions, do not place text on top...). To give an idea, see the brand guidelines from JetBrains: https://www.jetbrains.com/company/brand/ (most big companies will have one, should be easy to find many other examples).

While we don't have the new logo yet, would be useful to have a first version of this page while building the website (should be very easy to update the logo later and get the final version).

For now the new website should live in this repo, in the /doc/ directory.

DataFrame.to_parquet fails with index parameter

This issue should be created in the pandas repo, but creating it here so nobody not in this groups claims it.

Please consider this before claiming this issue:

This is somehow advanced, and besides requiring decent Python/pandas skills, it may require an important amount of time, please don't claim it if you don't have it in the next few days
If you already made your first PR to pandas, please give priority to someone who didn't (you're always welcome to look for "Good first issues" in pandas, and hopefully I can create enough issues for everyone who wants to work in them soon).
Feel free to research on this issue, discuss... even if you don't claim it, but better let someone new open the PR and follow the process end to end

See the next example:

>>> import pandas
>>> pandas.DataFrame({'foo': [1, 2], 'bar': [3, 4]}).to_parquet('test.parquet', engine='pyarrow', index=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-3a1f9432d201> in <module>()
      1 import pandas
----> 2 pandas.DataFrame({'foo': [1, 2], 'bar': [3, 4]}).to_parquet('test.parquet', engine='pyarrow', index=False)

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
   1943         from pandas.io.parquet import to_parquet
   1944         to_parquet(self, fname, engine,
-> 1945                    compression=compression, **kwargs)
   1946 
   1947     @Substitution(header='Write out the column names. If a list of strings '

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
    255     """
    256     impl = get_engine(engine)
--> 257     return impl.write(df, path, compression=compression, **kwargs)
    258 
    259 

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pandas/io/parquet.py in write(self, df, path, compression, coerce_timestamps, **kwargs)
    119             self.api.parquet.write_table(
    120                 table, path, compression=compression,
--> 121                 coerce_timestamps=coerce_timestamps, **kwargs)
    122 
    123     def read(self, path, columns=None, **kwargs):

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, coerce_timestamps, flavor, **kwargs)
   1090                 compression=compression,
   1091                 use_deprecated_int96_timestamps=use_int96,
-> 1092                 **kwargs) as writer:
   1093             writer.write_table(table, row_group_size=row_group_size)
   1094     except Exception:

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pyarrow/parquet.py in __init__(self, where, schema, flavor, version, use_dictionary, compression, use_deprecated_int96_timestamps, **options)
    309             use_dictionary=use_dictionary,
    310             use_deprecated_int96_timestamps=use_deprecated_int96_timestamps,
--> 311             **options)
    312         self.is_open = True
    313 

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()

TypeError: __cinit__() got an unexpected keyword argument 'index'

Based on the documentation, with index=False the DataFrame should be saved without index. But instead a pyarrow exception is raised.

Tasks to do:

Verify that in the latest (master branch) pandas the problem still exists, and the documentation is still incorrect
Check if pyarrow support writing without the index (it should), and how
Use git blame to see who is the main contributor(s) to the to_parquet functionality
Open an issue in the pandas repo explaining in detail the problem, and propose your solution; please tag me @datapythonista and also the person who worked more on that part
Claim the issue immediately after creating it with a comment saying you're working on it
Wait for confirmation on the approach you're proposing, and when maintainers are happy with it, open a PR with the fix

Set up azure-pipelines to automatically build and publish the documentation

After #31 is complete, we should set up a continuous integration system (azure-pipelines) to automatically build the documentation to html, and publish it in a website, every time there are changes.

Fix tseries.offsets docstrings with missing period in the summary (set 3)

Same as #132 but for the next docstrings:
pandas.tseries.offsets.Day.normalize: Summary does not end with a period
pandas.tseries.offsets.Hour.normalize: Summary does not end with a period
pandas.tseries.offsets.Minute.normalize: Summary does not end with a period
pandas.tseries.offsets.Second.normalize: Summary does not end with a period
pandas.tseries.offsets.Milli.normalize: Summary does not end with a period
pandas.tseries.offsets.Micro.normalize: Summary does not end with a period
pandas.tseries.offsets.Nano.normalize: Summary does not end with a period
pandas.tseries.offsets.BDay.normalize: Summary does not end with a period
pandas.tseries.offsets.BMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.CBMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.CBMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.CDay.normalize: Summary does not end with a period
pandas.tseries.frequencies.to_offset: Summary does not end with a period

Please share thoughts on SURVEY.md

Fix DatetimeIndex and other docstring for missing period at the end of summary

Same as #132 but for the next docstrings:

pandas.IndexSlice: Summary does not end with a period
pandas.MultiIndex.names: Summary does not end with a period
pandas.MultiIndex.is_lexsorted: Summary does not end with a period
pandas.MultiIndex.reorder_levels: Summary does not end with a period
pandas.DatetimeIndex.snap: Summary does not end with a period
pandas.DatetimeIndex.to_perioddelta: Summary does not end with a period
pandas.DatetimeIndex.to_pydatetime: Summary does not end with a period
pandas.DatetimeIndex.to_series: Summary does not end with a period
pandas.TimedeltaIndex: Summary does not end with a period
pandas.PeriodIndex.is_leap_year: Summary does not end with a period
pandas.api.extensions.ExtensionArray._concat_same_type: Summary does not end with a period
pandas.api.extensions.ExtensionArray.dropna: Summary does not end with a period

Discuss PR review conventions

Discuss:

How many reviews and approvals we want to enforce before a PR is cleared for a merge, if any?
Should this be enforced through git settings?

Create conda dependencies file with sphinx

Once we start changing the pandas contribution documentation (#12) we'll have to generate the html version of it. This is being done in most Python projects with Sphinx. To make it easier to create a development environment for this, we can use conda, and specify the dependencies in a file named environment.yml. We can base it on the pandas one, but for now we'll just add sphinx (the same version as pandas).

Implement header in Sphinx

As discussed in #51, we need a separate PR to implement header. I think we probably need to determine what the value should be, though, whether we should stick to panda's implementation or not (& if not what the value of header should be).

Create gitter channel for pandas-mentoring

Create a gitter channel to aid in collaborating on issues among the pandas-mentoring members

Add PR review conventions to Contributing.md

PR rules:
input from two reviewers Is needed to merge a PR.
After one reviewer approves, the next reviewer can merge.
xref #70

Copy the documentation about contributing to pandas to this repository

There is a directory doc/source/development in pandas that contains few files with the documentation on how to contribute. That documentation is kind of all right, and contains useful information, but could be easily improved, and make it easier for anyone to follow (including ourselves).

What we can do is to copy that documentation in this repository, improve it here in an iterative way, and then open a PR in pandas with all the updates.

I think it's probably good to keep these files in the same directory as in pandas here, doc'/source/development.

I'd also save somewhere (in a file in the repo, or in a comment to this ticket) the commit of the last version of pandas when copying these files, so we can later see if something changes in pandas while making the changes here. The commit hash can be obtained by simply git log.

Add link to gitter channel created in issue #55 to "README.md"

Add link to the new learning points page from the README

We recently added this document: https://github.com/python-sprints/pandas-mentoring/blob/master/LEARNING_POINTS.md

It's probably worth to add a link to it from the README file.

Add "Things we learned" heading to README.md

Add a new header "Things we learned" to list the lessons learned.

Create a slack Channel for the Pandas-mentoring for easy communication

Make explicit in pandas docs the imports and the options

See pandas-dev/pandas#28038

Until now, there has been a hidden code block at the beginning of every documentation page with imports, random seeds and options. There is agreement to make that code explicit, so the users can reproduce exactly the code, and there is no "magic" going on.

Let's start by opening a PR to remove that header in this page: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

My preferred option is to remove the {{ header }} variable, keep the currentmodule in its place, but not the code block. Then, just add what it's needed in the code blocks when things are first used. For example, in the first block code add import pandas as pd at the beginning. We probably only need the code with the random seed in the block with the first use of numpy.random.

After the change it can be good to run a diff of the html version before the change, and after the change, and see how much did it change (and add that diff to the PR description).

Note that several people in the pandas team is likely to have opinions on how this change should be implemented. So be prepared for several reviews, and several iterations of proposed changes and questions. :)

Timestamp docstring with missing periods

Same as #132, but for some other docstrings.

pandas.Timestamp.resolution: Summary does not end with a period
pandas.Timestamp.tz: Summary does not end with a period
pandas.Timestamp.ceil: Summary does not end with a period
pandas.Timestamp.combine: Summary does not end with a period
pandas.Timestamp.floor: Summary does not end with a period
pandas.Timestamp.fromordinal: Summary does not end with a period
pandas.Timestamp.isoweekday: Summary does not end with a period
pandas.Timestamp.replace: Summary does not end with a period
pandas.Timestamp.round: Summary does not end with a period
pandas.Timestamp.weekday: Summary does not end with a period
pandas.Timedelta.ceil: Summary does not end with a period
pandas.Timedelta.floor: Summary does not end with a period
pandas.Timedelta.isoformat: Summary does not end with a period
pandas.Timedelta.round: Summary does not end with a period

fix the slight typo in markdown in Aya's username addition.

Just clean out space in between @Aya-S . Markdown doesn't have space between the link text in square brackets and the link in normal brackets.

Add .gitignore to the project

Add gitignore and include html files generated with sphynx there.

Fix IntervalIndex docstring for missing period at the end of summary

Same as #132 but for IntervalIndex docstrings:

pandas.IntervalIndex.from_tuples: Summary does not end with a period
pandas.IntervalIndex.left: Summary does not end with a period
pandas.IntervalIndex.right: Summary does not end with a period
pandas.IntervalIndex.mid: Summary does not end with a period
pandas.IntervalIndex.closed: Summary does not end with a period
pandas.IntervalIndex.length: Summary does not end with a period
pandas.IntervalIndex.is_non_overlapping_monotonic: Summary does not end with a period
pandas.IntervalIndex.set_closed: Summary does not end with a period
pandas.IntervalIndex.to_tuples: Summary does not end with a period

git push error

am getting this error when I try to push kindly help

remote: Permission to python-sprints/pandas-mentoring.git denied to dorothykiz1.
fatal: unable to access 'https://github.com/python-sprints/pandas-mentoring.git/': The requested URL returned error: 403

structuring the LEARNING_POINTS.md document

How about structuring it based on topics like git, environment setup etc? Or just listing it for now and grouping it at later point?

Adding License file to the project.

Should we add a LICENSE file for this project? If yes, what License to follow?

Create notebook with histogram of number of wrong docstrings

In pandas there are many docstrings that have known errors, like parameters that are not documented, examples that do not run, formatting issues...

We have a script that is able to generate all them in a json file (you need a pandas development environment to run it, and should be run in an updated master branch):

./scripts/validate_docstrings.py --format=json > pandas_docstring_errors.json

After generating the json file, we need a jupyter notebook that opens that file in pandas, and shows how many of each error need to be fixed. The resulting notebook can be added to a notebooks/ directory in this repo.

DISCUSSION: Propose topics for pandas tutorials

In the pandas documentation, we would like to add tutorials that cover end to end real use cases of pandas. This should make things very easy for first time users trying to address a specific problem with pandas.

Based on my personal experience, those are the kind of problems I usually address:

Exploratory analysis of a dataset to answer specific questions
Build a pipeline transforming one or more data sources to generate an output (for example to train a machine learning model)
Forecasting of time series (for example stock market data)
Preprocessing of textual data (for most NLP problems there are surely better tools like nltk, but pandas can be more flexible for some cases)

I'm sure people is doing other cool things with pandas, would be great to brainstorm and find more use cases, that are worth having a tutorial.

Granting Aya-S the needed permissions to enter gitter channel and review PRs

Hello Everyone,
would be great if someone with the required privileges would Grant me the needed permissions to enter the gitter channel and review PRs.
TIA!

Fix RangeIndex and other docstrings for missing period in summary

Same as #132 but for other docstrings:

pandas.Categorical.dtype: Summary does not end with a period
pandas.merge_ordered: Summary does not end with a period
pandas.bdate_range: Summary does not end with a period
pandas.period_range: Summary does not end with a period
pandas.timedelta_range: Summary does not end with a period
pandas.interval_range: Summary does not end with a period
pandas.util.hash_pandas_object: Summary does not end with a period
pandas.Grouper: Summary does not end with a period
pandas.Index.memory_usage: Summary does not end with a period
pandas.Index.fillna: Summary does not end with a period
pandas.Index.dropna: Summary does not end with a period
pandas.Int64Index: Summary does not end with a period
pandas.UInt64Index: Summary does not end with a period
pandas.Float64Index: Summary does not end with a period
pandas.RangeIndex.start: Summary does not end with a period
pandas.RangeIndex.stop: Summary does not end with a period
pandas.RangeIndex.step: Summary does not end with a period

Set up sphinx to build the documentation

Once #12 and #17 we should have sphinx set up to be able to convert our markdown documentation into its html version.

Documentation for sphinx is available at: http://www.sphinx-doc.org/en/master/

Should we add new issue template?

Maybe we could use a .github to include an issue template.
Not only will it standardize the issue content, but help the newcomers become more familiar with the workflow/(unsaid rules: speak_no_evil:) of adding issues since most open-source projects come with a predefined issue template.
For the startes we can follow one from pandas-dev https://github.com/pandas-dev/pandas/blob/master/.github/ISSUE_TEMPLATE.md

Fix tseries.offsets docstrings with missing period in the summary (set 1)

Same as #132 but for the next docstrings:

pandas.tseries.offsets.DateOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessDay.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessHour.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessDay.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessHour.normalize: Summary does not end with a period
pandas.tseries.offsets.MonthOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.MonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.MonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.SemiMonthOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.SemiMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.SemiMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.Week.normalize: Summary does not end with a period
pandas.tseries.offsets.WeekOfMonth.normalize: Summary does not end with a period
pandas.tseries.offsets.LastWeekOfMonth.normalize: Summary does not end with a period

Fix IntervalArray docstrings with missing period in the summary

Same as #132, but for other docstrings:

pandas.arrays.IntervalArray.left: Summary does not end with a period
pandas.arrays.IntervalArray.right: Summary does not end with a period
pandas.arrays.IntervalArray.closed: Summary does not end with a period
pandas.arrays.IntervalArray.mid: Summary does not end with a period
pandas.arrays.IntervalArray.length: Summary does not end with a period
pandas.arrays.IntervalArray.is_non_overlapping_monotonic: Summary does not end with a period
pandas.arrays.IntervalArray.from_tuples: Summary does not end with a period
pandas.arrays.IntervalArray.set_closed: Summary does not end with a period
pandas.arrays.IntervalArray.to_tuples: Summary does not end with a period

DISCUSSION: Data for pandas examples

Very often in the pandas documentation, to show examples simple DataFrame objects are created. And many of them just use random data, see for example https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#id1

>>> df = pandas.DataFrame(numpy.random.randn(5, 2), columns=list('AB'))
>>> df
          A         B
0  0.469112 -0.282863
1 -1.509059 -1.135632
2  1.212112 -0.173215
3  0.119209 -1.044236
4 -0.861849 -2.104569

Then, if I want to show an operation, I can get something like:

>>> df @ 2
          A         B
0  2.469112  1.717137
1  0.490941  0.864368
2  3.212112  1.826785
3  2.119209  0.955764
4  1.138151 -0.104569

And in my opinion the example is quite useless (more than for the syntax), because if you don't know what the operation does, the example is not helping you understand.

The best example I could find to overcome that (probably not great, but the best I could find) is:

>>> df = pandas.DataFrame({"num_legs": [4, 4, 2],
...                        "num_arms": [0, 0, 2]},
...                       ["dog", "cat", "monkey"])
>>> df
        num_arms  num_legs
dog            0         4
cat            0         4
monkey         2         2

Then, when performing an operation is easy to guess what it's doing, or double check if you already have a guess:

>>> df @ 2
        num_arms  num_legs
dog            2         6
cat            2         6
monkey         4         4

We are already using some of those in some examples: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename_axis.html

While this worked well in some places, we found this dataset very insufficient to show all pandas functionality. And while we initially wanted to standardize the data used in the examples, so things are easier for recurring users, we finally forgot about it.

But while it's surely not simple, I think it'd be ideal if we could find a very reduced amount of datasets that can be used in all pandas examples. The ones I think we surely need are:

A simple example like the one proposed
One with MultIndex (probably in both axis)
A timeseries dataset

If we're able to find the ones we need, I think it'd also be great if we could have something like:

>>> import pandas
>>> animals = pandas.sample_data('animals')
>>> animals
        num_arms  num_legs
dog            0         4
cat            0         4
monkey         2         2

That should make the examples much simpler, and directly show the point they are trying to show. See for example the MultiIndex example here, how creating the DataFrame distracts from the operation shown: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

@python-sprints/pandas-mentoring thoughts? Ideas on datasets?

Fix docstrings where the summary does not end with a period.

In pandas we try to make sure that all the documentation pages follow the same conventions. For example, we expect all the summaries to finish with a period.

Here there is a list of docstrings that don't follow this convention:

pandas.DataFrame.keys: Summary does not end with a period
pandas.read_clipboard: Summary does not end with a period
pandas.ExcelFile.parse: Summary does not end with a period
pandas.HDFStore.put: Summary does not end with a period
pandas.HDFStore.get: Summary does not end with a period
pandas.HDFStore.select: Summary does not end with a period
pandas.HDFStore.keys: Summary does not end with a period
pandas.HDFStore.groups: Summary does not end with a period
pandas.HDFStore.walk: Summary does not end with a period
pandas.io.stata.StataReader.data: Summary does not end with a period
pandas.io.formats.style.Styler.loader: Summary does not end with a period
pandas.io.formats.style.Styler.set_caption: Summary does not end with a period
pandas.plotting.deregister_matplotlib_converters: Summary does not end with a period
pandas.plotting.register_matplotlib_converters: Summary does not end with a period

You can see the first one, how the first sentence should finish with a period and it doesn't: https://dev.pandas.io/reference/api/pandas.DataFrame.keys.html

Please tag me @datapythonista when opening the PR in pandas. Thanks!

[Funding Options][RGSOC][Future Ref.]

As per the discussion [1] on the PR #86.
This issue is to keep track of the revamp of RGSoC [2] so that it can be added to the list of funding opportunities, if need be so.

[1]: #86 (comment)
[2]: https://railsgirlssummerofcode.org/blog/2019-03-21-the-future-of-rgsoc

Creating a Contributing.md

Hi! I was wondering if it might be helpful to create a markdown detailing how to contribute to this repository. Maybe this is too meta (creating a contributing guideline to work in a repository that in the end will make changes to the contributing guideline of pandas), but I've seen lots of great advices from Marc and others on how to create better issues or PRs and how to style them, and thought perhaps it would be nice to have these advices written somewhere.

Write simple installation instructions for pandas

pandas has a quite complete documentation page with installation instructions:

https://pandas.pydata.org/pandas-docs/stable/install.html

But personally, I don't want users that come to pandas for the first time (and possibly to the whole PyData ecosystem) to find this document and have to use it to get started. I think it makes much more sense to provide a short document with how we think they should set everything up, and then a link to the existing document if they are interested in Advanced installation instructions.

What I think we should provide is a document that explains step by step:

How to get Anaconda
How to set up an environment with pandas
How to import and possibly show the pandas version
Link to the tutorials (we are working on them) and to the advanced installation instructions if they want more options

I'm unsure on whether the instructions should show how to get the environment directly in JupyterLab, show first in a Python terminal and then in JupyterLab, or ignore JupyterLab and just show with the Python terminal.

I think it may make sense to have screenshots and make the installation instructions as easy and visual as possible.

For now they should go into our /doc/ directory, and when we're happy with them, we'll open the PR in pandas.

Update NumFOCUS language in the pandas home page

NumFOCUS (the non-profit that supports pandas), is asking all the projects to standardize the language we use about NumFOCUS. In pandas. we have the mention to NumFOCUS at the beginning of the home page pandas.pydata.org.

The requested text is next:

[Project Name] is a Sponsored Project of NumFOCUS, a 501(c)(3) nonprofit charity in the United States. NumFOCUS provides [Project Name] with fiscal, legal, and administrative support to help ensure the health and sustainability of the project. Visit numfocus.org for more information.

Donations to [Project Name] are managed by NumFOCUS. For donors in the United States, your gift is tax-deductible to the extent provided by law. As with any donation, you should consult with your tax adviser about your particular tax situation.

Note that pandas is used in lowecase even at the beginning of the sentence.

The website is in a separate repo: https://github.com/pandas-dev/pandas-website

Validate that master branch is not used in the CI

One "mistake" that most people did in their first PR was to make the changes in the master branch. This shouldn't be done, because the local repository gets into a state that no branches for new features can be created.

In #32 we'll soon have a CI system implemented. Can we have a check in the CI that fails when a PR is opened from the master branch?

Remove alias in pandas docs for numpy.random.randn

See the discussion here: pandas-dev/pandas#28038 (comment)

In pandas we have a header variable that we inject at the top of all documentation pages. One of the things there is randn = np.random.randn, which is a very bad idea that is only useful to confuse users.

With a quick grep I could just find a single case where that is used:

whatsnew/v0.10.0.rst:  In [58]: p4d = Panel4D(randn(2, 2, 5, 4),

We should replace that case to use np.randon.randn explicitly, remove that line from the header in doc/source/conf.py and see in the doc builds if nothing fails after that change (this can be done locally or also in the CI).

Merge doc makefiles into a Python script

In the doc/ directory we've got two different scripts, that were created by Sphinx, Makefile and make.bat. The reason why Sphinx does this is because a Makefile only works in unix system (Linux or Mac mainly), so a separate make.bat file to build the docs in windows is provided.

But instead of having two separate scripts, seems more efficient to have a single script in a language that works in any platform. Python looks like a good option for that. Can we replace those files by a Python file that builds the documentation and provides the same functionality?

pandas issue: typo in roadmap

Just realized that the recently created pandas roadmap has a typo. In the section https://dev.pandas.io/development/roadmap.html#decoupling-of-indexing-and-internals the uses word is duplicated.

If anybody here wants to fix that, you should already know how to open a PR not using master ;)

Please add a comment in this issue if you're taking this, and if someone does don't also work on the same, that's not nice (open source is about working as a team, there is no competition at all, not even among different projects doing the same). Also, make sure there is not a PR in pandas already open for this.

The file to edit should be doc/source/development/roadmap.md (didn't check in detail).

Not able to JOIN GITTER?? can we use slack for better coordination

Hey all since many having issue (including myself ) to join the gitter channel for our discussion and there is a proper solution for the same. In that case can we choose slack as a better way of discussion and from my side that will be much easier

python-sprints / pandas-mentoring Goto Github PK

pandas-mentoring's Introduction

Python sprints website

Website set up

How to add your chapter

Example chapter setup

How to add an event

Example event setup

How to add a sponsor

Example sponsor setup

How to add a project

Example project setup

How does Jekyll work?

pandas-mentoring's People

Contributors

Stargazers

Watchers

Forkers

pandas-mentoring's Issues

Recommend Projects

Recommend Topics

Recommend Org