Giter VIP home page Giter VIP logo

pandas-mentoring's Introduction

Python sprints website

This is the website of the Python sprints group.

It was started by the London Python Sprints meetup, but open to any other Python User Group (PUG), interested in running sprints.

Website set up

The website is built using Jekyll, a Ruby (yes, Ruby ;) static website generator supported by GitHub pages. To build the website locally you need to:

  • Install ruby and ruby-dev
  • Install Jekyll with gem install jekyll bundle.
  • Install dependencies with bundle install in the project directory.
  • Run the server with bundle exec jekyll serve.
  • Open the rendered website at http://localhost:4000/

How to add your chapter

Send a pull request adding a new file _chapters/<your-chapter-name>.md, where <your-chapter-name> is the name of your chapter in ascii lowercase, and separating words with underscore (e.g. london_pyton_sprints).

The content of the file has a header section with some fields (started and finished with ---), and the main description of the chapter afterwards. We are using markdown files to store your data and they can contain either markdown tags for text formatting or pure html - it is up to you how to style your content.
The image that will represent your chapter should be in jpeg format. The size the content is optimized for is: 1920 x 600 px. Please name it <your_chapter_name_1920x600px.jpg>. This is the format:

---
category: "the-city-where-your-chapter-is-located-in"
title: "Name of your chapter"
meetup_link: <url-of-your-other-website-if-any>
address: "Description of your location, usually City, Country"
country_code: <2-digit-country-code-used-for-the-flag>
image: <relative path to the chapter image>
lat: <float-number-with-latitude-for-the-marker-in-the-map>
lng: <float-number-with-longitude-for-the-marker-in-the-map>
sponsors:
    - <first-sponsor-id-to-be-listed-in-your-page>
    - <second-sponsor-id-to-be-listed-in-your-page>
    - <feel-free-to-add-as-many-as-you-want>
---

Here you can add any information about your group. How it started, which are the goals,
what people can expect from it.

This is a good time to remind you, that no sort of discrimination (gender, religion,
age, sexual orientation...) is tolerated in the Python sprints (or in the Python
community in general). And as an organizer you must not allow any sort of harassment
in the events. Feel free to add in your chapter description to policies you have to
make sure the events are as diverse and welcoming as possible.

Example chapter setup

---
category: "london"
title: "London Python Sprints"
meetup_link: https://www.meetup.com/Python-Sprints/
address: London, United Kingdom
country_code: gb
image: static/images/chapters/london_python_sprints_1920x600px.jpg
lat: 51.512344
lng: -0.090985
sponsors:
  - harvey_nash
  - touch_surgery
  - bloomberg
---
The content of your chapters description goes below ---.

How to add an event

Before sending an event, you need to have your chapter set up. See the section How to add your chapter for more information on how to do it.

You can also add a sponsor, to give credit to the companies/institutions supporting your group with a venue, pizzas, beers... And if you are sprinting in a project not in the system, you may also want to add it.

To create an event, you need to add a new file to the _posts directory with the file name following the format YYYY-MM-DD-slug-of-the-event.md. Were YYYY-MM-DD is the date of the event (this will be the date shown on site), and the slug of the event is the title in a format URL friendly (only lowercase ascii letters and numbers, separating words with hyphens). For example 2017-12-31-django-bugfixing.md could be the name of a sprint "Django bugfixing" happening on December 31st of 2017.

The content of the file has a header section with some fields (started and finished with ---), and the main description of the event afterwards. This is the format:

---
category: "<same-as-category-of-your-chapter>"
title: "Short summary of your event"
level: "Target audience of the event (e.g. Beginners, All levels, Advanced,...)"
time: "hh:mm"
rsvp_link: <url-of-your-meetup-eventbrite-etc-page>
project: <id-of-the-project-you-will-work-on>
sponsor: <id-of-the-sponsor-for-the-event>
---

This space is the main description of the event, where you can provide further details.

Note that you don't need to add information about the project if it already exists in the `_projects` folder, as the description of the
project, the logo, and the environment set up instructions should be rendered automatically
after specifying the id of the project. If the project has not been added yet, you will need to add it before calling it here. If the event is not related to a specific project, you can leave the `<id-of-the-project-you-will-work-on>` blank.

Also, by specifying the id of the sponsor, a box with its information will appear.

Example event setup

---
category: "london"
title: "Pandas internals"
level: "All levels"
time: "18:30"
rsvp_link: https://www.meetup.com/Python-Sprints/events/249350212/
project: pandas
sponsor: harvey_nash
---
The content of your event's description goes below ---.

You may want to copy one of the last events in _posts to be used as reference.

How to add a sponsor

If you need to add your local sponsor which is not already in the _sponsors folder you can do so easily.
To add a sponsor logo please copy it to the static/images/sponsors folder with a name matching the obj_id of your sponsor. We use png files for our sponsors. The maximum size is: 258px x 82px so please scale them down to match one of those dimensions. The logo width is: 150px, if a logo having width smaller than that will be stretch to match the logo width.

Create a <name_of_your_sponsor>.md file using the format below:

---
obj_id: <unique_identifier_of_your_sponsor>
name: "name of your sponsor"
logo: <relative path to your sponsor logo>
link: <website link to your sponsor>
address: "sponsor's full address"
lat: <float-number-with-latitude-for-the-marker-in-the-map>
lng: <float-number-with-longitude-for-the-marker-in-the-map>
---
Here you can place a short description of your sponsor's business etc.

Example sponsor setup

---
obj_id: quantum_black
name: "Quantum Black"
logo: static/images/sponsors/quantum_black.png
link: https://www.quantumblack.com/
address: "Kinnaird House, 1 Pall Mall<br/>London, SW1Y 5AU, UK"
lat: 51.507954
lng: -0.130718
---
QuantumBlack is an advanced analytics firm operating at the intersection of strategy, technology & design to improve performance outcomes for organisations. With roots in Formula One, we now work across sector with some of the world's leading organisations in advanced industries, healthcare and finance.

How to add a project

If you the project that you are going to work on is not already in the _projects folder you can do add it easily.
To add a project logo please copy it to the static/images/projects folder. We use png files for projects.

Create a <name_of_your_project>.md file using the format below:

---
obj_id: <unique_identifier_of_your_project>
name: "name of your project"
logo: <relative path to your project logo>
website: <website link to your project>
setup_html: |
    <p>
        <!-- (link to) instruction of how to setup in html format -->
    </p>
---
Here you can place a short description of your project.

Example project setup

---
obj_id: pandas
name: "Pandas"
logo: static/images/projects/pandas_logo_donation.png
website: https://pandas.pydata.org/
setup_html: |
    <p>
        Please follow the instruction in this link:
        <a href="https://python-sprints.github.io/pandas/guide/index.html">
            https://python-sprints.github.io/pandas/guide/index.html
        </a>
    </p>
---
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

How does Jekyll work?

  • Posts (files in _posts/*.md) are the event pages in markdown
  • Projects (_project/*.md) are the open source projects we contribute to
  • Sponsors (_sponsors/*.md) are the companies providing the venue and pizzas
  • Layouts (files in _layouts/*.html) are equivalent to Django templates, and used to render the posts (events)
  • CSS is managed with scss/sass and built by Jekyll
  • GitHub pages automatically builds the site when the repository is updated

pandas-mentoring's People

Contributors

a-sal avatar asvithajanani avatar aya-s avatar ayowolet avatar bhagyac avatar bhavaniravi avatar bhuvanakundumani avatar computergeeknerd avatar datapythonista avatar dorothykiz1 avatar dujm avatar eloisaelias avatar galuhsahid avatar martinagvilas avatar mkhalusova avatar montjoile avatar sara-02 avatar shilpavijay avatar smiaa avatar sparalic avatar tanyaacjain avatar wuraolaoyewusi avatar yomaokobiah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pandas-mentoring's Issues

Fix tseries.offsets docstrings with missing period in the summary (set 2)

Same as #132 but for the next docstrings:

pandas.tseries.offsets.QuarterOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.BQuarterEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BQuarterBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.QuarterEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.QuarterBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.YearOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.BYearEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BYearBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.YearEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.YearBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.FY5253.normalize: Summary does not end with a period
pandas.tseries.offsets.FY5253Quarter.normalize: Summary does not end with a period
pandas.tseries.offsets.Easter.normalize: Summary does not end with a period
pandas.tseries.offsets.Tick.normalize: Summary does not end with a period

Error accessing gutter chanel

After signing up into gitter using GitHub and granting access to repo, getting this error.
"404. This is not the chat you're looking for. Sorry. :("

Copy alternative pandas setup instructions

In #41, we added the official pandas documentation for contributing.

A bit more than a year ago, we organized a worldwide pandas sprint to improve the documentation, and since the pandas official documentation wasn't great for beginners, we built an alternative page on how to set up the pandas development environment.

This documentation should be used, or at least considered, when improving the official pandas contributing pages. For now, this issue is only about moving the document to this repo (inside /doc/source/, but probably separate from the development one, so things don't get mixed).

This is the document: https://raw.githubusercontent.com/python-sprints/python-sprints.github.io/master/pandas/guide/_sources/pandas_setup.rst.txt?_sm_au_=iVVfbnLZnRjK7PVH (the html version is available here)

Add link to the pandas code of conduct in the README file

The pandas README file provides a short section about contributing to the project: https://github.com/pandas-dev/pandas#contributing-to-pandas-

While pandas has a strict code of conduct to make sure no sort of harassment or discrimination happens in the project development, there is no reference of link in that section. The code of conduct is in: https://github.com/pandas-dev/pandas/blob/master/.github/CODE_OF_CONDUCT.md

I think it's worth to add a reference and a link to the code of conduct in that section.

Please make sure to comment in this issue if you want to work on this, and make sure that you don't work on this if someone else already started. Also, please tag me when open the PR in pandas.

Update to_parquet/pyarrow tests

This issue will have a bit of trickiness on setting up locally the latest (master branch) version of arrow/pyarrow, but should be easy other than that:

pandas-dev/pandas#27955

Please add a comment to the original issue (also here) to claim it if you plan to work on it.

New citing and artwork section in the pandas website

In the new pandas website, I'd like to have a section for resources to cite pandas. To be specific, I want people to be able to cite pandas in scientific papers, and also to use the pandas logo correctly.

Scikit-learn has something like that in their about us section:
https://scikit-learn.org/stable/about.html#citing-scikit-learn

In the current website we've got two papers in the talks section:
https://pandas.pydata.org/talks.html Not sure which would be the preferred to cite pandas (will check with Wes later to see if he's got a preference).

For the logo, I expect to have different versions (for white background, for black background, with text, without...). Besides providing the svg for them, would be nice to provide some guideliness on how we expect the logo to be used (leaving certain margin, do not distort proportions, do not place text on top...). To give an idea, see the brand guidelines from JetBrains: https://www.jetbrains.com/company/brand/ (most big companies will have one, should be easy to find many other examples).

While we don't have the new logo yet, would be useful to have a first version of this page while building the website (should be very easy to update the logo later and get the final version).

For now the new website should live in this repo, in the /doc/ directory.

DataFrame.to_parquet fails with index parameter

This issue should be created in the pandas repo, but creating it here so nobody not in this groups claims it.

Please consider this before claiming this issue:

  • This is somehow advanced, and besides requiring decent Python/pandas skills, it may require an important amount of time, please don't claim it if you don't have it in the next few days
  • If you already made your first PR to pandas, please give priority to someone who didn't (you're always welcome to look for "Good first issues" in pandas, and hopefully I can create enough issues for everyone who wants to work in them soon).
  • Feel free to research on this issue, discuss... even if you don't claim it, but better let someone new open the PR and follow the process end to end

See the next example:

>>> import pandas
>>> pandas.DataFrame({'foo': [1, 2], 'bar': [3, 4]}).to_parquet('test.parquet', engine='pyarrow', index=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-3a1f9432d201> in <module>()
      1 import pandas
----> 2 pandas.DataFrame({'foo': [1, 2], 'bar': [3, 4]}).to_parquet('test.parquet', engine='pyarrow', index=False)

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
   1943         from pandas.io.parquet import to_parquet
   1944         to_parquet(self, fname, engine,
-> 1945                    compression=compression, **kwargs)
   1946 
   1947     @Substitution(header='Write out the column names. If a list of strings '

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
    255     """
    256     impl = get_engine(engine)
--> 257     return impl.write(df, path, compression=compression, **kwargs)
    258 
    259 

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pandas/io/parquet.py in write(self, df, path, compression, coerce_timestamps, **kwargs)
    119             self.api.parquet.write_table(
    120                 table, path, compression=compression,
--> 121                 coerce_timestamps=coerce_timestamps, **kwargs)
    122 
    123     def read(self, path, columns=None, **kwargs):

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, coerce_timestamps, flavor, **kwargs)
   1090                 compression=compression,
   1091                 use_deprecated_int96_timestamps=use_int96,
-> 1092                 **kwargs) as writer:
   1093             writer.write_table(table, row_group_size=row_group_size)
   1094     except Exception:

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pyarrow/parquet.py in __init__(self, where, schema, flavor, version, use_dictionary, compression, use_deprecated_int96_timestamps, **options)
    309             use_dictionary=use_dictionary,
    310             use_deprecated_int96_timestamps=use_deprecated_int96_timestamps,
--> 311             **options)
    312         self.is_open = True
    313 

/opt/anaconda2/envs/markdowna/lib/python3.5/site-packages/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()

TypeError: __cinit__() got an unexpected keyword argument 'index'

Based on the documentation, with index=False the DataFrame should be saved without index. But instead a pyarrow exception is raised.

Tasks to do:

  • Verify that in the latest (master branch) pandas the problem still exists, and the documentation is still incorrect
  • Check if pyarrow support writing without the index (it should), and how
  • Use git blame to see who is the main contributor(s) to the to_parquet functionality
  • Open an issue in the pandas repo explaining in detail the problem, and propose your solution; please tag me @datapythonista and also the person who worked more on that part
  • Claim the issue immediately after creating it with a comment saying you're working on it
  • Wait for confirmation on the approach you're proposing, and when maintainers are happy with it, open a PR with the fix

Fix tseries.offsets docstrings with missing period in the summary (set 3)

Same as #132 but for the next docstrings:
pandas.tseries.offsets.Day.normalize: Summary does not end with a period
pandas.tseries.offsets.Hour.normalize: Summary does not end with a period
pandas.tseries.offsets.Minute.normalize: Summary does not end with a period
pandas.tseries.offsets.Second.normalize: Summary does not end with a period
pandas.tseries.offsets.Milli.normalize: Summary does not end with a period
pandas.tseries.offsets.Micro.normalize: Summary does not end with a period
pandas.tseries.offsets.Nano.normalize: Summary does not end with a period
pandas.tseries.offsets.BDay.normalize: Summary does not end with a period
pandas.tseries.offsets.BMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.CBMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.CBMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.CDay.normalize: Summary does not end with a period
pandas.tseries.frequencies.to_offset: Summary does not end with a period

Fix DatetimeIndex and other docstring for missing period at the end of summary

Same as #132 but for the next docstrings:

pandas.IndexSlice: Summary does not end with a period
pandas.MultiIndex.names: Summary does not end with a period
pandas.MultiIndex.is_lexsorted: Summary does not end with a period
pandas.MultiIndex.reorder_levels: Summary does not end with a period
pandas.DatetimeIndex.snap: Summary does not end with a period
pandas.DatetimeIndex.to_perioddelta: Summary does not end with a period
pandas.DatetimeIndex.to_pydatetime: Summary does not end with a period
pandas.DatetimeIndex.to_series: Summary does not end with a period
pandas.TimedeltaIndex: Summary does not end with a period
pandas.PeriodIndex.is_leap_year: Summary does not end with a period
pandas.api.extensions.ExtensionArray._concat_same_type: Summary does not end with a period
pandas.api.extensions.ExtensionArray.dropna: Summary does not end with a period

Discuss PR review conventions

Discuss:

  • How many reviews and approvals we want to enforce before a PR is cleared for a merge, if any?
  • Should this be enforced through git settings?

Create conda dependencies file with sphinx

Once we start changing the pandas contribution documentation (#12) we'll have to generate the html version of it. This is being done in most Python projects with Sphinx. To make it easier to create a development environment for this, we can use conda, and specify the dependencies in a file named environment.yml. We can base it on the pandas one, but for now we'll just add sphinx (the same version as pandas).

Implement header in Sphinx

As discussed in #51, we need a separate PR to implement header. I think we probably need to determine what the value should be, though, whether we should stick to panda's implementation or not (& if not what the value of header should be).

Copy the documentation about contributing to pandas to this repository

There is a directory doc/source/development in pandas that contains few files with the documentation on how to contribute. That documentation is kind of all right, and contains useful information, but could be easily improved, and make it easier for anyone to follow (including ourselves).

What we can do is to copy that documentation in this repository, improve it here in an iterative way, and then open a PR in pandas with all the updates.

I think it's probably good to keep these files in the same directory as in pandas here, doc'/source/development.

I'd also save somewhere (in a file in the repo, or in a comment to this ticket) the commit of the last version of pandas when copying these files, so we can later see if something changes in pandas while making the changes here. The commit hash can be obtained by simply git log.

Make explicit in pandas docs the imports and the options

See pandas-dev/pandas#28038

Until now, there has been a hidden code block at the beginning of every documentation page with imports, random seeds and options. There is agreement to make that code explicit, so the users can reproduce exactly the code, and there is no "magic" going on.

Let's start by opening a PR to remove that header in this page: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

My preferred option is to remove the {{ header }} variable, keep the currentmodule in its place, but not the code block. Then, just add what it's needed in the code blocks when things are first used. For example, in the first block code add import pandas as pd at the beginning. We probably only need the code with the random seed in the block with the first use of numpy.random.

After the change it can be good to run a diff of the html version before the change, and after the change, and see how much did it change (and add that diff to the PR description).

Note that several people in the pandas team is likely to have opinions on how this change should be implemented. So be prepared for several reviews, and several iterations of proposed changes and questions. :)

Timestamp docstring with missing periods

Same as #132, but for some other docstrings.

pandas.Timestamp.resolution: Summary does not end with a period
pandas.Timestamp.tz: Summary does not end with a period
pandas.Timestamp.ceil: Summary does not end with a period
pandas.Timestamp.combine: Summary does not end with a period
pandas.Timestamp.floor: Summary does not end with a period
pandas.Timestamp.fromordinal: Summary does not end with a period
pandas.Timestamp.isoweekday: Summary does not end with a period
pandas.Timestamp.replace: Summary does not end with a period
pandas.Timestamp.round: Summary does not end with a period
pandas.Timestamp.weekday: Summary does not end with a period
pandas.Timedelta.ceil: Summary does not end with a period
pandas.Timedelta.floor: Summary does not end with a period
pandas.Timedelta.isoformat: Summary does not end with a period
pandas.Timedelta.round: Summary does not end with a period

Fix IntervalIndex docstring for missing period at the end of summary

Same as #132 but for IntervalIndex docstrings:

pandas.IntervalIndex.from_tuples: Summary does not end with a period
pandas.IntervalIndex.left: Summary does not end with a period
pandas.IntervalIndex.right: Summary does not end with a period
pandas.IntervalIndex.mid: Summary does not end with a period
pandas.IntervalIndex.closed: Summary does not end with a period
pandas.IntervalIndex.length: Summary does not end with a period
pandas.IntervalIndex.is_non_overlapping_monotonic: Summary does not end with a period
pandas.IntervalIndex.set_closed: Summary does not end with a period
pandas.IntervalIndex.to_tuples: Summary does not end with a period

Create notebook with histogram of number of wrong docstrings

In pandas there are many docstrings that have known errors, like parameters that are not documented, examples that do not run, formatting issues...

We have a script that is able to generate all them in a json file (you need a pandas development environment to run it, and should be run in an updated master branch):

./scripts/validate_docstrings.py --format=json > pandas_docstring_errors.json

After generating the json file, we need a jupyter notebook that opens that file in pandas, and shows how many of each error need to be fixed. The resulting notebook can be added to a notebooks/ directory in this repo.

DISCUSSION: Propose topics for pandas tutorials

In the pandas documentation, we would like to add tutorials that cover end to end real use cases of pandas. This should make things very easy for first time users trying to address a specific problem with pandas.

Based on my personal experience, those are the kind of problems I usually address:

  • Exploratory analysis of a dataset to answer specific questions
  • Build a pipeline transforming one or more data sources to generate an output (for example to train a machine learning model)
  • Forecasting of time series (for example stock market data)
  • Preprocessing of textual data (for most NLP problems there are surely better tools like nltk, but pandas can be more flexible for some cases)

I'm sure people is doing other cool things with pandas, would be great to brainstorm and find more use cases, that are worth having a tutorial.

Fix RangeIndex and other docstrings for missing period in summary

Same as #132 but for other docstrings:

pandas.Categorical.dtype: Summary does not end with a period
pandas.merge_ordered: Summary does not end with a period
pandas.bdate_range: Summary does not end with a period
pandas.period_range: Summary does not end with a period
pandas.timedelta_range: Summary does not end with a period
pandas.interval_range: Summary does not end with a period
pandas.util.hash_pandas_object: Summary does not end with a period
pandas.Grouper: Summary does not end with a period
pandas.Index.memory_usage: Summary does not end with a period
pandas.Index.fillna: Summary does not end with a period
pandas.Index.dropna: Summary does not end with a period
pandas.Int64Index: Summary does not end with a period
pandas.UInt64Index: Summary does not end with a period
pandas.Float64Index: Summary does not end with a period
pandas.RangeIndex.start: Summary does not end with a period
pandas.RangeIndex.stop: Summary does not end with a period
pandas.RangeIndex.step: Summary does not end with a period

Fix tseries.offsets docstrings with missing period in the summary (set 1)

Same as #132 but for the next docstrings:

pandas.tseries.offsets.DateOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessDay.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessHour.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessDay.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessHour.normalize: Summary does not end with a period
pandas.tseries.offsets.MonthOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.MonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.MonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.BusinessMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.CustomBusinessMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.SemiMonthOffset.normalize: Summary does not end with a period
pandas.tseries.offsets.SemiMonthEnd.normalize: Summary does not end with a period
pandas.tseries.offsets.SemiMonthBegin.normalize: Summary does not end with a period
pandas.tseries.offsets.Week.normalize: Summary does not end with a period
pandas.tseries.offsets.WeekOfMonth.normalize: Summary does not end with a period
pandas.tseries.offsets.LastWeekOfMonth.normalize: Summary does not end with a period

Fix IntervalArray docstrings with missing period in the summary

Same as #132, but for other docstrings:

pandas.arrays.IntervalArray.left: Summary does not end with a period
pandas.arrays.IntervalArray.right: Summary does not end with a period
pandas.arrays.IntervalArray.closed: Summary does not end with a period
pandas.arrays.IntervalArray.mid: Summary does not end with a period
pandas.arrays.IntervalArray.length: Summary does not end with a period
pandas.arrays.IntervalArray.is_non_overlapping_monotonic: Summary does not end with a period
pandas.arrays.IntervalArray.from_tuples: Summary does not end with a period
pandas.arrays.IntervalArray.set_closed: Summary does not end with a period
pandas.arrays.IntervalArray.to_tuples: Summary does not end with a period

DISCUSSION: Data for pandas examples

Very often in the pandas documentation, to show examples simple DataFrame objects are created. And many of them just use random data, see for example https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#id1

>>> df = pandas.DataFrame(numpy.random.randn(5, 2), columns=list('AB'))
>>> df
          A         B
0  0.469112 -0.282863
1 -1.509059 -1.135632
2  1.212112 -0.173215
3  0.119209 -1.044236
4 -0.861849 -2.104569

Then, if I want to show an operation, I can get something like:

>>> df @ 2
          A         B
0  2.469112  1.717137
1  0.490941  0.864368
2  3.212112  1.826785
3  2.119209  0.955764
4  1.138151 -0.104569

And in my opinion the example is quite useless (more than for the syntax), because if you don't know what the operation does, the example is not helping you understand.

The best example I could find to overcome that (probably not great, but the best I could find) is:

>>> df = pandas.DataFrame({"num_legs": [4, 4, 2],
...                        "num_arms": [0, 0, 2]},
...                       ["dog", "cat", "monkey"])
>>> df
        num_arms  num_legs
dog            0         4
cat            0         4
monkey         2         2

Then, when performing an operation is easy to guess what it's doing, or double check if you already have a guess:

>>> df @ 2
        num_arms  num_legs
dog            2         6
cat            2         6
monkey         4         4

We are already using some of those in some examples: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename_axis.html

While this worked well in some places, we found this dataset very insufficient to show all pandas functionality. And while we initially wanted to standardize the data used in the examples, so things are easier for recurring users, we finally forgot about it.

But while it's surely not simple, I think it'd be ideal if we could find a very reduced amount of datasets that can be used in all pandas examples. The ones I think we surely need are:

  • A simple example like the one proposed
  • One with MultIndex (probably in both axis)
  • A timeseries dataset

If we're able to find the ones we need, I think it'd also be great if we could have something like:

>>> import pandas
>>> animals = pandas.sample_data('animals')
>>> animals
        num_arms  num_legs
dog            0         4
cat            0         4
monkey         2         2

That should make the examples much simpler, and directly show the point they are trying to show. See for example the MultiIndex example here, how creating the DataFrame distracts from the operation shown: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

@python-sprints/pandas-mentoring thoughts? Ideas on datasets?

Fix docstrings where the summary does not end with a period.

In pandas we try to make sure that all the documentation pages follow the same conventions. For example, we expect all the summaries to finish with a period.

Here there is a list of docstrings that don't follow this convention:

pandas.DataFrame.keys: Summary does not end with a period
pandas.read_clipboard: Summary does not end with a period
pandas.ExcelFile.parse: Summary does not end with a period
pandas.HDFStore.put: Summary does not end with a period
pandas.HDFStore.get: Summary does not end with a period
pandas.HDFStore.select: Summary does not end with a period
pandas.HDFStore.keys: Summary does not end with a period
pandas.HDFStore.groups: Summary does not end with a period
pandas.HDFStore.walk: Summary does not end with a period
pandas.io.stata.StataReader.data: Summary does not end with a period
pandas.io.formats.style.Styler.loader: Summary does not end with a period
pandas.io.formats.style.Styler.set_caption: Summary does not end with a period
pandas.plotting.deregister_matplotlib_converters: Summary does not end with a period
pandas.plotting.register_matplotlib_converters: Summary does not end with a period

You can see the first one, how the first sentence should finish with a period and it doesn't: https://dev.pandas.io/reference/api/pandas.DataFrame.keys.html

Please tag me @datapythonista when opening the PR in pandas. Thanks!

Creating a Contributing.md

Hi! I was wondering if it might be helpful to create a markdown detailing how to contribute to this repository. Maybe this is too meta (creating a contributing guideline to work in a repository that in the end will make changes to the contributing guideline of pandas), but I've seen lots of great advices from Marc and others on how to create better issues or PRs and how to style them, and thought perhaps it would be nice to have these advices written somewhere.

Write simple installation instructions for pandas

pandas has a quite complete documentation page with installation instructions:

https://pandas.pydata.org/pandas-docs/stable/install.html

But personally, I don't want users that come to pandas for the first time (and possibly to the whole PyData ecosystem) to find this document and have to use it to get started. I think it makes much more sense to provide a short document with how we think they should set everything up, and then a link to the existing document if they are interested in Advanced installation instructions.

What I think we should provide is a document that explains step by step:

  • How to get Anaconda
  • How to set up an environment with pandas
  • How to import and possibly show the pandas version
  • Link to the tutorials (we are working on them) and to the advanced installation instructions if they want more options

I'm unsure on whether the instructions should show how to get the environment directly in JupyterLab, show first in a Python terminal and then in JupyterLab, or ignore JupyterLab and just show with the Python terminal.

I think it may make sense to have screenshots and make the installation instructions as easy and visual as possible.

For now they should go into our /doc/ directory, and when we're happy with them, we'll open the PR in pandas.

Update NumFOCUS language in the pandas home page

NumFOCUS (the non-profit that supports pandas), is asking all the projects to standardize the language we use about NumFOCUS. In pandas. we have the mention to NumFOCUS at the beginning of the home page pandas.pydata.org.

The requested text is next:

[Project Name] is a Sponsored Project of NumFOCUS, a 501(c)(3) nonprofit charity in the United States. NumFOCUS provides [Project Name] with fiscal, legal, and administrative support to help ensure the health and sustainability of the project. Visit numfocus.org for more information.

Donations to [Project Name] are managed by NumFOCUS. For donors in the United States, your gift is tax-deductible to the extent provided by law. As with any donation, you should consult with your tax adviser about your particular tax situation.

Note that pandas is used in lowecase even at the beginning of the sentence.

The website is in a separate repo: https://github.com/pandas-dev/pandas-website

Validate that master branch is not used in the CI

One "mistake" that most people did in their first PR was to make the changes in the master branch. This shouldn't be done, because the local repository gets into a state that no branches for new features can be created.

In #32 we'll soon have a CI system implemented. Can we have a check in the CI that fails when a PR is opened from the master branch?

Remove alias in pandas docs for numpy.random.randn

See the discussion here: pandas-dev/pandas#28038 (comment)

In pandas we have a header variable that we inject at the top of all documentation pages. One of the things there is randn = np.random.randn, which is a very bad idea that is only useful to confuse users.

With a quick grep I could just find a single case where that is used:

whatsnew/v0.10.0.rst:  In [58]: p4d = Panel4D(randn(2, 2, 5, 4),

We should replace that case to use np.randon.randn explicitly, remove that line from the header in doc/source/conf.py and see in the doc builds if nothing fails after that change (this can be done locally or also in the CI).

Merge doc makefiles into a Python script

In the doc/ directory we've got two different scripts, that were created by Sphinx, Makefile and make.bat. The reason why Sphinx does this is because a Makefile only works in unix system (Linux or Mac mainly), so a separate make.bat file to build the docs in windows is provided.

But instead of having two separate scripts, seems more efficient to have a single script in a language that works in any platform. Python looks like a good option for that. Can we replace those files by a Python file that builds the documentation and provides the same functionality?

pandas issue: typo in roadmap

Just realized that the recently created pandas roadmap has a typo. In the section https://dev.pandas.io/development/roadmap.html#decoupling-of-indexing-and-internals the uses word is duplicated.

If anybody here wants to fix that, you should already know how to open a PR not using master ;)

Please add a comment in this issue if you're taking this, and if someone does don't also work on the same, that's not nice (open source is about working as a team, there is no competition at all, not even among different projects doing the same). Also, make sure there is not a PR in pandas already open for this.

The file to edit should be doc/source/development/roadmap.md (didn't check in detail).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.