deppen8 / pandas-vet Goto Github PK

View Code? Open in Web Editor NEW

165.0 6.0 18.0 173 KB

A plugin for Flake8 that checks pandas code

Home Page: https://pandas-vet.readthedocs.io

License: MIT License

Python 100.00%

flake8 flake8-plugin linter pandas python

pandas-vet's Issues

Automate releases to PyPI and conda-forge

pandas-vet is available from PyPI (https://pypi.org/project/pandas-vet/) and conda-forge (https://github.com/conda-forge/pandas-vet-feedstock). I would love to automate the process that uploads a pandas-vet release to both of these places.

This will help avoid embarrassing situations like #78

Automate PyPI releases
Automate conda-forge releases

Challenging PD008: .at can be useful

Is there a good reason to prefer .loc over .at when you need to get a single value?

We use .at over .loc in our codebase when we want to signal to other developers that we intend to get a single value, as opposed to a Series or DataFrame.

The warning seems to assume the developer picked .at for speed, while their intention is more likely to have picked it for correctness and clarity.

Import pandas as pd

Check for import pandas as pd pattern.

If only import pandas, give error message:

Use import pandas as pd convention

Add instructions for installing in developer mode

The contributing guide needs an item to explain installing in "develop" mode.

Add items to README

Need to add:

contributor names
links to external sites like flake8 docs
badges
note to contributors to run flake8 and pytest before submitting PR
contributor links

Add check for python functions that should be pandas methods

This feature request is one of the topics on Minimally Sufficient Pandas
namely:

Builtin Python functions vs Pandas methods with the same name

https://deppen8.github.io/pandas-bw/general/Python-funcs-vs-pandas.png

In order to implement will need to be able to validate type, e.g. the argument inside sum, min, max, abs is a data frame.

False positive: dict().values()

Solving this false positive should be easier than solving all of them at the same time (#81), because in pandas .values is a property, not a method.

Existing `flake8` errors in source and tests

There are existing flake8 lint warnings in the source and tests

(.venv) > flake8 {pandas_vet,tests}
pandas_vet/__init__.py:116:80: E501 line too long (83 > 79 characters)
pandas_vet/__init__.py:134:80: E501 line too long (94 > 79 characters)
pandas_vet/__init__.py:137:80: E501 line too long (83 > 79 characters)
pandas_vet/__init__.py:155:80: E501 line too long (105 > 79 characters)
pandas_vet/__init__.py:158:80: E501 line too long (83 > 79 characters)
pandas_vet/__init__.py:174:80: E501 line too long (107 > 79 characters)
pandas_vet/__init__.py:177:80: E501 line too long (83 > 79 characters)
pandas_vet/__init__.py:232:80: E501 line too long (85 > 79 characters)
pandas_vet/__init__.py:365:80: E501 line too long (84 > 79 characters)
pandas_vet/__init__.py:369:80: E501 line too long (82 > 79 characters)
pandas_vet/__init__.py:373:80: E501 line too long (84 > 79 characters)
pandas_vet/__init__.py:383:80: E501 line too long (83 > 79 characters)
pandas_vet/__init__.py:386:80: E501 line too long (85 > 79 characters)
pandas_vet/__init__.py:389:80: E501 line too long (102 > 79 characters)
pandas_vet/__init__.py:392:80: E501 line too long (93 > 79 characters)
pandas_vet/__init__.py:395:80: E501 line too long (90 > 79 characters)
pandas_vet/__init__.py:398:80: E501 line too long (81 > 79 characters)
pandas_vet/__init__.py:401:80: E501 line too long (107 > 79 characters)
tests/test_PD015.py:56:80: E501 line too long (86 > 79 characters)

The simple fix here is to just update this code.

But it may also be a good idea to add flake8 lint check it CI, and to consider the library's interest in adopting black auto-formatting and how that might influence a flake8 config file.

Dockerize the development environment

Is your feature request related to a problem? Please describe.
I want to make it easier for new contributors to get going.

Describe the solution you'd like
Create a Dockerfile that installs the necessary files and dependencies for developing and testing the code. Modern IDEs like VSCode and PyCharm allow you to connect to a running Docker container which makes it a breeze to work on a container.

Check for .ix

Check for deprecated .ix. Give error message:

'.ix' is deprecated. Use '.loc' or '.iloc' instead.

Add notes about disabling checks to README

We should document both the general disabling/ignoring of checks as well as the per-line disabling provided by flake8 (#no-qa)

This is a spin-off from #81

False positives with common method names

This is a dedicated issue for the big discussion in #74

The problem is that many of our checks rely on the type of the object being a pandas object. This is a fundamental issue with static linting in Python because the AST doesn't know what type a thing is. This leads to false positives for things like re.sub() or dict.values()

I am open to suggestions on how to get around this, but it will likely be a big job. Some kind of integration with mypy or some other way to leverage type annotations might be a way to fix this, at least for folks who use those type annotations. What exactly that looks like is unclear to me, so please let me know if you have any ideas.

For now, the undesirable workaround is to turn off checks that are particularly bothersome.

Use generator expressions when possible

Is your feature request related to a problem? Please describe.

pd.concat([df for df in my_dfs])

works, but could also be written as

pd.concat(df for df in my_dfs)

thanks for PEP289.

Describe the solution you'd like

Flag cases when a generator expression could be used instead of a list comprehension

Describe alternatives you've considered
Giving a warning from within pandas - problem is, at runtime, one does not know whether a function has been passed a list written as a list comprehension or if it was a preformed list. So this is likely better-suited to a linter

Additional context

The trickiest part would probably be to identify all the cases when this can be done - that might mean having to parse the codebase and look for public functions where an argument is annotated with Iterable. I'm happy to raise a PR if you'd like this

pandas-vet should run when flake-8 invoked by pre-commit

Describe the bug
First of all thanks as this repository look awesome, what I expect to happend is that pandas-vet should run when flake-8 invoked by pre-commit and it doesnt

To Reproduce
Steps to reproduce the behavior:

Install dependencies:

pip install pre-commit==2.1.1
pip install pandas-vet==0.2.2

Create .pre-commit-config.yaml

repos:

repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.4.0
hooks:

id: flake8

Install pre-commit to specific repository

pre-commit install

Write code that break pandas-vet like naming dataframe df and try to commit and it will pass

Expected behavior
I would like that it will run in similar way when executed by the pre-commit hook

Desktop (please complete the following information):

OS: Catalina

Check for pd.merge

Check for use of pd.merge function. Preferred is .merge method. Even pandas docs use .merge for the documentation of pd.merge! See flashcard.

Give error message:

Use '.merge' method instead of 'pd.merge' function. They have equivalent functionality.

Check for .stack

Check for .stack method. See flashcard. Give error message:

Prefer '.melt' to '.stack'. '.melt' allows direct column renaming and avoids a MultiIndex

Discussion of features for next release (tentatively v0.3.0)

Creating this issue to propose and discuss additions that should be in the next release. Anything that seems like a winner will get added to the Milestone and we can work from there.

Check for pd.read_table

Check for pd.read_table function call. See flashcard here. Give error message:

'pd.read_table' is deprecated. Use 'pd.read_csv' for all delimited files.

inplace set to a variable raises exception

Describe the bug
raised exception

To Reproduce
Steps to reproduce the behavior:

Have the following code in a file

def some_function(dataFrame, in_place=False):
    return dataFrame.drop([], inplace=in_place)

Expected behavior
Allow flake8 to report violations and not throw exceptions.

Screenshots

Additional context

bash-5.1# cat /usr/lib/python3.9/site-packages/pandas_vet/version.py 
__version__ = "0.2.2"

This is running on a docker container based on alpine:3.14.1. Same results obtained on a mac.

Things work if we do not provide a variable:

def some_function(dataFrame, in_place=False):
    return dataFrame.drop([], inplace=False)

PD011 - shoud be to_numpy() instead of to_array()?

Describe the bug
for PD011 should the suggestion instead be to_numpy() instead of to_array() (given suggestion in warnings of https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.values.html)? Perhaps I am wrong on this but just wanted to bring up or least get clarification on potential difference

Constant column check with `nunique`

Try to respond to as many of the following as possible

Generally describe the pandas behavior that the linter should check for and why that is a problem. Links to resources, recommendations, docs appreciated

The linter should check for nunique being compared to 1. The detected pattern is less performant because it does not leverage short-circuiting when multiple unique values are found, and simply continues counting..

def setup(n):
    return pd.Series(list(range(n)))

def setup(n):
    return pd.Series([1] * (n - 1) + [2])

Suggest specific syntax or pattern(s) that should trigger the linter (e.g., .iat)

df.column.nunique() == 1
df.column.nunique() != 1
df.column.nunique(dropna=True) == 1
df.column.nunique(dropna=True) != 1
df.column.nunique(dropna=False) == 1
df.column.nunique(dropna=False) != 1

Suggest specific syntax or pattern(s) that the linter should allow (e.g., .iloc)

Note that the solution is simple when there are no NaN values:

(series.values[0] == series.values).all()

And needs some additional logic when NaN/NA values are present.

For dropna=True

v = series.values
v = remove_na_arraylike(v)
if v.shape[0] == 0:
    return False
(v[0] == v).all()

For dropna=False

v = s.values
if v.shape[0] == 0:
    return False
(v[0] == v).all() or not pd.notna(v).any()

if included in pandas:

series.is_constant()

Suggest a specific error message that the linter should display (e.g., "Use '.iloc' instead of '.iat'. If speed is important, use numpy indexing")

Consider checking equality to first element instead of .nunique() == 1 for checking for a constant column.

Are you willing to try to implement this check?

Yes
No
Maybe, with some guidance

Write AST explorer tool to facilitate creation of new 'check' functions.

Developing the check functions for each linter errors requires exploration of the AST nodes to determine the corresponding attributes to be compared against the valid code patterns. This presents a barrier to quickly implementing new linter checks that could be reduced if there were a tool that returns the appropriate AST node attributes for a specified pattern.

The envisioned solution might utilize the following form:

attributes = ast_explore(code_signature)

where code_signature is a string representing the code pattern of interest, and attributes is a string (or list of strings) representing the composed attributes for the corresponding AST node. This string could then be appended to the AST node in the check functions:

if node.<attribute_string> == <test_condition>:

for attribute in node.<attribute_string>:

Conda and pip install (0.2.2) not showing plugins with flake8 --version; 0.2.1 works

Installing pandas-vet with pip or conda on Windows or WSL results in flake8 --version returning

3.7.9 () CPython 3.7.3 on Windows
3.7.9 () CPython 3.8.2 on Linux

Expected behavior

Make a new env
pip install flake8 pandas-vet==0.2.2

flake8 --version
3.7.9 (flake8-pandas-vet: 0.2.1, mccabe: 0.6.1, pycodestyle: 2.5.0, pyflakes: 2.1.1) CPython 3.7.3 on Windows

To Reproduce
Steps to reproduce the behavior:

Make a new env
pip install flake8 pandas-vet or pip install flake8 pandas-vet==0.2.2
flake8 --version
See error

Desktop Environment:

OS: Microsoft Windows [Version 10.0.18363.720]
Python versions tested: 3.7.3, 3.8.2
pandas-vet version: 0.2.2

Additional Context
Running flake8 -v --version results in:

flake8.plugins.manager    MainProcess     65 INFO     Loading entry-points for "flake8.extension".
flake8.plugins.manager    MainProcess     73 INFO     Loading entry-points for "flake8.report".
flake8.plugins.manager    MainProcess     80 INFO     Loading plugin "F" from entry-point.
flake8.plugins.manager    MainProcess    118 INFO     Loading plugin "pycodestyle.ambiguous_identifier" from entry-point
.
flake8.plugins.manager    MainProcess    123 INFO     Loading plugin "pycodestyle.bare_except" from entry-point.
flake8.plugins.manager    MainProcess    123 INFO     Loading plugin "pycodestyle.blank_lines" from entry-point.
flake8.plugins.manager    MainProcess    123 INFO     Loading plugin "pycodestyle.break_after_binary_operator" from entr
y-point.
flake8.plugins.manager    MainProcess    123 INFO     Loading plugin "pycodestyle.break_before_binary_operator" from ent
ry-point.
flake8.plugins.manager    MainProcess    123 INFO     Loading plugin "pycodestyle.comparison_negative" from entry-point.
flake8.plugins.manager    MainProcess    124 INFO     Loading plugin "pycodestyle.comparison_to_singleton" from entry-po
int.
flake8.plugins.manager    MainProcess    124 INFO     Loading plugin "pycodestyle.comparison_type" from entry-point.
flake8.plugins.manager    MainProcess    124 INFO     Loading plugin "pycodestyle.compound_statements" from entry-point.
flake8.plugins.manager    MainProcess    124 INFO     Loading plugin "pycodestyle.continued_indentation" from entry-poin
t.
flake8.plugins.manager    MainProcess    124 INFO     Loading plugin "pycodestyle.explicit_line_join" from entry-point.
flake8.plugins.manager    MainProcess    124 INFO     Loading plugin "pycodestyle.extraneous_whitespace" from entry-poin
t.
flake8.plugins.manager    MainProcess    125 INFO     Loading plugin "pycodestyle.imports_on_separate_lines" from entry-
point.
flake8.plugins.manager    MainProcess    125 INFO     Loading plugin "pycodestyle.indentation" from entry-point.
flake8.plugins.manager    MainProcess    125 INFO     Loading plugin "pycodestyle.maximum_doc_length" from entry-point.
flake8.plugins.manager    MainProcess    125 INFO     Loading plugin "pycodestyle.maximum_line_length" from entry-point.
flake8.plugins.manager    MainProcess    125 INFO     Loading plugin "pycodestyle.missing_whitespace" from entry-point.
flake8.plugins.manager    MainProcess    126 INFO     Loading plugin "pycodestyle.missing_whitespace_after_import_keywor
d" from entry-point.
flake8.plugins.manager    MainProcess    126 INFO     Loading plugin "pycodestyle.missing_whitespace_around_operator" fr
om entry-point.
flake8.plugins.manager    MainProcess    126 INFO     Loading plugin "pycodestyle.module_imports_on_top_of_file" from en
try-point.
flake8.plugins.manager    MainProcess    126 INFO     Loading plugin "pycodestyle.python_3000_async_await_keywords" from
 entry-point.
flake8.plugins.manager    MainProcess    126 INFO     Loading plugin "pycodestyle.python_3000_backticks" from entry-poin
t.
flake8.plugins.manager    MainProcess    127 INFO     Loading plugin "pycodestyle.python_3000_has_key" from entry-point.
flake8.plugins.manager    MainProcess    127 INFO     Loading plugin "pycodestyle.python_3000_invalid_escape_sequence" f
rom entry-point.
flake8.plugins.manager    MainProcess    127 INFO     Loading plugin "pycodestyle.python_3000_not_equal" from entry-poin
t.
flake8.plugins.manager    MainProcess    127 INFO     Loading plugin "pycodestyle.python_3000_raise_comma" from entry-po
int.
flake8.plugins.manager    MainProcess    127 INFO     Loading plugin "pycodestyle.tabs_obsolete" from entry-point.
flake8.plugins.manager    MainProcess    128 INFO     Loading plugin "pycodestyle.tabs_or_spaces" from entry-point.
flake8.plugins.manager    MainProcess    128 INFO     Loading plugin "pycodestyle.trailing_blank_lines" from entry-point
.
flake8.plugins.manager    MainProcess    128 INFO     Loading plugin "pycodestyle.trailing_whitespace" from entry-point.
flake8.plugins.manager    MainProcess    128 INFO     Loading plugin "pycodestyle.whitespace_around_comma" from entry-po
int.
flake8.plugins.manager    MainProcess    129 INFO     Loading plugin "pycodestyle.whitespace_around_keywords" from entry
-point.
flake8.plugins.manager    MainProcess    129 INFO     Loading plugin "pycodestyle.whitespace_around_named_parameter_equa
ls" from entry-point.
flake8.plugins.manager    MainProcess    129 INFO     Loading plugin "pycodestyle.whitespace_around_operator" from entry
-point.
flake8.plugins.manager    MainProcess    129 INFO     Loading plugin "pycodestyle.whitespace_before_comment" from entry-
point.
flake8.plugins.manager    MainProcess    129 INFO     Loading plugin "pycodestyle.whitespace_before_parameters" from ent
ry-point.
flake8.plugins.manager    MainProcess    130 INFO     Loading plugin "C90" from entry-point.
flake8.plugins.manager    MainProcess    131 INFO     Loading plugin "PD" from entry-point.
flake8.plugins.manager    MainProcess    147 INFO     Loading plugin "default" from entry-point.
flake8.plugins.manager    MainProcess    151 INFO     Loading plugin "pylint" from entry-point.
flake8.plugins.manager    MainProcess    151 INFO     Loading plugin "quiet-filename" from entry-point.
flake8.plugins.manager    MainProcess    151 INFO     Loading plugin "quiet-nothing" from entry-point.

Check for `df` variable names

As raised in #64, variables named df are generally bad practice. However, they are used so widely that it would be pretty disruptive to introduce this as a normal check. It should be the first of the off-by-default checks (#71).

False positive: PD005 for regex `sub` method

Describe the bug
False positive: PD005 for regex sub method

To Reproduce
PD005 for this code:

return re.sub(r"\s+", " ", variable).strip()

Expected behavior
no warning

Method for off-by-default linter checks

As discussed in #64, it would be good to have some checks that can be implemented but are "off" by default. These would be the most opinionated checks that would be a bit too strict to be activated out-of-the-box.

Checks for .at and .iat

Check for .at and .iat indexing methods. These should probably be separate checks for consistency with other cases. See flashcard. Give error messages:

Use '.loc' instead of '.at'. If speed is important, use numpy.

Use '.iloc' instead of '.iat'. If speed is important, use numpy.

PD005 erroneously flags arithmetic methods from other packages

Describe the bug
PD005 is triggered when any module has a method named something like .add()

To Reproduce
I noticed this while using the tarfile module. The tarfile module has a method named .add()

Here is the snippet. Flake8 triggered PD005 on both of the tar.add() calls.

tar = tarfile.open(tar_filename, "w")
tar.add(other_filename)  # <-- triggers PD005
for filename in filenames:
    tar.add(filename)  # <-- triggers PD005
tar.close()

Expected behavior
I expect this to only be triggered when a method like .add() is called on a pandas object.

Additional context
I expect that this applies to the comparison operators too (PD006). Any fix for PD005 should also be made for PD006 too.

`PD012` is outdated

First off, thank you for developing this plugin!

Is your feature request related to a problem? Please describe.

read_table is no longer deprecated in favour of read_csv following a discussion. As far as I can tell from reading the references in the given motivation, this means there is no longer a good reason to recommend read_csv over read_table per se.

Describe the solution you'd like

The rule should either be categorised as opinionated to require an opt-in (like PD901), removed, or changed to only emit a warning if read_table is called with sep="," (which is what ruff has just done).

Find non-passing pandas scripts in the wild

It would be cool to have examples of real pandas scripts to feed to the tests. This might help us track down bugs. For example, I am anticipating we might catch false positives for some places where pandas overlaps numpy

Never set `inplace = True`

Check for use of parameter inplace = True. If found, give error message:

inplace operations do not always behave as you might expect

Add note to README re: AST tools

@simchuck is adding some good notes to the wiki about tools to explore and understand the AST. We should add an item "0" in the "How to add a check to the linter" section of README.md that links folks to the wiki for tips on working with the AST.

Improve contributor experience

Is your feature request related to a problem? Please describe.
I really like this project, and I want to contribute, but there are some roadblocks in the way.
Getting the project setup on my local machine had some issues:

Python 3.10 not supported. That is my default python, so it would be helpful if it was supported
Outdated dependencies. pytest and black are both pinned to old version that have issues. I suggest moving to poetry for dependecy handling instead
manual style checks. I'd much rather have a predefined pre-commit check to run black, flake8, pytest etc, instead of finding the exact commands in the readme before committing.

This is not meant to be one big complaint. This project is really useful to me, so thanks for building it.

PD007, PD008, PD009 should not use node.value.attr

When flake8 runs check_for_ix, check_for_at, or check_for_iat, it raises AttributeError: 'Name' object has no attribute 'attr'

This seems to indicate that node.value returns a 'Name' object and that node.value.attr should be changed. Not sure if we can follow the pattern for check_for_isnull, etc. because it .ix[], .at[].

Check for `.values`

https://deppen8.github.io/pandas-bw/general/values-array-to_numpy.png

Check for calls to .values. If found, error message:

Use .to_numpy if you need a numpy ndarray or .array (returns a PandasArray)

Add nox support to run tests, flake8, black locally

nox can do much of what TravisCI is doing for us, only it can do it locally and, in the case of black, update the local version that can then be pushed to the repo.

"import re; re.sub(...)" Marked as PD005 Error

Description/Steps to Reproduce
For the below file, pandas-vet returns the error tmp.py:2:1: PD005 Use arithmetic operator instead of method. It looks like the rule is intended for pandas.DataFrame.sub but is being applied to re.sub

import re
re.sub('', '', '')

Improve official docs

Need to add:

Basic usage guide
More on contributing.
Contributors list
License

Basically, everything that is currently in the README.md should be moved to the docs. The README.md should have minimal info and a basic example.

Check for .pivot and .unstack

Check for .pivot and .unstack methods. See flashcard. Give error message:

'.pivot' and '.unstack' functionality can be achieved with just '.pivot_table'

Add docstrings to everything

Every class, function, etc. should have a docstring that at least describes functionality, but many are currently lacking.

Error: No such option "--annoy"

Describe the bug
Cannot use the --annoy flag with flake8

To Reproduce
Steps to reproduce the behavior:

Create tmp.py file
Install flake8 and flake8 the file
Install pandas-vet
flake8 again verifying that pandas-vet was loaded (expected: ``)
Try again with --annoy flag. This fails with error: flake8: error: no such option: --annoy
See terminal output below

tmp.py

import pandas

that = 1
this = that
print(this)

df = 1

Terminal:

(py368) C:\Users\king.kyle\>flake8 tmp.py
tmp.py:1:1: F401 'pandas' imported but unused
tmp.py:1:1: D100 Missing docstring in public module
tmp.py:5:1: T001 print found.

(py368) C:\Users\king.kyle\>python -m pip install pandas-vet
Collecting pandas-vet
  Using cached https://files.pythonhosted.org/packages/21/53/d031fd623fde85f554c73d87c431ad4cf5d929d89c1cd728ab5e4d145a52/pandas_vet-0.2.1-py3-none-any.whl
Installing collected packages: pandas-vet
Successfully installed pandas-vet-0.2.1

(py368) C:\Users\king.kyle\>flake8 tmp.py
tmp.py:1:1: F401 'pandas' imported but unused
tmp.py:1:1: D100 Missing docstring in public module
tmp.py:1:1: PD001 pandas should always be imported as 'import pandas as pd'
tmp.py:5:1: T001 print found.

(py368) C:\Users\king.kyle\>flake8 tmp.py --annoy
Usage: flake8 [options] file file ...

flake8: error: no such option: --annoy

(py368) C:\Users\king.kyle\>

Expected behavior
Expected the error PD901 'df' is a bad variable name. Be kinder to your future self.

Environment:

OS: Windows 10 64-bit / 1909

(py368) C:\Users\king.kyle\Developer\Packages\common_dev>python
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()

(py368) C:\Users\king.kyle\Developer\Packages\common_dev>flake8 --version
3.7.9 (flake8-blind-except: 0.1.1, flake8-bugbear: 19.8.0, flake8-docstrings: 1.5.0, pydocstyle: 4.0.1, flake8-mock: 0.3, flake8-pandas-vet: 0.2.1, flake8-print: 3.1.4, flake8-tuple: 0.4.1, flake8_builtins: 1.4.1, flake8_commas: 2.0.0, flake8_deprecated: 1.2, flake8_isort: 2.3, flake8_quotes: 2.1.1, logging-format: 0.6.0, mccabe: 0.6.1, naming: 0.8.2, pycodestyle: 2.5.0, pyflakes: 2.1.1) CPython 3.6.8 on Windows

Documentation site

I would like to expand our docs beyond just the README file. A simple Read-the-Docs site or GitHub Page would be good. It should include pages for

Add banners to Readme.

With #88 we have tests, flake8 and black running on every PR and head. We should get some of those fancy banners in the readme to reflect that these are passing.

Take a look at how black does these for inspiration:

https://github.com/psf/black/blame/ce14fa8b497bae2b50ec48b3bd7022573a59cdb1/README.md#L5-L14

Collect groupby examples

For help with #24, we need to better understand the range of groupby uses. We should collect some real-world uses of groupby to better understand whether a linting pattern will raise too many false positives.

New approach to docs with JupyterBook

I have recently started adopting JupyterBook for documenting projects and I love it. It is great for adding the things that I would like to add to pandas-vet like tutorials and mixing Markdown and reST Sphinx docs.

It also has nice pre-defined GitHub Actions for building the docs and putting them in a branch for GitHub Pages.

This would replace issue #70 and #87

Check for .groupby aggregation patterns

Not even sure if this can be implemented, but maybe with a clever regular expression, it could. See the flashcard for some more details.

Refactor tests

All of our tests are currently in test_PD001.py. We need to either change the name of that file to something more general or breakout our tests into separate files corresponding to the error codes. I prefer the latter. So for each error code, we should have a different test_PD<code>.py file.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Arithmetic and comparison operators

We should implement checks for all of the text-based arithmetic and comparison operators. If found, recommend using the operator itself. Something like:

Use instead of

use	check for
`+`	`.add`
`-`	`.sub` and `.subtract`
`*`	`.mul` and `.multiply`
`/`	`.div`, `.divide` and `.truediv`
`**`	`.pow`
`//`	`.floordiv`
`%`	`.mod`
`>`	`.gt`
`<`	`.lt`
`>=`	`.ge`
`<=`	`.le`
`==`	`.eq`
`!=`	`.ne`

na vs. null

Check for .isnull and .notnull methods. Check them separately.

If .isnull found, give error message:

Use .isna instead of .isnull

If .notnull found, give error message:

Use .notna instead of .notnull

deppen8 / pandas-vet Goto Github PK

pandas-vet's Issues

Recommend Projects

Recommend Topics

Recommend Org