pandas-dev / pandas-stubs Goto Github PK
View Code? Open in Web Editor NEWPublic type stubs for pandas
License: BSD 3-Clause "New" or "Revised" License
Public type stubs for pandas
License: BSD 3-Clause "New" or "Revised" License
See #54 (comment)
>>> import pandas as pd
>>> pd.read_xml('path/to/file', xpath=xpath, stylesheet=xsl)
reports
error: Module has no attribute "read_xml"; maybe "read_html" or "read_excel"?
Describe the bug
Series
index constructor can accept None
To Reproduce
from __future__ import annotations
import pandas as pd
def f(idx: pd.Index | None) -> pd.Series:
return pd.Series([1,2,3],index=idx)
error is
error: Argument "index" to "Series" has incompatible type "Optional[Index]"; expected "Union[str, int, Series[Any], List[Any], Index]"
Please complete the following information:
@Dr-Irv Could you please tell me, what is the process to publish the package in conda? I finished the CD process to publish in github and pypi automaticly with a new version, but I'm not sure if it is possible to automate the conda process.
DataFrame.quantile
should accept Sequence[float]
rather than just List[float]
.
import numpy as np
import pandas as pd
df = pd.DataFrame([np.arange(100.0)],columns=["a"])
df.quantile(np.array(0.25, 0.75])
Error is
error: No overload variant of "quantile" of "DataFrame" matches argument type "ndarray[Any, dtype[Any]]"
note: Possible overload variants:
note: def quantile(self, q: float = ..., axis: Literal['columns', 'index', 0, 1] = ..., numeric_only: bool = ..., interpolation: Union[str, Literal['linear', 'lower', 'higher', 'midpoint', 'nearest']] = ...) -> Series[Any]
note: def quantile(self, q: List[float], axis: Literal['columns', 'index', 0, 1] = ..., numeric_only: bool = ..., interpolation: Union[str, Literal['linear', 'lower', 'higher', 'midpoint', 'nearest']] = ...) -> DataFrame
to_datetime
does not accept NumPy datetime64
which is a valid type for conversoin.
At some point, they will cut over and take what we have here.
Seems like when the 'tight' option for the 'orient' parameter was added, the type hints were not updated (nor were the exception messages).
Related to:
pandas-dev/pandas#47450
Could you please remove pandas and matplotlib from the dependencies of the pandas-stubs package.
Why would you have a separate stubs package and then depend on the main package and even matplotlib?
This package was super slim until 6 days ago: https://github.com/VirtusLab/pandas-stubs/blob/master/setup.py#L38-L40
Thank you.
import pandas as pd
import numpy as np
indices = np.array([0,1,2,3], dtype=np.int64)
values_s = pd.Series(np.arange(10),name="a")
values_s.iloc[indices]
The error is
error: No overload variant of "__getitem__" of "_iLocIndexerSeries" matches argument type "ndarray[Any, dtype[signedinteger[_64Bit]]]"
note: Possible overload variants:
note: def __getitem__(self, int) -> Any
note: def __getitem__(self, Union[Index, slice]) -> Series[Any]
My pyright test action, which is forked from Jake Bailey's but adds the warn on partial option, just got updated with the latest changes from Jake's, which are quite extensive. It would be good to kick off a test run of the stubs (I don't have permission). If things aren't working please open an issue at https://github.com/gramster/pyright-action
mypy 0.961
pandas-stubs 1.4.2.220626
import pandas
df: pandas.DataFrame
df.groupby(df.index)
output:
error: Argument 1 to "groupby" of "DataFrame" has incompatible type "Index"; expected "Union[List[str], str, None]" [arg-type]
The documentation explicitly caters for Series and dict; it should be expanded to clarify that Index is also OK:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
We already use pyproject.toml to store some configs of the project, so I thought, why do we don't use poetry to manage and publish the project?
Here's the deal. Mypy, Pyright and Pytests could store the configs in pyproject.toml file. We could run that thought poetry, split the development dependencies and publish the whole project. It's gonna be easier and clean for anyone to contribute to the project.
Download the project -> poetry install -> make code -> poetry run tests -> poetry publish
Compared to Pandas, the CI is amazingly fast :) but it could be faster. Installing all the dependencies takes the most time compared to any other CI step. Installing dependencies seems especially slow on windows (1min56), it is faster on mac (1min11) and ubuntu (49s).
I think there are at least two approaches:
pyright
and mypy
have issues in dealing with overloads on generics that are ambiguous, as pointed out by numpy
. See the following:
Issue comes in to play with respect to subtraction. For pandas, we'd like an untyped Series
to remain untyped, but subtraction of two Series[Timestamp]
to yield Series[Timedelta]
. This doesn't seem possible. Right now, the current stubs return Series[Timestamp]
when subtracting two untyped series, but do return Series[Timedelta]
when subtracting two Series[Timestamp]
. Discovered this by using the new assert_type()
feature.
In the current stubs from MS copied here, we use Series[bool]
, Series[Timestamp]
, Series[Timedelta]
, Series[float]
and Series[int]
as arguments and/or return types of different methods, which sharpens up some of the type checks.
Possible solutions:
BoolSeries
, TimestampSeries
, TimedeltaSeries
, FloatSeries
and IntSeries
that are typing subclasses of Series
that can help with series that have types and those that don't.Need to experiment with (2), and if it can't work, just remove all the generic stuff.
Minimal working file for pd.Timestamp
:
# ––– file ts_bug.py
import pandas as pd
pd.date_range('2000-01-01', '2000-01-10') < pd.Timestamp('2000-01-05')
Usage:
$ python ts_bug.py # → 👍
$ mypy ts_bug.py
ts_bug.py:4: error: Unsupported operand types for > ("Timestamp" and "DatetimeIndex")
Found 1 error in 1 file (checked 1 source file)
Minimal working file for datetime.datetime
:
# ––– file dt_bug.py
import datetime as dt
import pandas as pd
pd.date_range('2000-01-01', '2000-01-10') < dt.datetime(2000, 1, 5)
Usage:
$ python dt_bug.py # → 👍
$ mypy dt_bug.py
dt_bug.py:5: error: Unsupported operand types for > ("datetime" and "DatetimeIndex")
Found 1 error in 1 file (checked 1 source file)
Versions:
$ python --version
Python 3.10.5
$ python -m pip freeze | grep 'pandas\|mypy'
mypy==0.961
mypy-extensions==0.4.3
pandas==1.4.3
pandas-stubs==1.4.3.220702
import pandas as pd
df = pd.DataFrame({"a":[1,2,3]}, index = [0,1,2])
df.reindex([2,1,0])
returns
error: Too many arguments for "reindex" of "DataFrame"
Xref #54 (comment)
>>> import pandas as pd
>>> pd.testing.assert_frame_equal(pd.DataFrame(), pd.DataFrame(), check_index_type=False)
reports
error: Unexpected keyword argument "check_index_type" for "assert_frame_equal"
placeholder.
import pandas as pd
dti = pd.date_range("2000-1-1", periods=10)
dti.tzinfo # mypy error, but is `None`.
import pandas as pd
out = pd.to_datetime("dajsldaljd", errors="coerce")
reveal_type(out)
wrong type returns
note: Revealed type is "pandas._libs.tslibs.timestamps.Timestamp"
Also, mypy
on
import pandas as pd
out = pd.to_datetime("dajsldaljd", errors="coerce")
out is pd.NaT
returns
error: Non-overlapping identity check (left operand type: "Timestamp", right operand type: "NaTType")
DataFrame.sample
generally returns a DataFrame
object with a subsample of the original's rows, but mypy is complaining saying that the output is a Series
. From what I can tell, it is also typed as Series[S1]
in the stubs code:
pandas-stubs/pandas-stubs/core/frame.pyi
Line 1800 in 7a5d7a8
Is this a bug, or is it the intended type and I'm missing a step?
I don't know if currently only pyright: basic
support is targeted? If not it would be nice if this could be infered, since as far as I know it should always be str
.
# pyright: strict
import pandas as pd
df = pd.DataFrame()
for name in df.columns: # Type of "name" is unknown Pylance(reportUnknownVariableType)
print(name)
bug.py
with the following code:>>> import pandas as pd
>>> pd.read_csv('/path/to/file', usecols=['col1', 'col2'])
mypy --config mypy.ini bug.py
Result :
bug.py:2: error: No overload variant of "read_csv" matches argument types "str", "List[str]"
Also, PyCharm displays this message :
Expected type 'Union[ExtensionArray, ndarray, (Any) -> Any, None]', got 'list[str]' instead
usecols
should accept List[str]
pandas-stup version : 1.2.0.62
mypy : 0.961
>>> import distutils
>>> import pandas as pd
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 21.5.0
Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : fr_FR.UTF-8
pandas : 1.3.5
numpy : 1.23.0
pytz : 2022.1
dateutil : 2.8.2
pip : 22.1.2
setuptools : 62.6.0
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.3 (dt dec pq3 ext lo64)
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.5.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.39
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
When testing the distribution with poetry run poe test_dist
if you have not committed changes (including new files), those files get lost, since we remove the directory pandas-stubs
to check things.
Probably best to not test the dist if there are uncommitted changes (or new files) in the pandas-stubs
folder, and output an error message asking if the developer wants to commit.
@BrenoJesusFernandes
Hello guys, what do you think about to reorganize the tests to be easier to find and test? There was a work in progress in VirtusLab project. Here the link:
https://github.com/VirtusLab/pandas-stubs/tree/feature/reorganize_tests
mypy 0.961
pandas-stubs 1.4.2.220626
s: pandas.Series
s.cat.codes
output:
"ellipsis" has no attribute "codes"
Currently the type annotation for Index.difference is
def difference(self, other: Union[List[T1], Index]) -> Index: ...
None
values are disallowed for the other
argument if it is a list. As a result of that the following code, which runs perfectly fine, will give an error when mypy
is used to check it:
from pandas.core.indexes.api import Index
index = Index(["a", "b", "c"])
index.difference(["a", None])
I think the function type should be changed to:
def difference(self, other: Union[List[Optional[T1]], Index]) -> Index: ...
import pandas as pd
df = pd.DataFrame({"a":[1,2,3],"b":[0.0,1,1]})
df.columns= ["c", "d"] # mypy error, columns is `Index` so cannot take List[str}
I think a lot of the pandas labels would make sense here, but not all. I propose going through them an adding any label that make sense.
cc @Dr-Irv
import pandas as pd
df = pd.DataFrame({"x": [1,2,3]})
val = df.loc[0, "x"]
reports
Argument of type "tuple[Literal[0], Literal['x']]" cannot be assigned to parameter "idx" of type "Tuple[Tuple[slice, ...], StrLike]" in function "__getitem__"
Tuple entry 1 is incorrect type
"Literal[0]" is incompatible with "Tuple[slice, ...]"
visible using python 3.9, not 3.7. Didn't check 3.8
Hi all,
I raised an issue over at pylance:
microsoft/pylance-release#2933
These has been an update, but the error became TimestampSeries instead Series[Timestamp]
to replicate:
import pandas as pd
df = pd.DataFrame(['2020-01-01','2019-01-01'])
dates = pd.to_datetime(df[0], format = '%Y-%m-%d')
date_to_compare = pd.to_datetime('2019-02-01', format = '%Y-%m-%d')
date_mask = date_to_compare <= dates
Generated the type error in vs code:
Operator "<=" not supported for types "Timestamp" and "TimestampSeries"
Operator "<=" not supported for types "Timestamp" and "TimestampSeries"
PylancereportGeneralTypeIssues
Would like to create a PR to help, but just started to figure out how type stubs work, so might need a time :-)
I am aware that this repo is "born" from the existing stubs, but it seems like there are some stubs that live in the pylance bundle (I have no idea where on GitHub, though) that we can probably add in?
For example, the entire io
folder does not exist in this repo.
Example of Styler
, for which the to_html
method does not exist in the pylance bundle..
Path on my machine:
~\.vscode\extensions\ms-python.vscode-pylance-2022.6.30\dist\bundled\stubs\pandas\io\formats\style.pyi
Apologies if there is a misunderstanding on my side—also interested in contributing, as I use pandas daily.
import pandas as pd
import numpy as np
indices = np.array([0,1,2,3], dtype=np.int64)
values_s = pd.Series(np.arange(10),name="a")
reveal_type(values_s.iloc[indices])
Incorrect type is. Should be Series[Any]
.
note: Revealed type is "Any"
https://github.com/VirtusLab/pandas-stubs suggests to install the new stubs with
pip install git+https://github.com/pandas-dev/pandas-stubs
But I get this error. thanks
pip install "git+https://github.com/pandas-dev/pandas-stubs"
Collecting git+https://github.com/pandas-dev/pandas-stubs
Cloning https://github.com/pandas-dev/pandas-stubs to /tmp/pip-req-build-9xsaj4f8
Running command git clone --filter=blob:none --quiet https://github.com/pandas-dev/pandas-stubs /tmp/pip-req-build-9xsaj4f8
Resolved https://github.com/pandas-dev/pandas-stubs to commit 2eba4d4e512421927a4ba2e6d0ac7bbd4e934afe
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [4 lines of output]
/tmp/pip-build-env-02ltw3m2/overlay/lib/python3.10/site-packages/setuptools/config/setupcfg.py:459: SetuptoolsDeprecationWarning: The license_file parameter is deprecated, use license_files instead.
warnings.warn(msg, warning_class)
running egg_info
error: error in 'egg_base' option: 'src.tmp' does not exist or is not a directory
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
I'm a bit surprised that you would rename the top-level directory of the stub files to something other than pandas. Surely it should reflect the package/main module name?
(This change also broke our github actions to pull stubs from here, but that's our problem. In investigating the break I noticed this change and it seems wrong to me to use this name. Want to raise this ASAP in case it results in further changes).
import pandas as pd
from typing import Sequence
def f(i: Sequence[str]):
return pd.Series([1,2,3],index=i)
Error is
exp.py: note: In function "f":
exp.py:4: error: Function is missing a return type annotation
exp.py:5: error: No overload variant of "Series" matches argument types "List[int]", "Sequence[str]"
exp.py:5: note: Possible overload variants:
exp.py:5: note: def [S1 in (str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta, datetime64)] Series(cls, data: DatetimeIndex, index: Union[str, int, Series[Any], List[Any], Index] = ..., dtype: Any = ..., name: Optional[Hashable] = ..., copy: bool = ..., fastpath: bool = ...) -> TimestampSeries
exp.py:5: note: def [S1 in (str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta, datetime64)] Series(cls, data: object, dtype: Type[S1], index: Union[str, int, Series[Any], List[Any], Index] = ..., name: Optional[Hashable] = ..., copy: bool = ..., fastpath: bool = ...) -> Series[S1]
exp.py:5: note: def [S1 in (str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta, datetime64)] Series(cls, data: object = ..., index: Union[str, int, Series[Any], List[Any], Index] = ..., dtype: Any = ..., name: Optional[Hashable] = ..., copy: bool = ..., fastpath: bool = ...) -> Series[Any]
Pandas doesn't define TimedeltaSeries
and TimestampSeries
:
pandas-stubs/tests/test_timefuncs.py
Line 12 in b4db0a6
poetry mypy_src
fails in a clean environment because that step doesn't include the create_mypy_pkg_file
step
@BrenoJesusFernandes
Workaround now is to do poetry test_src
Hi,
This is a duplicate of VirtusLab/pandas-stubs#53. Opening here because the repository moved.
I updated pandas-stubs from 1.1.0.7 to 1.1.0.11 and wanted to see what specifically changed. I came here hoping to see a tag or release, but cannot find either. It would really help with discoverability and usage efforts if the repository tag versions (at minimum, releases would be nice too) reflected release versions on PyPi/vice versa.
Index should accept Sequence[Any]
or similar. Probably Sequence[Hashable]
would get be able as accurate as possible.
mypy 0.961
pandas-stubs 1.4.2.220626
df: pandas.DataFrame
df.ffill()
df.bfill()
output:
error: All overload variants of "ffill" of "DataFrame" require at least one argument [call-overload]
note: Possible overload variants:
note: def ffill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[True], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> None
note: def ffill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[False], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> DataFrame
error: All overload variants of "bfill" of "DataFrame" require at least one argument [call-overload]
note: def bfill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[True], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> None
note: def bfill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[False], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> DataFrame
From the documentation:
inplace bool, default False
xref #46850
pandas-stubs/pandas-stubs/core/frame.pyi
Line 958 in f2b783c
other should have list[Series | DataFrame].
Easy to reproduce using any DataFrame.
for col in df: # col is Hashable
df[col] # mypy error
Hi,
The latest version of pandas-stubs (1.4.2.220626) depends on Pandas version 1.4.2. This prevents updating to Pandas 1.4.3 if we want to use the latest version of pandas-stubs.
ERROR: Cannot install -r requirements.txt (line 22) and pandas==1.4.3 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested pandas==1.4.3
pandas-stubs 1.4.2.220626 depends on pandas==1.4.2
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
Need to add a type for DatetimeIndex.isocalendar()
Example:
dates = pd.date_range(start = '2012-01-01', end = '2019-12-31', freq = 'W-MON')
dates.isocalendar()
@BrenoJesusFernandes
using poetry, the file my_paths.pth
gets created in site-packages
in the current environment.
Can that be avoided?
If you use the environment for something else, then it is pointing to the directory you were working in for pandas-stubs.
Should accept anything that supports int, which would include all scalar Numpy integer and unsigned integer types.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.