Giter VIP home page Giter VIP logo

pandas-stubs's People

Contributors

abizzinotto avatar adamchainz avatar aftabby avatar aholmes avatar amotzop avatar anilbey avatar azureblade3808 avatar bashtage avatar breno-jesus-fernandes avatar danielroseman avatar daverball avatar diepala avatar dr-irv avatar eford36 avatar gandhis1 avatar hamdanal avatar hauntsaninja avatar lemeteore avatar matheusfelipeog avatar mutricyl avatar paw-lu avatar phofl avatar ramvikrams avatar skatsuta avatar taoufik07 avatar tdsmith avatar teshao avatar tmke8 avatar twoertwein avatar wakabame avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pandas-stubs's Issues

read_xml is missing

See #54 (comment)

>>> import pandas as pd
>>> pd.read_xml('path/to/file', xpath=xpath, stylesheet=xsl)

reports

error: Module has no attribute "read_xml"; maybe "read_html" or "read_excel"?

BUG: Argument "index" to "Series" has incompatible type "Optional[Index]"; expected "Union[str, int, Series[Any], List[Any], Index]"

Describe the bug
Series index constructor can accept None

To Reproduce

from __future__ import annotations

import pandas as pd

def f(idx: pd.Index | None) -> pd.Series:
    return pd.Series([1,2,3],index=idx)

error is

error: Argument "index" to "Series" has incompatible type "Optional[Index]"; expected "Union[str, int, Series[Any], List[Any], Index]"

Please complete the following information:

  • Windows
  • 11
  • 3.9
  • mypy 0.961
  • 1.4.3.220704

Conda distribuituion

@Dr-Irv Could you please tell me, what is the process to publish the package in conda? I finished the CD process to publish in github and pypi automaticly with a new version, but I'm not sure if it is possible to automate the conda process.

BUG: No overload variant of "quantile" of "DataFrame" matches argument type "ndarray[Any, dtype[floating[Any]]]"

DataFrame.quantile should accept Sequence[float] rather than just List[float].

import numpy as np
import pandas as pd

df = pd.DataFrame([np.arange(100.0)],columns=["a"])
df.quantile(np.array(0.25, 0.75])

Error is

error: No overload variant of "quantile" of "DataFrame" matches argument type "ndarray[Any, dtype[Any]]"
note: Possible overload variants:
note:     def quantile(self, q: float = ..., axis: Literal['columns', 'index', 0, 1] = ..., numeric_only: bool = ..., interpolation: Union[str, Literal['linear', 'lower', 'higher', 'midpoint', 'nearest']] = ...) -> Series[Any]      
note:     def quantile(self, q: List[float], axis: Literal['columns', 'index', 0, 1] = ..., numeric_only: bool = ..., interpolation: Union[str, Literal['linear', 'lower', 'higher', 'midpoint', 'nearest']] = ...) -> DataFrame 

BUG: No overload variant of "__getitem__" of "_iLocIndexerSeries" matches argument type "ndarray[Any, dtype[signedinteger[_64Bit]]]"

import pandas as pd
import numpy as np

indices = np.array([0,1,2,3], dtype=np.int64)
values_s = pd.Series(np.arange(10),name="a")
values_s.iloc[indices]

The error is

 error: No overload variant of "__getitem__" of "_iLocIndexerSeries" matches argument type "ndarray[Any, dtype[signedinteger[_64Bit]]]"
note: Possible overload variants:
note:     def __getitem__(self, int) -> Any
note:     def __getitem__(self, Union[Index, slice]) -> Series[Any]

Test the updated GitHub action

My pyright test action, which is forked from Jake Bailey's but adds the warn on partial option, just got updated with the latest changes from Jake's, which are quite extensive. It would be good to kick off a test run of the stubs (I don't have permission). If things aren't working please open an issue at https://github.com/gramster/pyright-action

Proposal: One file to all dependencies and configs of the project

We already use pyproject.toml to store some configs of the project, so I thought, why do we don't use poetry to manage and publish the project?

Here's the deal. Mypy, Pyright and Pytests could store the configs in pyproject.toml file. We could run that thought poetry, split the development dependencies and publish the whole project. It's gonna be easier and clean for anyone to contribute to the project.

Download the project -> poetry install -> make code -> poetry run tests -> poetry publish

CI: Making the CI faster

Compared to Pandas, the CI is amazingly fast :) but it could be faster. Installing all the dependencies takes the most time compared to any other CI step. Installing dependencies seems especially slow on windows (1min56), it is faster on mac (1min11) and ubuntu (49s).

I think there are at least two approaches:

  • Caching: can probably be easily achieves (in combination) with poetry github actions: install-poetry or actions-poetry
  • poetry.lock: If we are fine with all developers using the same version of packages, we could simply commit poetry.lock to the repository. This makes poetry significantly faster as it doesn't need to resolve which packages/versions are compatible with each other. We would need to update poetry.lock every now and then (for example when there is a new mypy version).

Resolve issues with ambiguity in typed Series

pyright and mypy have issues in dealing with overloads on generics that are ambiguous, as pointed out by numpy. See the following:

Issue comes in to play with respect to subtraction. For pandas, we'd like an untyped Series to remain untyped, but subtraction of two Series[Timestamp] to yield Series[Timedelta] . This doesn't seem possible. Right now, the current stubs return Series[Timestamp] when subtracting two untyped series, but do return Series[Timedelta] when subtracting two Series[Timestamp]. Discovered this by using the new assert_type() feature.

In the current stubs from MS copied here, we use Series[bool], Series[Timestamp], Series[Timedelta], Series[float] and Series[int] as arguments and/or return types of different methods, which sharpens up some of the type checks.

Possible solutions:

  1. Give up on using generic Series.
  2. For typing purposes, create BoolSeries, TimestampSeries, TimedeltaSeries, FloatSeries and IntSeries that are typing subclasses of Series that can help with series that have types and those that don't.

Need to experiment with (2), and if it can't work, just remove all the generic stuff.

BUG: Unsupported operand types for DatetimeIndex, Timestamp and datetime

Minimal working file for pd.Timestamp:

# ––– file ts_bug.py
import pandas as pd

pd.date_range('2000-01-01', '2000-01-10') < pd.Timestamp('2000-01-05')

Usage:

$ python ts_bug.py  # → 👍‍
$ mypy ts_bug.py 
ts_bug.py:4: error: Unsupported operand types for > ("Timestamp" and "DatetimeIndex")
Found 1 error in 1 file (checked 1 source file)

Minimal working file for datetime.datetime:

# ––– file dt_bug.py
import datetime as dt
import pandas as pd

pd.date_range('2000-01-01', '2000-01-10') < dt.datetime(2000, 1, 5)

Usage:

$ python dt_bug.py  # → 👍‍
$ mypy dt_bug.py 
dt_bug.py:5: error: Unsupported operand types for > ("datetime" and "DatetimeIndex")
Found 1 error in 1 file (checked 1 source file)

Versions:

$ python --version
Python 3.10.5
$ python -m pip freeze | grep 'pandas\|mypy'
mypy==0.961
mypy-extensions==0.4.3
pandas==1.4.3
pandas-stubs==1.4.3.220702

BUG: `to_datetime` can return `NaTType`

import pandas as pd
out = pd.to_datetime("dajsldaljd", errors="coerce")
reveal_type(out)

wrong type returns

note: Revealed type is "pandas._libs.tslibs.timestamps.Timestamp"

Also, mypy on

import pandas as pd
out = pd.to_datetime("dajsldaljd", errors="coerce")
out is pd.NaT

returns

error: Non-overlapping identity check (left operand type: "Timestamp", right operand type: "NaTType")

Type is not infered when iterating over `df.columns`

I don't know if currently only pyright: basic support is targeted? If not it would be nice if this could be infered, since as far as I know it should always be str.

# pyright: strict
import pandas as pd

df = pd.DataFrame()

for name in df.columns:  # Type of "name" is unknown Pylance(reportUnknownVariableType)
    print(name)

usecols in read_csv should accept type List[str]

Reproductible example

  1. Create a file named bug.py with the following code:
>>> import pandas as pd
>>> pd.read_csv('/path/to/file', usecols=['col1', 'col2'])
  1. Run mypy :
 mypy --config mypy.ini bug.py

Result :

bug.py:2: error: No overload variant of "read_csv" matches argument types "str", "List[str]"

Also, PyCharm displays this message :

Expected type 'Union[ExtensionArray, ndarray, (Any) -> Any, None]', got 'list[str]' instead 

Expected behaviour

usecols should accept List[str]

Installed Version

pandas-stup version : 1.2.0.62
mypy : 0.961

>>> import distutils
>>> import pandas as pd
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 66e3805b8cabe977f40c05259cc3fcf7ead5687d
python           : 3.10.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 21.5.0
Version          : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : fr_FR.UTF-8

pandas           : 1.3.5
numpy            : 1.23.0
pytz             : 2022.1
dateutil         : 2.8.2
pip              : 22.1.2
setuptools       : 62.6.0
Cython           : None
pytest           : 7.1.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : 2.9.3 (dt dec pq3 ext lo64)
jinja2           : 3.1.2
IPython          : 8.4.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.5.2
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : 1.4.39
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

poetry test_dist destroys uncommitted changes

When testing the distribution with poetry run poe test_dist if you have not committed changes (including new files), those files get lost, since we remove the directory pandas-stubs to check things.

Probably best to not test the dist if there are uncommitted changes (or new files) in the pandas-stubs folder, and output an error message asking if the developer wants to commit.

@BrenoJesusFernandes

Index.difference should allow for None values in the "other" list

Currently the type annotation for Index.difference is

def difference(self, other: Union[List[T1], Index]) -> Index: ...

None values are disallowed for the other argument if it is a list. As a result of that the following code, which runs perfectly fine, will give an error when mypy is used to check it:

from pandas.core.indexes.api import Index

index = Index(["a", "b", "c"])
index.difference(["a", None])

I think the function type should be changed to:

def difference(self, other: Union[List[Optional[T1]], Index]) -> Index: ...

Issue/PR labels

I think a lot of the pandas labels would make sense here, but not all. I propose going through them an adding any label that make sense.

cc @Dr-Irv

DataFrame.loc not taking an integer as first index

import pandas as pd

df = pd.DataFrame({"x": [1,2,3]})
val = df.loc[0, "x"]

reports

Argument of type "tuple[Literal[0], Literal['x']]" cannot be assigned to parameter "idx" of type "Tuple[Tuple[slice, ...], StrLike]" in function "__getitem__"
  Tuple entry 1 is incorrect type
    "Literal[0]" is incompatible with "Tuple[slice, ...]"

visible using python 3.9, not 3.7. Didn't check 3.8

Operator "<=" not supported for types "Timestamp" and "TimestampSeries"

Hi all,
I raised an issue over at pylance:
microsoft/pylance-release#2933
These has been an update, but the error became TimestampSeries instead Series[Timestamp]
to replicate:

import pandas as pd

df =  pd.DataFrame(['2020-01-01','2019-01-01'])
dates = pd.to_datetime(df[0], format = '%Y-%m-%d')
date_to_compare = pd.to_datetime('2019-02-01', format = '%Y-%m-%d')
date_mask = date_to_compare <= dates

Generated the type error in vs code:

Operator "<=" not supported for types "Timestamp" and "TimestampSeries"
Operator "<=" not supported for types "Timestamp" and "TimestampSeries"
PylancereportGeneralTypeIssues

Would like to create a PR to help, but just started to figure out how type stubs work, so might need a time :-)

Integrate Unused Microsoft Stubs

I am aware that this repo is "born" from the existing stubs, but it seems like there are some stubs that live in the pylance bundle (I have no idea where on GitHub, though) that we can probably add in?

For example, the entire io folder does not exist in this repo.

Example of Styler, for which the to_html method does not exist in the pylance bundle..

Path on my machine:
~\.vscode\extensions\ms-python.vscode-pylance-2022.6.30\dist\bundled\stubs\pandas\io\formats\style.pyi

image

Apologies if there is a misunderstanding on my side—also interested in contributing, as I use pandas daily.

BUG: Series.iloc returns type `Any`

import pandas as pd
import numpy as np

indices = np.array([0,1,2,3], dtype=np.int64)
values_s = pd.Series(np.arange(10),name="a")
reveal_type(values_s.iloc[indices])

Incorrect type is. Should be Series[Any].

note: Revealed type is "Any"

error in 'egg_base' option: 'src.tmp' does not exist or is not a directory

https://github.com/VirtusLab/pandas-stubs suggests to install the new stubs with
pip install git+https://github.com/pandas-dev/pandas-stubs

But I get this error. thanks

pip install "git+https://github.com/pandas-dev/pandas-stubs"
Collecting git+https://github.com/pandas-dev/pandas-stubs
  Cloning https://github.com/pandas-dev/pandas-stubs to /tmp/pip-req-build-9xsaj4f8
  Running command git clone --filter=blob:none --quiet https://github.com/pandas-dev/pandas-stubs /tmp/pip-req-build-9xsaj4f8
  Resolved https://github.com/pandas-dev/pandas-stubs to commit 2eba4d4e512421927a4ba2e6d0ac7bbd4e934afe
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [4 lines of output]
      /tmp/pip-build-env-02ltw3m2/overlay/lib/python3.10/site-packages/setuptools/config/setupcfg.py:459: SetuptoolsDeprecationWarning: The license_file parameter is deprecated, use license_files instead.
        warnings.warn(msg, warning_class)
      running egg_info
      error: error in 'egg_base' option: 'src.tmp' does not exist or is not a directory
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Why the new location?

I'm a bit surprised that you would rename the top-level directory of the stub files to something other than pandas. Surely it should reflect the package/main module name?

(This change also broke our github actions to pull stubs from here, but that's our problem. In investigating the break I noticed this change and it seems wrong to me to use this name. Want to raise this ASAP in case it results in further changes).

BUG: Series `index` parameter should accept `Sequence[str]` (or more general `Sequence[Optional[Hashable]]`

import pandas as pd
from typing import Sequence

def f(i: Sequence[str]):
    return pd.Series([1,2,3],index=i)

Error is

exp.py: note: In function "f":
exp.py:4: error: Function is missing a return type annotation
exp.py:5: error: No overload variant of "Series" matches argument types "List[int]", "Sequence[str]"
exp.py:5: note: Possible overload variants:
exp.py:5: note:     def [S1 in (str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta, datetime64)] Series(cls, data: DatetimeIndex, index: Union[str, int, Series[Any], List[Any], Index] = ..., dtype: Any = ..., name: Optional[Hashable] = ..., copy: bool = ..., fastpath: bool = ...) -> TimestampSeries
exp.py:5: note:     def [S1 in (str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta, datetime64)] Series(cls, data: object, dtype: Type[S1], index: Union[str, int, Series[Any], List[Any], Index] = ..., name: Optional[Hashable] = ..., copy: bool = ..., fastpath: bool = ...) -> Series[S1]
exp.py:5: note:     def [S1 in (str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta, datetime64)] Series(cls, data: object = ..., index: Union[str, int, Series[Any], List[Any], Index] = ..., dtype: Any = ..., name: Optional[Hashable] = ..., copy: bool = ..., fastpath: bool = ...) -> Series[Any]

Repository tags and releases do not reflect PyPi versions

Hi,
This is a duplicate of VirtusLab/pandas-stubs#53. Opening here because the repository moved.

I updated pandas-stubs from 1.1.0.7 to 1.1.0.11 and wanted to see what specifically changed. I came here hoping to see a tag or release, but cannot find either. It would really help with discoverability and usage efforts if the repository tag versions (at minimum, releases would be nice too) reflected release versions on PyPi/vice versa.

inplace argument should be optional in ffill() and bfill()

mypy 0.961
pandas-stubs 1.4.2.220626

df: pandas.DataFrame
df.ffill()
df.bfill()

output:

error: All overload variants of "ffill" of "DataFrame" require at least one argument  [call-overload]
note: Possible overload variants:
note:     def ffill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[True], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> None
note:     def ffill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[False], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> DataFrame
error: All overload variants of "bfill" of "DataFrame" require at least one argument  [call-overload]
note:     def bfill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[True], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> None
note:     def bfill(self, axis: Optional[Literal['columns', 'index', 0, 1]] = ..., *, inplace: Literal[False], limit: Optional[int] = ..., downcast: Optional[Dict[Any, Any]] = ...) -> DataFrame

From the documentation:

inplace bool, default False

Latest release of Pandas (1.4.3) cannot be used with the latest release of pandas-stubs (1.4.2.220626)

Hi,
The latest version of pandas-stubs (1.4.2.220626) depends on Pandas version 1.4.2. This prevents updating to Pandas 1.4.3 if we want to use the latest version of pandas-stubs.

ERROR: Cannot install -r requirements.txt (line 22) and pandas==1.4.3 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested pandas==1.4.3
    pandas-stubs 1.4.2.220626 depends on pandas==1.4.2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

DatetimeIndex.isocalendar() is missing

Need to add a type for DatetimeIndex.isocalendar()

Example:

dates = pd.date_range(start = '2012-01-01', end = '2019-12-31', freq = 'W-MON')
dates.isocalendar()

poetry creates my_paths.pth in environment - should get removed

@BrenoJesusFernandes

using poetry, the file my_paths.pth gets created in site-packages in the current environment.

Can that be avoided?

If you use the environment for something else, then it is pointing to the directory you were working in for pandas-stubs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.