Giter VIP home page Giter VIP logo

Comments (14)

sauloal avatar sauloal commented on May 19, 2024 1

Hello @nils-braun
I would install with pip but the changes for python 3.7 are not in pip yet and I was curious to try.
I'll try installing the requirements.
Regards

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024 1

The warnings are fine (its deprecations in the pandas <-> dask interface). Thank you so much for testing this all out. I will increase the minimal required pandas version to 1.1.0 (which works, I just tested).
Thanks for testing 1.2.1 - I tested some time ago with 1.2.0 and it failed, but I will repeat the tests.

from dask-sql.

sauloal avatar sauloal commented on May 19, 2024 1

After running pip install -e ".[dev]" works perfectly

$ python dask-sql-test.py
       name    id         x
0       Tim  1017  0.999988
1   Norbert   994  0.999949
0     Frank   983  0.999970
0    Oliver   990  0.999987
1       Dan   979  0.999991
0   Michael  1021  0.999992
0     Quinn  1012  0.999973
1    Xavier   986  0.999925
0   Charlie   961  0.999986
1     Alice  1003  0.999981
0    Ingrid  1030  0.999991
0     Zelda  1050  0.999923
0     Sarah  1084  0.999987
0     Edith  1013  0.999989
0    Ursula  1015  0.999990
1  Patricia   974  0.999999
0     Jerry   994  0.999999
0     Wendy  1000  0.999990
0     Laura  1014  0.999986
0       Ray   975  0.999939
1    Hannah   940  0.999986
0    Yvonne  1033  0.999981
0       Bob   976  0.999978
0    George  1026  0.999993
1     Kevin   992  0.999981
0    Victor   983  0.999997
0.9999788188689883

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024

Thanks again!
To run those commands, you still want to install two additional packages:

  • For the tests, it is pytest-cov, which automatically tests for the coverage (can be installed via pip)
  • For the Java setup, you need to install mvn (maven). How this is done depends on your setup, but I guess you should be able to just download it (it is not a python package).

Thanks for testing all that out. As you see, dask-sql does not have a very good support for non-conda installations (so far).
I will make the two additional installation steps more clear in the documentation.
If people install dask-sql directly via pip (not for development, but for production), these things do not need to be done. You can have a look into the conda.yaml to see what is needed for development.

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024

I am planning to do a patch release soon, might already be in the next days. Hopefully it will be easier after that!

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024

Thanks again for still testing even after running against all those blockers. I have added a better error message in the related PR for the error message you have mentioned and some more documentation. After the PR is merged, you should be able to install the development requirements with

pip install -e ".[dev]"

from dask-sql.

sauloal avatar sauloal commented on May 19, 2024

It got much further but still fails some tests

$ git clone https://github.com/nils-braun/dask-sql.git

$ cd dask-sql

$ sudo apt install maven

$ pip install pytest-cov

$ python setup.py java

$ pytest tests

=============================================== short test summary info ================================================
FAILED tests/integration/test_analyze.py::test_analyze - TypeError: assert_frame_equal() got an unexpected keyword ar...
FAILED tests/integration/test_groupby.py::test_group_by_nan - AssertionError: Attributes of DataFrame.iloc[:, 0] (col...
FAILED tests/integration/test_model.py::test_training_and_prediction - ModuleNotFoundError: No module named 'dask_ml'
FAILED tests/integration/test_model.py::test_clustering_and_prediction - ValueError: Can not import model dask_ml.clu...
FAILED tests/integration/test_model.py::test_iterative_and_prediction - ModuleNotFoundError: No module named 'dask_ml'
================================ 5 failed, 119 passed, 3 skipped, 8 warnings in 47.94s =================================

from dask-sql.

sauloal avatar sauloal commented on May 19, 2024

dask-ml is also a requirement

$ pip install "dask[complete]"
$ pip install dask-ml
$ pytest tests

================================================================= short test summary info ==================================================================
FAILED tests/integration/test_analyze.py::test_analyze - TypeError: assert_frame_equal() got an unexpected keyword argument 'atol'
FAILED tests/integration/test_groupby.py::test_group_by_nan - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="c") are different
================================================== 2 failed, 122 passed, 3 skipped, 6 warnings in 55.26s ===================================================

the first error seems to be a pandas version error:

$ python
Python 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:57:50)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'1.0.5'

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024

Right! Just for your information, I have added pip install -e ".[dev]" in the newest version, so you do not need to find out all dev requirements on your own :-)
Are you still running with pandas version 1.1.5? (if you want, can you post your pip list?)

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024

Ah, you edited your answer in the same moment that I did. Very good! So it seems there is some incompatibility. Could you try with 1.1.5 again if possible?

from dask-sql.

sauloal avatar sauloal commented on May 19, 2024

just updated:

$ pip install --upgrade pandas
Collecting pandas
  Downloading pandas-1.2.1-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)
     |████████████████████████████████| 9.9 MB 5.5 MB/s
Requirement already satisfied, skipping upgrade: pytz>=2017.3 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (2020.1)
Requirement already satisfied, skipping upgrade: numpy>=1.16.5 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (1.19.4)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (2.8.1)
Requirement already satisfied, skipping upgrade: six>=1.5 in /home/saulo/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
ERROR: dask-sql 0.3.0 has requirement pandas<1.2.0, but you'll have pandas 1.2.1 which is incompatible.
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.0.5
    Uninstalling pandas-1.0.5:
      Successfully uninstalled pandas-1.0.5
Successfully installed pandas-1.2.1

despite the error it passed the test with some warnings

===================================================================== warnings summary =====================================================================
tests/integration/test_rex.py::test_like
tests/integration/test_rex.py::test_date_functions
  /mnt/d/Programs/dask/dask-sql/dask_sql/context.py:201: DeprecationWarning: register_dask_table is deprecated, use the more general create_table instead.
    DeprecationWarning,

tests/integration/test_rex.py::test_math_operations
tests/integration/test_rex.py::test_math_operations
  /home/saulo/anaconda3/lib/python3.7/site-packages/pandas/core/arraylike.py:358: RuntimeWarning: invalid value encountered in arcsin
    result = getattr(ufunc, method)(*inputs, **kwargs)

tests/integration/test_rex.py::test_math_operations
tests/integration/test_rex.py::test_math_operations
  /home/saulo/anaconda3/lib/python3.7/site-packages/pandas/core/arraylike.py:358: RuntimeWarning: invalid value encountered in arccos
    result = getattr(ufunc, method)(*inputs, **kwargs)

tests/integration/test_rex.py::test_date_functions
  /home/saulo/anaconda3/lib/python3.7/site-packages/dask/dataframe/accessor.py:88: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.
    if callable(getattr(self._meta, key)):

tests/integration/test_rex.py::test_date_functions
tests/integration/test_rex.py::test_date_functions
  /home/saulo/anaconda3/lib/python3.7/site-packages/dask/dataframe/accessor.py:43: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.
    out = getattr(getattr(obj, accessor, obj), attr)

-- Docs: https://docs.pytest.org/en/latest/warnings.html

======================================================= 124 passed, 3 skipped, 9 warnings in 51.46s ========================================================

from dask-sql.

sauloal avatar sauloal commented on May 19, 2024

Always a pleasure to help and to get a great tool to the toolbox :)
Cheers

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024

I have added a PR in #129 that fixes the tests for pandas 1.0 and 1.1. Now the requirement is actually only >=1.0 and <1.2 (the latter we still need, due to dask/dask#7156)

from dask-sql.

nils-braun avatar nils-braun commented on May 19, 2024

The upper pandas version requirement is gone now - the problem in dask is fixed.

from dask-sql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.