Comments (14)
Hello @nils-braun
I would install with pip but the changes for python 3.7 are not in pip yet and I was curious to try.
I'll try installing the requirements.
Regards
from dask-sql.
The warnings are fine (its deprecations in the pandas <-> dask interface). Thank you so much for testing this all out. I will increase the minimal required pandas version to 1.1.0 (which works, I just tested).
Thanks for testing 1.2.1 - I tested some time ago with 1.2.0 and it failed, but I will repeat the tests.
from dask-sql.
After running pip install -e ".[dev]"
works perfectly
$ python dask-sql-test.py
name id x
0 Tim 1017 0.999988
1 Norbert 994 0.999949
0 Frank 983 0.999970
0 Oliver 990 0.999987
1 Dan 979 0.999991
0 Michael 1021 0.999992
0 Quinn 1012 0.999973
1 Xavier 986 0.999925
0 Charlie 961 0.999986
1 Alice 1003 0.999981
0 Ingrid 1030 0.999991
0 Zelda 1050 0.999923
0 Sarah 1084 0.999987
0 Edith 1013 0.999989
0 Ursula 1015 0.999990
1 Patricia 974 0.999999
0 Jerry 994 0.999999
0 Wendy 1000 0.999990
0 Laura 1014 0.999986
0 Ray 975 0.999939
1 Hannah 940 0.999986
0 Yvonne 1033 0.999981
0 Bob 976 0.999978
0 George 1026 0.999993
1 Kevin 992 0.999981
0 Victor 983 0.999997
0.9999788188689883
from dask-sql.
Thanks again!
To run those commands, you still want to install two additional packages:
- For the tests, it is pytest-cov, which automatically tests for the coverage (can be installed via pip)
- For the Java setup, you need to install mvn (maven). How this is done depends on your setup, but I guess you should be able to just download it (it is not a python package).
Thanks for testing all that out. As you see, dask-sql does not have a very good support for non-conda installations (so far).
I will make the two additional installation steps more clear in the documentation.
If people install dask-sql directly via pip (not for development, but for production), these things do not need to be done. You can have a look into the conda.yaml to see what is needed for development.
from dask-sql.
I am planning to do a patch release soon, might already be in the next days. Hopefully it will be easier after that!
from dask-sql.
Thanks again for still testing even after running against all those blockers. I have added a better error message in the related PR for the error message you have mentioned and some more documentation. After the PR is merged, you should be able to install the development requirements with
pip install -e ".[dev]"
from dask-sql.
It got much further but still fails some tests
$ git clone https://github.com/nils-braun/dask-sql.git
$ cd dask-sql
$ sudo apt install maven
$ pip install pytest-cov
$ python setup.py java
$ pytest tests
=============================================== short test summary info ================================================
FAILED tests/integration/test_analyze.py::test_analyze - TypeError: assert_frame_equal() got an unexpected keyword ar...
FAILED tests/integration/test_groupby.py::test_group_by_nan - AssertionError: Attributes of DataFrame.iloc[:, 0] (col...
FAILED tests/integration/test_model.py::test_training_and_prediction - ModuleNotFoundError: No module named 'dask_ml'
FAILED tests/integration/test_model.py::test_clustering_and_prediction - ValueError: Can not import model dask_ml.clu...
FAILED tests/integration/test_model.py::test_iterative_and_prediction - ModuleNotFoundError: No module named 'dask_ml'
================================ 5 failed, 119 passed, 3 skipped, 8 warnings in 47.94s =================================
from dask-sql.
dask-ml
is also a requirement
$ pip install "dask[complete]"
$ pip install dask-ml
$ pytest tests
================================================================= short test summary info ==================================================================
FAILED tests/integration/test_analyze.py::test_analyze - TypeError: assert_frame_equal() got an unexpected keyword argument 'atol'
FAILED tests/integration/test_groupby.py::test_group_by_nan - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="c") are different
================================================== 2 failed, 122 passed, 3 skipped, 6 warnings in 55.26s ===================================================
the first error seems to be a pandas version error:
$ python
Python 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'1.0.5'
from dask-sql.
Right! Just for your information, I have added pip install -e ".[dev]"
in the newest version, so you do not need to find out all dev requirements on your own :-)
Are you still running with pandas version 1.1.5? (if you want, can you post your pip list
?)
from dask-sql.
Ah, you edited your answer in the same moment that I did. Very good! So it seems there is some incompatibility. Could you try with 1.1.5 again if possible?
from dask-sql.
just updated:
$ pip install --upgrade pandas
Collecting pandas
Downloading pandas-1.2.1-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)
|████████████████████████████████| 9.9 MB 5.5 MB/s
Requirement already satisfied, skipping upgrade: pytz>=2017.3 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (2020.1)
Requirement already satisfied, skipping upgrade: numpy>=1.16.5 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (1.19.4)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /home/saulo/anaconda3/lib/python3.7/site-packages (from pandas) (2.8.1)
Requirement already satisfied, skipping upgrade: six>=1.5 in /home/saulo/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
ERROR: dask-sql 0.3.0 has requirement pandas<1.2.0, but you'll have pandas 1.2.1 which is incompatible.
Installing collected packages: pandas
Attempting uninstall: pandas
Found existing installation: pandas 1.0.5
Uninstalling pandas-1.0.5:
Successfully uninstalled pandas-1.0.5
Successfully installed pandas-1.2.1
despite the error it passed the test with some warnings
===================================================================== warnings summary =====================================================================
tests/integration/test_rex.py::test_like
tests/integration/test_rex.py::test_date_functions
/mnt/d/Programs/dask/dask-sql/dask_sql/context.py:201: DeprecationWarning: register_dask_table is deprecated, use the more general create_table instead.
DeprecationWarning,
tests/integration/test_rex.py::test_math_operations
tests/integration/test_rex.py::test_math_operations
/home/saulo/anaconda3/lib/python3.7/site-packages/pandas/core/arraylike.py:358: RuntimeWarning: invalid value encountered in arcsin
result = getattr(ufunc, method)(*inputs, **kwargs)
tests/integration/test_rex.py::test_math_operations
tests/integration/test_rex.py::test_math_operations
/home/saulo/anaconda3/lib/python3.7/site-packages/pandas/core/arraylike.py:358: RuntimeWarning: invalid value encountered in arccos
result = getattr(ufunc, method)(*inputs, **kwargs)
tests/integration/test_rex.py::test_date_functions
/home/saulo/anaconda3/lib/python3.7/site-packages/dask/dataframe/accessor.py:88: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated. Please use Series.dt.isocalendar().week instead.
if callable(getattr(self._meta, key)):
tests/integration/test_rex.py::test_date_functions
tests/integration/test_rex.py::test_date_functions
/home/saulo/anaconda3/lib/python3.7/site-packages/dask/dataframe/accessor.py:43: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated. Please use Series.dt.isocalendar().week instead.
out = getattr(getattr(obj, accessor, obj), attr)
-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================================= 124 passed, 3 skipped, 9 warnings in 51.46s ========================================================
from dask-sql.
Always a pleasure to help and to get a great tool to the toolbox :)
Cheers
from dask-sql.
I have added a PR in #129 that fixes the tests for pandas 1.0 and 1.1. Now the requirement is actually only >=1.0 and <1.2 (the latter we still need, due to dask/dask#7156)
from dask-sql.
The upper pandas version requirement is gone now - the problem in dask is fixed.
from dask-sql.
Related Issues (20)
- [BUG]] [GPU Logic Bug] "SELECT ((1) NOT BETWEEN (CASE ((<column>)) WHEN (1) THEN 0 END ) AND (<column>)) FROM <table>" brings Error
- [BUG][GPU Logic Bug] "SELECT ((<column>) IS DISTINCT FROM ((CASE <column> WHEN <number> THEN <number> END ))) FROM <table>" brings Error
- [BUG][GPU Logic Bug] "SELECT ( (CASE (CASE (<number>) WHEN <column> THEN (<number>) END ) WHEN <number> THEN (<number>) ELSE <column> END )) FROM <table>" brings Error
- [BUG][GPU Logic Bug] "SELECT (CASE (<column>) WHEN <number> THEN <number> END) FROM <table>" brings Error
- [BUG] [Logic Bug] "SELECT <column> FROM <table>" by JDBC brings Error
- [BUG][Logic Bug] "SELECT (<column>)*(<decimal>) FROM <table>" by JDBC brings Error
- [BUG][Logic Bug] "SELECT <column> FROM <table>" brings Error
- SchemaError / NotImplementedError: The python type string is not implemented (yet) HOT 2
- Implement date_trunc function [ENH]
- [BUG] `dynamic_partition_pruning::read_table` errors on single-file Parquet datasets
- [BUG] [GPU Error Bug] "SELECT (('b햦]D7Jr31')||((CASE 'Kx}lzJ^' WHEN <column> THEN '' END ))) FROM <table>" brings Error
- [BUG] [GPU Error Bug] "SELECT (((<column> LIKE '\뽞^' ESCAPE 'M')) IS NULL) FROM <tables>" brings Error
- [BUG] [GPU Error Bug] "SELECT -2613 FROM <table> HAVING (<TIMESTAMP> NOT BETWEEN <TIMESTAMP> AND MAX(<TIMESTAMP>))" brings Error HOT 1
- ⚠️ Upstream CI Dask failed ⚠️
- ⚠️ Upstream CI failed ⚠️ HOT 1
- Push pre-built Python 3.11 wheels [ENH] HOT 2
- Spatial SQL Support HOT 1
- [BUG] `dask-sql` fails to import with `dask-expr` enabled HOT 6
- [BUG] on Starter example HOT 1
- ⚠️ Upstream CI failed ⚠️
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-sql.