sql-machine-learning / pysqlflow Goto Github PK
View Code? Open in Web Editor NEWSQLFlow client library for Python
License: Apache License 2.0
SQLFlow client library for Python
License: Apache License 2.0
The log of running make release:
# Tag it
git tag v0.7.0
git push --tags
总共 0 (差异 0),复用 0 (差异 0)
To https://github.com/sql-machine-learning/pysqlflow.git
* [new tag] v0.7.0 -> v0.7.0
# Bump version for development
git checkout develop
切换到分支 'develop'
您的分支领先 'origin/develop' 共 1 个提交。
(使用 "git push" 来发布您的本地提交)
sed -i '' -E "s/[0-9]+, [0-9]+, [0-9]+/0, 8, 0, 'dev'/" sqlflow/_version.py
git commit -a -m "start 0.8.0"
[develop 46f42d0] start 0.8.0
1 file changed, 1 insertion(+), 1 deletion(-)
git push origin develop
枚举对象: 7, 完成.
对象计数中: 100% (7/7), 完成.
使用 8 个线程进行压缩
压缩对象中: 100% (4/4), 完成.
写入对象中: 100% (4/4), 394 字节 | 394.00 KiB/s, 完成.
总共 4 (差异 2),复用 0 (差异 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
remote: error: GH006: Protected branch update failed for refs/heads/develop.
remote: error: 2 of 2 required status checks are expected. At least 1 approving review is required by reviewers with write access.
To https://github.com/sql-machine-learning/pysqlflow.git
! [remote rejected] develop -> develop (protected branch hook declined)
error: 推送一些引用到 'https://github.com/sql-machine-learning/pysqlflow.git' 失败
By the way seems make release
only works on mac, not in docker ubuntu image.
So user can access doc at http://wangkuiyi.github.io/pysqlflow/
In order to reuse ipython-sql, we need to
Then we can plug in implemented sqlalchemy dialect to ipython-sql, a user can use it as
In [1]: %load_ext sql
In [2]: %%sql sqlflow://username:password@localhost/shakes
...: select * from iris.iris limit 10;
Out[2]:
+--------------+-------------+--------------+-------------+-------+
| sepal_length | sepal_width | petal_length | petal_width | class |
+--------------+-------------+--------------+-------------+-------+
| 6.4 | 2.8 | 5.6 | 2.2 | 2 |
| 5 | 2.3 | 3.3 | 1 | 1 |
| 4.9 | 2.5 | 4.5 | 1.7 | 2 |
| 4.9 | 3.1 | 1.5 | 0.1 | 0 |
| 5.7 | 3.8 | 1.7 | 0.3 | 0 |
| 4.4 | 3.2 | 1.3 | 0.2 | 0 |
| 5.4 | 3.4 | 1.5 | 0.4 | 0 |
| 6.9 | 3.1 | 5.1 | 2.3 | 2 |
| 6.7 | 3.1 | 4.4 | 1.4 | 1 |
| 5.1 | 3.7 | 1.5 | 0.4 | 0 |
+--------------+-------------+--------------+-------------+-------+
These three levels of abstractions enables users to write extensions in python and perform visualizations.
For the most scheduling scenarios, SQLFlow should read from the current partition table, the scheduler can expose the partition name in to the env, such as export DT=20190722
, SQLFlow SQL can be like:
SELECT * from train
WHERE partition = ${DT}
...
The if and else branch have the same code:
pysqlflow/sqlflow/compound_message.py
Lines 34 to 39 in da0fffd
Description:
SQLFlow should support command line like 'hive -e ' or 'hive -f '.
It could help developer or other users to schedule the running of SQLFlow task.
When retrieving a large number of columns, the current representation splits one row into multiple rows. It is quite messy...
We may use data frame like representation by default.
@ktong Any suggestions for this?
execute
and fetch
Need create an account in https://pypi.org/ and follow doc at https://docs.travis-ci.com/user/deployment/pypi/ to change user, password.secure in .travis.yml
https://github.com/sql-machine-learning/pysqlflow/blob/develop/sqlflow/sqlflow.py
Should we git rm
it?
As it releases to pypi under project sqlflow. We need change README.md:
pip install pysqlflow to pip install sqlflow
and the link to pypi icon.
We can wrap a client inside magic command %%sqlflow
so that we can play around with it.
Mabe adds the sqlflow repo as the submodule of pysqlflow is a good way.
Handling exception on grpc response, to reduce noise messages
For now, test_client.py
only tests decode, need a way to test execute
function. Either through starting a standalone grpc, or mocking it.
when i use the python api to execute standard sql , it works, but extended sql can not work, it raise exception
runExec failed: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DNNClassifier WITH n_classes = 3, hidden_units = [10, 20] COLUMN sepal_lengt' at line 3"
here is my test code:
client = Client(server_url="localhost:50051")
client.execute("""SELECT *
FROM train
TRAIN DNNClassifier
WITH
n_classes = 3,
hidden_units = [10, 20]
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
INTO my_dnn_model;""")
Simple.The selec
command is a syntax error.
%%sqlflow
selec * from iris.train limit 2;
log
---------------------------------------------------------------------------
_Rendezvous Traceback (most recent call last)
<ipython-input-11-7415a0a29612> in <module>
----> 1 get_ipython().run_cell_magic('sqlflow', '', 'selec * from iris.train limit 2;\n')
/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2350 with self.builtin_trap:
2351 args = (magic_arg_s, cell)
-> 2352 result = fn(*args, **kwargs)
2353 return result
2354
</usr/local/lib/python3.5/dist-packages/decorator.py:decorator-gen-126> in execute(self, line, cell)
/usr/local/lib/python3.5/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/usr/local/lib/python3.5/dist-packages/sqlflow/magic.py in execute(self, line, cell)
39
40 """
---> 41 return self.client.execute('\n'.join([line, cell]))
42
43 def load_ipython_extension(ipython):
/usr/local/lib/python3.5/dist-packages/sqlflow/client.py in execute(self, operation)
106 """
107 stream_response = self._stub.Run(pb.Request(sql=operation))
--> 108 first = next(stream_response)
109 if first.WhichOneof('response') == 'message':
110 _LOGGER.info(first.message.message)
/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in __next__(self)
361
362 def __next__(self):
--> 363 return self._next()
364
365 def next(self):
/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in _next(self)
355 raise StopIteration()
356 elif self._state.code is not None:
--> 357 raise self
358
359 def __iter__(self):
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "runExec failed: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'selec * from iris.train limit 2' at line 1"
debug_error_string = "{"created":"@1555475872.921841400","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"runExec failed: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'selec * from iris.train limit 2' at line 1","grpc_status":2}"
>
%%sqlflow
SELECT *
FROM boston.train
TO EXPLAIN sqlflow_models.my_xgb_regression_model
WITH
shap_summary.plot_type="dot",
shap_summary.alpha=1,
shap_summary.sort=True
USING TreeExplainer;
<IPython.core.display.HTML object>
<div align='center'><img src='data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAr8AAAJOCAYAAAC+129SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADh0RVh0U29
I happened to notice the following code in our .travis.yml file:
Lines 29 to 30 in 8352cfc
I noticed the -r wangkuiyi/sqlflow
part -- does it mean that the deployment would be making a pip package from github.com/wangkuiyi/pysqfllow
(instead of sql-machine-learning/pysqlflow
) to PyPI?
Statements such as VIEW
, DROP
, INSERT
, DELETE
doesn't return tables.
The current protobuf has two types of messages: a table and a string. What should pysqlflow return/print?
As the design doc: https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/auth_design.md#session , we need to add session struct in gRPC proto.
Lines 126 to 150 in da0fffd
From the above implementation, all the logs would be show in the Notebook at the end of sql program execution.
The current implementation of sqlflow command line is outdated.
The client is no longer yielding query result row by row.
Here is SQLFLOW Installation -
(base) C:\Users\aayus>pip install sqlflow
Collecting sqlflow
Using cached sqlflow-0.11.0-py3-none-any.whl (17 kB)
Requirement already satisfied: ipython==7.9 in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (7.9.0)
Requirement already satisfied: grpcio<2,>=1.17 in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (1.26.0)
Requirement already satisfied: protobuf<4,>=3.6 in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (3.11.2)
Requirement already satisfied: pandas in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (1.0.5)
Requirement already satisfied: pickleshare in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.7.5)
Requirement already satisfied: jedi>=0.10 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.15.2)
Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (2.0.10)
Requirement already satisfied: backcall in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.2.0)
Requirement already satisfied: setuptools>=18.5 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (47.3.1.post20200622)
Requirement already satisfied: colorama; sys_platform == "win32" in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.4.3)
Requirement already satisfied: traitlets>=4.2 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (4.3.3)
Requirement already satisfied: pygments in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (2.6.1)
Requirement already satisfied: decorator in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (4.4.2)
Requirement already satisfied: six>=1.5.2 in c:\users\aayus\anaconda3\lib\site-packages (from grpcio<2,>=1.17->sqlflow) (1.15.0)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\aayus\anaconda3\lib\site-packages (from pandas->sqlflow) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in c:\users\aayus\anaconda3\lib\site-packages (from pandas->sqlflow) (1.18.5)
Requirement already satisfied: pytz>=2017.2 in c:\users\aayus\anaconda3\lib\site-packages (from pandas->sqlflow) (2020.1)
Requirement already satisfied: parso>=0.5.2 in c:\users\aayus\anaconda3\lib\site-packages (from jedi>=0.10->ipython==7.9->sqlflow) (0.5.2)
Requirement already satisfied: wcwidth in c:\users\aayus\anaconda3\lib\site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython==7.9->sqlflow) (0.2.4)
Requirement already satisfied: ipython-genutils in c:\users\aayus\anaconda3\lib\site-packages (from traitlets>=4.2->ipython==7.9->sqlflow) (0.2.0)
Installing collected packages: sqlflow
Successfully installed sqlflow-0.11.0
when I am creating a Jupiter notebook file it is showing error as:
UsageError: Cell magic %%sqlflow
not found.
The current proto buffer definition exposes two methods: Exec
and Query
.
service SQLFlow {
rpc Query (Request) returns (stream RowSet);
rpc Execute (Request) returns (stream Messages);
}
The client needs to decide which method to call. This would require SQL parsing at the client side, while the server already has a parser.
As a side note, we shouldn't let the user to choose which method to call because a typical SQL console doesn't distinguish Query
and Execute
.
For cases like to replace ${yyyyMMdd - 7d}
and ${yyyyMMdd-7d}
and ${yyyyMMdd - 7d}
. Can add one or more spaces around operator -
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.