Giter VIP home page Giter VIP logo

pysqlflow's People

Contributors

ktong avatar lhw362950217 avatar tonyyang-svail avatar typhoonzero avatar wangkuiyi avatar weiguoz avatar yancey1989 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pysqlflow's Issues

make release still succeed to publish to pypi when the command fails

The log of running make release:

# Tag it
git tag v0.7.0
git push --tags
总共 0 (差异 0),复用 0 (差异 0)
To https://github.com/sql-machine-learning/pysqlflow.git
 * [new tag]         v0.7.0 -> v0.7.0
# Bump version for development
git checkout develop
切换到分支 'develop'
您的分支领先 'origin/develop' 共 1 个提交。
  (使用 "git push" 来发布您的本地提交)
sed -i '' -E "s/[0-9]+, [0-9]+, [0-9]+/0, 8, 0, 'dev'/" sqlflow/_version.py
git commit -a -m "start 0.8.0"
[develop 46f42d0] start 0.8.0
 1 file changed, 1 insertion(+), 1 deletion(-)
git push origin develop
枚举对象: 7, 完成.
对象计数中: 100% (7/7), 完成.
使用 8 个线程进行压缩
压缩对象中: 100% (4/4), 完成.
写入对象中: 100% (4/4), 394 字节 | 394.00 KiB/s, 完成.
总共 4 (差异 2),复用 0 (差异 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
remote: error: GH006: Protected branch update failed for refs/heads/develop.
remote: error: 2 of 2 required status checks are expected. At least 1 approving review is required by reviewers with write access.
To https://github.com/sql-machine-learning/pysqlflow.git
 ! [remote rejected] develop -> develop (protected branch hook declined)
error: 推送一些引用到 'https://github.com/sql-machine-learning/pysqlflow.git' 失败

By the way seems make release only works on mac, not in docker ubuntu image.

Work rest in pysqlflow

In order to reuse ipython-sql, we need to

  • Wrap grpc client and protobuf inside a PySQLFlow client #9
  • Wrap PySQLFlow client inside a DBAPI, please be aware of pagenation.
  • Wrap DBAPI inside a sqlalchemy dialect

Then we can plug in implemented sqlalchemy dialect to ipython-sql, a user can use it as

In [1]: %load_ext sql

In [2]: %%sql sqlflow://username:password@localhost/shakes
   ...: select * from iris.iris limit 10;
Out[2]:
+--------------+-------------+--------------+-------------+-------+
| sepal_length | sepal_width | petal_length | petal_width | class |
+--------------+-------------+--------------+-------------+-------+
|          6.4 |         2.8 |          5.6 |         2.2 |     2 |
|            5 |         2.3 |          3.3 |           1 |     1 |
|          4.9 |         2.5 |          4.5 |         1.7 |     2 |
|          4.9 |         3.1 |          1.5 |         0.1 |     0 |
|          5.7 |         3.8 |          1.7 |         0.3 |     0 |
|          4.4 |         3.2 |          1.3 |         0.2 |     0 |
|          5.4 |         3.4 |          1.5 |         0.4 |     0 |
|          6.9 |         3.1 |          5.1 |         2.3 |     2 |
|          6.7 |         3.1 |          4.4 |         1.4 |     1 |
|          5.1 |         3.7 |          1.5 |         0.4 |     0 |
+--------------+-------------+--------------+-------------+-------+

These three levels of abstractions enables users to write extensions in python and perform visualizations.

Support variable replacing in SQL

For the most scheduling scenarios, SQLFlow should read from the current partition table, the scheduler can expose the partition name in to the env, such as export DT=20190722, SQLFlow SQL can be like:

SELECT * from train 
WHERE partition = ${DT}
...

Support for command line

Description:
SQLFlow should support command line like 'hive -e ' or 'hive -f '.
It could help developer or other users to schedule the running of SQLFlow task.

Better presentation of Rows

When retrieving a large number of columns, the current representation splits one row into multiple rows. It is quite messy...

Screen Shot 2019-09-25 at 4 53 02 PM

We may use data frame like representation by default.

Screen Shot 2019-09-25 at 5 11 16 PM

@ktong Any suggestions for this?

fix pypi project name

As it releases to pypi under project sqlflow. We need change README.md:

pip install pysqlflow to pip install sqlflow
and the link to pypi icon.

execute extended sql failed

when i use the python api to execute standard sql , it works, but extended sql can not work, it raise exception

runExec failed: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DNNClassifier WITH n_classes = 3, hidden_units = [10, 20] COLUMN sepal_lengt' at line 3"

here is my test code:

client = Client(server_url="localhost:50051")
client.execute("""SELECT *
FROM train
TRAIN DNNClassifier
WITH
  n_classes = 3,
  hidden_units = [10, 20]
COLUMN sepal_length, sepal_width, petal_length, petal_width
LABEL class
INTO my_dnn_model;""")

Error log are not friendly.

Simple.The selec command is a syntax error.

%%sqlflow
selec * from iris.train limit 2;

log

---------------------------------------------------------------------------
_Rendezvous                               Traceback (most recent call last)
<ipython-input-11-7415a0a29612> in <module>
----> 1 get_ipython().run_cell_magic('sqlflow', '', 'selec * from iris.train limit 2;\n')

/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2350             with self.builtin_trap:
   2351                 args = (magic_arg_s, cell)
-> 2352                 result = fn(*args, **kwargs)
   2353             return result
   2354 

</usr/local/lib/python3.5/dist-packages/decorator.py:decorator-gen-126> in execute(self, line, cell)

/usr/local/lib/python3.5/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/usr/local/lib/python3.5/dist-packages/sqlflow/magic.py in execute(self, line, cell)
     39 
     40         """
---> 41         return self.client.execute('\n'.join([line, cell]))
     42 
     43 def load_ipython_extension(ipython):

/usr/local/lib/python3.5/dist-packages/sqlflow/client.py in execute(self, operation)
    106         """
    107         stream_response = self._stub.Run(pb.Request(sql=operation))
--> 108         first = next(stream_response)
    109         if first.WhichOneof('response') == 'message':
    110             _LOGGER.info(first.message.message)

/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in __next__(self)
    361 
    362     def __next__(self):
--> 363         return self._next()
    364 
    365     def next(self):

/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in _next(self)
    355                         raise StopIteration()
    356                     elif self._state.code is not None:
--> 357                         raise self
    358 
    359     def __iter__(self):

_Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "runExec failed: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'selec * from iris.train limit 2' at line 1"
	debug_error_string = "{"created":"@1555475872.921841400","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"runExec failed: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'selec * from iris.train limit 2' at line 1","grpc_status":2}"
>

Can not show image on the 0.9.0 tag

%%sqlflow
SELECT *
FROM boston.train
TO EXPLAIN sqlflow_models.my_xgb_regression_model
WITH
    shap_summary.plot_type="dot",
    shap_summary.alpha=1,
    shap_summary.sort=True
USING TreeExplainer;
<IPython.core.display.HTML object>

<div align='center'><img src='

Travis deployment from the correct repo

I happened to notice the following code in our .travis.yml file:

pysqlflow/.travis.yml

Lines 29 to 30 in 8352cfc

# secure token: `echo -n "password" | travis encrypt --add deploy.password -r wangkuiyi/pysqlflow`
secure: sDAa83Ty6gY5VEcGX8vZcy6poyHod2TUnZ9XQ/Bc8n9o5Zm1b7om6tvo8rJ5AyokDcWmXCTj8wqP4wcajnvzRee25dOfjZync23h+lrO23ovH7omSwmbjzaDIQa8OjlUz9AckI7GcW8h1Q9ARmr4vSBrGmk6qj0OADVNcsRRa7899ICArqI3PAfLj8zgu6Z2fvAcyzUi/Fmal4kOn/gjiB6Kpid2AxuWysyGe/lqxdIYfPmWCOKko68pkwcI+NJZoIZaK+C4Ahpv3SvKlvPBS9/JxTsKXy7zKbBrLQfqSGHRX06sciHtDD9BxY0kQmh0CezvZvl+inqodoTL6gS7QfEt4pdagKb2vYvJrDNdyZNJSyuI357oTMGpS4bOCfxl5iQY73MttZXnC8OqmMJAWOcMhwMIayKjJJPALkCVyi5bHeqORepMIAx4QvAb/N1IObe+ltUCkmofSqxk0g26V6xcVB/ECfdjwPWY0h76MhCC+VkncuGImRcGaUQ4j/tnKDN2wDnst4WFs9xZf4Jq55gQyz+6loxU3Nhj3TzOpl8vFC92ozqfz4THjiymxstJfff0UidGQrP6rmWFTJFf5TePFdqVwkEVTs/tasetVP9aFL4ixR89A1uHQEL/n3Eby6AyD6ODIP0qTIkLadPe8yNrT9Qvg2FFRPOAbMmQyDk=

I noticed the -r wangkuiyi/sqlflow part -- does it mean that the deployment would be making a pip package from github.com/wangkuiyi/pysqfllow (instead of sql-machine-learning/pysqlflow) to PyPI?

CompoundMessage should not aggregate logs

pysqlflow/sqlflow/client.py

Lines 126 to 150 in da0fffd

if first.WhichOneof('response') == 'message':
# if the first line is html tag like,
# merge all return strings then render the html on notebook
if re.match(r'<[a-z][\s\S]*>.*', first.message.message):
resp_list = [first.message.message]
for res in stream_response:
if res.WhichOneof('response') == 'eoe':
_LOGGER.info("end execute %s, spent: %d" % (res.eoe.sql, res.eoe.spent_time_seconds))
compound_message.add_html('\n'.join(resp_list), res)
break
resp_list.append(res.message.message)
from IPython.core.display import display, HTML
display(HTML('\n'.join(resp_list)))
else:
all_messages = []
all_messages.append(first.message.message)
eoe = None
for res in stream_response:
if res.WhichOneof('response') == 'eoe':
_LOGGER.info("end execute %s, spent: %d" % (res.eoe.sql, res.eoe.spent_time_seconds))
eoe = res
break
_LOGGER.debug(res.message.message)
all_messages.append(res.message.message)
compound_message.add_message('\n'.join(all_messages), eoe)

From the above implementation, all the logs would be show in the Notebook at the end of sql program execution.

UsageError: Cell magic `%%sqlflow` not found.

Here is SQLFLOW Installation -

(base) C:\Users\aayus>pip install sqlflow
Collecting sqlflow
Using cached sqlflow-0.11.0-py3-none-any.whl (17 kB)
Requirement already satisfied: ipython==7.9 in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (7.9.0)
Requirement already satisfied: grpcio<2,>=1.17 in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (1.26.0)
Requirement already satisfied: protobuf<4,>=3.6 in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (3.11.2)
Requirement already satisfied: pandas in c:\users\aayus\anaconda3\lib\site-packages (from sqlflow) (1.0.5)
Requirement already satisfied: pickleshare in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.7.5)
Requirement already satisfied: jedi>=0.10 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.15.2)
Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (2.0.10)
Requirement already satisfied: backcall in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.2.0)
Requirement already satisfied: setuptools>=18.5 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (47.3.1.post20200622)
Requirement already satisfied: colorama; sys_platform == "win32" in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (0.4.3)
Requirement already satisfied: traitlets>=4.2 in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (4.3.3)
Requirement already satisfied: pygments in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (2.6.1)
Requirement already satisfied: decorator in c:\users\aayus\anaconda3\lib\site-packages (from ipython==7.9->sqlflow) (4.4.2)
Requirement already satisfied: six>=1.5.2 in c:\users\aayus\anaconda3\lib\site-packages (from grpcio<2,>=1.17->sqlflow) (1.15.0)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\aayus\anaconda3\lib\site-packages (from pandas->sqlflow) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in c:\users\aayus\anaconda3\lib\site-packages (from pandas->sqlflow) (1.18.5)
Requirement already satisfied: pytz>=2017.2 in c:\users\aayus\anaconda3\lib\site-packages (from pandas->sqlflow) (2020.1)
Requirement already satisfied: parso>=0.5.2 in c:\users\aayus\anaconda3\lib\site-packages (from jedi>=0.10->ipython==7.9->sqlflow) (0.5.2)
Requirement already satisfied: wcwidth in c:\users\aayus\anaconda3\lib\site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython==7.9->sqlflow) (0.2.4)
Requirement already satisfied: ipython-genutils in c:\users\aayus\anaconda3\lib\site-packages (from traitlets>=4.2->ipython==7.9->sqlflow) (0.2.0)
Installing collected packages: sqlflow
Successfully installed sqlflow-0.11.0

when I am creating a Jupiter notebook file it is showing error as:

UsageError: Cell magic %%sqlflow not found.

Combine Query and Execute to Run

The current proto buffer definition exposes two methods: Exec and Query.

service SQLFlow {
  rpc Query (Request) returns (stream RowSet);
  rpc Execute (Request) returns (stream Messages);
}

The client needs to decide which method to call. This would require SQL parsing at the client side, while the server already has a parser.

As a side note, we shouldn't let the user to choose which method to call because a typical SQL console doesn't distinguish Query and Execute.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.