Giter VIP home page Giter VIP logo

databricks-dbapi's People

Contributors

crflynn avatar evanjt1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

databricks-dbapi's Issues

Working workaround to run on Windows (SASL problem)

I use Windows with Python 3.9 so I can't install sasl package. SASL is extremely hard to compile on Windows, also, there's no compiled wheels for Python newer than 3.7.

I spent few days and found out that hive connection on Azure Databricks DOES NOT USE sasl. Then, I tried to open hive connection with SqlAlchemy and it worked. But the same code from this package didn't worked.

Solution to run databricks connection on Windows:

  1. install databricks-dbapi without dependencies (--no-deps option), install them manually (thrift, six, pyhive etc)
  2. during installing pyhive, also use --no-deps option, because pyhive will try to install sasl
  3. go to site-packages\PyHive-0.6.4.dist-info\METADATA file and comment out one line which tells python that sasl is required, so result should look like this:
    <!-- Requires-Dist: sasl (>=0.2.1) ; extra == 'hive' -->
  4. Enjoy

@crflynn You could add this solution in readme, because whole Internet points to precompiled sasl packages, which are ends with Python 3.7.

Support Databricks SQL endpoints using pyhive

Hello! Thanks for making this it has been very helpful and I have a suggestion for an improvement.

We have found the pyodbc SQL Alchemy dialect quite problematic. e.g. case statements on boolean columns add inserts don't seem to work.

query = session.query(case([(Model.some_boolean_column, 1)],else_=0) ) 
session.add(Model(column="value"))
session.commit()

I believe this is because most of the pyodbc dialect is just inheriting the pyhive library. We want to use Databricks SQL endpoints so I found a workaround adding support for SQL endpoints with the pyhive mode. https://github.com/Tom-Newton/databricks-dbapi?organization=Tom-Newton&organization=Tom-Newton . It was quite simple to do really and I thought you might want to make a similar change to this primary repo so others can benefit. I just used the Databricks SQL Connector for Python for making pyhive connections. This supports general purpose compute clusters and SQL endpoints.

Preventing SSL certificate verification?

Hello,

The module is awesome, just does it support https without certificate validation?

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1091)

Doesn't seem to support SQL Analytics cluster (tried on Azure)

SQL Analytics has become the most important cluster to connect from tools like Apache Superset for us. We are currently able to use this driver to connect our Superset instance to Databricks on both AWS and Azure. However, this doesn't seem to support SQL analytics clusters for now.

Are there any plans to add support for them? Also, would it make sense to switch to the ODBC drivers?

Hard dependency on `pyhive[hive]` causes installation issues

Thanks for creating this package! We're investigating using it in OpenSAFELY.

This may not be an easy issue to solve, and certainly won't be high priority for you, but just wanted to flag up that the way the dependencies of this package are configured has caused us some problems. Specifically, the dependency on pyhive[hive] pulls in sasl which we don't need and which is awkward to install because it has no wheel and needs the libsasl2 headers to compile.

To work around them we've had to fork the package and edit the dependencies (opensafely-core#1):
https://github.com/opensafely-core/databricks-dbapi/pull/1/files

It's possible that future updates to pip and pip-tools will give us other ways to work around this issue without needing to fork. But I thought it was worth mentioning this problem in case an easier solution occurs to you.

Unable to reflect table

Hi there!

I'm going through your example of using this with SQLAlchemy. I've got a Databricks cluster spun up, have generated a PAT, and am able to successfully connect. However, whenever I try to do something like

Table("my_table", MetaData(bind=engine), autoload=True)

I get

expected string or bytes-like object
Traceback (most recent call last):
  File "/scripts/pipelines/airflow/operators/mysql_to_detabricks_operator.py", line 137, in databricks_table_sa
    self._databricks_table, MetaData(bind=self.databricks_sqla), autoload=True
  File "<string>", line 2, in __new__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 559, in __new__
    metadata._remove_table(name, schema)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
    exc_value, with_traceback=exc_tb,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 554, in __new__
    table._init(name, metadata, *args, **kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 648, in _init
    resolve_fks=resolve_fks,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py", line 689, in _autoload
    _extend_on=_extend_on,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2221, in run_callable
    return conn.run_callable(callable_, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1660, in run_callable
    return callable_(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 467, in reflecttable
    table, include_columns, exclude_columns, resolve_fks, **opts
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 666, in reflecttable
    table_name, schema, **table.dialect_kwargs
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 392, in get_columns
    self.bind, table_name, schema, info_cache=self.info_cache, **kw
  File "/usr/local/lib/python3.7/site-packages/pyhive/sqlalchemy_hive.py", line 319, in get_columns
    col_type = re.search(r'^\w+', col_type).group(0)
  File "/usr/local/lib/python3.7/re.py", line 185, in search
    return _compile(pattern, flags).search(string)

Digging a little bit, I've discovered that the return of DESCRIBE my_table returns something like:
image

whereas this line tells me it's expecting to break on "# Partition Information". Therefore col_type ends up being None when the re.search is being run.

What am I missing? I've inherited this code, and I'm new to whole pyspark/hive/etc space so I'm not sure this library is even the culprit. Please let me know what other information I can provide. FWIW here's my pip freeze

alabaster==0.7.12
alembic==1.4.2
amqp==2.6.1
ansiwrap==0.8.4
apache-airflow==1.10.9
apispec==1.3.3
appdirs==1.4.3
appnope==0.1.2
argcomplete==1.11.1
arrow==0.14.2
asn1crypto==0.24.0
astroid==2.2.5
async-generator==1.10
attrs==19.3.0
Authlib==0.14.3
autopep8==1.5.4
avro-python3==1.10.0
aws-xray-sdk==0.95
azure-common==1.1.8
azure-nspkg==2.0.0
azure-storage-blob==0.37.1
azure-storage-common==0.37.1
azure-storage-nspkg==2.0.0
Babel==2.8.0
backcall==0.2.0
bcrypt==3.1.7
billiard==3.6.3.0
black==20.8b1
boto==2.49.0
boto3==1.7.84
botocore==1.10.84
bumpversion==0.5.3
cached-property==1.5.1
cachetools==4.1.0
cattrs==0.9.2
celery==4.4.4
certifi==2020.4.5.1
cffi==1.14.0
cfgv==3.2.0
chardet==3.0.4
click==7.1.2
colorama==0.4.3
colorlog==4.0.2
configparser==3.5.3
cookies==2.2.1
coverage==4.5.3
croniter==0.3.32
cryptography==2.8
cycler==0.10.0
databricks-connect==6.5.1
databricks-dbapi==0.3.0
databricks-test==0.0.4
decorator==4.4.2
defusedxml==0.6.0
dill==0.3.1.1
distlib==0.3.1
dnspython==1.16.0
doc8==0.8.1
docker==4.4.1
docopt==0.6.2
docutils==0.16
ecdsa==0.16.1
email-validator==1.1.1
entrypoints==0.3
factory-boy==2.12.0
Faker==1.0.8
filelock==3.0.12
flake8==3.8.4
Flask==1.1.2
Flask-Admin==1.5.4
Flask-AppBuilder==2.3.4
Flask-Babel==1.0.0
Flask-Bcrypt==0.7.1
Flask-Caching==1.3.3
Flask-JWT-Extended==3.24.1
Flask-Login==0.4.1
Flask-OpenID==1.2.5
Flask-SQLAlchemy==2.4.3
flask-swagger==0.2.13
Flask-WTF==0.14.3
flower==0.9.4
freezegun==1.0.0
funcsigs==1.0.2
future==0.16.0
futures==3.1.1
google-api-core==1.14.2
google-api-python-client==1.7.11
google-auth==1.6.3
google-auth-httplib2==0.0.3
google-oauth==1.0.1
googleapis-common-protos==1.51.0
graphviz==0.14
gunicorn==19.10.0
hmsclient==0.1.1
httplib2==0.18.1
humanize==0.5.1
identify==1.5.11
idna==2.8
ijson==3.0.4
imagesize==1.2.0
importlib-metadata==1.7.0
iniconfig==1.0.0
ipykernel==5.1.4
ipython==7.19.0
ipython-genutils==0.2.0
iso8601==0.1.12
isort==4.3.21
itsdangerous==1.1.0
JayDeBeApi==1.2.1
jedi==0.18.0
Jinja2==2.10.3
jmespath==0.10.0
joblib==1.0.0
JPype1==0.7.5
json-merge-patch==0.2
jsondiff==1.1.1
jsonpickle==1.4.2
jsonschema==3.2.0
jupyter-client==6.1.11
jupyter-core==4.7.0
kiwisolver==1.3.1
kombu==4.6.11
lazy-object-proxy==1.4.1
lockfile==0.12.2
m2r==0.2.1
Mako==1.1.3
Markdown==2.6.11
MarkupSafe==1.1.1
marshmallow==2.21.0
marshmallow-enum==1.5.1
marshmallow-sqlalchemy==0.23.1
matplotlib==3.1.2
mccabe==0.6.1
mirakuru==2.3.0
mistune==0.8.4
mock==4.0.3
moto==1.3.4
mypy-extensions==0.4.3
mysqlclient==1.4.4
natsort==7.0.1
nbclient==0.5.1
nbformat==5.0.8
nest-asyncio==1.4.3
nodeenv==1.5.0
numpy==1.18.4
packaging==20.8
pandas==0.25.3
papermill==2.0.0
paramiko==2.7.1
parso==0.8.1
pathspec==0.8.1
pbr==5.5.1
pendulum==1.4.4
pexpect==4.8.0
pickleshare==0.7.5
pluggy==0.13.1
port-for==0.4
pre-commit==2.9.3
prison==0.1.3
prompt-toolkit==3.0.10
protobuf==3.12.2
psutil==5.7.0
ptyprocess==0.7.0
py==1.10.0
py4j==0.10.7
pyaml==20.4.0
pyarrow==0.17.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
PyAthena==1.6.1
pycodestyle==2.6.0
pycparser==2.20
pycryptodome==3.9.9
pycryptodomex==3.9.7
pyflakes==2.2.0
Pygments==2.6.1
PyHive==0.6.2
PyJWT==1.7.1
pykwalify==1.7.0
pylint==2.3.1
pymongo==3.8.0
PyNaCl==1.4.0
pyOpenSSL==19.1.0
pyparsing==2.4.7
pyrsistent==0.16.0
pysftp==0.2.9
pytest==6.2.1
pytest-freezegun==0.4.2
pytest-mongo==1.2.1
pytest-mysql==2.0.1
pytest-sftpserver==1.3.0
python-daemon==2.1.2
python-dateutil==2.8.1
python-decouple==3.4
python-editor==1.0.4
python-jose==2.0.2
python3-openid==3.1.0
pytz==2020.1
pytzdata==2019.3
PyYAML==5.1.1
pyzmq==20.0.0
redis==3.5.3
regex==2020.11.13
requests==2.23.0
requests-mock==1.7.0
responses==0.12.1
restructuredtext-lint==1.3.2
rsa==4.0
s3transfer==0.1.13
sasl==0.2.1
scikit-learn==0.22.1
scipy==1.6.0
setproctitle==1.1.10
simple-salesforce==1.10.1
six==1.15.0
slackclient==1.3.0
snowballstemmer==2.0.0
snowflake-connector-python==2.0.0
Sphinx==3.3.1
sphinx-rtd-theme==0.5.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-confluencebuilder==1.3.0
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
SQLAlchemy==1.3.17
SQLAlchemy-JSONField==0.9.0
SQLAlchemy-Utils==0.36.6
sqlparse==0.3.0
sshtunnel==0.1.5
stevedore==3.3.0
tabulate==0.8.7
tenacity==4.12.0
termcolor==1.1.0
text-unidecode==1.2
textwrap3==0.9.2
thrift==0.13.0
thrift-sasl==0.4.2
toml==0.10.1
tornado==5.1.1
tqdm==4.55.1
traitlets==5.0.5
typed-ast==1.4.0
typing-extensions==3.7.4.2
tzlocal==1.5.1
unicodecsv==0.14.1
uritemplate==3.0.1
urllib3==1.25.10
vine==1.3.0
virtualenv==20.2.2
wcwidth==0.2.5
websocket-client==0.54.0
Werkzeug==0.16.0
wrapt==1.11.1
WTForms==2.2.1
xmltodict==0.12.0
zdesk==2.7.1
zipp==3.1.0
zope.deprecation==4.4.0

Error when installing databricks-dbapi on Azure Databricks

Hi there,

I am getting the following error when trying to install databricks-dbapi on my Databricks Cluster on Azure:

List(/databricks/python/bin/pip, install, databricks-dbapi[sqlalchemy]==0.5.0, --disable-pip-version-check) exited with code 1.   ERROR: Command errored out with exit status 1:
  In file included from sasl/saslwrapper.cpp:254:0:
  sasl/saslwrapper.h:22:10: fatal error: sasl/sasl.h: No such file or directory
   #include <sasl/sasl.h>
            ^~~~~~~~~~~~~
  compilation terminated.

Seems like sasl lib is missing, any idea how it can be solved? I tried Databricks version 7.5 and 8.0 Beta, problem is the same for both

Can't install databricks-dbapi on macOS 11

Hello.

I have been trying to install databricks-dbapi to no avail. I tried installing libsasl2 as you recommended here but there continues to be a barrage of errors and warnings (please see ErrorLog.txt )

Can you please assist? Thank you

macOS version 11.3.1
Python version 3.9.5

Connecting Databricks Delta Lake using SQL Alchemy (Pyhive)

  1. I have used the Flask and SQL Alchemy ORM concept for the REST API.
  2. The API simply connects to one of the Databricks Delta lake database , queries one particular table and returns the result.
  3. I have used the following sample code to create the SQL Alchemy engine : URL: https://pypi.org/project/databricks-dbapi/
    Azure Databricks with token
    provide token, region for url, database name, http_path (with cluster name)
    engine = create_engine(
    "databricks+pyhive://token:@.azuredatabricks.net:443/",
    connect_args={"http_path": "<azure_databricks_http_path>"}
    )
  4. I have written the REST API , created a docker image and deployed the image on the AKS cluster.
  5. For starting few days the API was fetching the response. From yesterday I see the attached whole error received DatabricksI_Error.txt)
    Single line from the error : Error Line: sqlState=None, errorCode=0, errorMessage='Invalid SessionHandle: SessionHandle [4d00c5b7-2ee4-448a-92de-cb04deb773b6]'), operationHandle=None)
  6. I had restarted the docker container yesterday which has this POD using the agent-pool-node serial console after which the API was working. But from today morning again I get this issue. I m not sure why the issue is occurring could you please help me ?

Databricks_Error.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.