Giter VIP home page Giter VIP logo

snowpark-python's Introduction

Snowflake Snowpark Python API

Build and Test codecov PyPi License Apache-2.0 Codestyle Black

The Snowpark library provides intuitive APIs for querying and processing data in a data pipeline. Using this library, you can build applications that process data in Snowflake without having to move data to the system where your application code runs.

Source code | Developer guide | API reference | Product documentation | Samples

Getting started

Have your Snowflake account ready

If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.

Create a Python virtual environment

You can use miniconda, anaconda, or virtualenv to create a Python 3.8, 3.9, 3.10 or 3.11 virtual environment.

To have the best experience when using it with UDFs, creating a local conda environment with the Snowflake channel is recommended.

Install the library to the Python virtual environment

pip install snowflake-snowpark-python

Optionally, you need to install pandas in the same environment if you want to use pandas-related features:

pip install "snowflake-snowpark-python[pandas]"

Create a session and use the APIs

from snowflake.snowpark import Session

connection_parameters = {
  "account": "<your snowflake account>",
  "user": "<your snowflake user>",
  "password": "<your snowflake password>",
  "role": "<snowflake user role>",
  "warehouse": "<snowflake warehouse>",
  "database": "<snowflake database>",
  "schema": "<snowflake schema>"
}

session = Session.builder.configs(connection_parameters).create()
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
df.show()
pandas_df = df.to_pandas()  # this requires pandas installed in the Python environment
result = df.collect()

Samples

The Developer Guide and API references have basic sample code. Snowflake-Labs has more curated demos.

Logging

Configure logging level for snowflake.snowpark for Snowpark Python API logs. Snowpark uses the Snowflake Python Connector. So you may also want to configure the logging level for snowflake.connector when the error is in the Python Connector. For instance,

import logging
for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

Contributing

Please refer to CONTRIBUTING.md.

snowpark-python's People

Contributors

sfc-gh-aalam avatar sfc-gh-aling avatar sfc-gh-anavalos avatar sfc-gh-azhan avatar sfc-gh-hayu avatar sfc-gh-helmeleegy avatar sfc-gh-jdu avatar sfc-gh-joshi avatar sfc-gh-jrose avatar sfc-gh-kdama avatar sfc-gh-kwagner avatar sfc-gh-lspiegelberg avatar sfc-gh-mabrennan avatar sfc-gh-mkeller avatar sfc-gh-mrojas avatar sfc-gh-mvashishtha avatar sfc-gh-nkrishna avatar sfc-gh-nkumar avatar sfc-gh-okostakis avatar sfc-gh-rsureshbabu avatar sfc-gh-sfan avatar sfc-gh-smirzaei avatar sfc-gh-stan avatar sfc-gh-thoyt avatar sfc-gh-vbudati avatar sfc-gh-vnayak avatar sfc-gh-yixie avatar sfc-gh-yuwang avatar sfc-gh-yzou avatar sfc-gh-zliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snowpark-python's Issues

SNOW-643388: Are CTEs supported?

What is the current behavior?

Using the same dataframe in a self-join results in generated SQL that has multiple copies of the same sub-query.

What is the desired behavior?

It seems beneficial to have such dataframe operations use CTEs when generating the SQL.

How would this improve snowflake-snowpark-python?

More efficient query execution.

References, Other Background

I see that the dataframe api offers the cache_result() function. However, that makes use of temp tables, and requires the end-user to explicitly call cache_result() (rather than the library figuring out when to use CTEs).

[Bug] Pickling compatibility host OS vs. runtime for functions

Reported in JIRA: https://snowflakecomputing.atlassian.net/browse/SNOW-470079

When using Snowpark-python with UDFs from a local MacOS runtime (e.g., installed via brew), the most recent pickle protocol is not supported. This is because Python3.9 uses pickle version 5 as standard protocol, but 3.6, 3.7 use 3, and in 3.8 pickle version 4 is used as default. Confer https://docs.python.org/3/library/pickle.html#data-stream-format for detailed info.

Because a lot of data scientists love to use their local Python which may run on MacOS, forcing serialization using the highest common pickle protocol should be done automatically, or simply pickle protocol 3 used as standard.

A sample error traceback for a simple UDF (lambda x: x * x), input_types = [DoubleType()], return_type=FloatType() is here:

onnection = <snowflake.connector.connection.SnowflakeConnection object at 0x121e05700>
cursor = <snowflake.connector.cursor.SnowflakeCursor object at 0x121de8c10>
error_class = <class 'snowflake.connector.errors.ProgrammingError'>
error_value = {'done_format_msg': False, 'errno': 100357, 'msg': 'Python Interpreter Error:\nValueError: unsupported pickle protocol...ction TEMPUDF_1452976317 with handler udf_py_1696308750.compute', 'sfqid': '019f4131-0503-f02a-0000-9003052c00ba', ...}

    @staticmethod
    def default_errorhandler(
        connection: "SnowflakeConnection",
        cursor: "SnowflakeCursor",
        error_class: Type["Error"],
        error_value: Dict[str, str],
    ) -> None:
        """Default error handler that raises an error.
    
        Args:
            connection: Connections in which the error happened.
            cursor: Cursor in which the error happened.
            error_class: Class of error that needs handling.
            error_value: A dictionary of the error details.
    
        Raises:
            A Snowflake error.
        """
>       raise error_class(
            msg=error_value.get("msg"),
            errno=error_value.get("errno"),
            sqlstate=error_value.get("sqlstate"),
            sfqid=error_value.get("sfqid"),
            done_format_msg=error_value.get("done_format_msg"),
            connection=connection,
            cursor=cursor,
        )
E       snowflake.connector.errors.ProgrammingError: 100357 (P0000): Python Interpreter Error:
E       ValueError: unsupported pickle protocol: 5 in function TEMPUDF_1452976317 with handler udf_py_1696308750.compute

/usr/local/lib/python3.9/site-packages/snowflake/connector/errors.py:188: ProgrammingError

Host OS facts:
macOS Big Sur, 11.6
Python 3.9.7 (default, Sep 3 2021, 12:37:55)
[Clang 12.0.5 (clang-1205.0.22.9)] on darwin

run with 99d9328 (recent master)

SNOW-683625: Standardize RLIKE (REGEXP/RLIKE in Snowflake) and add CONTAINS method to Column

This is more of Snowflake syntax issue.

What is the current behavior?

To match any string containing ABC, .rlike('.*ABC.*')
Also Snowflake CONTAINS is not available for Snowpark Column yet.

What is the desired behavior?

To match any string containing ABC, .rlike('ABC')

How would this improve snowflake-snowpark-python?

Make RLIKE consistent with other SQL, i.e. pySpark and MySQL

References, Other Background

According to Snowflake Docs on REGEXP,

The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$'). To match any string starting with ABC, the pattern would be 'ABC.*'.

SNOW-696531: Moving files from S3 bucket to stage.

I was wondering if there was a way to move a zip file from S3 bucket to a stage? This is not a data file (csv or so), but a zip of some Python functions. I currently upload them to the stage from my local using PUT file://... and import them to my UDF.

But, is there a way to upload the file to my stage from a CDN or an S3 bucket?

SNOW-664933: session.create_dataframe() crashes when the Pandas dataframe is empty.

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.8

  1. What operating system and processor architecture are you using?

Snowflake Python UDF

  1. What are the component versions in the environment (pip freeze)?

Only snowflake-snowpark-python

  1. What did you do?

Using session.create_dataframe() does not work with an empty pandas.DataFrame object.

Here's the full code:

orders_table = session.table('orders')
pandas_df = orders_table.to_pandas()
filtered_pandas_df = pandas_df.loc[pandas_df['CUSTOMER_ID'] == '<DOESNT_EXIST>']
# filtered_df is an empty Pandas dataframe.
snowpark_df = session.create_dataframe(filtered_pandas_df)
# Here I got an exception.

Here it is:

File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/snowpark/session.py:1066, in Session.write_pandas(self, df, table_nam
e, database, schema, chunk_size, compression, on_error, parallel, quote_identifiers, auto_create_table, create_temp_table, overwrite)
   1060     else:                                                                                                                     
   1061         location = (                                       
   1062             (database + "." if database else "")                                                                              
   1063             + (schema + "." if schema else "")
   1064             + (table_name)                                 
   1065         )                                                                                                                     
-> 1066     success, nchunks, nrows, ci_output = write_pandas(                                                                        
   1067         self._conn._conn,
   1068         df,            
   1069         table_name,                                                                                                           
   1070         database=database,                                                                                                    
   1071         schema=schema,   
   1072         chunk_size=chunk_size,                                                                                                
   1073         compression=compression,                                                                                              
   1074         on_error=on_error,                                                                                                    
   1075         parallel=parallel,                                                                                                    
   1076         quote_identifiers=quote_identifiers,                                                                                  
   1077         auto_create_table=auto_create_table,                                                                                  
   1078         create_temp_table=create_temp_table,
   1079         overwrite=overwrite,                                                                                                  
   1080     )                                                      
   1081 except ProgrammingError as pe:                             
   1082     if pe.msg.endswith("does not exist"):
                                                                                                                                      
File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py:183, in write_pandas(conn, df, table_name, 
database, schema, chunk_size, compression, on_error, parallel, quote_identifiers, auto_create_table, create_temp_table, overwrite, tab
le_type)                                                           
    180         raise                                                                                                                 
    182 with TemporaryDirectory() as tmp_folder:                   
--> 183     for i, chunk in chunk_helper(df, chunk_size):                                                                             
    184         chunk_path = os.path.join(tmp_folder, f"file{i}.txt")      
    185         # Dump chunk into parquet file                                                                                        
                                 
File ~/work/venvs/snowpark/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py:37, in chunk_helper(lst, n)
     35 def chunk_helper(lst: T, n: int) -> Iterator[tuple[int, T]]:
     36     """Helper generator to chunk a sequence efficiently with current index like if enumerate was called on sequence."""       
---> 37     for i in range(0, len(lst), n):
     38         yield int(i / n), lst[i : i + n]                                                                                      
                                 
ValueError: range() arg 3 must not be zero     
  1. What did you expect to see?

I expected to get an empty snowflake.snowpark.DataFrame with the defined columns so that I'd be able to write it to a table with no rows.

SNOW-616914: AttributeError: 'SnowflakeCursor' object has no attribute 'query'

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    3.8.13

  2. What operating system and processor architecture are you using?

    Replace with the output of Windows-10-10.0.19042-SP0'

  3. What are the component versions in the environment (pip freeze)?

    Replace with the output of

argon2-cffi @ file:///opt/conda/conda-bld/argon2-cffi_1645000214183/work
argon2-cffi-bindings @ file:///C:/ci/argon2-cffi-bindings_1644569878360/work
asn1crypto==1.5.1
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
attrs @ file:///opt/conda/conda-bld/attrs_1642510447205/work
backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
beautifulsoup4 @ file:///C:/ci/beautifulsoup4_1650274792587/work
bh-snowflake @ file:///C:/Users/local/package
bleach @ file:///opt/conda/conda-bld/bleach_1641577558959/work
certifi==2022.5.18.1
cffi @ file:///C:/ci_310/cffi_1642682485096/work
charset-normalizer==2.0.12
click==8.1.3
cloudpickle==2.0.0
colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work
cryptography==3.3.2
dask==2021.10.0
debugpy @ file:///C:/ci/debugpy_1637073815078/work
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work
distributed==2021.10.0
entrypoints @ file:///C:/ci/entrypoints_1649926621247/work
executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
fastjsonschema @ file:///tmp/build/80754af9/python-fastjsonschema_1620414857593/work/dist
fsspec==2022.5.0
greenlet==1.1.2
HeapDict==1.0.1
idna==3.3
importlib-resources @ file:///tmp/build/80754af9/importlib_resources_1625135880749/work
ipykernel @ file:///C:/ci/ipykernel_1647000978151/work/dist/ipykernel-6.9.1-py3-none-any.whl
ipython @ file:///C:/ci/ipython_1651601994376/work
ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work
jedi @ file:///C:/ci/jedi_1644315425835/work
Jinja2 @ file:///opt/conda/conda-bld/jinja2_1647436528585/work
jsonschema @ file:///C:/ci/jsonschema_1650026220037/work
jupyter-client @ file:///C:/ci/jupyter_client_1650534470695/work
jupyter-core @ file:///C:/ci/jupyter_core_1651656285178/work
jupyterlab-pygments @ file:///tmp/build/80754af9/jupyterlab_pygments_1601490720602/work
locket==1.0.0
MarkupSafe @ file:///C:/ci/markupsafe_1654489871526/work
matplotlib-inline @ file:///tmp/build/80754af9/matplotlib-inline_1628242447089/work
mistune==0.8.4
msgpack==1.0.4
nbclient @ file:///C:/ci/nbclient_1650290386732/work
nbconvert @ file:///C:/ci/nbconvert_1649741005454/work
nbformat @ file:///C:/ci/nbformat_1649845101866/work
nest-asyncio @ file:///C:/ci/nest-asyncio_1649829929372/work
notebook @ file:///C:/ci/notebook_1651648658345/work
numpy==1.22.4
oscrypto==1.3.0
packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work
pandas==1.3.5
pandocfilters @ file:///opt/conda/conda-bld/pandocfilters_1643405455980/work
parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
partd==1.2.0
pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
prometheus-client @ file:///opt/conda/conda-bld/prometheus_client_1643788673601/work
prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1633440160888/work
psutil==5.9.1
pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
pyarrow==5.0.0
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pycryptodomex==3.14.1
Pygments @ file:///opt/conda/conda-bld/pygments_1644249106324/work
PyJWT==2.4.0
pyOpenSSL==20.0.1
pyparsing @ file:///tmp/build/80754af9/pyparsing_1635766073266/work
pyrsistent @ file:///C:/ci/pyrsistent_1636111468851/work
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
pytz==2022.1
pywin32==302
pywinpty @ file:///C:/ci_310/pywinpty_1644230983541/work/target/wheels/pywinpty-2.0.2-cp38-none-win_amd64.whl
PyYAML==6.0
pyzmq @ file:///C:/ci/pyzmq_1638435185959/work
requests==2.28.0
Send2Trash @ file:///tmp/build/80754af9/send2trash_1632406701022/work
six @ file:///tmp/build/80754af9/six_1644875935023/work
snowflake-connector-python==2.7.1
snowflake-snowpark-python==0.7.0
snowflake-sqlalchemy==1.3.4
sortedcontainers==2.4.0
soupsieve @ file:///tmp/build/80754af9/soupsieve_1636706018808/work
SQLAlchemy==1.4.37
stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
tblib==1.7.0
terminado @ file:///C:/ci/terminado_1644322757089/work
testpath @ file:///tmp/build/80754af9/testpath_1624638946665/work
toolz==0.11.2
tornado @ file:///C:/ci/tornado_1606942392901/work
traitlets @ file:///tmp/build/80754af9/traitlets_1636710298902/work
typing_extensions==4.2.0
urllib3==1.26.9
wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
webencodings==0.5.1
wincertstore==0.2
zict==2.2.0
zipp @ file:///C:/ci/zipp_1652274073489/work
  1. What did you do?

I simply followed along the README of this repo.
The specific line that created the error was

session = Session.builder.configs(snowflake_conn_prop).create()
print(session.sql('select current_warehouse(), current_database(), current_schema()').collect())
  1. What did you expect to see?

The selected values.
As for some background, this and all other queries works in my WSL setup running Ubuntu 20.x. I also want this library to function on my native windows platform.

  1. Can you set logging to DEBUG and collect the logs?

    import logging
    
    for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
        logger = logging.getLogger(logger_name)
        logger.setLevel(logging.DEBUG)
        ch = logging.StreamHandler()
        ch.setLevel(logging.DEBUG)
        ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
        logger.addHandler(ch)
    
2022-06-22 11:54:22,394 - MainThread cursor.py:616 - execute() - DEBUG - executing SQL/command
2022-06-22 11:54:22,394 - MainThread cursor.py:696 - execute() - INFO - query: [select current_warehouse(), current_database(), current_schema()]
2022-06-22 11:54:22,394 - MainThread connection.py:1250 - _next_sequence_counter() - DEBUG - sequence counter: 35
2022-06-22 11:54:22,394 - MainThread cursor.py:445 - _execute_helper() - DEBUG - Request id: 0055f58c-e7fc-4093-8030-8632ddc1d27d
2022-06-22 11:54:22,405 - MainThread cursor.py:448 - _execute_helper() - DEBUG - running query [select current_warehouse(), current_database(), current_schema()]
2022-06-22 11:54:22,405 - MainThread cursor.py:457 - _execute_helper() - DEBUG - is_file_transfer: False
2022-06-22 11:54:22,406 - MainThread connection.py:923 - cmd_query() - DEBUG - _cmd_query
2022-06-22 11:54:22,407 - MainThread connection.py:946 - cmd_query() - DEBUG - sql=[select current_warehouse(), current_database(), current_schema()], sequence_id=[35], is_file_transfer=[False]
2022-06-22 11:54:22,407 - MainThread network.py:1117 - _use_requests_session() - DEBUG - Session status for SessionPool 'bta67228.us-east-1.snowflakecomputing.com', SessionPool 1/1 active sessions
2022-06-22 11:54:22,408 - MainThread network.py:823 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
2022-06-22 11:54:22,409 - MainThread network.py:804 - add_request_guid() - DEBUG - Request guid: aa10eab1-d73e-4ca1-ac55-3afc95d585b9
2022-06-22 11:54:22,410 - MainThread network.py:973 - _request_exec() - DEBUG - socket timeout: 60
2022-06-22 11:54:22,554 - MainThread connectionpool.py:452 - _make_request() - DEBUG - [https://xxx.snowflakecomputing.com:443](https:/xxx.snowflakecomputing.com/) "POST /queries/v1/query-request?requestId=0055f58c-e7fc-4093-8030-8632ddc1d27d&request_guid=aa10eab1-d73e-4ca1-ac55-3afc95d585b9 HTTP/1.1" 200 1703
2022-06-22 11:54:22,554 - MainThread network.py:1002 - _request_exec() - DEBUG - SUCCESS
2022-06-22 11:54:22,554 - MainThread network.py:1122 - _use_requests_session() - DEBUG - Session status for SessionPool 'bta67228.us-east-1.snowflakecomputing.com', SessionPool 0/1 active sessions
2022-06-22 11:54:22,554 - MainThread network.py:711 - _post_request() - DEBUG - ret[code] = None, after post request
2022-06-22 11:54:22,554 - MainThread network.py:735 - _post_request() - DEBUG - Query id: 01a51f8e-0402-244c-0000-bc03181f96d2
2022-06-22 11:54:22,560 - MainThread cursor.py:718 - execute() - DEBUG - sfqid: 01a51f8e-0402-244c-0000-bc03181f96d2
2022-06-22 11:54:22,561 - MainThread cursor.py:720 - execute() - INFO - query execution done
2022-06-22 11:54:22,561 - MainThread cursor.py:722 - execute() - DEBUG - SUCCESS
2022-06-22 11:54:22,561 - MainThread cursor.py:725 - execute() - DEBUG - PUT OR GET: None
2022-06-22 11:54:22,562 - MainThread cursor.py:820 - _init_result_and_meta() - DEBUG - Query result format: arrow
2022-06-22 11:54:22,563 - MainThread server_connection.py:324 - run_query() - ERROR - Failed to execute query select current_warehouse(), current_database(), current_schema()
'SnowflakeCursor' object has no attribute 'query'

SNOW-650150: Add some indication to `DataFrame.show()` that it isn't all the results

What is the current behavior?

No indication that show is only printing top results, which could mislead users that aren't aware.

>>> df = session.range(1,12).to_df('col1') # create 11 element sequence
>>> df.show()
----------
|"COL1"  |
----------
|1       |
|2       |
|3       |
|4       |
|5       |
|6       |
|7       |
|8       |
|9       |
|10      |
----------
>>> df.show(n=11)
----------
|"COL1"  |
----------
|1       |
|2       |
|3       |
|4       |
|5       |
|6       |
|7       |
|8       |
|9       |
|10      |
|11      |
----------

What is the desired behavior?

Option 1:
Some indication that this is a paged result. Like 1/n, or 15 out of # rows (if that is efficient to do).

Option 2:
Create a DataFrame.head() function that replicates pandas behavior, making it more obvious that this is a preview.

How would this improve snowflake-snowpark-python?

Users will not accidentally realize that they are omitting results when results sets are right around the threshold of show

References, Other Background

SNOW-698642: ExternalBrowser authentication spins when launched via. WSL

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    3.8.14

  2. What operating system and processor architecture are you using?

    Linux-5.15.74.2-microsoft-standard-WSL2-x86_64-with-glibc2.35

  3. What are the component versions in the environment (pip freeze)?

asn1crypto==1.5.1
certifi==2022.9.24
cffi==1.15.1
charset-normalizer==2.1.1
cloudpickle==2.0.0
cryptography==38.0.3
filelock==3.8.0
idna==3.4
networkx==2.8.8
numpy==1.23.5
oscrypto==1.3.0
pandas==1.5.1
pycparser==2.21
pycryptodomex==3.15.0
PyJWT==2.6.0
pyOpenSSL==22.1.0
python-dateutil==2.8.2
pytz==2022.6
requests==2.28.1
scipy==1.9.3
six==1.16.0
snowflake-connector-python==2.8.2
snowflake-snowpark-python==1.0.0
sql-metadata==2.6.0
sqlparse==0.4.3
typing_extensions==4.4.0
urllib3==1.26.12

  1. What did you do?

Running WSL2 with Ubuntu 22.04.01 with Windows 11. I created a session and set authenticator to externalbrowser. I have installed wslu and set BROWSER=wslview. I have confirmed that I can launch Chrome (my default browser) via WSL. The webpage is launched in chrome but I am stuck with a webpage that stays on Redirecting to Snowflake client in 1 seconds.... I do see an error message that says tcgetpgrp failed: Not a tty. After a while, I get the error message: Failed to connect to DB: <my-account>.snowflakecomputing.com:443, Invalid HTTP request from web browser. Idp authentication could have failed.

  1. What did you expect to see?

To be authenticated as if I was running a Windows machine

  1. Can you set logging to DEBUG and collect the logs?

I don't suspect that it has to do with the code itself.

Thanks!

SNOW-651614: UDTF example returns error

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.8.13

  2. What operating system and processor architecture are you using?

    Windows-10-10.0.19042-SP0

  3. What are the component versions in the environment (pip freeze)?

    asn1crypto @ file:///C:/ci/asn1crypto_1652344143274/work
    Bottleneck @ file:///C:/Windows/Temp/abs_3198ca53-903d-42fd-87b4-03e6d03a8381yfwsuve8/croots/recipe/bottleneck_1657175565403/work
    brotlipy==0.7.0
    certifi @ file:///C:/Windows/TEMP/abs_e9b7158a-aa56-4a5b-87b6-c00d295b01fanefpc8_o/croots/recipe/certifi_1655968940823/work/certifi
    cffi @ file:///C:/Windows/Temp/abs_6808y9x40v/croots/recipe/cffi_1659598653989/work
    charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
    cloudpickle @ file:///tmp/build/80754af9/cloudpickle_1632508026186/work
    cryptography @ file:///C:/ci/cryptography_1639472366776/work
    idna @ file:///tmp/build/80754af9/idna_1637925883363/work
    mkl-fft==1.3.1
    mkl-random @ file:///C:/ci/mkl_random_1626186184278/work
    mkl-service==2.4.0
    numexpr @ file:///C:/Windows/Temp/abs_e2036a32-9fe9-47f3-a04c-dbb1c232ba4b334exiur/croots/recipe/numexpr_1656940304835/work
    numpy @ file:///C:/Windows/Temp/abs_2a1e1vbeag/croots/recipe/numpy_and_numpy_base_1659432712056/work
    oscrypto @ file:///tmp/build/80754af9/oscrypto_1633350059025/work
    packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work
    pandas @ file:///C:/ci/pandas_1641443171711/work
    pyarrow==8.0.0
    pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
    pycryptodomex @ file:///C:/ci/pycryptodomex_1613765448814/work
    PyJWT @ file:///C:/ci/pyjwt_1657529430378/work
    pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
    pyparsing @ file:///tmp/build/80754af9/pyparsing_1635766073266/work
    PySocks @ file:///C:/ci/pysocks_1605287845585/work
    python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
    pytz @ file:///C:/Windows/TEMP/abs_90eacd4e-8eff-491e-b26e-f707eba2cbe1ujvbhqz1/croots/recipe/pytz_1654762631027/work
    requests @ file:///C:/ci/requests_1657717096906/work
    six @ file:///tmp/build/80754af9/six_1644875935023/work
    snowflake-connector-python @ file:///C:/Windows/Temp/abs_2a3xz7c_sg/croots/recipe/snowflake-connector-python_1659699091424/work
    snowflake-snowpark-python @ file:///C:/Windows/TEMP/abs_39001p8e_v/croots/recipe/snowflake-snowpark-python_1659013087223/work
    snowflake-sqlalchemy==1.4.0
    typing_extensions @ file:///C:/Windows/TEMP/abs_dd2d0moa85/croots/recipe/typing_extensions_1659638831135/work
    urllib3 @ file:///C:/Windows/TEMP/abs_65ynz4fdmi/croots/recipe/urllib3_1659110473919/work
    win-inet-pton @ file:///C:/ci/win_inet_pton_1605306167264/work
    wincertstore==0.2

  4. What did you do?

I am using your example code from document.

from snowflake.snowpark.types import IntegerType, StructField, StructType
from snowflake.snowpark.functions import udtf, lit
class GeneratorUDTF:
... def process(self, n):
... for i in range(n):
... yield (i, )
generator_udtf = udtf(GeneratorUDTF, output_schema=StructType([StructField("number", IntegerType())]), input_types=[IntegerType()])
session.table_function(generator_udtf(lit(3))).collect() # Query it by calling it
[Row(NUMBER=0), Row(NUMBER=1), Row(NUMBER=2)]
session.table_function(generator_udtf.name, lit(3)).collect() # Query it by using the name
[Row(NUMBER=0), Row(NUMBER=1), Row(NUMBER=2)]

Or you can lateral-join a UDTF like any other table functions

df = session.create_dataframe([2, 3], schema=["c"])
df.join_table_function(generator_udtf(df["c"])).sort("c", "number").show()

Error:
000904 (42000): SQL compilation error: error line 1 at position 92
invalid identifier 'C'

  1. What did you expect to see?


|"C" |"NUMBER" |

|2 |0 |
|2 |1 |
|3 |0 |
|3 |1 |
|3 |2 |

  1. Can you set logging to DEBUG and collect the logs?

    import logging
    
    for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
        logger = logging.getLogger(logger_name)
        logger.setLevel(logging.DEBUG)
        ch = logging.StreamHandler()
        ch.setLevel(logging.DEBUG)
        ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
        logger.addHandler(ch)
    

SNOW-686233: Ability to Convert 'snowflake.snowpark.dataframe.DataFrame' to spark.sql.DataFrame

What is the current behavior?

Creating DataFrame / Executing sql statements on snowflake returns a snowflake.snowpark.dataframe.DataFrame by default without an option to convert it to a spark DataFrame similar to to_pandas.

I'm planning to run snowpark on Databricks env, are snowflake.snowpark.dataframe.DataFrame distributed across spark workers by default ?

What is the desired behavior?

Provide a to_spark_df()

How would this improve snowflake-snowpark-python?

Ability to run execute queries on snowflake and use the post-computation data in other platforms for specific down stream tasks in a distributed way with spark data frames.

References, Other Background

N/A

SNOW-682238: Long running program iterating over data with to_local_iterator crashing unexpectedly

  1. What version of Python are you using?
Python 3.8.9 (default, Apr  6 2022, 10:43:36)
[GCC 9.4.0]
  1. What operating system and processor architecture are you using?
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
  1. What are the component versions in the environment (pip freeze)?
snowflake-snowpark-python==0.9.0
  1. What did you do?

To summarize, I have a long running program iterating over a huge table. This job performs a lot of processing on the data for every iteration. The code looks something like this:

import gc
import sys

from snowflake.snowpark import (
    DataFrame,
    Session
)

from snowflake.snowpark.row import Row

snowflake_connect_options = {
   "account": os.environ['SNOWFLAKE_ACCOUNT'],
   "user": os.environ['SNOWFLAKE_USER'],
   "password": os.environ['SNOWFLAKE_PASSWORD'],
   "warehouse": os.environ['SNOWFLAKE_WAREHOUSE'],
   "role": os.environ['SNOWFLAKE_ROLE'],
   "database": os.environ['SNOWFLAKE_DATABASE'],
   "client_session_keep_alive": True,
}

session = Session.builder.configs(snowflake_connect_options).create()

local_iterator = self.session.table(qualified_raw_table).to_local_iterator()

data_list = []

for row_data in local_iterator:
   # Perform processing operations on row_data -> processed_data
   data_list.append(processed_data)

   data_size_mb = (sys.getsizeof(data_list) + sum([sys.getsizeof(data_list[x]) for x in range(len(data_list))])) // 1000000

   if data_size_mb  > 500:
      # Write data_list contents to destination
      del data_list[:]
   gc.collect()
  1. What did you expect to see?

I expect the code to run through the local_iterator without issue, instead this happened after 6 hours (meaning it's not the 4 hour timeout messing with me):

File "/path/to/code.py", line X, in code_function
   for row_data in local_iterator:
File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py", line 505, in to_local_iterator
   yield from self._session._conn.execute(
File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/utils.py", line 478, in result_set_to_iter
   row = Row(*data)
TypeError: type object argument after * must be an iterable, not NoneType
  1. Can you set logging to DEBUG and collect the logs?

Unfortunately it's hard for me to do this as the job takes a long time to run and generates a lot of other logs.

pkg_resources.Requirement.parse() replaces underscores with hyphens -> Cannot include most packages with underscores in UDFs / UDTFs / Stored Procs

  1. What version of Python are you using?
    Python 3.8.13 (default, Mar 28 2022, 11:38:47)
    [GCC 7.5.0]

  2. What operating system and processor architecture are you using?
    Linux-5.10.25-linuxkit-x86_64-with-glibc2.17

  3. What are the component versions in the environment (pip freeze)?
    altair==4.2.0 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 asn1crypto==1.4.0 asttokens==2.0.5 attrs==21.4.0 backcall==0.2.0 backports.zoneinfo==0.2.1 beautifulsoup4==4.11.1 bleach==5.0.1 blinker==1.4 blis==0.7.8 cachetools==4.2.2 catalogue==2.0.7 certifi==2021.10.8 cffi==1.15.1 chardet==4.0.0 charset-normalizer==2.0.4 click==8.1.3 click-plugins==1.1.1 cligj==0.7.2 cloudpickle==2.0.0 commonmark==0.9.1 cryptography==36.0.2 cycler==0.11.0 cymem==2.0.6 debugpy==1.6.0 decorator==5.1.1 defusedxml==0.7.1 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl entrypoints==0.4 et-xmlfile==1.1.0 executing==0.8.3 fastjsonschema==2.15.3 filelock==3.7.1 Fiona==1.8.21 fonttools==4.33.3 geojson==2.5.0 geopandas==0.9.0 gitdb==4.0.9 GitPython==3.1.27 gmaps==0.9.0 htbuilder==0.6.0 huggingface-hub==0.8.1 idna==3.3 imageio==2.19.3 importlib-metadata==4.12.0 importlib-resources==5.8.0 ipykernel==6.15.0 ipython==8.4.0 ipython-genutils==0.2.0 ipywidgets==7.7.1 iteration-utilities==0.11.0 jedi==0.18.1 Jinja2==3.1.2 joblib==1.1.0 jsonschema==4.6.1 jupyter==1.0.0 jupyter-client==7.3.4 jupyter-console==6.4.4 jupyter-core==4.10.0 jupyterlab-pygments==0.2.2 jupyterlab-widgets==1.1.1 kiwisolver==1.4.3 langcodes==3.3.0 lxml==4.9.0 MarkupSafe==2.1.1 matplotlib==3.5.2 matplotlib-inline==0.1.3 mistune==0.8.4 munch==2.5.0 murmurhash==1.0.7 nbclient==0.6.6 nbconvert==6.5.0 nbformat==5.4.0 nest-asyncio==1.5.5 networkx==2.8.4 nltk==3.7 notebook==6.4.12 numpy==1.22.3 openpyxl==3.0.10 oscrypto==1.2.1 packaging==21.3 pandas==1.4.2 pandas-datareader==0.10.0 pandocfilters==1.5.0 parso==0.8.3 pathy==0.6.2 patsy==0.5.2 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.1.1 preshed==3.0.6 prometheus-client==0.14.1 prompt-toolkit==3.0.30 protobuf==3.20.1 psutil==5.9.1 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pycryptodomex==3.15.0 pydantic==1.8.2 pydeck==0.7.1 Pygments==2.12.0 PyJWT==2.1.0 Pympler==1.0.1 pyOpenSSL==22.0.0 pyparsing==3.0.9 pyproj==3.3.1 pyrsistent==0.18.1 python-dateutil==2.8.2 pytz==2022.1 pytz-deprecation-shim==0.1.0.post0 PyWavelets==1.3.0 PyYAML==6.0 pyzmq==23.2.0 qtconsole==5.3.1 QtPy==2.1.0 regex==2021.8.3 requests==2.28.1 rich==12.4.4 sacremoses==0.0.53 scikit-image==0.19.2 scikit-learn==1.0.2 scipy==1.7.1 seaborn==0.9.0 semver==2.13.0 Send2Trash==1.8.0 sentence-transformers==2.2.2 sentencepiece==0.1.95 Shapely==1.8.2 six==1.16.0 smart-open==5.2.1 smmap==5.0.0 snowflake-connector-python==2.7.9 snowflake-snowpark-python==0.7.0 soupsieve==2.3.2.post1 spacy==3.3.0 spacy-legacy==3.0.9 spacy-loggers==1.0.2 sqlparse==0.4.2 srsly==2.4.3 st-annotated-text==3.0.0 stack-data==0.3.0 statsmodels==0.13.2 streamlit==1.10.0 terminado==0.15.0 thinc==8.0.17 threadpoolctl==3.1.0 tifffile==2022.5.4 timm==0.5.4 tinycss2==1.1.1 tokenizers==0.12.1 toml==0.10.2 toolz==0.11.2 torch==1.8.1 torchvision==0.9.1 tornado==6.1 tqdm==4.64.0 traitlets==5.3.0 transformers==4.18.0 typer==0.4.1 typing_extensions==4.1.0 tzdata==2022.1 tzlocal==4.2 urllib3==1.26.8 validators==0.20.0 wasabi==0.9.1 watchdog==2.1.9 wcwidth==0.2.5 webencodings==0.5.1 widgetsnbextension==3.6.1 wordcloud==1.8.2.2 xgboost==1.5.0 zipp==3.8.0

  4. What did you do?
    I tried to create a UDF with dependencies on a package with underscores.
    def test() -> None: import spacy self.nlp = spacy.load("en_core_web_sm") return session.udf.register(func=test, packages=['spacy','spacy-model-en_core_web_sm'])

  5. What did you expect to see?
    Expected Result: UDF is created
    Result: Error-Message "Cannot add package spacy-model-en-core-web-sm because it is not available in Snowflake."

My investigations so far:
As you can see, the resulting error message replaced all underscores with hyphens.
My investigation showed that the problem comes from the pkg_resources.Requirement.parse function which replaces underscores with hyphens.
As a result, the package_name does not match anymore.

I can replicate this behaviour by running the following code:
import pkg_resources package_req = pkg_resources.Requirement.parse('spacy-model-en_core_web_sm') package_name = package_req.key print(package_name)

Output: spacy-model-en-core-web-sm

Using underscores in package_names is also discouraged as described here.

Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

List of currently affected packages:
async_generator backports.functools_lru_cache backports.shutil_get_terminal_size backports_abc category_encoders cmake_setuptools console_shortcut cx_oracle essential_generators et_xmlfile factory_boy flask_cors get_terminal_size huggingface_hub idna_ssl importlib_resources ipython_genutils jupyter_client jupyter_core jupyter_server jupyter_telemetry jupyterlab_pygments jupyterlab_server jupyterlab_widgets korean_lunar_calendar line_profiler memory_profiler mpl_sample_data multi_key_dict mypy_extensions opentracing_instrumentation opt_einsum plaster_pastedeploy posix_ipc prometheus_client pure_eval pyramid_jinja2 python_http_client pyviz_comms readme_renderer requests_download requests_ntlm ruamel_yaml semantic_version service_identity smart_open spacy-model-en_core_web_lg spacy-model-en_core_web_md spacy-model-en_core_web_sm spacy-model-en_core_web_trf stack_data typing_inspect vega_datasets win32_setctime

SNOW-654710: Drop cached temporary tables that are no longer being referenced

What is the current behavior?

cache_result does not clean up a temporary table when it is no longer being referenced. This results in unnecessary storage credits throughout the entirety of the session whenever a result is re-cached, which is common thing to do in an interactive session.

>>> dff = session.create_dataframe([1,2,3])

>>> dff.cache_result().explain()

---------DATAFRAME EXECUTION PLAN----------
Query List:
1.
SELECT  *  FROM (SNOWPARK_TEMP_TABLE_X3FCJ1U38A)
...

--------------------------------------------

>>> dff.cache_result().explain()

---------DATAFRAME EXECUTION PLAN----------
Query List:
1.
SELECT  *  FROM (SNOWPARK_TEMP_TABLE_Z9H68STVDH)
....

What is the desired behavior?

Ideal behavior would be a temporary table cleanup by defining a __del__ method for snowflake.snowpark.dataframe.DataFrame. I'll admit that I don't have a deep understanding of snowpark's innerworkings, but on the surface it seems like all cached results are essentialy a select * from (temp_table_name) which seems generalizable. In pseudo-ish code, it could be something like:

# Module: snowflake.snowpark.dataframe.DataFrame

class DataFrame:
...

    def __del__(self):
        if self.is_cached:
            temp_table = # extract name from self.queries string or some other method
            self._session.sql(f'drop table {temp_table').collect()

How would this improve snowflake-snowpark-python?

Avoid overcharging on storage credits :)

References, Other Background

SNOW-649339: Allow usage of a pre-configured NAMED File Format working with data in staged files

What is the current behavior?

Currently all the DataFrameReader file methods (.csv, .parquet, .json, etc) create temp file formats with options from python dicitionaries.

Even if the external stage has a default (name) file format attached the reader methods can fail when/if file format options are not passed to the DataFrameReader.

This also ties into the copy_into_table method.

What is the desired behavior?

Be able to pass a pre-configured file format name and/or have the DataFrameReader methods use the attached file format to the external stage.

How would this improve snowflake-snowpark-python?

  • Allow users not to have to re-configure file format options in snowpark, just use the file format/stage that's already set up in snowflake.

  • Mix and match different file formats between stages without having to reset all the file/copy options again.

References, Other Background

https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/_internal/analyzer/snowflake_plan.py#L685
Could bypass this query if a user passes a named file format, or the option to use the file format configured for the selected stage.

SNOW-664934: Dataframe schemas don't persist between Pandas and Snowpark

  1. What version of Python are you using?

Python 3.8

  1. What operating system and processor architecture are you using?

Snowflake Python UDF

  1. What are the component versions in the environment (pip freeze)?

Only snowflake-snowpark-python

  1. What did you do?

Snowpark has a problem / bug that it does not maintain the types between Pandas and Snowpark, nor does it allow to manually set its schema.

For instance,

df1 = session.sql(sql).to_pandas()
df2 = session.create_dataframe(df)

The timestamp field on df1 with TimestampType has become a LongType.

I've also tried to store the schema and use it, but same results.

df1 = session.sql(sql)
df1_schema = df1.schema
df1 = df1.to_pandas()
df2 = session.create_dataframe(df, df1_schema)

This stops me from being able to write the DataFrame back to the table as it needs to be of TimestampType rather than LongType.

  1. What did you expect to see?

The schemas persist among the conversions.

SNOW-616002: Ability to overwrite tables when using write_pandas()

What is the current behavior?

Using the write_pandas() API there is no way to overwrite contents of a Snowflake table with that of a Pandas DataFrame without first executing a truncate SQL statement.

What is the desired behavior?

Using the write_pandas() API there should be a way to optionally overwrite contents of a Snowflake table with that of a Pandas DataFrame.

How would this improve snowflake-snowpark-python?

Better user experience for Python developers and especially those devs that are not familiar with truncate SQL statement or know how to write SQL.

References, Other Background

Similar to overwriting contents of a Snowflake table when writing contents of Snowpark DataFrames using df.write.mode('overwrite')

SNOW-656406: `table.merge()` tries to reference a Snowpark Temp Table that does not exist

Been experiencing this issue when trying to write a MERGE operation using Snowpark. Thanks for taking a look!

  1. What version of Python are you using?
python --version
Python 3.8.13
  1. What operating system and processor architecture are you using?
python -c 'import platform; print(platform.platform())'
macOS-12.2-arm64-arm-64bit
  1. What are the component versions in the environment (pip freeze)?

Note: current version of Snowflake-Snowpark is 0.9.0 but have also been experiencing the same issue in 0.7.0 and 0.8.0

agate==1.6.3
alembic==1.6.5
aniso8601==7.0.0
anyio==3.6.1
asn1crypto==1.5.1
attrs==22.1.0
Babel==2.10.3
beautifulsoup4==4.11.1
black==22.6.0
bleach==5.0.1
boto3==1.24.66
botocore==1.27.66
braintree==4.16.0
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
cloudpickle==2.0.0
colorama==0.4.5
coloredlogs==14.0
commonmark==0.9.1
coverage==6.4.4
croniter==1.3.5
cryptography==36.0.2
dagit==0.15.9
dagster==0.15.9
dagster-aws==0.15.9
dagster-cloud==0.15.9
dagster-cloud-cli==0.15.9
dagster-dbt==0.15.9
dagster-graphql==0.15.9
dagster-pandas==0.15.9
-e git+ssh://[email protected]/dribbble/data-sdk.git@20e06af78f1c389a62c2247eae6ecab9adb52133#egg=data_sdk
dbt-core==1.2.0
dbt-extractor==0.4.1
dbt-junitxml==0.1.5
dbt-snowflake==1.2.0
defusedxml==0.7.1
Deprecated==1.2.13
docstring-parser==0.14.1
elasticsearch==7.17.6
elasticsearch-dsl==7.4.0
entrypoints==0.4
eventbrite==3.3.5
fastjsonschema==2.16.1
filelock==3.8.0
flatdict==4.0.1
future==0.18.2
gql==2.0.0
graphene==2.1.9
graphql-core==2.3.2
graphql-relay==2.0.1
grpcio==1.48.1
grpcio-health-checking==1.43.0
h11==0.13.0
hologram==0.0.15
httptools==0.4.0
humanfriendly==10.0
idna==3.3
importlib-metadata==4.12.0
iniconfig==1.1.1
isodate==0.6.1
jaraco.classes==3.2.2
Jinja2==2.11.3
jmespath==1.0.1
jsonschema==3.2.0
junit-xml==1.9
jupyter-core==4.11.1
jupyter_client==7.3.5
jupyterlab-pygments==0.2.2
keyring==23.9.0
leather==0.3.4
Logbook==1.5.3
Mako==1.2.2
MarkupSafe==2.0.1
mashumaro==2.9
minimal-snowplow-tracker==0.0.2
mistune==0.8.4
more-itertools==8.14.0
msgpack==1.0.4
mypy-extensions==0.4.3
nbclient==0.5.13
nbconvert==6.4.5
nbformat==5.4.0
nest-asyncio==1.5.5
networkx==2.8.6
numpy==1.23.2
oscrypto==1.3.0
packaging==21.3
pandas==1.4.4
pandocfilters==1.5.0
parsedatetime==2.4
pathspec==0.9.0
pendulum==2.1.2
pep562==1.1
platformdirs==2.5.2
pluggy==1.0.0
promise==2.3
prompt-toolkit==3.0.31
protobuf==3.20.1
psycopg2==2.9.3
py==1.11.0
pycparser==2.21
pycryptodomex==3.15.0
PyGithub @ git+https://github.com/dribbble/PyGithub@538b30b5814fa5093b3785ed022ec58cc2e86bf0
Pygments==2.13.0
PyJWT==2.4.0
PyNaCl==1.5.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.18.1
pytest==7.1.2
pytest-cov==3.0.0
python-dateutil==2.8.2
python-decouple==3.6
python-dotenv==0.21.0
python-editor==1.0.4
python-slugify==6.1.2
pytimeparse==1.1.8
pytz==2022.2.1
pytzdata==2020.1
PyYAML==6.0
pyzmq==23.2.1
questionary==1.10.0
requests==2.28.1
rich==12.5.1
Rx==1.6.1
s3transfer==0.6.0
shellingham==1.5.0
six==1.16.0
sniffio==1.3.0
snowflake-connector-python==2.7.12
snowflake-snowpark-python==0.9.0
soupsieve==2.3.2.post1
SQLAlchemy==1.4.40
sqlparse==0.4.2
starlette==0.20.4
tabulate==0.8.10
testpath==0.6.0
text-unidecode==1.3
tomli==2.0.1
toposort==1.7
tornado==6.2
tqdm==4.64.1
traitlets==5.3.0
typer==0.6.1
typing-compat==0.1.0
typing_extensions==4.3.0
urllib3==1.26.12
uvicorn==0.18.3
uvloop==0.16.0
watchdog==2.1.9
watchfiles==0.16.1
wcwidth==0.2.5
webencodings==0.5.1
websockets==10.3
Werkzeug==2.1.2
wrapt==1.14.1
zipp==3.8.1
  1. What did you do?

I tried to run the following (target being a Snowpark Table and source being a Snowpark DataFrame)

merge_mapping = {v: source[v] for v in source.columns}

target.merge(
    source,
    target["id"] == source["id"],
    [
        when_matched().update(merge_mapping),
        when_not_matched().insert(merge_mapping),
    ],
)
  1. What did you expect to see?

I expected the statement to merge source into target, using an update + insert strategy based on whether a record was matched or not matched according to the id field.

What I got was an error that SNOWPARK_TEMP_TABLE_X does not exist or is not authorized (X being a random alphanumeric string of length 10). I noticed that the error only occurred when there were more than a few rows (e.g. > 10) in the source DataFrame. Running this statement in tests with <= 10 rows in the source DataFrame produced the expected result.

Full stack trace below:

snowflake.snowpark.exceptions.SnowparkSQLException: (1304): 002003 (42S02): SQL compilation error:
Object 'SNOWPARK_TEMP_TABLE_4G6YDVDRWU' does not exist or not authorized.
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
    yield
  File "/usr/local/lib/python3.8/site-packages/dagster/_utils/__init__.py", line 421, in iterate_with_context
    next_output = next(iterator)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/execution/plan/execute_step.py", line 554, in _gen_fn
    gen_output = output_manager.handle_output(output_context, output.value)
  File "/opt/dagster/dagster_home/dagster_analytics/dagster_analytics/io_managers/snowpark.py", line 111, in handle_output
    merged_rows = table.merge(
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/telemetry.py", line 76, in wrap
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/table.py", line 506, in merge
    new_df._internal_collect_with_tag(),
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py", line 446, in _internal_collect_with_tag
    return self._session._conn.execute(
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 368, in execute
    result_set, result_meta = self.get_result_set(
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/snowflake_plan.py", line 149, in wrap
    raise ne.with_traceback(tb) from None
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/snowflake_plan.py", line 82, in wrap
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 405, in get_result_set
    result = self.run_query(
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 103, in wrap
    raise ex
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 97, in wrap
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 325, in run_query
    raise ex
  File "/usr/local/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 317, in run_query
    results_cursor = self._cursor.execute(query, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 804, in execute
    Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
  File "/usr/local/lib/python3.8/site-packages/snowflake/connector/errors.py", line 276, in errorhandler_wrapper
    handed_over = Error.hand_to_other_handler(
  File "/usr/local/lib/python3.8/site-packages/snowflake/connector/errors.py", line 331, in hand_to_other_handler
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "/usr/local/lib/python3.8/site-packages/snowflake/connector/errors.py", line 210, in default_errorhandler
    raise error_class(
  1. Can you set logging to DEBUG and collect the logs?

I can do this if necessary, can also provide the compiled SQL statements that Snowflake tried to run.

SNOW-687991: Align the table_name quotation in save_as_table() and write_pandas()

What is the current behavior?

  • snowpark_df.write.save_as_table('abc') creates a table ABC in snowflake
  • session.write_pandas(pandas_df, 'abc') creates a table abc in snowflake
  • session.write_pandas(pandas_df, 'abc', quote_identifiers=False) creates a table ABC in snowflake

What is the desired behavior?

  • session.write_pandas(pandas_df, 'abc') creates a table ABC in snowflake
  • session.write_pandas(pandas_df, 'abc', quote_identifiers=True) creates a table abc in snowflake

How would this improve snowflake-snowpark-python?

Make writing Snowflake Table consistent

References, Other Background

As explained in write_pandas docs

quote_identifiers – By default, identifiers, specifically database, schema, table and column names (from DataFrame.columns) will be quoted. If set to False, identifiers are passed on to Snowflake without quoting, i.e. identifiers will be coerced to uppercase by Snowflake.

SNOW-641398: CREATE TABLE privilege required on schemas when appending data

I am trying to insert data into a Snowflake table using a role which has insert privileges on table my_table, but not create table on my_table's schema. I expect this to work as follows:

target_table = session.table("my_schema.my_table")
insert_table = session.create_dataframe(
    [[1]],
    schema=target_table.schema
)
insert_table.write.save_as_table("my_schema.my_table", mode="append")

However, on running this, I get the error Insufficient privileges to operate on schema 'MY_SCHMEA'. Granting CREATE TABLE privileges on my_schema to the role being used by session resolves the problem, but should be unnecessary when I only want to append data (and the table already exists).

SNOW-656385: Cannot import snowflake.snowpark.session in Google Colab

Please answer these questions before submitting your issue. Thanks!

  1. Pyrhon version : 3.8
    3.8.1 (default, Jan 8 2020, 22:29:32) \n[GCC 7.3.0]

  2. What operating system and processor architecture are you using?
    Google colab with conda installation through

!wget -O mini.sh https://repo.anaconda.com/miniconda/Miniconda3-py38_4.8.2-Linux-x86_64.sh
!chmod +x mini.sh
!bash ./mini.sh -b -f -p /usr/local
!conda install -q -y jupyter
!conda install -q -y google-colab -c conda-forge
!python -m ipykernel install --name "py38" --user

Conda info :

active environment : None
       user config file : /root/.condarc
 populated config files : 
          conda version : 4.14.0
    conda-build version : not installed
         python version : 3.8.1.final.0
       virtual packages : __linux=5.4.188=0
                          __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /usr/local  (writable)
      conda av data dir : /usr/local/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /usr/local/pkgs
                          /root/.conda/pkgs
       envs directories : /usr/local/envs
                          /root/.conda/envs
               platform : linux-64
             user-agent : conda/4.14.0 requests/2.22.0 CPython/3.8.1 Linux/5.4.188+ ubuntu/18.04.6 glibc/2.27
                UID:GID : 0:0
             netrc file : None
           offline mode : False4````

   Replace with the output of `python -c 'import platform; print(platform.platform())'`

4. What are the component versions in the environment (`pip freeze`)?

aiohttp==3.8.1
aiosignal==1.2.0
anyio==3.5.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asn1crypto==1.3.0
asttokens==2.0.5
async-timeout==4.0.2
attrs==21.4.0
Babel==2.9.1
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==4.1.0
cachetools==5.2.0
certifi==2022.6.15
cffi==1.14.0
chardet==3.0.4
charset-normalizer==2.1.1
cloudpickle==2.0.0
conda==4.14.0
conda-package-handling==1.6.0
cryptography==35.0.0
cytoolz==0.11.0
debugpy==1.5.1
decorator==5.1.1
defusedxml==0.7.1
entrypoints==0.4
executing==0.8.3
fastjsonschema==2.16.2
filelock==3.6.0
frozenlist==1.2.0
google-auth==2.11.0
google-colab==1.0.0
idna==2.8
importlib-resources==5.2.0
ipykernel==6.9.1
ipython==8.4.0
ipython-genutils==0.2.0
ipywidgets==7.6.5
jedi==0.18.1
Jinja2==3.0.3
json5==0.9.6
jsonschema==4.4.0
jupyter==1.0.0
jupyter-client==7.1.2
jupyter-console==6.4.3
jupyter-core==4.10.0
jupyter-server==1.18.1
jupyterlab==3.4.4
jupyterlab-pygments==0.1.2
jupyterlab-server==2.12.0
jupyterlab-widgets==1.0.0
MarkupSafe==2.1.1
matplotlib-inline==0.1.6
mistune==0.8.4
multidict==5.2.0
nbclassic==0.3.5
nbclient==0.5.13
nbconvert==6.4.4
nbformat==5.3.0
nest-asyncio==1.5.5
notebook==6.4.12
numpy==1.17.5
oscrypto==1.2.1
packaging==21.3
pandas==1.0.5
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
portpicker==1.3.1
prometheus-client==0.14.1
prompt-toolkit==3.0.20
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==8.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.7
pycosat==0.6.3
pycparser==2.19
pycryptodomex==3.15.0
Pygments==2.11.2
PyJWT==2.4.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.18.0
PySocks==1.7.1
python-dateutil==2.8.2
pytz==2022.1
pyu2f==0.1.5
pyzmq==22.3.0
qtconsole==5.3.2
QtPy==2.2.0
requests==2.22.0
rsa==4.9
ruamel-yaml==0.15.87
Send2Trash==1.8.0
sip==4.19.13
six==1.14.0
sniffio==1.2.0
snowflake-connector-python==2.7.12
snowflake-snowpark-python==0.8.0
soupsieve==2.3.1
stack-data==0.2.0
terminado==0.13.1
testpath==0.6.0
toolz==0.11.2
tornado==6.1
tqdm==4.42.1
traitlets==5.1.1
typing-extensions==4.3.0
urllib3==1.25.8
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==0.58.0
widgetsnbextension==3.5.2
yarl==1.6.3
zipp==3.8.0


5. What did you do?

 Cannot import from snowflake.snowpark.session import Session 
 
6. What did you expect to see?
 Googe fails to import snowflake.snowpark.session. I cannot soart a  Snowflake session in colab

cf google colab notebook
https://colab.research.google.com/drive/1tKhlTJ8r_gbwTpwRd3SOv0FBCOm3tgkf?usp=sharing

   What should have happened and what happened instead?

8. Can you set logging to DEBUG and collect the logs?



<!--
If you need urgent assistance reach out to support for escalated issue processing https://community.snowflake.com/s/article/How-To-Submit-a-Support-Case-in-Snowflake-Lodge
-->

SNOW-698565: SnowPark interface cannot be analysed by mypy (missing typed.py)

What is the current behavior?

When running mypy on code that uses the SnowPark API the following style of error is reported for each SnowPark module referenced.:

error: Skipping analyzing "snowflake.snowpark": module is installed, but missing library stubs or py.typed marker

What is the desired behavior?

Code using methods and classes from SnowPark modules can be checked.

How would this improve snowflake-snowpark-python?

Can use mypy to analyse code using Python SnowPark for better type-safety.

References, Other Background

See snowflakedb/snowflake-connector-python#991 for the same issue, now fixed, with the base connector)

SNOW-646934: Installing Snowpark using pip

I am trying to create AWS Lambda layer for snowpark and I get errors that snowpark model is not found

  1. What version of Python are you using?
    3.8

  2. What operating system and processor architecture are you using?

Linux-4.14.287-215.504.amzn2.x86_64-x86_64-with-glibc2.2.5

  1. What are the component versions in the environment (pip freeze)?

asn1crypto==1.5.1
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.0
cloudpickle==2.0.0
cryptography==36.0.2
idna==3.3
oscrypto==1.3.0
pycparser==2.21
pycryptodomex==3.15.0
PyJWT==2.4.0
pyOpenSSL==22.0.0
pytz==2022.2.1
requests==2.28.1
snowflake-connector-python==2.7.11
snowflake-snowpark-python==0.8.0
typing-extensions==4.3.0
urllib3==1.26.11

  1. What did you do?
    I did pip install in Amazon Linux 2
    Open your AWS Cloud9 Amazon EC2 environment.

  2. What did you expect to see?
    I do not see snowpark folder under snowflake directory. I do see that when I pip install in mac

  3. Can you set logging to DEBUG and collect the logs?

    import logging
    
    for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
        logger = logging.getLogger(logger_name)
        logger.setLevel(logging.DEBUG)
        ch = logging.StreamHandler()
        ch.setLevel(logging.DEBUG)
        ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
        logger.addHandler(ch)
    

Force schema for a list of dictionaries returns NULLs

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

3.8.10 64 bits

  1. What operating system and processor architecture are you using?

Windows 11

  1. What are the component versions in the environment (pip freeze)?

asn1crypto==1.5.1
asttokens==2.0.5
azure-core==1.24.2
azure-monitor-opentelemetry-exporter==1.0.0b6
backcall==0.2.0
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.0.12
cloudpickle==2.0.0
colorama==0.4.5
cryptography==36.0.2
debugpy==1.6.2
decorator==5.1.1
Deprecated==1.2.13
entrypoints==0.4
executing==0.8.3
idna==3.3
ipykernel==6.15.1
ipython==8.4.0
isodate==0.6.1
jedi==0.18.1
jupyter-client==7.3.4
jupyter-core==4.11.1
matplotlib-inline==0.1.3
msrest==0.7.1
nest-asyncio==1.5.5
ntlm-auth==1.5.0
numpy==1.23.1
oauthlib==3.2.0
opentelemetry-api==1.12.0rc2
opentelemetry-sdk==1.12.0rc2
opentelemetry-semantic-conventions==0.32b0
oscrypto==1.3.0
packaging==21.3
pandas==1.4.3
parso==0.8.3
pickleshare==0.7.5
prompt-toolkit==3.0.30
psutil==5.9.1
pure-eval==0.2.2
pyarrow==6.0.1
pycparser==2.21
pycryptodomex==3.15.0
Pygments==2.12.0
PyJWT==2.4.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pypiwin32==223
python-dateutil==2.8.2
pytz==2022.1
pywin32==304
pyzmq==23.2.0
requests==2.28.1
requests-negotiate-sspi==0.5.2
requests-ntlm==1.1.0
requests-oauthlib==1.3.1
six==1.16.0
snowflake-connector-python==2.7.9
snowflake-snowpark-python==0.8.0
stack-data==0.3.0
tornado==6.2
traitlets==5.3.0
typing_extensions==4.3.0
urllib3==1.26.10
wcwidth==0.2.5
wrapt==1.14.1

  1. What did you do?

Firstly, create a dataframe from a list of dictionaries

myList = [
	{
		'foo':12,
		'bar':14
	},
	{
		'foo':52,
		'bar':641
	},
	{
		'foo':6,
		'bar':84
	}
]

dfmyList = session.create_dataframe(myList)
print(dfmyList.schema)

StructType([StructField('FOO', LongType(), nullable=False), StructField('BAR', LongType(), nullable=False)])

dfmyList.show()

-----------------
|"FOO"  |"BAR"  |
-----------------
|12     |14     |
|52     |641    |
|6      |84     |
-----------------

After that, I try the same but forcing the schema:

from snowflake.snowpark.types import IntegerType, StringType, StructType, StructField, LongType, ArrayType, BooleanType, DateType
-- from the output of the previous dfmyList.schema
myListSchema = StructType([StructField('FOO', LongType(), nullable=False), StructField('BAR', LongType(), nullable=False)])

dfmyListSchema = session.create_dataframe(myList, myListSchema)

print(dfmyListSchema.schema)

StructType([StructField('FOO', LongType(), nullable=True), StructField('BAR', LongType(), nullable=True)])

dfmyListSchema.show()

-----------------
|"FOO"  |"BAR"  |
-----------------
|NULL   |NULL   |
|NULL   |NULL   |
|NULL   |NULL   |
-----------------
  1. What did you expect to see?
 -----------------
|"FOO"  |"BAR"  |
-----------------
|12     |14     |
|52     |641    |
|6      |84     |
-----------------
  1. Can you set logging to DEBUG and collect the logs?

No

SNOW-612394: Support for Python >= 3.9

What is the current behavior?

Install fails on python 3.9

What is the desired behavior?

That it works on python 3.9. Our use-case is installing snowpark in the jupyterhub/docker-stacks scipy image, which is in fact already on python 3.10.

How would this improve snowflake-snowpark-python?

More flexibility. Is there really such a breaking change in python 3.9 that warrants this oddly specific version pin on 3.8? I suspect just releasing the pin would probably be enough, as I'm not aware of any non-compatibilities between 3.8 and >=3.9

SNOW-666918: session.write_pandas with create_temp_table fails

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
session.write_pandas(input_table_df, "output_table", auto_create_table=True, create_temp_table=True)
  1. What did you expect to see?

A temporary table created.

  1. Logs in the snowpark console when running the stored procedure.
snowflake.connector.errors.ProgrammingError: 090236 (42601): Stored procedure execution error: Unsupported statement type 'temporary TABLE'.
 in function MAKE_RATIO with handler make_ratio   

SNOW-693539: Allow case insensitive access to fields on a row

What is the current behavior?

Employee=Row('FIRSTNAME', 'LASTNAME', 'GENDER', 'SALARY')
employee1=Employee('Robert', 'Williams', 'M', 62)
print(employee1.firstname)

This will throw an exception indicating that the field firstname is not present

What is the desired behavior?

Allow to use the attributes in case insensitive way

How would this improve snowflake-snowpark-python?

it will make it easier in scenarios where you deal with rows.

for example is common to have methods like

def processRows(l):
for m in l:
m.firstName + m.lastName

in spark probably can do:

rows = df.collect()
processRows(rows)

but in SF we will need to adjust to method to match the fieldname

References, Other Background

the code in the Row class already has a getattr

So we can do something like:

Employee=Row('FIRSTNAME', 'LASTNAME', 'GENDER', 'SALARY',Row.CASE_INSENSITIVE)
employee1=Employee('Robert', 'Williams', 'M', 62)
print(employee1.firstname)

A test modification for Row can be:

class Row(tuple):
    """Represents a row in :class:`DataFrame`.

    It is immutable and works like a tuple or a named tuple.

    >>> row = Row(1, 2)
    >>> row
    Row(1, 2)
    >>> row[0]
    1
    >>> len(row)
    2
    >>> row[0:1]
    Row(1)
    >>> named_row = Row(name1=1, name2=2)
    >>> named_row
    Row(name1=1, name2=2)
    >>> named_row["name1"]
    1
    >>> named_row.name1
    1
    >>> row == named_row
    True

    A ``Row`` object is callable. You can use it to create other ``Row`` objects:

    >>> Employee = Row("name", "salary")
    >>> emp1 = Employee("John", 10000)
    >>> emp1
    Row(name='John', salary=10000)
    >>> emp2 = Employee("James", 20000)
    >>> emp2
    Row(name='James', salary=20000)

    """
    
    def __new__(cls, *values: Any, **named_values: Any):
        if values and named_values:
            raise ValueError("Either values or named_values is required but not both.")
        if named_values:
            # After py3.7, dict is ordered(not sorted) by item insertion sequence.
            # If we support 3.6 or older someday, this implementation needs changing.
            row = tuple.__new__(cls, tuple(named_values.values()))
            row.__dict__["_case_insensitive"] = False
            row.__dict__["_named_values"] = named_values
            row.__dict__["_fields"] = tuple(named_values.keys())
        else:

            is_ci = values[-1]==Row.CASE_INSENSITIVE
            if is_ci:
               values = values[:-1]
            row = tuple.__new__(cls, values)
            row.__dict__["_case_insensitive"] = is_ci
            row.__dict__["_named_values"] = None
            row.__dict__["_fields"] = None

        # _fields is for internal use only. Users shouldn't set this attribute.
        # It contains a list of str representing column names. It also allows duplicates.
        # snowflake DB can return duplicate column names, for instance, "select a, a from a_table."
        # When return a DataFrame from a sql, duplicate column names can happen.
        # But using duplicate column names is obviously a bad practice even though we allow it.
        # It's value is assigned in __setattr__ if internal code assign value explicitly.
        row.__dict__["_has_duplicates"] = None
        return row

    def __getitem__(self, item: Union[int, str, slice]):
        if isinstance(item, int):
            return super().__getitem__(item)
        elif isinstance(item, slice):
            return Row(*super().__getitem__(item))
        else:  # str
            self._populate_named_values_from_fields()
            # get from _named_values first
            if self._named_values:
                return self._named_values[item]
            # we have duplicated fields and _named_values is not populated,
            # so indexing fields
            elif self._fields and self._check_if_having_duplicates():
                try:
                    index = self._fields.index(item.upper())  # may throw ValueError
                    return super().__getitem__(index)  # may throw IndexError
                except (IndexError, ValueError):
                    raise KeyError(item)
            # no column names/keys/fields
            else:
                raise KeyError(item)

    def __setitem__(self, key, value):
        raise TypeError("Row object does not support item assignment")

    def __getattr__(self, item):
        if self._case_insensitive:
            item = item.upper()
        self._populate_named_values_from_fields()
        if self._named_values and item in self._named_values:
            return self._named_values[item]
        elif self._fields and self._check_if_having_duplicates():
            try:
                index = self._fields.index(item.upper())  # may throw ValueError
                return self[index]  # may throw IndexError
            except (IndexError, ValueError):
                raise AttributeError(f"Row object has no attribute {item}")
        else:
            raise AttributeError(f"Row object has no attribute {item}")

    def __setattr__(self, key, value):
        if key != "_fields":
            raise AttributeError("Can't set attribute to Row object")
        if value is not None:
            self.__dict__["_fields"] = value

    def __contains__(self, item):
        self._populate_named_values_from_fields()
        if self._named_values:
            return item in self._named_values
        elif self._fields:
            return item in self._fields
        else:
            return super().__contains__(item)

    def __call__(self, *args, **kwargs):
        """Create a new Row from current row."""
        if args and kwargs:
            raise ValueError(
                "A Row instance should have either values or field-value pairs but not both."
            )
        elif args and len(args) != len(self):
            raise ValueError(f"{len(self)} values are expected.")
        self._populate_named_values_from_fields()
        if self._named_values:
            if args:
                raise ValueError(
                    "The Row object can't be called with a list of values"
                    "because it already has fields and values."
                )
            new_row = Row(**self._named_values)
            for input_key, input_value in kwargs.items():
                if input_key not in self._named_values:
                    raise ValueError(f"Wrong keyword argument {input_key} for {self}")
                new_row._named_values[input_key] = input_value
            return new_row
        elif self._fields and self._check_if_having_duplicates():
            raise ValueError(
                "The Row object can't be called because it has duplicate fields"
            )
        else:
            if kwargs:
                raise ValueError(
                    "The Row object can't be called with field-value pairs "
                    "because it doesn't have field-value pairs"
                )
            if len(args) != len(self) or any(
                not isinstance(value, str) for value in self
            ):
                raise ValueError(
                    "The called Row object and input values must have the same size and the called Row object shouldn't have any non-str fields."
                )
            new_row= Row(**{k: v for k, v in zip(self, args)})
            new_row.__dict__["_case_insensitive"] = self._case_insensitive
            return new_row

    def __copy__(self):
        return _restore_row_from_pickle(self, self._named_values, self._fields)

    def __repr__(self):
        if self._fields:
            return "Row({})".format(
                ", ".join(f"{k}={v!r}" for k, v in zip(self._fields, self))
            )
        elif self._named_values:  # pragma: no cover
            # this might not be reachable because when there is self._named_values, there is always self._fields
            # Need to review later.
            return "Row({})".format(
                ", ".join(f"{k}={v!r}" for k, v in self._named_values.items())
            )

        else:
            return "Row({})".format(", ".join(f"{v!r}" for v in self))

    def __reduce__(self):
        return (
            _restore_row_from_pickle,
            (tuple(self), self._named_values, self._fields),
        )

    def as_dict(self, recursive: bool = False) -> Dict:
        """Convert to a dict if this row object has both keys and values.

        Args:
            recursive: Recursively convert child :class:`Row` objects to dicts. Default is False.

        >>> row = Row(name1=1, name2=2, name3=Row(childname=3))
        >>> row.as_dict()
        {'name1': 1, 'name2': 2, 'name3': Row(childname=3)}
        >>> row.as_dict(True)
        {'name1': 1, 'name2': 2, 'name3': {'childname': 3}}
        """
        self._populate_named_values_from_fields()
        if not self._named_values:
            raise TypeError(
                "Cannot convert a Row without key values or duplicated keys to a dict."
            )
        if not recursive:
            return dict(self._named_values)
        return self._convert_dict(self._named_values)

    def _convert_dict(
        self, obj: Union["Row", Dict, Iterable[Union["Row", Dict]]]
    ) -> Union[Dict, Iterable[Dict]]:
        if isinstance(obj, Row):
            return obj.as_dict(True)
        elif isinstance(obj, dict):
            child_dict = {}
            for k, v in obj.items():
                child_dict[k] = self._convert_dict(v)
            return child_dict
        elif isinstance(obj, Iterable) and not isinstance(obj, (str, bytes, bytearray)):
            return [self._convert_dict(x) for x in obj]

        return obj

    def _populate_named_values_from_fields(self) -> None:
        # populate _named_values dict if we have unduplicated fields
        if (
            self._named_values is None
            and self._fields
            and not self._check_if_having_duplicates()
        ):
            self.__dict__["_named_values"] = {k: v for k, v in zip(self._fields, self)}

    def _check_if_having_duplicates(self) -> bool:
        if self._has_duplicates is None:
            # Usually we don't have duplicate keys
            self.__dict__["_has_duplicates"] = bool(
                len(set(self._fields)) != len(self._fields)
            )
        return self._has_duplicates

    # Add aliases for user code migration
    asDict = as_dict

Row.CASE_INSENSITIVE = "@@@@@@CS@@@@@@"

SNOW-672261: Add functions `first_value` and `last_value`

What is the current behavior?

The functions first_value and last_value are not currently available, although there is groundwork already for ranked functions (i.e. lag / lead). While it is possible to use call_function("last_value", col("colname")) for basic use of this function, it does not appear to be possible to execute the ignore nulls version of these functions (i.e. last_value(colname ignore nulls) over ...)

What is the desired behavior?

How would this improve snowflake-snowpark-python?

References, Other Background

update snowflake-snowpark-python package in Snowflake

What is the current behavior?

snowflake-snowpark-python v0.12.0 was released Oct 14th 2022, the Snowflake Packages still only has v0.10.0 as of Oct 26 2022.

What is the desired behavior?

Synchronize the release snowflake-snowpark-python

How would this improve snowflake-snowpark-python?

Make new Snowpark features available for creating Stored Function and Store Procedure.

References, Other Background

Here is the list available snowflake-snowpark-python in Snowflake

select * from information_schema.packages 
where language = 'python'
    and package_name = 'snowflake-snowpark-python'
order by to_numeric(split_part(version, '.', 2)) desc

SNOW-704049: Pearson Correlation, Covariance

What is the current behavior?

SQL %CORR or SQL %COVAR is called via DataFrame.stat.corr(). It returns single value instead of a correlation or covariance matrix with all pair-wise coefficients. The procedure is not mature and doesn't scale to wide tables.

What is the desired behavior?

Return symmetrical output matrix with all correlation coefficients with the ability to run on wide 1k + columns. Pandas.corr(), R corr(method=Pearson), SAS proc corr all do this Snowflake/Snowpark does not.

How would this improve snowflake-snowpark-python?

It would provide additional functionality for a core Feature Engineering/DS tool. It would scale to wide tables where Corr is used. It would meet customer demand(Fidelity, GIC).

References, Other Background

This topic has been widely discussed for a long time. See - https://snowflakecomputing.atlassian.net/browse/SNOW-587357.

SNOW-666406: Options for unit testing?

I'm evaluating dbt and specifically python models (beta feature) in dbt as documented here: https://docs.getdbt.com/docs/building-a-dbt-project/building-models/python-models

I'm trying to come up with some way of unit testing these dbt python models, and by unit test I mean something that could run independently of any snowflake database and wouldn't need to connect to any "real" database to run, though I would accept some localhost emulation that would accomplish the same goal. Is there any way to create a snowpark session object that could do this?

Here's what I'm trying to do:

dbt model foo.py:

def add_one(x):
    return x + 1


def model(dbt, session):
    dbt.config(materialized="table")
    df = dbt.ref("my_first_dbt_model")

    df = df.where("id = 1")
    df = df = df.withColumn("id_plus_one", add_one(df["id"]))
    return df

pytest test_foo.py:

from unittest.mock import Mock

from models.example.foo import model
from snowflake.snowpark import Session


def create_unit_test_session():
    return Session.builder.configs(
        {
             # is this possible?        
        }
    ).create()


def test_foo():
    mock_dbt = Mock()
    mock_session = create_unit_test_session()

    input_df = mock_session.create_dataframe([1, 2, 3, 4]).to_df("id")

    mock_dbt.ref = Mock(return_value=input_df)
    result_df = model(mock_dbt, mock_session)

    expected_df = mock_session.create_dataframe([(1, 2)]).to_df("id", "id_plus_one")
    assert result_df.collect() == expected_df.collect()

SNOW-651322: Snowpark with Pandas as a Lambda layer - Library size issues

What is the current behavior?

I need to add "Snowpark with Pandas" as an AWS Lambda layer, however the size of the python lib goes above 250MB so i can not use it.

What is the desired behavior?

Please give a trim down version of the library so size of the python lib stays below 250MB

How would this improve snowflake-snowpark-python?

Please give a trim down version of the library so size of the python lib stays below 250MB

References, Other Background

SNOW-654151: DataFrame.write.save_as_table() raises exception in snowpark-python 0.8.0

  1. What version of Python are you using?
    I'm running Snowpark in a Snowflake stored procedure with:
language python
runtime_version = 3.8
  1. What operating system and processor architecture are you using?
    The bug occurs in a Snowflake stored procedure which appears to be running Linux-5.4.181-99.354.amzn2.aarch64-aarch64-with-glibc2.34

  2. What are the component versions in the environment (pip freeze)?
    I'm using the native Snowpark python environment (as specified above). The only package included is:
    packages = ('snowflake-snowpark-python==0.8.0')

  3. What did you do?
    Attempted to write a dataframe to a snowflake table using:
    DataFrame.write.save_as_table(destination_table_name, mode="append")

  4. What did you expect to see?
    I expected the dataframe to be written to the specified table. Instead, the write consistently raises the exception TypeError: argument of type 'NoneType' is not iterable (stack trace below).

Extra details:

  • Works as expected with snowflake-snowpark-python==0.7.0
  • Appears to work when writing a dataframe with a very small number of rows (on the order of 10), but fails even for around 100 rows. Potentially points to a bug related to batch inserts?

Stack trace:

Python Interpreter Error:
Traceback (most recent call last):
  File "_udf_code.py", line 66, in run
  File "_udf_code.py", line 111, in write_alerts_to_database
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/_internal/telemetry.py", line 165, in wrap
    result = func(*args, **kwargs)
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/dataframe_writer.py", line 142, in save_as_table
    session._conn.execute(snowflake_plan, _statement_params=statement_params)
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 369, in execute
    result_set, result_meta = self.get_result_set(
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/snowflake_plan.py", line 81, in wrap
    return func(*args, **kwargs)
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 407, in get_result_set
    self.run_batch_insert(query.sql, query.rows, **kwargs)
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 104, in wrap
    raise ex
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 98, in wrap
    return func(*args, **kwargs)
  File "/usr/lib/python_udf/1adf2bad1b775134282f52c6525f68044babf4feecd5a313ae7083de4f58eaa3/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py", line 458, in run_batch_insert
    and "QUERY_TAG" in kwargs["_statement_params"]
TypeError: argument of type 'NoneType' is not iterable
 in function TEST_PROC with handler run
  1. Can you set logging to DEBUG and collect the logs?
    Happy to do so if possible, but would need some help to understand I can get the stored procedure to output the logs.

SNOW-652229: Documentation error: `save_as_table` doesn't accept a table_type parameter

Please answer these questions before submitting your issue. Thanks!
Sorry for skipping over the boilerplate questions, but I don't think any would be applicable. I'm just wanting to point out an inconsistency in this documentation section (which also comes up in the alias and other related functions).

The documentation shows this example to create a temp table

>>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"])
>>> df.write.mode("overwrite").save_as_table("my_table", table_type="temporary")

but this will throw an error since table_type is not an accepted argument:

TypeError: save_as_table() got an unexpected keyword argument 'table_type'

SNOW-665294: Pandas DataFrame types not working in VS Code

It looks like the recent change of return types here:

 def to_pandas(
        self,
        *,
        statement_params: Optional[Dict[str, str]] = None,
        block: bool = True,
        **kwargs: Dict[str, Any],
    ) -> Union["pandas.DataFrame", AsyncJob]:

) -> Union["pandas.DataFrame", AsyncJob]:

causes some issues with VS Code autocomplete which sees this as returning (Any | AsyncJob) which turned my code into a lot of red squiggly lines.

I imagine the double quote string literal of type here is the issue, and previously I expect it was assuming it was type Any so wouldn't auto-complete, but wouldn't red squiggle. The code actually runs fine, but editors are confused by the type being returned.

SNOW-647070: Installing Pandas and Snowflake separately

I installed snowpark and pandas separately. this code work in mac. However when I run the code in Amazon Linux 2 it fails.

import os
from tokenize import String
import pandas as pd
import json
from io import StringIO
from datetime import datetime
from botocore.exceptions import ClientError
from snowflake.snowpark import Session
from snowflake.snowpark.types import StructType, StructField, StringType, IntegerType
from snowflake.connector.pandas_tools import write_pandas
from snowflake.snowpark.functions import when_matched, when_not_matched
from snowflake.snowpark.functions import col, lit

connection_parameters = {
}

def write_to_snowflake(cdf):
session = Session.builder.configs(connection_parameters).create()
df=session.create_dataframe(cdf)
df.write.mode("overwrite").save_as_table("cdp_staging", table_type="temporary")
print("after stage")
session.close()

def stage():
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
write_to_snowflake(df)

stage()

At run time I get an error
"create_dataframe() function only accepts data as a list, tuple or a pandas DataFrame."

Does it need specific version of pandas? Thanks!

What version of Python are you using?
3.8

What operating system and processor architecture are you using?

Linux-4.14.287-215.504.amzn2.x86_64-x86_64-with-glibc2.2.5

What are the component versions in the environment (pip freeze)?
asn1crypto==1.5.1
boto3==1.24.53
botocore==1.27.53
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.0
cloudpickle==2.0.0
cryptography==36.0.2
idna==3.3
jmespath==1.0.1
numpy==1.23.2
oscrypto==1.3.0
pandas==1.4.3
pycparser==2.21
pycryptodomex==3.15.0
PyJWT==2.4.0
pyOpenSSL==22.0.0
python-dateutil==2.8.2
pytz==2022.2.1
requests==2.28.1
s3transfer==0.6.0
six==1.16.0
snowflake-connector-python==2.7.11
snowflake-snowpark-python==0.8.0
typing_extensions==4.3.0
urllib3==1.26.11

What did you do?
Open your AWS Cloud9 Amazon EC2 environment.
Install Python 3.8 and pip3 by running the following commands:
$ sudo amazon-linux-extras install python3.8
$ curl -O https://bootstrap.pypa.io/get-pip.py
$ python3.8 get-pip.py --user
Create a python folder by running the following command:
python3.8 -m pip install snowflake-snowpark-python -t python/ --upgrade
python3.8 -m pip install pandas -t python/ --upgrade

What did you expect to see?
Pandas and snowpark to work on Linux 2 as I need to run this code in AWS Lambda

Can you set logging to DEBUG and collect the logs?
Response
{
"errorMessage": "create_dataframe() function only accepts data as a list, tuple or a pandas DataFrame.",
"errorType": "TypeError",
"stackTrace": [
" File "/var/lang/lib/python3.8/imp.py", line 234, in load_module\n return load_source(name, filename, file)\n",
" File "/var/lang/lib/python3.8/imp.py", line 171, in load_source\n module = _load(spec)\n",
" File "", line 702, in _load\n",
" File "", line 671, in _load_unlocked\n",
" File "", line 843, in exec_module\n",
" File "", line 219, in _call_with_frames_removed\n",
" File "/var/task/lambda_function.py", line 81, in \n stage()\n",
" File "/var/task/lambda_function.py", line 62, in stage\n write_to_snowflake(df)\n",
" File "/var/task/lambda_function.py", line 45, in write_to_snowflake\n df=session.create_dataframe(cdf)\n",
" File "/opt/python/snowflake/snowpark/session.py", line 1133, in create_dataframe\n raise TypeError(\n"
]
}

SNOW-685998: Unexpected behavior around join and filter order

Summary The order of filter and join statements on a table determines whether the query succeeds or fails. To me, this is unexpected behavior.

In short: table.join().join().filter() works, table.filter().join().join() fails.

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.8.15 (default, Oct 27 2022, 17:03:06)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

  1. What operating system and processor architecture are you using?

Linux-5.16.13-generic-aarch64-with-glibc2.17

  1. What are the component versions in the environment (pip freeze)?

anyio==3.5.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asn1crypto==1.5.1
asttokens==2.0.5
attrs==21.4.0
Babel==2.9.1
backcall==0.2.0
beautifulsoup4==4.10.0
bleach==4.1.0
certifi==2022.9.24
cffi==1.15.1
charset-normalizer==2.1.1
cloudpickle==2.0.0
cryptography==36.0.2
debugpy==1.5.1
decorator==5.1.1
defusedxml==0.7.1
distlib==0.3.6
entrypoints==0.4
executing==0.8.3
fastjsonschema==2.16.2
filelock==3.8.0
fire==0.4.0
greenlet==1.1.2
idna==3.4
importlib-metadata==4.11.3
pytz==2022.5
PyYAML==6.0
pyzmq==22.3.0
qtconsole==5.2.2
QtPy==2.0.1
requests==2.28.1
scipy==1.8.0
Send2Trash==1.8.0
six==1.16.0
sniffio==1.2.0
snowflake-connector-python==2.8.0
snowflake-snowpark-python==0.12.0
snowflake-sqlalchemy==1.3.3
soupsieve==2.3.1
SQLAlchemy==1.4.32
stack-data==0.2.0
statsmodels==0.13.2
tenacity==8.0.1
termcolor==1.1.0
terminado==0.13.3
testpath==0.6.0
tinycss2==1.2.1
toml==0.10.2
tomli==2.0.1
tornado==6.1
traitlets==5.1.1
typing_extensions==4.4.0
urllib3==1.26.12
virtualenv==20.16.6
virtualenv-clone==0.5.7
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.3.1
widgetsnbextension==3.6.0
wquantiles==0.6
yapf==0.32.0
zipp==3.7.0

  1. What did you do?
# setup
session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b1"]).create_or_replace_temp_view('table_a')
session.create_dataframe([[1, 20], [3, 40]], schema=["a", "b2"]).create_or_replace_temp_view('table_b')
session.create_dataframe([[1, 200], [3, 400]], schema=["a", "b3"]).create_or_replace_temp_view('table_c')
table_a = session.table('table_a')
table_b = session.table('table_b')
table_c = session.table('table_c')

# works
joined_data = table_a.join(
    table_b, (table_a['a'] == table_b['a'])
).join(
    table_c, (table_b['a'] == table_c['a'])
).filter(
    table_a.b1 != lit(4)
).to_pandas()

# doesn't work
joined_data = table_a.filter(
    table_a.b1 != lit(4)
).join(
    table_b, (table_a['a'] == table_b['a'])
).join(
    table_c, (table_b['a'] == table_c['a'])
).to_pandas()

Leads to: SnowparkSQLException: (1304): 000904 (42000): SQL compilation error: error line 1 at position 214
invalid identifier 'A'

  1. What did you expect to see?

I expected both variants to work.

  1. Can you set logging to DEBUG and collect the logs?
2022-10-31 09:43:34,998 - MainThread cursor.py:629 - execute() - DEBUG - executing SQL/command
2022-10-31 09:43:34,999 - MainThread cursor.py:714 - execute() - INFO - query: [SELECT  *  FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BIGINT AS "B1") WHERE ("B1" !=...]
2022-10-31 09:43:34,999 - MainThread connection.py:1309 - _next_sequence_counter() - DEBUG - sequence counter: 52
2022-10-31 09:43:35,000 - MainThread cursor.py:459 - _execute_helper() - DEBUG - Request id: 64861187-0363-41e3-9eb4-b82085281cb6
2022-10-31 09:43:35,000 - MainThread cursor.py:461 - _execute_helper() - DEBUG - running query [SELECT  *  FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BIGINT AS "B1") WHERE ("B1" !=...]
2022-10-31 09:43:35,001 - MainThread cursor.py:470 - _execute_helper() - DEBUG - is_file_transfer: False
2022-10-31 09:43:35,001 - MainThread connection.py:979 - cmd_query() - DEBUG - _cmd_query
2022-10-31 09:43:35,001 - MainThread connection.py:1002 - cmd_query() - DEBUG - sql=[SELECT  *  FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BIGINT AS "B1") WHERE ("B1" !=...], sequence_id=[52], is_file_transfer=[False]
2022-10-31 09:43:35,002 - MainThread network.py:1147 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 1/1 active sessions
2022-10-31 09:43:35,003 - MainThread network.py:827 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
2022-10-31 09:43:35,006 - MainThread network.py:808 - add_request_guid() - DEBUG - Request guid: a8863f98-b7ce-4819-a054-e7503a15af33
2022-10-31 09:43:35,007 - MainThread network.py:1006 - _request_exec() - DEBUG - socket timeout: 60
2022-10-31 09:43:35,261 - MainThread connectionpool.py:456 - _make_request() - DEBUG - [https://snowhouse.snowflakecomputing.com:443](https://snowhouse.snowflakecomputing.com/) "POST /queries/v1/query-request?requestId=64861187-0363-41e3-9eb4-b82085281cb6&request_guid=a8863f98-b7ce-4819-a054-e7503a15af33 HTTP/1.1" 200 1082
2022-10-31 09:43:35,262 - MainThread network.py:1032 - _request_exec() - DEBUG - SUCCESS
2022-10-31 09:43:35,262 - MainThread network.py:1152 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 0/1 active sessions
2022-10-31 09:43:35,263 - MainThread network.py:715 - _post_request() - DEBUG - ret[code] = None, after post request
2022-10-31 09:43:35,263 - MainThread network.py:739 - _post_request() - DEBUG - Query id: 01a7fe47-0605-0b3a-0001-dd14b94dd35f
2022-10-31 09:43:35,264 - MainThread cursor.py:736 - execute() - DEBUG - sfqid: 01a7fe47-0605-0b3a-0001-dd14b94dd35f
2022-10-31 09:43:35,264 - MainThread cursor.py:738 - execute() - INFO - query execution done
2022-10-31 09:43:35,264 - MainThread cursor.py:740 - execute() - DEBUG - SUCCESS
2022-10-31 09:43:35,265 - MainThread cursor.py:743 - execute() - DEBUG - PUT OR GET: None
2022-10-31 09:43:35,265 - MainThread cursor.py:840 - _init_result_and_meta() - DEBUG - Query result format: arrow
2022-10-31 09:43:35,266 - MainThread cursor.py:854 - _init_result_and_meta() - INFO - Number of results in first chunk: 0
2022-10-31 09:43:35,266 - MainThread cursor.py:629 - execute() - DEBUG - executing SQL/command
2022-10-31 09:43:35,267 - MainThread cursor.py:714 - execute() - INFO - query: [SELECT "A" AS "l_avoo_A", "B1" AS "B1" FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BI...]
2022-10-31 09:43:35,267 - MainThread connection.py:1309 - _next_sequence_counter() - DEBUG - sequence counter: 53
2022-10-31 09:43:35,267 - MainThread cursor.py:459 - _execute_helper() - DEBUG - Request id: 0066d3c6-0df6-4129-a3fc-a5f384c194fd
2022-10-31 09:43:35,268 - MainThread cursor.py:461 - _execute_helper() - DEBUG - running query [SELECT "A" AS "l_avoo_A", "B1" AS "B1" FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BI...]
2022-10-31 09:43:35,268 - MainThread cursor.py:470 - _execute_helper() - DEBUG - is_file_transfer: False
2022-10-31 09:43:35,268 - MainThread connection.py:979 - cmd_query() - DEBUG - _cmd_query
2022-10-31 09:43:35,269 - MainThread connection.py:1002 - cmd_query() - DEBUG - sql=[SELECT "A" AS "l_avoo_A", "B1" AS "B1" FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BI...], sequence_id=[53], is_file_transfer=[False]
2022-10-31 09:43:35,270 - MainThread network.py:1147 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 1/1 active sessions
2022-10-31 09:43:35,270 - MainThread network.py:827 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
2022-10-31 09:43:35,271 - MainThread network.py:808 - add_request_guid() - DEBUG - Request guid: ba390adc-5e07-4f38-817a-03bdc5fd5d8e
2022-10-31 09:43:35,271 - MainThread network.py:1006 - _request_exec() - DEBUG - socket timeout: 60
2022-10-31 09:43:35,455 - MainThread connectionpool.py:456 - _make_request() - DEBUG - [https://snowhouse.snowflakecomputing.com:443](https://snowhouse.snowflakecomputing.com/) "POST /queries/v1/query-request?requestId=0066d3c6-0df6-4129-a3fc-a5f384c194fd&request_guid=ba390adc-5e07-4f38-817a-03bdc5fd5d8e HTTP/1.1" 200 1086
2022-10-31 09:43:35,456 - MainThread network.py:1032 - _request_exec() - DEBUG - SUCCESS
2022-10-31 09:43:35,456 - MainThread network.py:1152 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 0/1 active sessions
2022-10-31 09:43:35,457 - MainThread network.py:715 - _post_request() - DEBUG - ret[code] = None, after post request
2022-10-31 09:43:35,457 - MainThread network.py:739 - _post_request() - DEBUG - Query id: 01a7fe47-0605-0b44-0001-dd14b94e14fb
2022-10-31 09:43:35,457 - MainThread cursor.py:736 - execute() - DEBUG - sfqid: 01a7fe47-0605-0b44-0001-dd14b94e14fb
2022-10-31 09:43:35,458 - MainThread cursor.py:738 - execute() - INFO - query execution done
2022-10-31 09:43:35,458 - MainThread cursor.py:740 - execute() - DEBUG - SUCCESS
2022-10-31 09:43:35,458 - MainThread cursor.py:743 - execute() - DEBUG - PUT OR GET: None
2022-10-31 09:43:35,459 - MainThread cursor.py:840 - _init_result_and_meta() - DEBUG - Query result format: arrow
2022-10-31 09:43:35,459 - MainThread cursor.py:854 - _init_result_and_meta() - INFO - Number of results in first chunk: 0
2022-10-31 09:43:35,460 - MainThread cursor.py:629 - execute() - DEBUG - executing SQL/command
2022-10-31 09:43:35,460 - MainThread cursor.py:714 - execute() - INFO - query: [SELECT "A" AS "r_zqsj_A", "B2" AS "B2" FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BI...]
2022-10-31 09:43:35,460 - MainThread connection.py:1309 - _next_sequence_counter() - DEBUG - sequence counter: 54
2022-10-31 09:43:35,461 - MainThread cursor.py:459 - _execute_helper() - DEBUG - Request id: bd78ee48-aa9d-4e74-864f-0a02b956d7db
2022-10-31 09:43:35,461 - MainThread cursor.py:461 - _execute_helper() - DEBUG - running query [SELECT "A" AS "r_zqsj_A", "B2" AS "B2" FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BI...]
2022-10-31 09:43:35,461 - MainThread cursor.py:470 - _execute_helper() - DEBUG - is_file_transfer: False
2022-10-31 09:43:35,462 - MainThread connection.py:979 - cmd_query() - DEBUG - _cmd_query
2022-10-31 09:43:35,462 - MainThread connection.py:1002 - cmd_query() - DEBUG - sql=[SELECT "A" AS "r_zqsj_A", "B2" AS "B2" FROM ( SELECT 0 :: BIGINT AS "A", 0 :: BI...], sequence_id=[54], is_file_transfer=[False]
2022-10-31 09:43:35,463 - MainThread network.py:1147 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 1/1 active sessions
2022-10-31 09:43:35,463 - MainThread network.py:827 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
2022-10-31 09:43:35,464 - MainThread network.py:808 - add_request_guid() - DEBUG - Request guid: f185476b-0ebf-4db5-8e2c-4e89cd35bbd6
2022-10-31 09:43:35,464 - MainThread network.py:1006 - _request_exec() - DEBUG - socket timeout: 60
2022-10-31 09:43:35,650 - MainThread connectionpool.py:456 - _make_request() - DEBUG - [https://snowhouse.snowflakecomputing.com:443](https://snowhouse.snowflakecomputing.com/) "POST /queries/v1/query-request?requestId=bd78ee48-aa9d-4e74-864f-0a02b956d7db&request_guid=f185476b-0ebf-4db5-8e2c-4e89cd35bbd6 HTTP/1.1" 200 1083
2022-10-31 09:43:35,651 - MainThread network.py:1032 - _request_exec() - DEBUG - SUCCESS
2022-10-31 09:43:35,652 - MainThread network.py:1152 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 0/1 active sessions
2022-10-31 09:43:35,652 - MainThread network.py:715 - _post_request() - DEBUG - ret[code] = None, after post request
2022-10-31 09:43:35,652 - MainThread network.py:739 - _post_request() - DEBUG - Query id: 01a7fe47-0605-0b3f-0001-dd14b94cf7c7
2022-10-31 09:43:35,652 - MainThread cursor.py:736 - execute() - DEBUG - sfqid: 01a7fe47-0605-0b3f-0001-dd14b94cf7c7
2022-10-31 09:43:35,653 - MainThread cursor.py:738 - execute() - INFO - query execution done
2022-10-31 09:43:35,653 - MainThread cursor.py:740 - execute() - DEBUG - SUCCESS
2022-10-31 09:43:35,654 - MainThread cursor.py:743 - execute() - DEBUG - PUT OR GET: None
2022-10-31 09:43:35,654 - MainThread cursor.py:840 - _init_result_and_meta() - DEBUG - Query result format: arrow
2022-10-31 09:43:35,655 - MainThread cursor.py:854 - _init_result_and_meta() - INFO - Number of results in first chunk: 0
2022-10-31 09:43:35,655 - MainThread cursor.py:629 - execute() - DEBUG - executing SQL/command
2022-10-31 09:43:35,656 - MainThread cursor.py:714 - execute() - INFO - query: [SELECT  *  FROM (( SELECT 0 :: BIGINT AS "l_avoo_A", 0 :: BIGINT AS "B1") AS SNO...]
2022-10-31 09:43:35,656 - MainThread connection.py:1309 - _next_sequence_counter() - DEBUG - sequence counter: 55
2022-10-31 09:43:35,656 - MainThread cursor.py:459 - _execute_helper() - DEBUG - Request id: ec6e1d38-92a6-411d-9f91-b18f7b0f3035
2022-10-31 09:43:35,657 - MainThread cursor.py:461 - _execute_helper() - DEBUG - running query [SELECT  *  FROM (( SELECT 0 :: BIGINT AS "l_avoo_A", 0 :: BIGINT AS "B1") AS SNO...]
2022-10-31 09:43:35,657 - MainThread cursor.py:470 - _execute_helper() - DEBUG - is_file_transfer: False
2022-10-31 09:43:35,658 - MainThread connection.py:979 - cmd_query() - DEBUG - _cmd_query
2022-10-31 09:43:35,658 - MainThread connection.py:1002 - cmd_query() - DEBUG - sql=[SELECT  *  FROM (( SELECT 0 :: BIGINT AS "l_avoo_A", 0 :: BIGINT AS "B1") AS SNO...], sequence_id=[55], is_file_transfer=[False]
2022-10-31 09:43:35,659 - MainThread network.py:1147 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 1/1 active sessions
2022-10-31 09:43:35,659 - MainThread network.py:827 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
2022-10-31 09:43:35,659 - MainThread network.py:808 - add_request_guid() - DEBUG - Request guid: d1d1e7fe-8d65-4fc0-8606-f8326fe383f2
2022-10-31 09:43:35,660 - MainThread network.py:1006 - _request_exec() - DEBUG - socket timeout: 60
2022-10-31 09:43:35,878 - MainThread connectionpool.py:456 - _make_request() - DEBUG - [https://snowhouse.snowflakecomputing.com:443](https://snowhouse.snowflakecomputing.com/) "POST /queries/v1/query-request?requestId=ec6e1d38-92a6-411d-9f91-b18f7b0f3035&request_guid=d1d1e7fe-8d65-4fc0-8606-f8326fe383f2 HTTP/1.1" 200 371
2022-10-31 09:43:35,878 - MainThread network.py:1032 - _request_exec() - DEBUG - SUCCESS
2022-10-31 09:43:35,879 - MainThread network.py:1152 - _use_requests_session() - DEBUG - Session status for SessionPool 'snowhouse.snowflakecomputing.com', SessionPool 0/1 active sessions
2022-10-31 09:43:35,879 - MainThread network.py:715 - _post_request() - DEBUG - ret[code] = 000904, after post request
2022-10-31 09:43:35,880 - MainThread network.py:739 - _post_request() - DEBUG - Query id: 01a7fe47-0605-0da9-0001-dd14b94e31c3
2022-10-31 09:43:35,880 - MainThread cursor.py:736 - execute() - DEBUG - sfqid: 01a7fe47-0605-0da9-0001-dd14b94e31c3
2022-10-31 09:43:35,881 - MainThread cursor.py:738 - execute() - INFO - query execution done
2022-10-31 09:43:35,881 - MainThread cursor.py:788 - execute() - DEBUG - {'data': {'internalError': False, 'errorCode': '000904', 'age': 0, 'sqlState': '42000', 'queryId': '01a7fe47-0605-0da9-0001-dd14b94e31c3', 'line': -1, 'pos': -1, 'type': 'COMPILATION'}, 'code': '000904', 'message': "SQL compilation error: error line 1 at position 214\ninvalid identifier 'A'", 'success': False, 'headers': None}
---------------------------------------------------------------------------
SnowparkSQLException                      Traceback (most recent call last)
Input In [61], in <cell line: 1>()
----> 1 joined_data = table_a.filter(
      2     table_a.b1 != lit(4)
      3 ).join(
      4     table_b, (table_a['a'] == table_b['a'])
      5 ).join(
      6     table_c, (table_b['a'] == table_c['a'])
      7 )
      8 print(joined_data.queries)
      9 joined_data.to_pandas()

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/_internal/telemetry.py:223, in df_api_usage.<locals>.wrap(*args, **kwargs)
    221 @functools.wraps(func)
    222 def wrap(*args, **kwargs):
--> 223     r = func(*args, **kwargs)
    224     plan = r._select_statement or r._plan
    225     # Some DataFrame APIs call other DataFrame APIs, so we need to remove the extra call

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py:2038, in DataFrame.join(self, right, on, how, lsuffix, rsuffix, **kwargs)
   2033     elif not isinstance(using_columns, list):
   2034         raise TypeError(
   2035             f"Invalid input type for join column: {type(using_columns)}"
   2036         )
-> 2038     return self._join_dataframes(
   2039         right,
   2040         using_columns,
   2041         create_join_type(join_type or "inner"),
   2042         lsuffix=lsuffix,
   2043         rsuffix=rsuffix,
   2044     )
   2046 raise TypeError("Invalid type for join. Must be Dataframe")

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py:2246, in DataFrame._join_dataframes(self, right, using_columns, join_type, lsuffix, rsuffix)
   2236 def _join_dataframes(
   2237     self,
   2238     right: "DataFrame",
   (...)
   2243     rsuffix: str = "",
   2244 ) -> "DataFrame":
   2245     if isinstance(using_columns, Column):
-> 2246         return self._join_dataframes_internal(
   2247             right,
   2248             join_type,
   2249             join_exprs=using_columns,
   2250             lsuffix=lsuffix,
   2251             rsuffix=rsuffix,
   2252         )
   2254     if isinstance(join_type, (LeftSemi, LeftAnti)):
   2255         # Create a Column with expression 'true AND <expr> AND <expr> .."
   2256         join_cond = Column(Literal(True))

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py:2302, in DataFrame._join_dataframes_internal(self, right, join_type, join_exprs, lsuffix, rsuffix)
   2293 def _join_dataframes_internal(
   2294     self,
   2295     right: "DataFrame",
   (...)
   2300     rsuffix: str = "",
   2301 ) -> "DataFrame":
-> 2302     (lhs, rhs) = _disambiguate(
   2303         self, right, join_type, [], lsuffix=lsuffix, rsuffix=rsuffix
   2304     )
   2305     expression = join_exprs._expression if join_exprs is not None else None
   2306     join_logical_plan = Join(
   2307         lhs._plan,
   2308         rhs._plan,
   2309         join_type,
   2310         expression,
   2311     )

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py:218, in _disambiguate(lhs, rhs, join_type, using_columns, lsuffix, rsuffix)
    213 normalized_using_columns = {quote_name(c) for c in using_columns}
    214 #  Check if the LHS and RHS have columns in common. If they don't just return them as-is. If
    215 #  they do have columns in common, alias the common columns with randomly generated l_
    216 #  and r_ prefixes for the left and right sides respectively.
    217 #  We assume the column names from the schema are normalized and quoted.
--> 218 lhs_names = [attr.name for attr in lhs._output]
    219 rhs_names = [attr.name for attr in rhs._output]
    220 common_col_names = [
    221     n
    222     for n in lhs_names
    223     if n in set(rhs_names) and n not in normalized_using_columns
    224 ]

File ~/.pyenv/versions/3.8.15/lib/python3.8/functools.py:967, in cached_property.__get__(self, instance, owner)
    965 val = cache.get(self.attrname, _NOT_FOUND)
    966 if val is _NOT_FOUND:
--> 967     val = self.func(instance)
    968     try:
    969         cache[self.attrname] = val

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py:3435, in DataFrame._output(self)
   3430 @cached_property
   3431 def _output(self) -> List[Attribute]:
   3432     return (
   3433         self._select_statement.column_states.projection
   3434         if self._select_statement
-> 3435         else self._plan.output
   3436     )

File ~/.pyenv/versions/3.8.15/lib/python3.8/functools.py:967, in cached_property.__get__(self, instance, owner)
    965 val = cache.get(self.attrname, _NOT_FOUND)
    966 if val is _NOT_FOUND:
--> 967     val = self.func(instance)
    968     try:
    969         cache[self.attrname] = val

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/snowflake_plan.py:215, in SnowflakePlan.output(self)
    213 @cached_property
    214 def output(self) -> List[Attribute]:
--> 215     return [Attribute(a.name, a.datatype, a.nullable) for a in self.attributes]

File ~/.pyenv/versions/3.8.15/lib/python3.8/functools.py:967, in cached_property.__get__(self, instance, owner)
    965 val = cache.get(self.attrname, _NOT_FOUND)
    966 if val is _NOT_FOUND:
--> 967     val = self.func(instance)
    968     try:
    969         cache[self.attrname] = val

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/snowflake_plan.py:209, in SnowflakePlan.attributes(self)
    207 @cached_property
    208 def attributes(self) -> List[Attribute]:
--> 209     output = analyze_attributes(self.schema_query, self.session)
    210     self.schema_query = schema_value_statement(output)
    211     return output

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/schema_utils.py:81, in analyze_attributes(sql, session)
     78     session._run_query(sql)
     79     return convert_result_meta_to_attribute(session._conn._cursor.description)
---> 81 return session._get_result_attributes(sql)

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/session.py:1048, in Session._get_result_attributes(self, query)
   1047 def _get_result_attributes(self, query: str) -> List[Attribute]:
-> 1048     return self._conn.get_result_attributes(query)

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/snowflake_plan.py:147, in SnowflakePlan.Decorator.wrap_exception.<locals>.wrap(*args, **kwargs)
    143     else:
    144         ne = SnowparkClientExceptionMessages.SQL_EXCEPTION_FROM_PROGRAMMING_ERROR(
    145             e
    146         )
--> 147         raise ne.with_traceback(tb) from None
    148 else:
    149     ne = SnowparkClientExceptionMessages.SQL_EXCEPTION_FROM_PROGRAMMING_ERROR(
    150         e
    151     )

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/_internal/analyzer/snowflake_plan.py:85, in SnowflakePlan.Decorator.wrap_exception.<locals>.wrap(*args, **kwargs)
     83 def wrap(*args, **kwargs):
     84     try:
---> 85         return func(*args, **kwargs)
     86     except snowflake.connector.errors.ProgrammingError as e:
     87         tb = sys.exc_info()[2]

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/snowpark/_internal/server_connection.py:209, in ServerConnection.get_result_attributes(self, query)
    207 @SnowflakePlan.Decorator.wrap_exception
    208 def get_result_attributes(self, query: str) -> List[Attribute]:
--> 209     return convert_result_meta_to_attribute(self._cursor.describe(query))

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/connector/cursor.py:825, in SnowflakeCursor.describe(self, *args, **kwargs)
    816 """Obtain the schema of the result without executing the query.
    817 
    818 This function takes the same arguments as execute, please refer to that function
   (...)
    822     The schema of the result.
    823 """
    824 kwargs["_describe_only"] = kwargs["_is_internal"] = True
--> 825 self.execute(*args, **kwargs)
    826 return self._description

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/connector/cursor.py:803, in SnowflakeCursor.execute(self, command, params, _bind_stage, timeout, _exec_async, _no_retry, _do_reset, _put_callback, _put_azure_callback, _put_callback_output_stream, _get_callback, _get_azure_callback, _get_callback_output_stream, _show_progress_bar, _statement_params, _is_internal, _describe_only, _no_results, _is_put_get, _raise_put_get_error, _force_put_overwrite, file_stream)
    799     is_integrity_error = (
    800         code == "100072"
    801     )  # NULL result in a non-nullable column
    802     error_class = IntegrityError if is_integrity_error else ProgrammingError
--> 803     Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
    804 return self

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/connector/errors.py:275, in Error.errorhandler_wrapper(connection, cursor, error_class, error_value)
    252 @staticmethod
    253 def errorhandler_wrapper(
    254     connection: SnowflakeConnection | None,
   (...)
    257     error_value: dict[str, str | bool | int],
    258 ) -> None:
    259     """Error handler wrapper that calls the errorhandler method.
    260 
    261     Args:
   (...)
    272         exception to the first handler in that order.
    273     """
--> 275     handed_over = Error.hand_to_other_handler(
    276         connection,
    277         cursor,
    278         error_class,
    279         error_value,
    280     )
    281     if not handed_over:
    282         raise Error.errorhandler_make_exception(
    283             error_class,
    284             error_value,
    285         )

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/connector/errors.py:330, in Error.hand_to_other_handler(connection, cursor, error_class, error_value)
    328 if cursor is not None:
    329     cursor.messages.append((error_class, error_value))
--> 330     cursor.errorhandler(connection, cursor, error_class, error_value)
    331     return True
    332 elif connection is not None:

File /media/psf/Home/vm_files/notebooks/venv/lib/python3.8/site-packages/snowflake/connector/errors.py:209, in Error.default_errorhandler(connection, cursor, error_class, error_value)
    191 @staticmethod
    192 def default_errorhandler(
    193     connection: SnowflakeConnection,
   (...)
    196     error_value: dict[str, str],
    197 ) -> None:
    198     """Default error handler that raises an error.
    199 
    200     Args:
   (...)
    207         A Snowflake error.
    208     """
--> 209     raise error_class(
    210         msg=error_value.get("msg"),
    211         errno=error_value.get("errno"),
    212         sqlstate=error_value.get("sqlstate"),
    213         sfqid=error_value.get("sfqid"),
    214         done_format_msg=error_value.get("done_format_msg"),
    215         connection=connection,
    216         cursor=cursor,
    217     )

SnowparkSQLException: (1304): 01a7fe47-0605-0da9-0001-dd14b94e31c3: 000904 (42000): 01a7fe47-0605-0da9-0001-dd14b94e31c3: SQL compilation error: error line 1 at position 214
invalid identifier 'A'

SNOW-690918: Write to table with specified schema and data types

Is there a way to write a table into Snowflake with a specified schema with data types? I've noticed that there was a way to specify column names, but I would like to go beyond that and also include what datatypes I would like the columns to have.

Thanks!

SNOW-687955: Multiple stored procedures with same name but different arguments

Multiple stored procedures with same name but different arguments

  1. What did you do?

Snowpark and the snowflake connector creates multiple stored procedures with the same name, but different arguments.

For example, these are the procedures that were existing in my schema:

  • FORECAST(VARCHAR, NUMBER, VARCHAR)
  • FORECAST(VARCHAR, NUMBER, NUMBER)
  • FORECAST(NUMBER, VARCHAR, NUMBER)
  1. What did you expect to see?

Creating multiple procedures with create or replace, must create only one with the name. Irrespective of the arguments that are passed to it. If not, this becomes very hard to maintain and keep track of the procedures.

SNOW-689375: Location of Dependency Manager of Columns

Apologies if this is not the appropriate forum to ask, I am asking this for education purposes.

Let's say i have the following example:

df = session.table("some_db.some_schema.some_table)
df = df.with_column('.test', F.sql_expr("case when coalesce(foo, bar) then 1 else 0 end"))
df.select('.test').show(10)

Where in the repository is the section in which the F.sql_expr("case when coalesce(foo, bar) then 1 else 0 end")) determines that it depends on columns foo and bar.

Thanks!

SNOW-668693: Create_dataframe add quote_identifiers option

What is the current behavior?

Currently session.create_dataframe forces quoted identifiers when creating a snowpark dataframe from a pandas dataframe.
There is no option to remove quoted identifiers.

What is the desired behavior?

I would like to be able to use session.create_dataframe(.... quote_identifiers = False) to create temp tables with easy to use column names.

How would this improve snowflake-snowpark-python?

Under the hood this utilizes write_pandas functionality which allows you to specify quote_identifiers=True/False. Adding this functionality to create_dataframe would keep our api consistent.

References, Other Background

I am happy to submit this as a pr. Wanted to open a discussion first.

SNOW-654658: SnowparkFetchDataException when using to_pandas after v0.9.0 upgrade

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)]

  1. What operating system and processor architecture are you using?

Windows-10-10.0.19042-SP0

  1. What are the component versions in the environment (pip freeze)?

    Replace with the output of python -m pip freeze
    affinegap==1.12
    anyio @ file:///C:/ci/anyio_1644463701441/work/dist
    argon2-cffi @ file:///opt/conda/conda-bld/argon2-cffi_1645000214183/work
    argon2-cffi-bindings @ file:///C:/ci/argon2-cffi-bindings_1644569878360/work
    asn1crypto @ file:///tmp/build/80754af9/asn1crypto_1596577642040/work
    asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
    attrs @ file:///opt/conda/conda-bld/attrs_1642510447205/work
    Babel @ file:///tmp/build/80754af9/babel_1620871417480/work
    backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
    beautifulsoup4 @ file:///C:/ci/beautifulsoup4_1650274792587/work
    bleach @ file:///opt/conda/conda-bld/bleach_1641577558959/work
    Bottleneck @ file:///C:/ci/bottleneck_1648029103672/work
    brotlipy==0.7.0
    BTrees==4.10.0
    categorical-distance==1.9
    certifi @ file:///C:/Windows/TEMP/abs_e9b7158a-aa56-4a5b-87b6-c00d295b01fanefpc8_o/croots/recipe/certifi_1655968940823/work/certifi
    cffi @ file:///C:/ci_310/cffi_1642682485096/work
    charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
    cloudpickle @ file:///tmp/build/80754af9/cloudpickle_1632508026186/work
    colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work
    cryptography @ file:///C:/ci/cryptography_1616769344312/work
    cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work
    datetime-distance==0.1.3
    debugpy @ file:///C:/ci/debugpy_1637073815078/work
    decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
    dedupe==2.0.17
    dedupe-variable-datetime==0.1.5
    dedupe-variable-name==0.0.14
    defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work
    DoubleMetaphone==1.1
    entrypoints @ file:///C:/ci/entrypoints_1649926621247/work
    executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
    fastjsonschema @ file:///tmp/build/80754af9/python-fastjsonschema_1620414857593/work/dist
    filelock==3.8.0
    fonttools==4.25.0
    future==0.18.2
    haversine==2.6.0
    highered==0.2.1
    idna @ file:///tmp/build/80754af9/idna_1637925883363/work
    importlib-resources @ file:///tmp/build/80754af9/importlib_resources_1625135880749/work
    ipdb==0.13.9
    ipykernel @ file:///C:/ci/ipykernel_1647000978151/work/dist/ipykernel-6.9.1-py3-none-any.whl
    ipympl==0.9.1
    ipython @ file:///C:/ci/ipython_1651601994376/work
    ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work
    ipywidgets @ file:///tmp/build/80754af9/ipywidgets_1634143127070/work
    jedi @ file:///C:/ci/jedi_1644315425835/work
    Jinja2 @ file:///opt/conda/conda-bld/jinja2_1647436528585/work
    joblib @ file:///tmp/build/80754af9/joblib_1635411271373/work
    json5 @ file:///tmp/build/80754af9/json5_1624432770122/work
    jsonschema @ file:///C:/ci/jsonschema_1650026220037/work
    jupyter==1.0.0
    jupyter-client @ file:///C:/ci/jupyter_client_1650534470695/work
    jupyter-console @ file:///opt/conda/conda-bld/jupyter_console_1647002188872/work
    jupyter-core @ file:///C:/ci/jupyter_core_1651656285178/work
    jupyter-server @ file:///C:/Windows/TEMP/abs_d3c42c59-765d-4f9b-9fa3-ad5b1369485611i_yual/croots/recipe/jupyter_server_1658754493238/work
    jupyterlab @ file:///C:/ci/jupyterlab_1658909327110/work
    jupyterlab-pygments @ file:///tmp/build/80754af9/jupyterlab_pygments_1601490720602/work
    jupyterlab-server @ file:///C:/ci/jupyterlab_server_1650271783893/work
    jupyterlab-widgets @ file:///tmp/build/80754af9/jupyterlab_widgets_1609884341231/work
    kiwisolver @ file:///C:/ci/kiwisolver_1644962567532/work
    Levenshtein-search==1.4.5
    MarkupSafe @ file:///C:/ci/markupsafe_1621528314575/work
    matplotlib @ file:///C:/ci/matplotlib-suite_1647423638658/work
    matplotlib-inline @ file:///tmp/build/80754af9/matplotlib-inline_1628242447089/work
    mistune==0.8.4
    mkl-fft==1.3.1
    mkl-random @ file:///C:/ci/mkl_random_1626186184278/work
    mkl-service==2.4.0
    munkres==1.1.4
    nbclassic @ file:///opt/conda/conda-bld/nbclassic_1644943264176/work
    nbclient @ file:///C:/ci/nbclient_1650290386732/work
    nbconvert @ file:///C:/ci/nbconvert_1649741005454/work
    nbformat @ file:///C:/ci/nbformat_1649845101866/work
    nest-asyncio @ file:///C:/ci/nest-asyncio_1649829929372/work
    notebook @ file:///C:/ci/notebook_1651648658345/work
    numexpr @ file:///C:/ci/numexpr_1640704408721/work
    numpy @ file:///C:/ci/numpy_and_numpy_base_1653574890766/work
    opencv-python==4.6.0.66
    oscrypto @ file:///tmp/build/80754af9/oscrypto_1633350059025/work
    packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work
    pandas @ file:///C:/ci/pandas_1641443171711/work
    pandocfilters @ file:///opt/conda/conda-bld/pandocfilters_1643405455980/work
    parseratorvariable==0.0.18
    parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
    patsy==0.5.2
    persistent==4.9.0
    pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
    Pillow==9.0.1
    probableparsing==0.0.1
    probablepeople==0.5.4
    prometheus-client @ file:///opt/conda/conda-bld/prometheus_client_1643788673601/work
    prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1633440160888/work
    pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
    pyarrow==4.0.1
    pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
    pycryptodomex @ file:///C:/ci/pycryptodomex_1613765448814/work
    Pygments @ file:///opt/conda/conda-bld/pygments_1644249106324/work
    pyhacrf-datamade==0.2.6
    PyJWT @ file:///C:/ci/pyjwt_1619651841937/work
    PyLBFGS==0.2.0.14
    pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1635333100036/work
    pyparsing @ file:///tmp/build/80754af9/pyparsing_1635766073266/work
    pyrsistent @ file:///C:/ci/pyrsistent_1636111468851/work
    PySocks @ file:///C:/ci/pysocks_1605287845585/work
    python-crfsuite==0.9.8
    python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
    pytz==2021.3
    pyusb==1.2.1
    pywin32==302
    pywinpty @ file:///C:/ci_310/pywinpty_1644230983541/work/target/wheels/pywinpty-2.0.2-cp38-none-win_amd64.whl
    PyYAML==6.0
    pyzmq @ file:///C:/ci/pyzmq_1638435185959/work
    qtconsole @ file:///C:/ci/qtconsole_1657545340046/work
    QtPy @ file:///opt/conda/conda-bld/qtpy_1649073884068/work
    requests @ file:///opt/conda/conda-bld/requests_1641824580448/work
    scikit-learn @ file:///C:/ci/scikit-learn_1642599125707/work
    scipy @ file:///C:/ci/scipy_1630606942584/work
    seaborn @ file:///tmp/build/80754af9/seaborn_1629307859561/work
    Send2Trash @ file:///tmp/build/80754af9/send2trash_1632406701022/work
    simplecosine==1.2
    sip==4.19.13
    six @ file:///tmp/build/80754af9/six_1644875935023/work
    sklearn-pandas==2.2.0
    sniffio @ file:///C:/ci/sniffio_1614030707456/work
    snowflake-connector-python==2.7.12
    snowflake-snowpark-python==0.9.0
    soupsieve @ file:///tmp/build/80754af9/soupsieve_1636706018808/work
    stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
    statsmodels==0.13.2
    terminado @ file:///C:/ci/terminado_1644322757089/work
    testpath @ file:///tmp/build/80754af9/testpath_1624638946665/work
    threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work
    toml==0.10.2
    tornado @ file:///C:/ci/tornado_1606942392901/work
    traitlets @ file:///tmp/build/80754af9/traitlets_1636710298902/work
    typing-extensions==4.3.0
    urllib3 @ file:///C:/ci/urllib3_1650640043075/work
    wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
    webencodings==0.5.1
    websocket-client @ file:///C:/ci/websocket-client_1614804473297/work
    widgetsnbextension @ file:///C:/ci/widgetsnbextension_1645009558218/work
    win-inet-pton @ file:///C:/ci/win_inet_pton_1605306167264/work
    wincertstore==0.2
    zipp @ file:///opt/conda/conda-bld/zipp_1641824620731/work
    zope.index==5.2.0
    zope.interface==5.4.0

  2. What did you do?

dff = session.create_dataframe([1, 2, 3, 4], schema=["a"])
dff.to_pandas()
  1. What did you expect to see?

A pandas DataFrame should be returned. Instead got the following error:

SnowparkFetchDataException: (1406): Failed to fetch a Pandas Dataframe. The error is: to_pandas() did not return a Pandas DataFrame. If you use session.sql(...).to_pandas(), the input query can only be a SELECT statement. Or you can use session.sql(...).collect() to get a list of Row objects for a non-SELECT statement, then convert it to a Pandas DataFrame.

This wasn't an issue prior to upgrading to 0.9.0.
Now I need to do a workaround using .collect(), which is surprisingly seamless, but still odd how it breaks the behavior.

  1. Can you set logging to DEBUG and collect the logs?

If necessary I can add this,

SNOW-646407: Referencing columns per source table/dataframe after join.

What is the current behavior?

When we do a join on two dataframes that happen to have columns that are named the same, Snowpark adds a seemingly random string prefix to either of the column names ( for example "r_gxpp_TIMEZONE"). There is no mention in docs on how to get a reference to this random prefix programmatically, so referencing this column after the join is cumbersome, to say the least.

The workaround to rename the columns before the join df_joined = df1.join(df2.select(df2.col('timezone').as_('timezone_df2')), df1.col('id') == df2.col('id')) can get convoluted quickly when working with multiple joins.

What is the desired behavior?

I would like to be able to reference the columns like

df_joined = df1.join(df2, df1.col('id') == df2.col('id')) 
df_joined.select(df1.col('timezone').as_('timezone1'), df2.col('timezone').as_('timezone2'))

The bare minimum would be to document this behaviour (and the workaround) as well as provide a way of deducing the prefix added to the conflicting columns.

How would this improve snowflake-snowpark-python?

This would make the syntax match closer SQL where you can do something like

select
  df1.timezone as timezone1,
  df2.timezone as timezone2
from table1 as df1
  join table2 as df2 on df1.id = df2.id 

References, Other Background

SNOW-667858: `to_pandas` will only convert `TimestampType`s to a native pandas date format, and not `DateType`s

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)]

  1. What operating system and processor architecture are you using?

    Windows-10-10.0.19042-SP0

  2. What are the component versions in the environment (pip freeze)?

    Replace with the output of python -m pip freeze

CLICK ME

``` affinegap==1.12 anyio @ file:///C:/ci/anyio_1644463701441/work/dist argon2-cffi @ file:///opt/conda/conda-bld/argon2-cffi_1645000214183/work argon2-cffi-bindings @ file:///C:/ci/argon2-cffi-bindings_1644569878360/work asn1crypto @ file:///tmp/build/80754af9/asn1crypto_1596577642040/work asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work attrs @ file:///opt/conda/conda-bld/attrs_1642510447205/work Babel @ file:///tmp/build/80754af9/babel_1620871417480/work backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work beautifulsoup4 @ file:///C:/ci/beautifulsoup4_1650274792587/work bleach @ file:///opt/conda/conda-bld/bleach_1641577558959/work Bottleneck @ file:///C:/ci/bottleneck_1648029103672/work brotlipy==0.7.0 BTrees==4.10.0 categorical-distance==1.9 certifi @ file:///C:/Windows/TEMP/abs_e9b7158a-aa56-4a5b-87b6-c00d295b01fanefpc8_o/croots/recipe/certifi_1655968940823/work/certifi cffi @ file:///C:/ci_310/cffi_1642682485096/work charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work cloudpickle @ file:///tmp/build/80754af9/cloudpickle_1632508026186/work colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work convertdate==2.4.0 cryptography @ file:///C:/ci/cryptography_1616769344312/work cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work datetime-distance==0.1.3 debugpy @ file:///C:/ci/debugpy_1637073815078/work decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work dedupe==2.0.17 dedupe-variable-datetime==0.1.5 dedupe-variable-name==0.0.14 defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work DoubleMetaphone==1.1 entrypoints @ file:///C:/ci/entrypoints_1649926621247/work et-xmlfile==1.1.0 executing @ file:///opt/conda/conda-bld/executing_1646925071911/work fastjsonschema @ file:///tmp/build/80754af9/python-fastjsonschema_1620414857593/work/dist filelock==3.8.0 fonttools==4.25.0 future==0.18.2 haversine==2.6.0 highered==0.2.1 hijri-converter==2.2.4 holidays==0.11.3.1 idna @ file:///tmp/build/80754af9/idna_1637925883363/work importlib-resources @ file:///tmp/build/80754af9/importlib_resources_1625135880749/work ipdb==0.13.9 ipykernel @ file:///C:/ci/ipykernel_1647000978151/work/dist/ipykernel-6.9.1-py3-none-any.whl ipympl==0.9.1 ipython==8.5.0 ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work ipywidgets @ file:///tmp/build/80754af9/ipywidgets_1634143127070/work jedi @ file:///C:/ci/jedi_1644315425835/work Jinja2 @ file:///opt/conda/conda-bld/jinja2_1647436528585/work joblib @ file:///tmp/build/80754af9/joblib_1635411271373/work json5 @ file:///tmp/build/80754af9/json5_1624432770122/work jsonschema @ file:///C:/ci/jsonschema_1650026220037/work jupyter==1.0.0 jupyter-client @ file:///C:/ci/jupyter_client_1650534470695/work jupyter-console @ file:///opt/conda/conda-bld/jupyter_console_1647002188872/work jupyter-core @ file:///C:/ci/jupyter_core_1651656285178/work jupyter-server @ file:///C:/Windows/TEMP/abs_d3c42c59-765d-4f9b-9fa3-ad5b1369485611i_yual/croots/recipe/jupyter_server_1658754493238/work jupyterlab @ file:///C:/ci/jupyterlab_1658909327110/work jupyterlab-pygments @ file:///tmp/build/80754af9/jupyterlab_pygments_1601490720602/work jupyterlab-server @ file:///C:/ci/jupyterlab_server_1650271783893/work jupyterlab-widgets @ file:///tmp/build/80754af9/jupyterlab_widgets_1609884341231/work kiwisolver @ file:///C:/ci/kiwisolver_1644962567532/work korean-lunar-calendar==0.3.1 Levenshtein-search==1.4.5 libusb-package @ file:///C:/Users/maxim.kupfer/libusb-package MarkupSafe @ file:///C:/ci/markupsafe_1621528314575/work matplotlib @ file:///C:/ci/matplotlib-suite_1647423638658/work matplotlib-inline @ file:///tmp/build/80754af9/matplotlib-inline_1628242447089/work mistune==0.8.4 mkl-fft==1.3.1 mkl-random @ file:///C:/ci/mkl_random_1626186184278/work mkl-service==2.4.0 munkres==1.1.4 nbclassic @ file:///opt/conda/conda-bld/nbclassic_1644943264176/work nbclient @ file:///C:/ci/nbclient_1650290386732/work nbconvert @ file:///C:/ci/nbconvert_1649741005454/work nbformat @ file:///C:/ci/nbformat_1649845101866/work nest-asyncio @ file:///C:/ci/nest-asyncio_1649829929372/work notebook @ file:///C:/ci/notebook_1651648658345/work numexpr @ file:///C:/ci/numexpr_1640704408721/work numpy @ file:///C:/ci/numpy_and_numpy_base_1653574890766/work opencv-python==4.6.0.66 openpyxl==3.0.10 oscrypto @ file:///tmp/build/80754af9/oscrypto_1633350059025/work packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work pandas @ file:///C:/ci/pandas_1641443171711/work pandocfilters @ file:///opt/conda/conda-bld/pandocfilters_1643405455980/work parseratorvariable==0.0.18 parso @ file:///opt/conda/conda-bld/parso_1641458642106/work patsy==0.5.2 persistent==4.9.0 pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==9.0.1 probableparsing==0.0.1 probablepeople==0.5.4 prometheus-client @ file:///opt/conda/conda-bld/prometheus_client_1643788673601/work prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1633440160888/work pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work pyarrow==8.0.0 pycountry==22.3.5 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pycryptodomex @ file:///C:/ci/pycryptodomex_1613765448814/work Pygments @ file:///opt/conda/conda-bld/pygments_1644249106324/work pyhacrf-datamade==0.2.6 PyJWT @ file:///C:/ci/pyjwt_1619651841937/work PyLBFGS==0.2.0.14 PyMeeus==0.5.11 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1635333100036/work pyparsing @ file:///tmp/build/80754af9/pyparsing_1635766073266/work pyrsistent @ file:///C:/ci/pyrsistent_1636111468851/work PySocks @ file:///C:/ci/pysocks_1605287845585/work python-crfsuite==0.9.8 python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work pytz==2021.3 pyusb==1.2.1 pywin32==302 pywinpty @ file:///C:/ci_310/pywinpty_1644230983541/work/target/wheels/pywinpty-2.0.2-cp38-none-win_amd64.whl PyYAML==6.0 pyzmq @ file:///C:/ci/pyzmq_1638435185959/work qtconsole @ file:///C:/ci/qtconsole_1657545340046/work QtPy @ file:///opt/conda/conda-bld/qtpy_1649073884068/work requests @ file:///opt/conda/conda-bld/requests_1641824580448/work rise==5.7.1 scikit-learn @ file:///C:/ci/scikit-learn_1642599125707/work scipy @ file:///C:/ci/scipy_1630606942584/work seaborn @ file:///tmp/build/80754af9/seaborn_1629307859561/work Send2Trash @ file:///tmp/build/80754af9/send2trash_1632406701022/work simplecosine==1.2 sip==4.19.13 six @ file:///tmp/build/80754af9/six_1644875935023/work sklearn-pandas==2.2.0 sniffio @ file:///C:/ci/sniffio_1614030707456/work snowflake-connector-python==2.7.12 snowflake-snowpark-python==0.10.0 soupsieve @ file:///tmp/build/80754af9/soupsieve_1636706018808/work stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work statsmodels==0.13.2 terminado @ file:///C:/ci/terminado_1644322757089/work testpath @ file:///tmp/build/80754af9/testpath_1624638946665/work threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work toml==0.10.2 tornado @ file:///C:/ci/tornado_1606942392901/work traitlets @ file:///tmp/build/80754af9/traitlets_1636710298902/work typing-extensions==4.3.0 urllib3 @ file:///C:/ci/urllib3_1650640043075/work wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work webencodings==0.5.1 websocket-client @ file:///C:/ci/websocket-client_1614804473297/work widgetsnbextension @ file:///C:/ci/widgetsnbextension_1645009558218/work win-inet-pton @ file:///C:/ci/win_inet_pton_1605306167264/work wincertstore==0.2 zipp @ file:///opt/conda/conda-bld/zipp_1641824620731/work zope.index==5.2.0 zope.interface==5.4.0 ```

  1. What did you do?

So not sure if it's a bug or a feature, but to_pandas will only convert TimestampTypes to a native pandas date format, and not DateTypes. Any reason we can't have both, for instance when we have dates but don't need the time precision.

>>> session.create_dataframe(
...     ['2022-09-23'],
...         schema=StructType([StructField('dt', DateType())])
...         ).to_pandas().dtypes
...
DT    object
dtype: object

>>> session.create_dataframe(
...     ['2022-09-23'],
...         schema=StructType([StructField('dt', TimestampType())])
...         ).to_pandas().dtypes
...
DT    datetime64[ns]
  1. What did you expect to see?

DateTypes being converted to datetime dtype in pandas.

  1. Can you set logging to DEBUG and collect the logs?

    import logging
    
    for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
        logger = logging.getLogger(logger_name)
        logger.setLevel(logging.DEBUG)
        ch = logging.StreamHandler()
        ch.setLevel(logging.DEBUG)
        ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
        logger.addHandler(ch)
    

SNOW-651557: (`pyarrow==9.0.0` issue)Python kernel consistently killed by `.to_pandas()` or `.toPandas()` calls in a notebook

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?
@lostmygithubaccount ➜ /workspaces/codyspace/jaffle_shop/pandas_jaffle_shop (cody/snowy) $ python --version --version
Python 3.8.13 (default, Aug  2 2022, 11:41:59) 
[GCC 10.2.1 20210110]
  1. What operating system and processor architecture are you using?
@lostmygithubaccount ➜ /workspaces/codyspace/jaffle_shop/pandas_jaffle_shop (cody/snowy) $ python -c 'import platform; print(platform.platform())'
Linux-5.4.0-1086-azure-x86_64-with-glibc2.2.5
  1. What are the component versions in the environment (pip freeze)?
@lostmygithubaccount ➜ /workspaces/codyspace/jaffle_shop/pandas_jaffle_shop (cody/snowy) $ python -m pip freeze 
agate==1.6.3
alembic==1.8.1
asn1crypto==1.5.1
asttokens==2.0.8
attrs==22.1.0
Babel==2.10.3
backcall==0.2.0
black==22.6.0
bokeh==2.4.3
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
cloudpickle==2.0.0
colorama==0.4.5
cramjam==2.5.0
cryptography==36.0.2
cycler==0.11.0
dask==2022.8.1
databricks-cli==0.17.2
-e git+https://github.com/dbt-labs/dbt-core.git@7f8d9a7af976f640e376900773a0d793acf3a3ce#egg=dbt_core&subdirectory=core
dbt-extractor==0.4.1
-e git+https://github.com/dbt-labs/dbt-core.git@7f8d9a7af976f640e376900773a0d793acf3a3ce#egg=dbt_postgres&subdirectory=plugins/postgres
-e git+https://github.com/dbt-labs/dbt-snowflake.git@0b4868df4347c53d4b16069b3e9fe29237d7bb72#egg=dbt_snowflake
-e git+https://github.com/dbt-labs/dbt-core.git@7f8d9a7af976f640e376900773a0d793acf3a3ce#egg=dbt_tests_adapter&subdirectory=tests/adapter
debugpy==1.6.3
decorator==5.1.1
distributed==2022.8.1
docker==5.0.3
entrypoints==0.4
executing==0.10.0
fastparquet==0.8.2
Flask==2.2.2
fonttools==4.37.0
fsspec==2022.7.1
future==0.18.2
gitdb==4.0.9
GitPython==3.1.27
greenlet==1.1.2
gunicorn==20.1.0
HeapDict==1.0.1
hologram==0.0.15
idna==3.3
importlib-metadata==4.12.0
importlib-resources==5.9.0
iniconfig==1.1.1
ipykernel==6.15.1
ipython==8.4.0
isodate==0.6.1
itsdangerous==2.1.2
jedi==0.18.1
jeepney==0.8.0
Jinja2==3.1.2
joblib==1.1.0
jsonschema==3.2.0
jupyter-client==7.3.4
jupyter-core==4.11.1
keyring==23.8.2
kiwisolver==1.4.4
leather==0.3.4
lightgbm==3.3.2
locket==1.0.0
Logbook==1.5.3
Mako==1.2.1
MarkupSafe==2.1.1
mashumaro==3.0.3
matplotlib==3.5.3
matplotlib-inline==0.1.6
minimal-snowplow-tracker==0.0.2
mlflow==1.28.0
msgpack==1.0.4
mypy-extensions==0.4.3
nest-asyncio==1.5.5
networkx==2.8.6
numpy==1.23.2
oauthlib==3.2.0
oscrypto==1.3.0
packaging==21.3
pandas==1.4.3
parsedatetime==2.4
parso==0.8.3
partd==1.3.0
pathspec==0.9.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.2.0
platformdirs==2.5.2
pluggy==1.0.0
prometheus-client==0.14.1
prometheus-flask-exporter==0.20.3
prompt-toolkit==3.0.30
protobuf==4.21.5
psutil==5.9.1
psycopg2-binary==2.9.3
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pyarrow==9.0.0
pycparser==2.21
pycryptodomex==3.15.0
Pygments==2.13.0
PyJWT==2.4.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.18.1
pytest==7.1.2
python-dateutil==2.8.2
python-slugify==6.1.2
pytimeparse==1.1.8
pytz==2022.2.1
PyYAML==6.0
pyzmq==23.2.1
querystring-parser==1.2.4
requests==2.28.1
scikit-learn==1.1.2
scipy==1.9.0
seaborn==0.11.2
SecretStorage==3.3.3
six==1.16.0
smmap==5.0.0
snowflake-connector-python==2.7.11
snowflake-snowpark-python==0.8.0
sortedcontainers==2.4.0
SQLAlchemy==1.4.40
sqlparse==0.4.2
stack-data==0.4.0
tabulate==0.8.10
tblib==1.7.0
text-unidecode==1.3
threadpoolctl==3.1.0
tokenize-rt==4.2.1
tomli==2.0.1
toolz==0.12.0
tornado==6.1
traitlets==5.3.0
typing_extensions==4.3.0
urllib3==1.26.12
wcwidth==0.2.5
websocket-client==1.3.3
Werkzeug==2.2.2
zict==2.2.0
zipp==3.8.1
  1. What did you do?

Anything to get a Snowpark table/dataframe and calling .to_pandas() or .toPandas() on it. I tried a lot of things. Itg seemed to just stop working a couple weeks ago despite no change in PyPi. Change out the model string below with any table in your session. Changed out connection details

# snowflake
import yaml
import snowflake.snowpark

from snowflake.snowpark import types
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, udf, sql_expr

# connection details from dbt profile
with open("/home/vscode/.dbt/profiles.yml", "r") as f:
    profiles = yaml.safe_load(f)
    dev_profile = profiles["snowflake"]["outputs"]["dev"]

conn_params = {
    "account": dev_profile["account"],
    "user": dev_profile["user"],
    "role": dev_profile["role"],
    "warehouse": dev_profile["warehouse"],
    "database": dev_profile["database"],
    "schema": dev_profile["schema"],
    "authenticator": dev_profile["authenticator"],
}

# build session
s = Session.builder.configs(conn_params).create()

# get a dataframe
model = "raw_customers"
df = s.table(model)
print(df.show(5))

# anything below here fails and kills the Python kernel
pdf = df.to_pandas()
pdf = df.toPandas()
pdf = df.select("*").to_pandas()

My code/notebooks showing this are here: https://github.com/dbt-labs/jaffle_shop/tree/cody/snowy see Untitled.ipynb for the above. Kernel dies:

image

  1. What did you expect to see?

My kernel not killed; pandas dataframe returned.

  1. Can you set logging to DEBUG and collect the logs?

    import logging
    
    for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
        logger = logging.getLogger(logger_name)
        logger.setLevel(logging.DEBUG)
        ch = logging.StreamHandler()
        ch.setLevel(logging.DEBUG)
        ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
        logger.addHandler(ch)
    

Sure:
image

Copied output:

2022-08-25 00:09:24,556 - MainThread cursor.py:630 - execute() - DEBUG - executing SQL/command
2022-08-25 00:09:24,557 - MainThread cursor.py:715 - execute() - INFO - query: [SELECT  *  FROM (raw_customers)]
2022-08-25 00:09:24,558 - MainThread connection.py:1302 - _next_sequence_counter() - DEBUG - sequence counter: 2
2022-08-25 00:09:24,559 - MainThread cursor.py:460 - _execute_helper() - DEBUG - Request id: f629bab4-19ec-4841-ad5d-eaa235dd19d2
2022-08-25 00:09:24,560 - MainThread cursor.py:462 - _execute_helper() - DEBUG - running query [SELECT  *  FROM (raw_customers)]
2022-08-25 00:09:24,560 - MainThread cursor.py:471 - _execute_helper() - DEBUG - is_file_transfer: False
2022-08-25 00:09:24,561 - MainThread connection.py:972 - cmd_query() - DEBUG - _cmd_query
2022-08-25 00:09:24,562 - MainThread connection.py:995 - cmd_query() - DEBUG - sql=[SELECT  *  FROM (raw_customers)], sequence_id=[2], is_file_transfer=[False]
2022-08-25 00:09:24,563 - MainThread network.py:1147 - _use_requests_session() - DEBUG - Session status for SessionPool 'ska67070.snowflakecomputing.com', SessionPool 1/1 active sessions
2022-08-25 00:09:24,564 - MainThread network.py:827 - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
2022-08-25 00:09:24,564 - MainThread network.py:808 - add_request_guid() - DEBUG - Request guid: 94b7e29c-adb3-48b6-ac5f-c1bf4220f908
2022-08-25 00:09:24,565 - MainThread network.py:1006 - _request_exec() - DEBUG - socket timeout: 60
2022-08-25 00:09:24,613 - MainThread connectionpool.py:456 - _make_request() - DEBUG - https://ska67070.snowflakecomputing.com:443 "POST /queries/v1/query-request?requestId=f629bab4-19ec-4841-ad5d-eaa235dd19d2&request_guid=94b7e29c-adb3-48b6-ac5f-c1bf4220f908 HTTP/1.1" 200 3025
2022-08-25 00:09:24,614 - MainThread network.py:1032 - _request_exec() - DEBUG - SUCCESS
2022-08-25 00:09:24,615 - MainThread network.py:1152 - _use_requests_session() - DEBUG - Session status for SessionPool 'ska67070.snowflakecomputing.com', SessionPool 0/1 active sessions
2022-08-25 00:09:24,616 - MainThread network.py:715 - _post_request() - DEBUG - ret[code] = None, after post request
2022-08-25 00:09:24,617 - MainThread network.py:739 - _post_request() - DEBUG - Query id: 01a68329-0401-c351-000d-378309b360d2
2022-08-25 00:09:24,617 - MainThread cursor.py:737 - execute() - DEBUG - sfqid: 01a68329-0401-c351-000d-378309b360d2
2022-08-25 00:09:24,618 - MainThread cursor.py:739 - execute() - INFO - query execution done
2022-08-25 00:09:24,618 - MainThread cursor.py:741 - execute() - DEBUG - SUCCESS
2022-08-25 00:09:24,619 - MainThread cursor.py:744 - execute() - DEBUG - PUT OR GET: None
2022-08-25 00:09:24,619 - MainThread cursor.py:841 - _init_result_and_meta() - DEBUG - Query result format: arrow
2022-08-25 00:09:24,620 - MainThread server_connection.py:322 - run_query() - DEBUG - Execute query [queryID: 01a68329-0401-c351-000d-378309b360d2]  SELECT  *  FROM (raw_customers)
2022-08-25 00:09:24,621 - MainThread arrow_iterator.cpython-38-x86_64-linux-gnu.so:0 - __cinit__() - DEBUG - Batches read: 1
Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info. View Jupyter [log](command:jupyter.viewOutput) for further details.

SNOW-670481: Creating procedure using functions with variable arguments (*args)

Trying to create a UDF using the snowflake connector by executing this statement. This seems to work for creating a UDF, but not for a Store Procedure. It seems that, while the statement is executed, there is an AST Parser that is executed to check for the arguments.

I would like to take in variable arguments, so that I could modify my function depending on the requirements. Is there a way to achieve what I am trying to do, or by pass the limitation?

  1. What did you do?
create or replace procedure var_arg_proc(arg_1 STRING)
    returns STRING
    language python
    runtime_version = '3.8'
    packages = ('snowflake-snowpark-python')
    handler = 'my_function'
AS
$$
from snowflake.snowpark.session import Session

def my_function(session: Session, *args):
    return arg[0]
$$;
  1. Can you set logging to DEBUG and collect the logs?
100357 (P0000): Python function is defined with 1 arguments (including session), but stored procedure 
definition contains 1 arguments. Python function arguments are expected to be session and stored procedure 
defined arguments in function VAR_ARG_PROC with handler my_function

SNOW-701482: Inconsistency with numeric datatypes in Snowpark table/DataFrame Schema

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    • Python 3.8.13
  2. What operating system and processor architecture are you using?

    • macOS-10.16-x86_64-i386-64bit
  3. What are the component versions in the environment (pip freeze)?

    • snowflake-connector-python==2.7.12
    • snowflake-snowpark-python==1.0.0
  4. What did you do?

tbl_name = 'NUMERIC_TABLE'

create_table_query = f'''
create or replace TABLE {tbl_name} (
    NUMBER_COL NUMBER(38,0),
    INT_COL INT,
    DOUBLE_COL DOUBLE,
    FLOAT_COL FLOAT,
    LONG_COL BIGINT,
    SHORT_COL SMALLINT,
    BYTE_COL TINYINT
)
'''

client.session.sql(create_table_query).show()

tbl = client.session.table('NUMERIC_TABLE')

print('Snowpark Schema:')

for f in tbl.schema.fields:
    print(f)

"""
---------------------------------------------
|"status"                                   |
---------------------------------------------
|Table NUMERIC_TABLE successfully created.  |
---------------------------------------------

Snowpark Schema:
StructField('NUMBER_COL', LongType(), nullable=True)
StructField('INT_COL', LongType(), nullable=True)
StructField('DOUBLE_COL', DoubleType(), nullable=True)
StructField('FLOAT_COL', DoubleType(), nullable=True)
StructField('LONG_COL', LongType(), nullable=True)
StructField('SHORT_COL', LongType(), nullable=True)
StructField('BYTE_COL', LongType(), nullable=True)
"""
  1. What did you expect to see?

Creating a table with various numeric types seems to return only a LongType() for most columns/types when viewing the Snowpark table/dataframe schema property.

Seems to be some inconsistencies with generating the Snowpark schema vs. what datatypes the table is created with.

Running a DESCRIBE TABLE NUMERIC_TABLE; returns:

name	              type
NUMBER_COL     NUMBER(38,0)
INT_COL	              NUMBER(38,0)
DOUBLE_COL	      FLOAT
FLOAT_COL	      FLOAT
LONG_COL	      NUMBER(38,0)
SHORT_COL	      NUMBER(38,0)
BYTE_COL	      NUMBER(38,0)

The above is more expected when using the Synonymous numeric type names, but seems pretty far off when viewing the Snowpark schema fields on a table/dataframe.

  1. Can you set logging to DEBUG and collect the logs?
2022-11-29 14:20:36,803 - Snowpark Session information: 
"version" : 1.0.0,
"python.version" : 3.8.13,
"python.connector.version" : 2.7.12,
"python.connector.session.id" : 28283155476234,
"os.name" : Darwin

2022-11-29 14:20:37,129 - Execute query [queryID: 01a8a45c-0506-3265-0000-19b9b8c44872] 
create or replace TABLE NUMERIC_TABLE (
    NUMBER_COL NUMBER(38,0),
    INT_COL INT,
    DOUBLE_COL DOUBLE,
    FLOAT_COL FLOAT,
    LONG_COL BIGINT,
    SHORT_COL SMALLINT,
    BYTE_COL TINYINT
)

SNOW-687930: session.write_pandas creates table that can only ne referenced with double quotes

Not sure if this is related to the issue #512 .

  1. What did you do?

After creating a table inside a stored procedure with

session.write_pandas(result, 'result_table', auto_create_table=True, overwrite=True)

I can only view the table using

SELECT * FROM "result_table"

and not using

SELECT * FROM result_table

  1. What did you expect to see?

To be able to view the table using: SELECT * FROM result_table -> without the quotes on the table name.

  1. Can you set logging to DEBUG and collect the logs?
002003 (42S02): SQL compilation error:
Object 'RESULT_TABLE' does not exist or not authorized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.