bwiley1 / pandleau Goto Github PK

View Code? Open in Web Editor NEW

60.0 8.0 19.0 454 KB

A quick and easy way to convert a Pandas DataFrame to a Tableau .hyper or .tde extract.

License: MIT License

Python 100.00%

python tableau pandas tde hyper

pandleau's People

Contributors

Stargazers

Watchers

Forkers

huangrh zhiruiwang richtrf kripakaranj aawiegel pascaloberle vvillt madderle 33n-ltd akgeodude francoswiss gitmstoute kudeh xyzlat sanyam07 rickyschools agrasurbhi bellyfat

pandleau's Issues

No Module named pandleau

Hi @bwiley1, please help. Currently getting error after running the sample code that you have posted on the wiki.

/venv/local/lib/python2.7/site-packages/pandleau/init.py", line 1, in
from pandleau.pandleau import *
ImportError: No module named pandleau

I installed pandleau via pip and have its dependencies properly installed.

Thank you!

Edit:

Never mind. I found the fix here #5

Error on Windows Install

Hello,

This works on my Mac side but not on my windows side. I seemingly installed everything correctly.

Traceback (most recent call last):
File ".\helpscout.py", line 73, in
df_tableau.to_tableau('PATH\helpscout30.hyper', add_index=False)
File "C:\PATH\lib\site-packages\pandleau-0.3.1-py3.7.egg\pandleau\pandle
au.py", line 112, in to_tableau
new_extract = Extract( path )
File "C:\PATH\lib\site-packages\tableausdk\HyperExtract.py", line 594, i
n init
raise Exceptions.TableauException(ret, Exceptions.GetLastErrorMessage())
tableausdk.Exceptions.TableauException: TableauException (999): Unknown error

Duplication of rows in tde file when read in tableau

Hello bwiley1,

I was using your code and it is successful in generating the tde files, but when I tried to read the generated tde file into tableau, the tableau showing 16 rows with column incremental for 16 columns.

For clear understand the tde file with a single row and how it gets read into tableau.

Please let me know if this is issue or am I crept into some errors in the approach.

Thanks
Sai

Append rows when you refresh the .hyper(or .tde)

I have an issue that when I rerun the same script and look the .hyper(or .tde) file I can see duplicate rows.

I was expected the overwrite for data for updating my tableau server!
I have not used "tableauserverclient" yet but I am not sure will be any different.

OS: Windows and MacOS
Python: 3.7.4
Tableau Desktop: 2019.2.4
Tableau Server: 2019.3.0

set_spatial question

Hi! Thank you for this awesome package. So helpful!

I am running into difficulty with the set_spatial function. I'm reading in a geojson file of NYC neighborhood tabulation areas using geopandas, then converting that geodataframe into a regular pandas dataframe, pandleau-ing it, then setting the spatial column to the geometry column. Everything seems to be working fine, everything runs without error, except then the geometry column in the data source in Tableau Server is null. I've tried playing around with the indicator argument and have double checked that there aren't null geometries or issues with the geojson file itself and can't find anything wrong on either front. Appreciate any guidance you can provide!

I'm using Python 3.6 and Tableau Server Version: 2019.2.1 (20192.19.0621.1547) 64-bit Linux.

import os
import sys
import pandas as pd
import geopandas as gpd
import tableauserverclient as tsc
from tableausdk import *
from tableausdk.HyperExtract import *
from pandleau import *

#read in geojson file
ntas = gpd.read_file('nyc_ntas.geojson')  

#convert to pandas dataframe for pandleau-ing
nta_pd = pd.DataFrame(ntas)
nta_df = pandleau(nta_pd)

#set spatial column
nta_df.set_spatial('geometry', indicator=True)

nta_df.to_tableau('tableau\\nyc_ntas.hyper', add_index=False)
server.auth.sign_in(tableau_auth)
mydatasourceitem = tsc.DatasourceItem('project_id', name='test_nyc_ntas')
item = server.datasources.publish(mydatasourceitem, 'tableau\\nyc_ntas.hyper', 'Overwrite')
print("{} published with id: {}".format(item.name, item.id))
server.auth.sign_out()```

deprecation warning from pandas

Python36\site-packages\pandleau\pandleau.py:81: FutureWarning: pandas.lib is
deprecated and will be removed in a future version.
You can access infer_dtype as pandas.api.types.infer_dtype
  return pandleau.mapper[pandas.lib.infer_dtype(column.dropna())]

Support for multi-table extracts?

It's possible now to create multiple-table storage extracts using tableausdk in python: docs.

It would be a great feature for Pandleau to support as well - is there any plan to implement this?

python version 2 or 3?

Hi Benjamin,

I am kind of new to python and eager to use your library to make it easy to switch between python and tableau. I have the following question: one of the libraries that is used by your module is 'tableausdk'. it seems that this is a python 2 version only, or am i wrong?

I am trying to pip install your module but it keeps giving me version mismatch errors ( i run on py 3.6).
Thanks in advance!

Melvin

parsing 13m x 89 table is quite slow. Any suggestion for improvements?

Hi,
not really an issue, but i am using pandleau to create an hyper file out of a 13m x89 table.
Columns are half string and half numbers.
Process takes quite a while (7 hours on a 16GB desktop).
Was wondering if you could suggest potential improvements?
Saw notes regarding Unicode slowing down python. any tricks to get around the issue?
kind regards

ModuleNotFoundError: No module named 'tableasdk.HyperExtract'

Hey thanks for creating this library.

I just installed tableausdk(TableauSDK-10300.18.0510.1135) and ran its setup.py, then pip installed pandleau. But when running either import pandleau or from pandleau import *, I get this error:

>>> from pandleau import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/maxepstein/myproject/myenv/lib/python3.6/site-packages/pandleau/__init__.py", line 1, in <module>
    from . import pandleau
  File "/Users/maxepstein/myproject/myenv/lib/python3.6/site-packages/pandleau/pandleau.py", line 11, in <module>
    from tableausdk.HyperExtract import *
ModuleNotFoundError: No module named 'tableausdk.HyperExtract'

Navigating to my myproject/myenv/lib/python3.6/site-packages/tableausdk and running ls, I can confirm there's no HyperExtract module there.

Looking up this issue on Tableau help, I see ["Note: Use the Extract API 2.0 to create .hyper files for Tableau 2018.1 and later. Use the Tableau SDK to create .tde files for previous versions of Tableau."](https://onlinehelp.tableau.com/current/api/sdk/en-us/help.htm)

But Pandleau says it requires the Tableausdk and then exports .hyper files. Is there something I missed installing, or am I using this wrong? Thanks

Not recognizing installed tableausdk

When installing the package from pip or by running setup.py with python, it looks for the tableausdk via pip and cannot find it. (I already installed the extract api and verified that it correctly installed by importing it in Python.) I was able to install it by running setup.py and removing the reference to tableausdk. I then tested it by converting a data frame to a .hyper file and it worked fine.

Unrelated issue, but just a heads up that infer_dtypes in the pandas api was new in version 0.21.0, so this package won't work with earlier versions of pandas.

Bug with large strings

Large strings cause a fatal error and cause Python to crash on my system.

Namely, it looks like strings that are >1,024 characters in length are causing the crash while running the to_tableau function.

As a workaround I have limited my string length, but I'm not sure if there is a way to correct or catch this behavior.

Windows OSError when export large table to hyper

I get this error when executing "pandleau.to_tableau(filepath, add_index = False)":

line 44, in execute
df_tableau.to_tableau(file_path, add_index=False)
File "C:\Program Files\SysESS\Sympathy for Data 1.6.1\Python\lib\site-packages\pandleau-0.3.2_snapshot-py3.7.egg\pandleau\pandleau.py", line 160, in to_tableau
new_extract.close()
File "C:\Program Files\SysESS\Sympathy for Data 1.6.1\Python\lib\site-packages\tableausdk\HyperExtract.py", line 601, in close
hyperExtract_lib.TabExtractClose( self._handle )
OSError: exception: access violation reading 0x0000000000000000

It is only occurring when running on a low spec machine ( 7GB RAM ) and on large table. Works when trying to export the very same table on a high spec machine ( 32 GB RAM ).

This thread discusses the same topic:
https://community.tableau.com/thread/280830

A solution where the process "hyperd.exe" is killed before closing hyper file is proposed.

TableauException (303): extract path must have .tde extension

first of all, thanks for a great tool.

second, i'm getting the following error:

df_tableau = pandleau(example_df)
df_tableau.to_tableau('test.hyper', add_index=False)
Traceback (most recent call last):
File "", line 1, in
File "/Users/philipzelichenko/Downloads/pandleau-master/pandleau/pandleau.py", line 112, in to_tableau
new_extract = Extract( path )
File "/usr/local/lib/python3.6/site-packages/tableausdk/Extract.py", line 594, in init
raise Exceptions.TableauException(ret, Exceptions.GetLastErrorMessage())
tableausdk.Exceptions.TableauException: TableauException (303): extract path must have .tde extension

any idea why?

Updates to operate with pandas 1.0.0

It seems that pandas in versions larger than 0.21 aren't supported with the current version of Pandleau. Is there an upgrade plan in sight?

Also, great job on the package, it has been really useful

Append instead of overwrite

Is there any way to append to the hyper extract rather than overwrite the file?

Some values are causing fatal errors

Hello,

In a few days I have already saved about 40 Pandas DataFrames to .hyper and I am very thankful for that.
Somehow 3 Pandas DataFrames could not be saved to .hyper (some segments will however sucessfully be saved) and I could not troubleshoot the issue.

I made a sample with 3 lines. As you will see it is the last row that causes a fatal error. I'd be very happy if you could find the issue or provide a workaround.

In the zip file you will find:

Exported .hyper file - Test.hyper
The data causing the issues - Bug_pandleau_data.sqlite3
A notebook to read the data and try the conversion + some attempt at fixing the issue - Troubleshooting_pandleau.ipynb

troubleshoot_pandleau.zip

EDIT:

I tried saving the data from the sqlite3 to a .xlsx and a .csv and then to .hyper and had exactly the same issue.
What I gave you is fake data.
Sorry for closing and reopening the issue my finger slipped :(

Thanks in advance,

Cheers,

Thibault

Incorrect parsing

Hello,

just saved a Pandas DataFrame and had another issue. The values are not correctly parsed by Tableau.

Here is a zip containing:

Data to reproduce the bug - Bug_pandleau_data2.sqlite3
Expected in Tableau - Expected.png
How it really is read in Tableau - In_Tableau.png
Tested hyper file - Test.hyper
Notebook to create the hyper file - Troubleshooting_pandleau2.ipynb

troubleshooting_pandleau2.zip

TOP = expected, BOTTOM = read by Tableau

Thanks in advance for your help! Hope you find a solution or you have a workaround.

Cheers,

Thibault

unicode column types in pandas 0.23.0

pandleau.py
In the mapper dictionary (line 37), could you add the following data type:
'unicode':Type.UNICODE_STRING
Thank you!

ModuleNotFoundError not defined in python2

1841213

this broke python 2 -- ModuleNotFoundError is python3 only, but ImportError is probably close enough

Publish to pypi without tableausdk dependency?

In recent commits of setup.py the requirement for tableausdk was dropped. Is it possible publish a version with this change to pypi?
I'm including pandleau in some docker containers, and because I can't pip/pipenv install pandleau, it makes the docker build more complex.

Thanks in advance!

Hyper API works much faster.

I've tried hyper api. And if we use pandas.DataFrame.iterrows() to insert data into hyper file, it's not fast. But if we use hyper sql command "Copy" to create hyper directly from csv, it's much faster, almost 10-100x faster. The only problem is that we have to write data to csv, which is slow with pandas. But luckly we have datatable in Python and it's about R data.table's speed. I tested it on 600M rows and 31 columns data and just spent nearly 17 seconds for build hyper file from csv.
reference:
https://github.com/tableau/hyper-api-samples/blob/main/Tableau-Supported/Python/create_hyper_file_from_csv.py.

init.py

from pandleau import pandleau
You are using the Tableau SDK, please save the output as .tde format

ImportErrorTraceback (most recent call last)
in ()
----> 1 from pandleau import pandleau

C:\Users\perkir1\AppData\Local\Continuum\anaconda2\lib\site-packages\pandleau_init_.py in ()
----> 1 from pandleau.pandleau import *

ImportError: No module named pandleau

package having issues w/ init.py file. Returns no module named pandleau import error.

Bizzare error running pandleau on RHEL7

Hey

Running on RHEL7, using python 2.7+ pandleau.. i am getting a massive exception in pandleau.py

OSError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found

i was using pandas 0.24.something. Downgrading to 0.20.3 didnt fix it
What did fix it was actually to do a local import of pandas rather than doing an import pandas at the top of the file
I dont know what triggers it...it could be the fact that pandleau tries to be smart to detect if you are using the old tableausdk api or the new one.... i dont know... it seems that pandas and the import
from tableausdk import *
somehow causes this problem

have you ever seen it? Would you know what to do to address it?
right now i had to copy paste the code and remove the global import
I am pretty sure this has to do with pandas as , when i run the extractAPI python samples, i dont get any error - while when i edit the sample and add an import pandas also the extractAPI sample blows up
ANy chances you can reproduce and help?
thanks

No module named 'tableausdk'

Hello,
I am not able to get past this error message:

File "/Users/jakovvidulic/Code/0_to_publish/readcsv.py", line 26, in
from pandleau import pandleau
File "/Users/jakovvidulic/.pyenv/versions/3.9.9/lib/python3.9/site-packages/pandleau/init.py", line 1, in
from pandleau.pandleau import *
File "/Users/jakovvidulic/.pyenv/versions/3.9.9/lib/python3.9/site-packages/pandleau/pandleau.py", line 10, in
from tableausdk import *
ModuleNotFoundError: No module named 'tableausdk'

The site for downloading Tableau SDK you put on https://pypi.org/project/pandleau/ is not available.
How do I fix this?

bwiley1 / pandleau Goto Github PK

pandleau's People

Contributors

Stargazers

Watchers

Forkers

pandleau's Issues

ImportError: No module named pandleau

Recommend Projects

Recommend Topics

Recommend Org