Comments (4)
To address the UnicodeDecodeError
you're encountering when querying binary, non-UTF-8 encodable data from a VARBINARY column in Trino using SQL Lab in Superset, consider implementing a custom JSON serializer that can handle binary data by encoding it in a format like base64. This will allow binary data to be serialized into a string format that JSON can handle without errors.
Here are the steps you can take:
-
Implement a Custom Serializer: Create a custom JSON serializer for handling binary data, ensuring it's encoded in a format such as base64 before serialization.
-
Modify JSON Serialization Calls: Update the
json.dumps
calls insqllab/api.py
to utilize this custom serializer, particularly where JSON serialization of execution results and data serialization occurs. -
Adjust Data Decompression Logic: In
utils/core.py
, ensure the decompression logic can handle binary data correctly, which may involve type checks before decompression and appropriate handling for binary data. -
Thorough Testing: Test the querying of binary data from VARBINARY columns extensively to confirm the absence of the
UnicodeDecodeError
. -
Documentation: Update project documentation to reflect the handling of binary data in SQL Lab queries.
This approach should help you resolve the encoding issues you're facing with binary data in Superset's SQL Lab.
Details
Sources
- superset/sqllab/api.py
- superset/utils/core.py
- superset/key_value/types.py
- tests/unit_tests/key_value/codec_test.py
- tests/integration_tests/utils_tests.py
- superset/db_engine_specs/trino.py
- superset/db_engine_specs/presto.py
- superset/models/core.py
- superset/sql_lab.py
- superset/sqllab/exceptions.py
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
from superset.
I don't think so, but it's possible. I was trying to allow UTF-8 strings through without them being escaped. It's possible this has to do with something being escaped incorrectly, which may or may not be the same chunk o' code.
from superset.
+1
We're running into this as well. It also affects the Table viz (and possibly others), not just SQL Lab.
@rusackas Could this have anything to do with the stringify
stuff you were trying to remove?
from superset.
it seems that simplejson will handle bytes itself, and will not use logic in 'default' function:
import simplejson
def convert_bytes(val):
if isinstance(val, bytes):
return 'converted-' + repr(val)
return val
print(simplejson.__version__)
# 3.17.3
print(simplejson.dumps({'a': b'b'}, default=convert_bytes))
# {"a": "b"}
print(simplejson.dumps({'a': b'\x00\x85'}, default=convert_bytes))
# raises UnicodeDecodeError
so bytes should be handled before they goes into the dumps
function.
from superset.
Related Issues (20)
- Failed to setup alerts and reports based pypi installation HOT 6
- 4.0.1: Alerting and reports does not take EMAIL RECIPIENTS* HOT 4
- Filter does not follow label HOT 1
- [SIP-138] Proposal for Slack file upload V2 integration for Alerts and Reports HOT 2
- [SIP-139] Proposal for Ant Design 5.x Upgrade HOT 4
- On KILL QUERY error "expected string or bytes-like object" in Clickhouse DB HOT 1
- build fails when using docker compose up HOT 7
- Blank Charts while loading dashboard HOT 2
- Warning when upgrading DB HOT 1
- 500 error and unexpected keyword argument 'extra_filters' HOT 3
- sqlalchemy.exc.ArgumentError HOT 1
- Oracle as Metadata- Not working HOT 2
- Thumbnails work for Dashboards, not for charts HOT 1
- Unable to set role permissions for new schema unless a new connection is created. HOT 1
- Superset Login Redirect After Guest Token Usage in Angular Iframe HOT 1
- start of production no possible due to errors in docker-compose-non-dev.yml and docker/.env HOT 3
- Data Misalignment When Applying Filter HOT 5
- PLAYWRIGHT_REPORTS_AND_THUMBNAILS grey output for dashboards HOT 4
- Superset 3.1.3: Long Dashboards render blank thumbnails unless Dashboard length is reduced HOT 4
- Changing to non-temporal variable, then back, causes time grain to no longer function in 4.0.2rc2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from superset.