Giter VIP home page Giter VIP logo

Comments (5)

jiayuasu avatar jiayuasu commented on August 22, 2024 1

@remibaar Makes sense to me. Would you please update the doc of Sedona website and create a PR? I am happy to accept it!

from sedona.

remibaar avatar remibaar commented on August 22, 2024

After some further investigation I see the Databricks runtime also contains H3 functionality. For this it uses com.uber h3 version 3.7.0. Could this be conflicting with the version 4.1.1 which is being used by Sedona? It would explain it as polygonToCells is not available in version 3.x of H3.

from sedona.

remibaar avatar remibaar commented on August 22, 2024

I managed to solve the issue! Indeed it was related to the version of H3 that was being installed in the Databricks runtime.

By adjusting the init script, I remove the older H3 jar from the Databricks jars. This solves the issue.
This is the code for my new init script:

%sh

# Create init script
cat > /dbfs/FileStore/sedona/scripts/sedona-init.sh <<'EOF'
#!/bin/bash
#
# File: sedona-init.sh
# 
# On cluster startup, this script will copy the Sedona jars to the cluster's default jar directory.
# In order to activate Sedona functions, remember to add to your spark configuration the Sedona extensions: "spark.sql.extensions org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"

# Remove default H3 version of databricks, as it is not compatible with Sedona > 1.5.0
rm -f /databricks/jars/*com.uber*h3*.jar

# Copy jars
cp /dbfs/FileStore/sedona/jars/*.jar /databricks/jars

EOF

Note: This will break the builtin H3 functionality of Databricks. But I believe the H3 functions of Sedona supersedes those of the built-in H3 of Databricks. The builtin H3 functions will now throw a NoClassDefFoundError

I will keep this issue open, because I am going to create a PR for a change in the docs.
https://github.com/apache/sedona/blob/master/docs/setup/databricks.md

from sedona.

jiayuasu avatar jiayuasu commented on August 22, 2024

The main reason is that we shaded the uber-h3 jar into sedona-spark-shaded which leads to conflicts. Another alternative to fix this is that: use sedona-spark jar which does not shade anything, and manually download all dependency jars of Sedona: https://github.com/apache/sedona/blob/master/pom.xml#L139

from sedona.

remibaar avatar remibaar commented on August 22, 2024

Another alternative to fix this is that: use sedona-spark jar which does not shade anything, and manually download all dependency jars of Sedona

Please correct me if I am wrong. With this method you also will not be able to use both the H3 of Sedona and the H3 of Databricks. Because they use different major versions (Sedona uses 4.1.1, Databricks uses 3.7.0), which are incompatible.

My personal recommendation would be to remove the H3 3.7.0 jar from the Databricks runtime. This disables the H3 functions of Databricks, but allows the use of the H3 functions of Sedona.
In my opinion the H3 functions of Sedona are more feature complete.

For example one of the features I need is the fullCover of the ST_H3CellIDs function. Which is not available at the Databricks implementation, but is at Sedona

from sedona.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.