Giter VIP home page Giter VIP logo

Comments (1)

gobiviswanath avatar gobiviswanath commented on May 26, 2024

Metastore can be imported into workspace but data all will have to be external mounts.

a) The mounts will have to remounted in new workspace with exact directory structure. for external tables. do %fs mounts in old workspace and copy the exact mount structure in destination workspace. %fs mounts on source and destination workspace should be same if mounts are to be pointed by any tables in metastore.

b) For tables using abfs:// or adl:// declare credentials in spark config and use that cluster for migration.

example: In destination workspace

python import_db.py --azure --profile secondary --metastore --cluster-name CLUSTER_WITH_SPARKCREDS_FOR_STORAGE

adl://

https://docs.databricks.com/data/data-sources/azure/azure-datalake.html

spark.sparkContext.hadoopConfiguration.set(".oauth2.access.token.provider.type", "ClientCredential")
spark.sparkContext.hadoopConfiguration.set(".oauth2.client.id", "")
spark.sparkContext.hadoopConfiguration.set(".oauth2.credential", dbutils.secrets.get(scope = "", key = ""))
spark.sparkContext.hadoopConfiguration.set(".oauth2.refresh.url", "https://login.microsoftonline.com//oauth2/token")

abfss://

https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html

spark.hadoop.fs.azure.account.auth.type..dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type..dfs.core.windows.net org.apache.hadoop.spark.hadoop.fs.azure.fs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id..dfs.core.windows.net
spark.hadoop.fs.azure.account.oauth2.client.secret..dfs.core.windows.net
spark.hadoop.fs.azure.account.oauth2.client.endpoint..dfs.core.windows.net https://login.microsoftonline.com//oauth2/token

wasbs://

https://docs.databricks.com/data/data-sources/azure/azure-storage.html

spark.hadoop.fs.azure.account.key..blob.core.windows.net

S3://

https://docs.databricks.com/data/data-sources/aws/amazon-s3.html#access-s3-buckets-directly

b) For managed table we will have to copy dbfs:/user/hive/warehouse/ to dbfs:/tempMount location. Then we will have to copy from dbfs:/tempMount to dbfs:/user/hive/warehouse/ location in new workspace in azure. There is need for temp because root dbfs is locked in Azure. Users having 1000 of TBs of data can reach out to azure SAs to workout a plan to avoid overhead in copy to temp location.

The copy tool can be used to copy between temp and dbfs mount.

At source workspace: dbutils.fs.cp ("dbfs:/user/hive/warehouse/", "dbfs:/tempMount/", True)
At destination workspace: : dbutils.fs.cp ("dbfs:/tempMount/", "dbfs:/user/hive/warehouse/", True)

dbfs:/tempMount shall be ADLS or Blob or even S3:

https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html or
https://docs.databricks.com/data/data-sources/azure/azure-datalake.html
or https://docs.databricks.com/data/data-sources/azure/azure-storage.html
https://docs.databricks.com/data/data-sources/azure/azure-storage.html

AWS is similar but we have tools to directly copy the data from root bucket to root bucket I suppose. /temp

from db-migration.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.