Comments (1)
Metastore can be imported into workspace but data all will have to be external mounts.
a) The mounts will have to remounted in new workspace with exact directory structure. for external tables. do %fs mounts
in old workspace and copy the exact mount structure in destination workspace. %fs mounts
on source and destination workspace should be same if mounts are to be pointed by any tables in metastore.
b) For tables using abfs:// or adl:// declare credentials in spark config and use that cluster for migration.
example: In destination workspace
python import_db.py --azure --profile secondary --metastore --cluster-name CLUSTER_WITH_SPARKCREDS_FOR_STORAGE
adl://
https://docs.databricks.com/data/data-sources/azure/azure-datalake.html
spark.sparkContext.hadoopConfiguration.set(".oauth2.access.token.provider.type", "ClientCredential")
spark.sparkContext.hadoopConfiguration.set(".oauth2.client.id", "")
spark.sparkContext.hadoopConfiguration.set(".oauth2.credential", dbutils.secrets.get(scope = "", key = ""))
spark.sparkContext.hadoopConfiguration.set(".oauth2.refresh.url", "https://login.microsoftonline.com//oauth2/token")
abfss://
https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html
spark.hadoop.fs.azure.account.auth.type..dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type..dfs.core.windows.net org.apache.hadoop.spark.hadoop.fs.azure.fs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id..dfs.core.windows.net
spark.hadoop.fs.azure.account.oauth2.client.secret..dfs.core.windows.net
spark.hadoop.fs.azure.account.oauth2.client.endpoint..dfs.core.windows.net https://login.microsoftonline.com//oauth2/token
wasbs://
https://docs.databricks.com/data/data-sources/azure/azure-storage.html
spark.hadoop.fs.azure.account.key..blob.core.windows.net
S3://
https://docs.databricks.com/data/data-sources/aws/amazon-s3.html#access-s3-buckets-directly
b) For managed table we will have to copy dbfs:/user/hive/warehouse/ to dbfs:/tempMount location. Then we will have to copy from dbfs:/tempMount to dbfs:/user/hive/warehouse/ location in new workspace in azure. There is need for temp because root dbfs is locked in Azure. Users having 1000 of TBs of data can reach out to azure SAs to workout a plan to avoid overhead in copy to temp location.
The copy tool can be used to copy between temp and dbfs mount.
At source workspace: dbutils.fs.cp ("dbfs:/user/hive/warehouse/", "dbfs:/tempMount/", True)
At destination workspace: : dbutils.fs.cp ("dbfs:/tempMount/", "dbfs:/user/hive/warehouse/", True)
dbfs:/tempMount shall be ADLS or Blob or even S3:
https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html or
https://docs.databricks.com/data/data-sources/azure/azure-datalake.html
or https://docs.databricks.com/data/data-sources/azure/azure-storage.html
https://docs.databricks.com/data/data-sources/azure/azure-storage.html
AWS is similar but we have tools to directly copy the data from root bucket to root bucket I suppose. /temp
from db-migration.
Related Issues (20)
- DDL extraction issue with control characters HOT 2
- Add support for different file types for notebooks
- Fix ACLs for User Migrations with Case Sensitivity Changes
- --reset-exports arg defaults to true HOT 1
- add support for managed table data migration options. HOT 1
- Import users without notifications? HOT 2
- Add Secrets Export
- Add Secrets Import
- delta and external tables in metastore import fails HOT 10
- user import in azure fails with key error. HOT 4
- databases migrated failed to inherit path / location from source workspace HOT 2
- Add support for custom database locations for single database exports
- Apply Table Views After Raw Tables During Metastore Import
- Import Users into the specific groups already available from the export user-group lists HOT 2
- Add Export analysis for metastore partitioned tables
- Duplication of databricks jobs HOT 7
- Deleted Clusters requiring cluster id to exist before being imported again HOT 3
- CSV Custom Delimiters Error
- Tables are blank/empty after migration HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from db-migration.