mrchristine / db-migration Goto Github PK
View Code? Open in Web Editor NEWDatabricks Migration Tools
License: Other
Databricks Migration Tools
License: Other
When an Admin becomes the migration owner, the old owner must be switch to CAN_MANAGE
permissions instead of being the actual owner.
We used this tool to migrate delta table. There is issue with delta table import:
The migration tool generates delta import command in below format but it fails
CREATE TABLE events
USING DELTA
Error:
ERROR:
org.apache.spark.sql.AnalysisException: Cannot create table ('`test_db`.`test_table1`'). The associated location ('dbfs:/mnt/mountdbfs1/ods/test_table1') is not empty.;
{'resultType': 'error', 'summary': 'org.apache.spark.sql.AnalysisException: Cannot create table ('`test_db`.`test_table1`'). The associated location ('dbfs:/mnt/mountdbfs1/ods/test_table1') is not empty.;', 'cause': '---------------------------------------------------------------------------\nPy4JJavaError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)\n 62 try:\n---> 63 return f(*a, **kw)\n 64 except py4j.protocol.Py4JJavaError as e:\n\n/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)\n 327 "An error occurred while calling {0}{1}{2}.\\n".\n--> 328 format(target_id, ".", name), value)\n 329 else:\n\nPy4JJavaError: An error occurred while calling o210.sql.\n: org.apache.spark.sql.AnalysisException: Cannot create table (\'`test_db`.`test_table1`\'). The associated location (\'dbfs:/mnt/mountdbfs1/ods/test_table1\') is not empty.;\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.com$databricks$sql$transaction$tahoe$commands$CreateDeltaTableCommand$$assertPathEmpty(CreateDeltaTableCommand.scala:186)\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:136)\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:93)\n\tat com.databricks.logging.UsageLogging$$anonfun$recordOperation$1.apply(UsageLogging.scala:428)\n\tat com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)\n\tat scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)\n\tat com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)\n\tat com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)\n\tat com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.logging.UsageLogging$class.recordOperation(UsageLogging.scala:409)\n\tat com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:55)\n\tat
The correct command should be
CREATE TABLE events
USING DELTA
LOCATION '/mnt/delta/events'
Using python3 /databricks/driver/src/databricks-migration/export_db.py --profile DEFAULT --azure --libs
Gives error message:
Get: https://eastus2.azuredatabricks.net/api/1.2/libraries/list
Traceback (most recent call last):
File "/databricks/driver/src/databricks-migration/export_db.py", line 128, in <module>
main()
File "/databricks/driver/src/databricks-migration/export_db.py", line 67, in main
lib_c.log_library_details()
File "/databricks/driver/src/databricks-migration/dbclient/LibraryClient.py", line 18, in log_library_details
all_libs = self.get('/libraries/list', version='1.2')
File "/databricks/driver/src/databricks-migration/dbclient/dbclient.py", line 60, in get
raise Exception("Error. GET request failed with code {}\n{}".format(http_status_code, raw_results.text))
Exception: Error. GET request failed with code 400
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 400 </title>
</head>
<body>
<h2>HTTP ERROR: 400</h2>
<p>Problem accessing /api/1.2/libraries/list. Reason:
<pre> The V1 APIs for clusters and libraries are disabled. Please use the V2 APIs.</pre></p>
<hr />
</body>
</html>```
add documentation on data migration options
We support different export types that could help reduce export / import time for notebooks.
https://docs.databricks.com/dev-tools/api/latest/workspace.html#exportformat
Supporting source notebook types removes output from cells and reduces total size of objects.
Add support for importing and exporting cluster policies.
This includes:
During a migration, if the email case sensitivity has changed, then the users defined in the ACLs need to be updated.
Review all ACLs to verify that this is updated.
Tried pulling the changed code and tried re-importing the issue DDL.
But looks like similar error.
n\nDuring handling of the above exception, another exception occurred:\n\nParseException Traceback (most recent call last)\n in \n 2 )\n 3 USING delta\n----> 4 LOCATION 'dbfs:/mnt/Test-databricks/use_cases/tp_reservoir/dev/dtc_monitor/databricks_output/xentry_valid_readouts_clustered' """)\n\n/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)\n 702 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]\n 703 """\n--> 704 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)\n 705 \n 706 @SInCE(2.0)\n\n/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)\n 1303 answer = self.gateway_client.send_command(command)\n 1304 return_value = get_return_value(\n-> 1305 answer, self.gateway_client, self.target_id, self.name)\n 1306 \n 1307 for temp_arg in temp_args:\n\n/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)\n 100 converted = convert_exception(e.java_exception)\n 101 if not isinstance(converted, UnknownException):\n--> 102 raise converted\n 103 else:\n 104 raise\n\nParseException: \nno viable alternative at input 'CREATE TABLE dtc_monitor
.xentry_valid_readouts_clustered
(\n )'(line 2, pos 2)\n\n== SQL ==\n CREATE TABLE dtc_monitor
.xentry_valid_readouts_clustered
(\n )\n--^^^\nUSING delta\nLOCATION 'dbfs:/mnt/Test-databricks/use_cases/tp_reservoir/dev/dtc_monitor/databricks_output/xentry_valid_readouts_clustered''}
Complete Metastore Import Time: 0:06:29.123450
I am getting an error when trying to export metastore for a database with tables that point to files on Azure Storage account.
Here is the error.
Table: account_diagnosis_related_group_fact
post: https://eastus.azuredatabricks.net/api/1.2/commands/execute
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Logging failure
Within a notebook, I can run the same statement that I find in the failed_metastore.log using the same spark cluster and I don't get an error.
Here is the python statement I can run successfully:
spark.sql("show create table dw.account_diagnosis_related_group_fact").collect()[0][0]
Out[2]: "CREATE TABLE dw
.account_diagnosis_related_group_fact
(\n account_diagnosis_related_group_dimension_key
BIGINT,\n effective_from_date
DATE,\n effective_to_date
DATE,\n tenant_key
BIGINT,\n account_dimension_key
BIGINT,\n account_key
BIGINT,\n diagnosis_related_group_dimension_key
BIGINT,\n diagnosis_related_group_code
STRING,\n relationship_type_code_key
BIGINT,\n relationship_type_code
STRING,\n source_code_key
BIGINT,\n source_code
STRING,\n diagnosis_related_group_condition_code_key
BIGINT,\n diagnosis_related_group_condition_code
STRING,\n diagnosis_related_group_condition_description
STRING,\n diagnosis_related_group_length_of_stay_days_count
INT,\n diagnosis_related_group_qualifier_code_key
BIGINT,\n diagnosis_related_group_qualifier_code
STRING,\n diagnosis_related_group_qualifier_description
STRING,\n illness_severity_class_code_key
BIGINT,\n illness_severity_class_code
STRING,\n illness_severity_class_description
STRING,\n mortality_risk_class_code_key
BIGINT,\n mortality_risk_class_code
STRING,\n mortality_risk_class_description
STRING,\n arithmetic_average_length_of_stay
DECIMAL(5,2),\n geometric_average_length_of_stay
DECIMAL(5,2),\n relative_weighting_factor
DECIMAL(18,4),\n diagnosis_related_group_sequence
BIGINT,\n diagnosis_related_group_billing_indicator
INT,\n account_diagnosis_related_group_count
INT,\n document_key
BIGINT,\n document_dimension_key
BIGINT,\n diagnosis_related_group_comparison_indicator
INT)\nUSING delta\nLOCATION 'abfss://[email protected]/edw/prod/dw/account_diagnosis_related_group_fact.delta'\n"
Could you please add an option to exclude database from the metastore export. For example:
--exclude-database DATABASE Database name to be excluded from metastore export. Single database name supported.
A more friendly option would be to exclude a list of command delisted database
--exclude-database DATABASE Database name(s) to be excluded from metastore export. Command delimited database names supported.
How the current code is implemented when exporting data from metastore if the cluster json spec used in the data/ folder does not have a instance profile that has access to the S3 buckets that are defined for the tables it will fail to get the ddl for those tables.
Hi Team,
Kindly let us know if there is a way to dump the extracted metadata of Databricks workspace onto the SQL data base.
Is there any utility just like how we imported and exported the metastore using python script?
Is possible to export just the database names in a workspace without the table definitions? I am asking because some databases/tables point to Data Lake Storage Gen2 the cluster and I get an error when trying to export a table that is located on the Data Lake Storage Gen2 even though Pass Through Credentials are enables.
The error I get is
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Alternatively, is there a way to query the databases that exist in advance in order to pick which databases to export. I checked the Databricks CLI but I don't think it has a metadata (metastore) list function.
Basically, I only care about tables that are stored in DBFS but not on the Azure Data Lake Storage
Hello! Is it possible to import users into a new workspace without notifying the users? Thank you.
Need to add paging support for large DDLs.
For files > 1kb, upload to DBFS /tmp/ locations, read the file locally within python, then replay via SparkSQL on the cluster.
Files < 1kb, can replay the DDL string directly via the execution context.
Due to the cost factor of Databricks, I've been asked to figure out how to migrate off of the platform. I'm unclear what migration means in the context of this project. Is it migration from one Databricks environment to another or is it migration off of the platform?
Add MLFlow experiments / models for import / export.
Metastore export should support table comments or table properties. This does not migrate today with the show create table output.
I have a customer who only wants to migrate a subset of existing clusters from old env to new env.
Remove admin user from PVC user export, it will fail on import with a user already exists error.
When running from PowerShell with the following options (i.e. not specifying --reset-exports explicitly)
python $localFolder/db-migration-master/export_db.py --azure --metastore --silent
The script will always ask for input on clearing the export folder. This means when trying to automate this is not possible through PowerShell (In this case this is for running through a cd pipeline).
It would be good to be able to explicitly set the --reset-exports to false, or it defaults to false correctly.
Use the permissions API to add cluster ACLs
Support adding a whitelist for users to be imported into the new env
When using export with a specific cluster using the --cluster_name argument and the cluster is spark 3.0 then an error is generated as shown below. This was tested on Azure Databricks. Cluster used is : 7.0 (includes Apache Spark 3.0.0, Scala 2.12)
python3 ./export_db.py --azure --metastore --cluster-name export --profile ws-databricks
ERROR:
AttributeError: databaseName
{"resultType": "error", "summary": "<span class="ansi-red-fg">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}
Traceback (most recent call last):
File "./export_db.py", line 151, in
main()
File "./export_db.py", line 137, in main
hive_c.export_hive_metastore(cluster_name=args.cluster_name)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, metastore_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases
raise ValueError("Cannot identify number of databases due to the above error")
ValueError: Cannot identify number of databases due to the above error
Have a method to export / import groups into the workspace instead of individual users.
Support Account ID Updates in Instance Profiles.
Instance profiles show up in the following logs:
Add secrets import
For example using databricks CLI one can download a file or a directory as shown below:
databricks fs cp -r dbfs:/user/hive/warehouse ./mylocalDir
Verify that the policy id changes are supported when migrating clusters via cluster policies.
Some deployments of Databricks need verify=False
Add complementary option to import a single user home directory that was export.
When doing an import and the table already exists then an error is generated. The behavior on import of users is different. It automatically creates the users even if they already exist. Could the same be done for the megastore import and other exported items.
Hello , I recently noticed that a space is missing after the USING keyword in the DDL , below is an example.
USINGorg.apache.spark.sql.parquet
OPTIONS (
path 'dbfs:/path'
)
PARTITIONED BY (column)
because of the missing space , the DDL execution fails. Can you please help us with this ?
Investigate how to export and log Azure libraries in their workspace.
We currently log the workspace libraries that are defined via the workspace export.
user migration:
Export succeeded
python3 export_db.py --profile primary --users --azure
Import failed:
python3 import_db.py --profile secondary --users --azure
Traceback (most recent call last):
File "import_db.py", line 180, in
main()
File "import_db.py", line 41, in main
scim_c.import_all_users_and_groups()
File "/home/vivek/db-migration/dbclient/ScimClient.py", line 402, in import_all_users_and_groups
self.import_groups(group_dir)
File "/home/vivek/db-migration/dbclient/ScimClient.py", line 379, in import_groups
member_id_list.append(current_group_ids[m['display']])
KeyError: '9db832f5-bb53-4cc7-8250-912e676046d1'
Hi Team,
We tried to follow the document steps for migrating the metastore from our existing workspace on to the new databricks workspace.
Initially when we executed the script, it was successful - but we noticed that partial database and table information were migrated from old workspace to the new one.
We tried to re-migrate the second time - but we ended up facing issues while executing the export.py script with below error.
Creating remote Spark Session
post: https://westeurope.azuredatabricks.net/api/1.2/contexts/create
post: https://westeurope.azuredatabricks.net/api/1.2/commands/execute
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
ERROR:
AttributeError: databaseName
Traceback (most recent call last):
File "export_db.py", line 132, in
main()
File "export_db.py", line 126, in main
hive_c.export_hive_metastore()
File "C:\migration-tools\db-migration-master\dbclient\HiveClient.py", line 161, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, ms_dir)
File "C:\migration-tools\db-migration-master\dbclient\HiveClient.py", line 108, in log_all_databases
num_of_dbs = ast.literal_eval(results['data'])
KeyError: 'data'
C:\Users\dsvm\Desktop\db-migration-master>python export_db.py --metastore --azure --profile tpold
Traceback (most recent call last):
File "export_db.py", line 135, in
main()
File "export_db.py", line 25, in main
token = login_args['token']
KeyError: 'token'
add support to export tables with datasource as root dbfs (/usr/hive/db location)
Add permissions / ACLs export and import for notebooks and directories
Notebook ids and directory ids change with the migration of the workspace.
Need to define a lookup table to find the new ids quickly.
Export for azure created a folder called azure_logs, however, the import is looking for a folder called logs.
Here is an example:
% python3 import_db.py --azure --workspace
Import the complete workspace at 2020-09-01 12:59:00.020444
Import on https://adb-5463815377663355.15.azuredatabricks.net
Traceback (most recent call last):
File "import_db.py", line 123, in
main()
File "import_db.py", line 49, in main
ws_c.import_all_workspace_items(archive_missing=False)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/WorkspaceClient.py", line 364, in import_all_workspace_items
num_exported_users = self.get_num_of_saved_users(src_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/WorkspaceClient.py", line 82, in get_num_of_saved_users
ls = os.listdir(user_home_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/artifacts/Users'
The workaround for me was to rename azure_logs to logs. Should the default for azure import use the same as the export directory as a quick fix?
There are some gaps with the 1.2 Rest API library commands. We can do a best effort re-creation of the workspace libs using the support APIs.
Add cluster policies to cluster client with permissions / ACLs.
Could you add the ability to specify the destination folder instead of always being azure_logs?
Hello
Just found some inconsitencey while extracting a DDL that make use of control characters.
Below was the DDL that as used to create the table
CREATE EXTERNAL TABLE db.tab
(
h_ls_hash array<string>,
ls_id STRING,
bin array<STRING>,
Class1 array<int>,
Class1_valueString array<string>,
Class1_valueFrom array<float>,
Class1_valueTo array<float>,
Class2 array<int>,
Class2_valueString array<string>,
Class2_valueFrom array<float>,
Class2_valueTo array<float>,
Class3 array<int>,
Class3_valueString array<string>,
Class3_valueFrom array<float>,
Class3_valueTo array<float>,
load_ts timestamp COMMENT 'EN: load timestamp | DE: Zeitstempel für das Laden des Datensatzes',
record_source string COMMENT 'EN: Source Name | DE: Quellenname'
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS TEXTFILE
location 'dbfs:/loc';
but when the DDL's were extracted using the migration tool it resulted in the below.
<div class="ansiout">CREATE EXTERNAL TABLE `db`.`tab`(`h_ls_hash` ARRAY<STRING>, `ls_id` STRING, `bin` ARRAY<STRING>, `Class1` ARRAY<INT>, `Class1_valueString` ARRAY<STRING>, `Class1_valueFrom` ARRAY<FLOAT>, `Class1_valueTo` ARRAY<FLOAT>, `Class2` ARRAY<INT>, `Class2_valueString` ARRAY<STRING>, `Class2_valueFrom` ARRAY<FLOAT>, `Class2_valueTo` ARRAY<FLOAT>, `Class3` ARRAY<INT>, `Class3_valueString` ARRAY<STRING>, `Class3_valueFrom` ARRAY<FLOAT>, `Class3_valueTo` ARRAY<FLOAT>, `load_ts` TIMESTAMP COMMENT 'EN: load timestamp | DE: Zeitstempel für das Laden des Datensatzes', `record_source` STRING COMMENT 'EN: Source Name | DE: Quellenname')
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim' = '�',
'serialization.format' = '�'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'dbfs:/loc'
TBLPROPERTIES (
'transient_lastDdlTime' = '1606221712'
)
</div>
Can someone help me on this ?
Export secrets with their ACLs.
Would like to choose whether or not to retain instance profile access by group when importing users.
I noticed that when doing an export with --metastore it creates a cluster with a name stored in the data/azure_cluster.json. However, I would like it to use an existing cluster in my workspace. Is this possible?
I changed the cluster name and version to a cluster I crated as shown below but I getting an error. Here is the azure_cluster.json file followed by the error.
{
"num_workers": 1,
"cluster_name": "export",
"spark_version": "7.0.x-scala2.12",
"spark_conf": {},
"node_type_id": "Standard_F4s_v2",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"autotermination_minutes": 30,
"init_scripts": []
}
######################## ERROR #######################################
python3 ./export_db.py --azure --metastore --debug --profile ws-databricks
https://adb-5463815377663355.15.azuredatabricks.net dapi6fe373fdb9f18c9613840ebefdccde43
Export the metastore configs at 2020-09-15 17:16:17.354008
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list
Starting export with id 0915-220257-ruins233
post: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/start
Error: Cluster 0915-220257-ruins233 is in unexpected state Running.
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/get
Cluster creation time: 0:00:00.662221
Creating remote Spark Session
post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/contexts/create
post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/execute
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
ERROR:
AttributeError: databaseName
{"resultType": "error", "summary": "<span class="ansi-red-fg">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}
Traceback (most recent call last):
File "./export_db.py", line 151, in
main()
File "./export_db.py", line 137, in main
hive_c.export_hive_metastore(cluster_name=args.cluster_name)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, metastore_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases
raise ValueError("Cannot identify number of databases due to the above error")
ValueError: Cannot identify number of databases due to the above error
PS /Users/saldroubi/Dropbox/git/db-migration>
Metastore import is not working as the DDL needs to be wrapped in a Spark command.
Use permissions api endpoint for job ACLs and re-import them on the new workspace.
Scheduled jobs will be imported in the PAUSED
state to reduce impact on the current workspace with scheduled jobs.
Both export / import tools support pausing all jobs and un-pausing all jobs to satisfy a switchover for each environment.
Hello,
I am using this utility to do a dump of all the DDL from the databricks cluster. What i observed is that when the DDL are huge the DDL statement gets truncated.
*** WARNING: skipped 23006 bytes of output ***
USING parquet
OPTIONS (
path 'dbfs:/XX/XXX/XXXXX'
)
PARTITIONED BY (XXX)
Because of the truncating my DDL statement fails . Is there any parameter that can resolve such errors ?
Note: The table has over 700 columns with comments included.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.