mrchristine / db-migration Goto Github PK

View Code? Open in Web Editor NEW

43.0 43.0 27.0 384 KB

Databricks Migration Tools

License: Other

Python 100.00%

db-migration's People

Contributors

Stargazers

Watchers

db-migration's Issues

Job ACL Import Bug

When an Admin becomes the migration owner, the old owner must be switch to CAN_MANAGE permissions instead of being the actual owner.

delta and external tables in metastore import fails

We used this tool to migrate delta table. There is issue with delta table import:

There is a bug in migration tool which missed location path in create statement where table creation fails while doing metastore migration. The exact details are highlighted in the screenshots.

The migration tool generates delta import command in below format but it fails

CREATE TABLE events
  USING DELTA

Error:

ERROR:

org.apache.spark.sql.AnalysisException: Cannot create table (&#39;`test_db`.`test_table1`&#39;). The associated location (&#39;dbfs:/mnt/mountdbfs1/ods/test_table1&#39;) is not empty.;
{'resultType': 'error', 'summary': 'org.apache.spark.sql.AnalysisException: Cannot create table (&#39;`test_db`.`test_table1`&#39;). The associated location (&#39;dbfs:/mnt/mountdbfs1/ods/test_table1&#39;) is not empty.;', 'cause': '---------------------------------------------------------------------------\nPy4JJavaError                             Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)\n     62         try:\n---> 63             return f(*a, **kw)\n     64         except py4j.protocol.Py4JJavaError as e:\n\n/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)\n    327                     "An error occurred while calling {0}{1}{2}.\\n".\n--> 328                     format(target_id, ".", name), value)\n    329             else:\n\nPy4JJavaError: An error occurred while calling o210.sql.\n: org.apache.spark.sql.AnalysisException: Cannot create table (\'`test_db`.`test_table1`\'). The associated location (\'dbfs:/mnt/mountdbfs1/ods/test_table1\') is not empty.;\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.com$databricks$sql$transaction$tahoe$commands$CreateDeltaTableCommand$$assertPathEmpty(CreateDeltaTableCommand.scala:186)\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:136)\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:93)\n\tat com.databricks.logging.UsageLogging$$anonfun$recordOperation$1.apply(UsageLogging.scala:428)\n\tat com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)\n\tat scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)\n\tat com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)\n\tat com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)\n\tat com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.logging.UsageLogging$class.recordOperation(UsageLogging.scala:409)\n\tat com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:55)\n\tat

The correct command should be

CREATE TABLE events
  USING DELTA
  LOCATION '/mnt/delta/events'

There are also issues with some external tables after migration which have .seq.gz files as extensions which need to be fixed as well. They return empty dataset.

Error exporting Libs on Azure

Using python3 /databricks/driver/src/databricks-migration/export_db.py --profile DEFAULT --azure --libs
Gives error message:

Get: https://eastus2.azuredatabricks.net/api/1.2/libraries/list
Traceback (most recent call last):
  File "/databricks/driver/src/databricks-migration/export_db.py", line 128, in <module>
    main()
  File "/databricks/driver/src/databricks-migration/export_db.py", line 67, in main
    lib_c.log_library_details()
  File "/databricks/driver/src/databricks-migration/dbclient/LibraryClient.py", line 18, in log_library_details
    all_libs = self.get('/libraries/list', version='1.2')
  File "/databricks/driver/src/databricks-migration/dbclient/dbclient.py", line 60, in get
    raise Exception("Error. GET request failed with code {}\n{}".format(http_status_code, raw_results.text))
Exception: Error. GET request failed with code 400
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 400 </title>
</head>
<body>
<h2>HTTP ERROR: 400</h2>
<p>Problem accessing /api/1.2/libraries/list. Reason:
<pre>    The V1 APIs for clusters and libraries are disabled. Please use the V2 APIs.</pre></p>
<hr />
</body>
</html>```

add documentation on data migration options

Add support for different file types for notebooks

We support different export types that could help reduce export / import time for notebooks.
https://docs.databricks.com/dev-tools/api/latest/workspace.html#exportformat

Supporting source notebook types removes output from cells and reduces total size of objects.

Add support for cluster policies

Add support for importing and exporting cluster policies.

This includes:

Cluster policy defn
Cluster policy ACLs.

Fix ACLs for User Migrations with Case Sensitivity Changes

During a migration, if the email case sensitivity has changed, then the users defined in the ACLs need to be updated.

Cluster ACLs
Job ACLs
Notebook ACLs

Review all ACLs to verify that this is updated.

import issue

Tried pulling the changed code and tried re-importing the issue DDL.

But looks like similar error.

n\nDuring handling of the above exception, another exception occurred:\n\nParseException Traceback (most recent call last)\n in \n 2 )\n 3 USING delta\n----> 4 LOCATION 'dbfs:/mnt/Test-databricks/use_cases/tp_reservoir/dev/dtc_monitor/databricks_output/xentry_valid_readouts_clustered' """)\n\n/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)\n 702 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]\n 703 """\n--> 704 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)\n 705 \n 706 @SInCE(2.0)\n\n/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)\n 1303 answer = self.gateway_client.send_command(command)\n 1304 return_value = get_return_value(\n-> 1305 answer, self.gateway_client, self.target_id, self.name)\n 1306 \n 1307 for temp_arg in temp_args:\n\n/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)\n 100 converted = convert_exception(e.java_exception)\n 101 if not isinstance(converted, UnknownException):\n--> 102 raise converted\n 103 else:\n 104 raise\n\nParseException: \nno viable alternative at input 'CREATE TABLE dtc_monitor.xentry_valid_readouts_clustered (\n )'(line 2, pos 2)\n\n== SQL ==\n CREATE TABLE dtc_monitor.xentry_valid_readouts_clustered (\n )\n--^^^\nUSING delta\nLOCATION 'dbfs:/mnt/Test-databricks/use_cases/tp_reservoir/dev/dtc_monitor/databricks_output/xentry_valid_readouts_clustered''}
Complete Metastore Import Time: 0:06:29.123450

Error when exporting metastore

I am getting an error when trying to export metastore for a database with tables that point to files on Azure Storage account.

Here is the error.

Table: account_diagnosis_related_group_fact
post: https://eastus.azuredatabricks.net/api/1.2/commands/execute
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Logging failure

Within a notebook, I can run the same statement that I find in the failed_metastore.log using the same spark cluster and I don't get an error.

Here is the python statement I can run successfully:

spark.sql("show create table dw.account_diagnosis_related_group_fact").collect()[0][0]

Out[2]: "CREATE TABLE dw.account_diagnosis_related_group_fact (\n account_diagnosis_related_group_dimension_key BIGINT,\n effective_from_date DATE,\n effective_to_date DATE,\n tenant_key BIGINT,\n account_dimension_key BIGINT,\n account_key BIGINT,\n diagnosis_related_group_dimension_key BIGINT,\n diagnosis_related_group_code STRING,\n relationship_type_code_key BIGINT,\n relationship_type_code STRING,\n source_code_key BIGINT,\n source_code STRING,\n diagnosis_related_group_condition_code_key BIGINT,\n diagnosis_related_group_condition_code STRING,\n diagnosis_related_group_condition_description STRING,\n diagnosis_related_group_length_of_stay_days_count INT,\n diagnosis_related_group_qualifier_code_key BIGINT,\n diagnosis_related_group_qualifier_code STRING,\n diagnosis_related_group_qualifier_description STRING,\n illness_severity_class_code_key BIGINT,\n illness_severity_class_code STRING,\n illness_severity_class_description STRING,\n mortality_risk_class_code_key BIGINT,\n mortality_risk_class_code STRING,\n mortality_risk_class_description STRING,\n arithmetic_average_length_of_stay DECIMAL(5,2),\n geometric_average_length_of_stay DECIMAL(5,2),\n relative_weighting_factor DECIMAL(18,4),\n diagnosis_related_group_sequence BIGINT,\n diagnosis_related_group_billing_indicator INT,\n account_diagnosis_related_group_count INT,\n document_key BIGINT,\n document_dimension_key BIGINT,\n diagnosis_related_group_comparison_indicator INT)\nUSING delta\nLOCATION 'abfss://[email protected]/edw/prod/dw/account_diagnosis_related_group_fact.delta'\n"

databases migrated failed to inherit path / location from source workspace

Source :

destination

Add exclude database option from metastore export

Could you please add an option to exclude database from the metastore export. For example:

--exclude-database DATABASE Database name to be excluded from metastore export. Single database name supported.

A more friendly option would be to exclude a list of command delisted database

--exclude-database DATABASE Database name(s) to be excluded from metastore export. Command delimited database names supported.

Export Metadata doesn't work if Metastore work cluster doesn't have access

How the current code is implemented when exporting data from metastore if the cluster json spec used in the data/ folder does not have a instance profile that has access to the S3 buckets that are defined for the tables it will fail to get the ddl for those tables.

Import metastore extract onto external SQL server.

Hi Team,

Kindly let us know if there is a way to dump the extracted metadata of Databricks workspace onto the SQL data base.

Is there any utility just like how we imported and exported the metastore using python script?

Export databases but no tables

Is possible to export just the database names in a workspace without the table definitions? I am asking because some databases/tables point to Data Lake Storage Gen2 the cluster and I get an error when trying to export a table that is located on the Data Lake Storage Gen2 even though Pass Through Credentials are enables.
The error I get is
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token

Alternatively, is there a way to query the databases that exist in advance in order to pick which databases to export. I checked the Databricks CLI but I don't think it has a metadata (metastore) list function.

Basically, I only care about tables that are stored in DBFS but not on the Azure Data Lake Storage

Import users without notifications?

Hello! Is it possible to import users into a new workspace without notifying the users? Thank you.

Add paging support for large metastore DDL

Need to add paging support for large DDLs.

For files > 1kb, upload to DBFS /tmp/ locations, read the file locally within python, then replay via SparkSQL on the cluster.
Files < 1kb, can replay the DDL string directly via the execution context.

Clarification as to what "migration" means for this project

Due to the cost factor of Databricks, I've been asked to figure out how to migrate off of the platform. I'm unclear what migration means in the context of this project. Is it migration from one Databricks environment to another or is it migration off of the platform?

Add MLFlow Import / Export

Add MLFlow experiments / models for import / export.

Metastore Export Does Not Migrate Comments / Properties

Metastore export should support table comments or table properties. This does not migrate today with the show create table output.

Support adding a list for cluster names to be imported into the new workspace

I have a customer who only wants to migrate a subset of existing clusters from old env to new env.

Remove Admin User from PVC

Remove admin user from PVC user export, it will fail on import with a user already exists error.

--reset-exports arg defaults to true

When running from PowerShell with the following options (i.e. not specifying --reset-exports explicitly)

python $localFolder/db-migration-master/export_db.py --azure --metastore --silent

The script will always ask for input on clearing the export folder. This means when trying to automate this is not possible through PowerShell (In this case this is for running through a cd pipeline).

It would be good to be able to explicitly set the --reset-exports to false, or it defaults to false correctly.

Add Cluster ACLs to Export / Import

Use the permissions API to add cluster ACLs

Import Specific List of Users

Support adding a whitelist for users to be imported into the new env

Add support for spark 3.0

When using export with a specific cluster using the --cluster_name argument and the cluster is spark 3.0 then an error is generated as shown below. This was tested on Azure Databricks. Cluster used is : 7.0 (includes Apache Spark 3.0.0, Scala 2.12)

python3 ./export_db.py --azure --metastore --cluster-name export --profile ws-databricks

ERROR:
AttributeError: databaseName
{"resultType": "error", "summary": "<span class="ansi-red-fg">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}

Traceback (most recent call last):
File "./export_db.py", line 151, in
main()
File "./export_db.py", line 137, in main
hive_c.export_hive_metastore(cluster_name=args.cluster_name)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, metastore_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases
raise ValueError("Cannot identify number of databases due to the above error")
ValueError: Cannot identify number of databases due to the above error

Support Group Migrations

Have a method to export / import groups into the workspace instead of individual users.

Add support for moving Instance Profiles across AWS accounts

Support Account ID Updates in Instance Profiles.

Instance profiles show up in the following logs:

Group / Roles
User Roles
Cluster policies
Cluster configs
Instance profiles
Job configs (for new cluster options)

Add Secrets Import

Add secrets import

Add support to download DBFS files locally

For example using databricks CLI one can download a file or a directory as shown below:

databricks fs cp -r dbfs:/user/hive/warehouse ./mylocalDir

Export / Import of Clusters using Policies

Verify that the policy id changes are supported when migrating clusters via cluster policies.

SSL Verification Error

Some deployments of Databricks need verify=False

Support import_home

Add complementary option to import a single user home directory that was export.

Error on import hive metastore when table already exists

When doing an import and the table already exists then an error is generated. The behavior on import of users is different. It automatically creates the users even if they already exist. Could the same be done for the megastore import and other exported items.

Missing space after the keyword USING in DDL

Hello , I recently noticed that a space is missing after the USING keyword in the DDL , below is an example.

USINGorg.apache.spark.sql.parquet
OPTIONS (
  path 'dbfs:/path'
)
PARTITIONED BY (column)

because of the missing space , the DDL execution fails. Can you please help us with this ?

Add Azure Library Export

Investigate how to export and log Azure libraries in their workspace.
We currently log the workspace libraries that are defined via the workspace export.

user import in azure fails with key error.

user migration:

Export succeeded

python3 export_db.py --profile primary --users --azure

Import failed:

python3 import_db.py --profile secondary --users --azure

Traceback (most recent call last):
File "import_db.py", line 180, in
main()
File "import_db.py", line 41, in main
scim_c.import_all_users_and_groups()
File "/home/vivek/db-migration/dbclient/ScimClient.py", line 402, in import_all_users_and_groups
self.import_groups(group_dir)
File "/home/vivek/db-migration/dbclient/ScimClient.py", line 379, in import_groups
member_id_list.append(current_group_ids[m['display']])
KeyError: '9db832f5-bb53-4cc7-8250-912e676046d1'

Issues while migrating Metastore from old databricks workspace to the new one.

Hi Team,

We tried to follow the document steps for migrating the metastore from our existing workspace on to the new databricks workspace.

Initially when we executed the script, it was successful - but we noticed that partial database and table information were migrated from old workspace to the new one.

We tried to re-migrate the second time - but we ended up facing issues while executing the export.py script with below error.

Creating remote Spark Session
post: https://westeurope.azuredatabricks.net/api/1.2/contexts/create
post: https://westeurope.azuredatabricks.net/api/1.2/commands/execute
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
ERROR:
AttributeError: databaseName
Traceback (most recent call last):
File "export_db.py", line 132, in
main()
File "export_db.py", line 126, in main
hive_c.export_hive_metastore()
File "C:\migration-tools\db-migration-master\dbclient\HiveClient.py", line 161, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, ms_dir)
File "C:\migration-tools\db-migration-master\dbclient\HiveClient.py", line 108, in log_all_databases
num_of_dbs = ast.literal_eval(results['data'])
KeyError: 'data'

C:\Users\dsvm\Desktop\db-migration-master>python export_db.py --metastore --azure --profile tpold
Traceback (most recent call last):
File "export_db.py", line 135, in
main()
File "export_db.py", line 25, in main
token = login_args['token']
KeyError: 'token'

add support for managed table data migration options.

add support to export tables with datasource as root dbfs (/usr/hive/db location)

Add Notebook / Directory ACLs

Add permissions / ACLs export and import for notebooks and directories

Notebook ids and directory ids change with the migration of the workspace.
Need to define a lookup table to find the new ids quickly.

Error when import into azure

Export for azure created a folder called azure_logs, however, the import is looking for a folder called logs.
Here is an example:
% python3 import_db.py --azure --workspace
Import the complete workspace at 2020-09-01 12:59:00.020444
Import on https://adb-5463815377663355.15.azuredatabricks.net
Traceback (most recent call last):
File "import_db.py", line 123, in
main()
File "import_db.py", line 49, in main
ws_c.import_all_workspace_items(archive_missing=False)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/WorkspaceClient.py", line 364, in import_all_workspace_items
num_exported_users = self.get_num_of_saved_users(src_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/WorkspaceClient.py", line 82, in get_num_of_saved_users
ls = os.listdir(user_home_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/artifacts/Users'

The workaround for me was to rename azure_logs to logs. Should the default for azure import use the same as the export directory as a quick fix?

Add best effort support for libs import

There are some gaps with the 1.2 Rest API library commands. We can do a best effort re-creation of the workspace libs using the support APIs.

Export / Import Cluster Policies

Add cluster policies to cluster client with permissions / ACLs.

Add ability to specify the logs destination folder

Could you add the ability to specify the destination folder instead of always being azure_logs?

DDL extraction issue with control characters

Hello

Just found some inconsitencey while extracting a DDL that make use of control characters.

Below was the DDL that as used to create the table

	CREATE EXTERNAL TABLE db.tab
	(
	h_ls_hash array<string>,
	ls_id STRING,
	bin array<STRING>,
	Class1  array<int>,
	Class1_valueString  array<string>,
	Class1_valueFrom  array<float>,
	Class1_valueTo  array<float>,
	Class2  array<int>,
	Class2_valueString  array<string>,
	Class2_valueFrom  array<float>,
	Class2_valueTo  array<float>,
	Class3  array<int>,
	Class3_valueString  array<string>,
	Class3_valueFrom  array<float>,
	Class3_valueTo  array<float>,
	load_ts timestamp COMMENT 'EN: load timestamp | DE: Zeitstempel für das Laden des Datensatzes',
	record_source string COMMENT 'EN: Source Name | DE: Quellenname'
	)
	ROW FORMAT DELIMITED
	FIELDS TERMINATED BY '\u0001'
	STORED AS TEXTFILE
	location 'dbfs:/loc';

but when the DDL's were extracted using the migration tool it resulted in the below.

<div class="ansiout">CREATE EXTERNAL TABLE `db`.`tab`(`h_ls_hash` ARRAY&lt;STRING&gt;, `ls_id` STRING, `bin` ARRAY&lt;STRING&gt;, `Class1` ARRAY&lt;INT&gt;, `Class1_valueString` ARRAY&lt;STRING&gt;, `Class1_valueFrom` ARRAY&lt;FLOAT&gt;, `Class1_valueTo` ARRAY&lt;FLOAT&gt;, `Class2` ARRAY&lt;INT&gt;, `Class2_valueString` ARRAY&lt;STRING&gt;, `Class2_valueFrom` ARRAY&lt;FLOAT&gt;, `Class2_valueTo` ARRAY&lt;FLOAT&gt;, `Class3` ARRAY&lt;INT&gt;, `Class3_valueString` ARRAY&lt;STRING&gt;, `Class3_valueFrom` ARRAY&lt;FLOAT&gt;, `Class3_valueTo` ARRAY&lt;FLOAT&gt;, `load_ts` TIMESTAMP COMMENT &#39;EN: load timestamp | DE: Zeitstempel für das Laden des Datensatzes&#39;, `record_source` STRING COMMENT &#39;EN: Source Name | DE: Quellenname&#39;)
ROW FORMAT SERDE &#39;org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe&#39;
WITH SERDEPROPERTIES (
  &#39;field.delim&#39; = &#39;�&#39;,
  &#39;serialization.format&#39; = &#39;�&#39;
)
STORED AS
  INPUTFORMAT &#39;org.apache.hadoop.mapred.TextInputFormat&#39;
  OUTPUTFORMAT &#39;org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat&#39;
LOCATION &#39;dbfs:/loc&#39;
TBLPROPERTIES (
  &#39;transient_lastDdlTime&#39; = &#39;1606221712&#39;
)

</div>

Can someone help me on this ?

Add Secrets Export

Export secrets with their ACLs.

Switch for importing instance profile access during user migration

Would like to choose whether or not to retain instance profile access by group when importing users.

export metastore using existing cluster?

I noticed that when doing an export with --metastore it creates a cluster with a name stored in the data/azure_cluster.json. However, I would like it to use an existing cluster in my workspace. Is this possible?

I changed the cluster name and version to a cluster I crated as shown below but I getting an error. Here is the azure_cluster.json file followed by the error.

{
"num_workers": 1,
"cluster_name": "export",
"spark_version": "7.0.x-scala2.12",
"spark_conf": {},
"node_type_id": "Standard_F4s_v2",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"autotermination_minutes": 30,
"init_scripts": []
}

######################## ERROR #######################################

python3 ./export_db.py --azure --metastore --debug --profile ws-databricks

https://adb-5463815377663355.15.azuredatabricks.net dapi6fe373fdb9f18c9613840ebefdccde43
Export the metastore configs at 2020-09-15 17:16:17.354008
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list
Starting export with id 0915-220257-ruins233
post: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/start
Error: Cluster 0915-220257-ruins233 is in unexpected state Running.
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/get
Cluster creation time: 0:00:00.662221
Creating remote Spark Session
post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/contexts/create
post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/execute
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
ERROR:
AttributeError: databaseName
{"resultType": "error", "summary": "<span class="ansi-red-fg">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}

*** WARNING: skipped 23006 bytes of output ***

USING parquet
OPTIONS (
  path 'dbfs:/XX/XXX/XXXXX'
)
PARTITIONED BY (XXX)

Because of the truncating my DDL statement fails . Is there any parameter that can resolve such errors ?

Note: The table has over 700 columns with comments included.

mrchristine / db-migration Goto Github PK

db-migration's People

Contributors

Stargazers

Watchers

Forkers

db-migration's Issues

Recommend Projects

Recommend Topics

Recommend Org