Giter VIP home page Giter VIP logo

db-migration's People

Contributors

christophergrant avatar craigng avatar dmoore247 avatar koernigo avatar mrchristine avatar neil90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

db-migration's Issues

Job ACL Import Bug

When an Admin becomes the migration owner, the old owner must be switch to CAN_MANAGE permissions instead of being the actual owner.

delta and external tables in metastore import fails

We used this tool to migrate delta table. There is issue with delta table import:

  1. There is a bug in migration tool which missed location path in create statement where table creation fails while doing metastore migration. The exact details are highlighted in the screenshots.

The migration tool generates delta import command in below format but it fails

CREATE TABLE events
  USING DELTA

Error:

ERROR:

org.apache.spark.sql.AnalysisException: Cannot create table ('`test_db`.`test_table1`'). The associated location ('dbfs:/mnt/mountdbfs1/ods/test_table1') is not empty.;
{'resultType': 'error', 'summary': 'org.apache.spark.sql.AnalysisException: Cannot create table ('`test_db`.`test_table1`'). The associated location ('dbfs:/mnt/mountdbfs1/ods/test_table1') is not empty.;', 'cause': '---------------------------------------------------------------------------\nPy4JJavaError                             Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)\n     62         try:\n---> 63             return f(*a, **kw)\n     64         except py4j.protocol.Py4JJavaError as e:\n\n/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)\n    327                     "An error occurred while calling {0}{1}{2}.\\n".\n--> 328                     format(target_id, ".", name), value)\n    329             else:\n\nPy4JJavaError: An error occurred while calling o210.sql.\n: org.apache.spark.sql.AnalysisException: Cannot create table (\'`test_db`.`test_table1`\'). The associated location (\'dbfs:/mnt/mountdbfs1/ods/test_table1\') is not empty.;\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.com$databricks$sql$transaction$tahoe$commands$CreateDeltaTableCommand$$assertPathEmpty(CreateDeltaTableCommand.scala:186)\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:136)\n\tat com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:93)\n\tat com.databricks.logging.UsageLogging$$anonfun$recordOperation$1.apply(UsageLogging.scala:428)\n\tat com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)\n\tat scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)\n\tat com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)\n\tat com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)\n\tat com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.logging.UsageLogging$class.recordOperation(UsageLogging.scala:409)\n\tat com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:18)\n\tat com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:55)\n\tat 

The correct command should be

CREATE TABLE events
  USING DELTA
  LOCATION '/mnt/delta/events'
  1. There are also issues with some external tables after migration which have .seq.gz files as extensions which need to be fixed as well. They return empty dataset.

Error exporting Libs on Azure

Using python3 /databricks/driver/src/databricks-migration/export_db.py --profile DEFAULT --azure --libs
Gives error message:

Get: https://eastus2.azuredatabricks.net/api/1.2/libraries/list
Traceback (most recent call last):
  File "/databricks/driver/src/databricks-migration/export_db.py", line 128, in <module>
    main()
  File "/databricks/driver/src/databricks-migration/export_db.py", line 67, in main
    lib_c.log_library_details()
  File "/databricks/driver/src/databricks-migration/dbclient/LibraryClient.py", line 18, in log_library_details
    all_libs = self.get('/libraries/list', version='1.2')
  File "/databricks/driver/src/databricks-migration/dbclient/dbclient.py", line 60, in get
    raise Exception("Error. GET request failed with code {}\n{}".format(http_status_code, raw_results.text))
Exception: Error. GET request failed with code 400
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 400 </title>
</head>
<body>
<h2>HTTP ERROR: 400</h2>
<p>Problem accessing /api/1.2/libraries/list. Reason:
<pre>    The V1 APIs for clusters and libraries are disabled. Please use the V2 APIs.</pre></p>
<hr />
</body>
</html>```

import issue

Tried pulling the changed code and tried re-importing the issue DDL.

But looks like similar error.

n\nDuring handling of the above exception, another exception occurred:\n\nParseException Traceback (most recent call last)\n in \n 2 )\n 3 USING delta\n----> 4 LOCATION 'dbfs:/mnt/Test-databricks/use_cases/tp_reservoir/dev/dtc_monitor/databricks_output/xentry_valid_readouts_clustered' """)\n\n/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)\n 702 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]\n 703 """\n--> 704 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)\n 705 \n 706 @SInCE(2.0)\n\n/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)\n 1303 answer = self.gateway_client.send_command(command)\n 1304 return_value = get_return_value(\n-> 1305 answer, self.gateway_client, self.target_id, self.name)\n 1306 \n 1307 for temp_arg in temp_args:\n\n/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)\n 100 converted = convert_exception(e.java_exception)\n 101 if not isinstance(converted, UnknownException):\n--> 102 raise converted\n 103 else:\n 104 raise\n\nParseException: \nno viable alternative at input 'CREATE TABLE dtc_monitor.xentry_valid_readouts_clustered (\n )'(line 2, pos 2)\n\n== SQL ==\n CREATE TABLE dtc_monitor.xentry_valid_readouts_clustered (\n )\n--^^^\nUSING delta\nLOCATION 'dbfs:/mnt/Test-databricks/use_cases/tp_reservoir/dev/dtc_monitor/databricks_output/xentry_valid_readouts_clustered''}
Complete Metastore Import Time: 0:06:29.123450

Error when exporting metastore

I am getting an error when trying to export metastore for a database with tables that point to files on Azure Storage account.

Here is the error.

Table: account_diagnosis_related_group_fact
post: https://eastus.azuredatabricks.net/api/1.2/commands/execute
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
Get: https://eastus.azuredatabricks.net/api/1.2/commands/status
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Logging failure

Within a notebook, I can run the same statement that I find in the failed_metastore.log using the same spark cluster and I don't get an error.

Here is the python statement I can run successfully:

spark.sql("show create table dw.account_diagnosis_related_group_fact").collect()[0][0]

Out[2]: "CREATE TABLE dw.account_diagnosis_related_group_fact (\n account_diagnosis_related_group_dimension_key BIGINT,\n effective_from_date DATE,\n effective_to_date DATE,\n tenant_key BIGINT,\n account_dimension_key BIGINT,\n account_key BIGINT,\n diagnosis_related_group_dimension_key BIGINT,\n diagnosis_related_group_code STRING,\n relationship_type_code_key BIGINT,\n relationship_type_code STRING,\n source_code_key BIGINT,\n source_code STRING,\n diagnosis_related_group_condition_code_key BIGINT,\n diagnosis_related_group_condition_code STRING,\n diagnosis_related_group_condition_description STRING,\n diagnosis_related_group_length_of_stay_days_count INT,\n diagnosis_related_group_qualifier_code_key BIGINT,\n diagnosis_related_group_qualifier_code STRING,\n diagnosis_related_group_qualifier_description STRING,\n illness_severity_class_code_key BIGINT,\n illness_severity_class_code STRING,\n illness_severity_class_description STRING,\n mortality_risk_class_code_key BIGINT,\n mortality_risk_class_code STRING,\n mortality_risk_class_description STRING,\n arithmetic_average_length_of_stay DECIMAL(5,2),\n geometric_average_length_of_stay DECIMAL(5,2),\n relative_weighting_factor DECIMAL(18,4),\n diagnosis_related_group_sequence BIGINT,\n diagnosis_related_group_billing_indicator INT,\n account_diagnosis_related_group_count INT,\n document_key BIGINT,\n document_dimension_key BIGINT,\n diagnosis_related_group_comparison_indicator INT)\nUSING delta\nLOCATION 'abfss://[email protected]/edw/prod/dw/account_diagnosis_related_group_fact.delta'\n"

Add exclude database option from metastore export

Could you please add an option to exclude database from the metastore export. For example:

--exclude-database DATABASE Database name to be excluded from metastore export. Single database name supported.

A more friendly option would be to exclude a list of command delisted database

--exclude-database DATABASE Database name(s) to be excluded from metastore export. Command delimited database names supported.

Import metastore extract onto external SQL server.

Hi Team,

Kindly let us know if there is a way to dump the extracted metadata of Databricks workspace onto the SQL data base.

Is there any utility just like how we imported and exported the metastore using python script?

Export databases but no tables

Is possible to export just the database names in a workspace without the table definitions? I am asking because some databases/tables point to Data Lake Storage Gen2 the cluster and I get an error when trying to export a table that is located on the Data Lake Storage Gen2 even though Pass Through Credentials are enables.
The error I get is
ERROR:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token

Alternatively, is there a way to query the databases that exist in advance in order to pick which databases to export. I checked the Databricks CLI but I don't think it has a metadata (metastore) list function.

Basically, I only care about tables that are stored in DBFS but not on the Azure Data Lake Storage

Add paging support for large metastore DDL

Need to add paging support for large DDLs.

For files > 1kb, upload to DBFS /tmp/ locations, read the file locally within python, then replay via SparkSQL on the cluster.
Files < 1kb, can replay the DDL string directly via the execution context.

Clarification as to what "migration" means for this project

Due to the cost factor of Databricks, I've been asked to figure out how to migrate off of the platform. I'm unclear what migration means in the context of this project. Is it migration from one Databricks environment to another or is it migration off of the platform?

--reset-exports arg defaults to true

When running from PowerShell with the following options (i.e. not specifying --reset-exports explicitly)

python $localFolder/db-migration-master/export_db.py --azure --metastore --silent

The script will always ask for input on clearing the export folder. This means when trying to automate this is not possible through PowerShell (In this case this is for running through a cd pipeline).

image

It would be good to be able to explicitly set the --reset-exports to false, or it defaults to false correctly.

Add support for spark 3.0

When using export with a specific cluster using the --cluster_name argument and the cluster is spark 3.0 then an error is generated as shown below. This was tested on Azure Databricks. Cluster used is : 7.0 (includes Apache Spark 3.0.0, Scala 2.12)

python3 ./export_db.py --azure --metastore --cluster-name export --profile ws-databricks

ERROR:
AttributeError: databaseName
{"resultType": "error", "summary": "<span class="ansi-red-fg">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}

Traceback (most recent call last):
File "./export_db.py", line 151, in
main()
File "./export_db.py", line 137, in main
hive_c.export_hive_metastore(cluster_name=args.cluster_name)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, metastore_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases
raise ValueError("Cannot identify number of databases due to the above error")
ValueError: Cannot identify number of databases due to the above error

Support import_home

Add complementary option to import a single user home directory that was export.

Error on import hive metastore when table already exists

When doing an import and the table already exists then an error is generated. The behavior on import of users is different. It automatically creates the users even if they already exist. Could the same be done for the megastore import and other exported items.

Missing space after the keyword USING in DDL

Hello , I recently noticed that a space is missing after the USING keyword in the DDL , below is an example.

USINGorg.apache.spark.sql.parquet
OPTIONS (
  path 'dbfs:/path'
)
PARTITIONED BY (column)

because of the missing space , the DDL execution fails. Can you please help us with this ?

Add Azure Library Export

Investigate how to export and log Azure libraries in their workspace.
We currently log the workspace libraries that are defined via the workspace export.

user import in azure fails with key error.

user migration:

Export succeeded

python3 export_db.py --profile primary --users --azure

Import failed:

python3 import_db.py --profile secondary --users --azure

Traceback (most recent call last):
File "import_db.py", line 180, in
main()
File "import_db.py", line 41, in main
scim_c.import_all_users_and_groups()
File "/home/vivek/db-migration/dbclient/ScimClient.py", line 402, in import_all_users_and_groups
self.import_groups(group_dir)
File "/home/vivek/db-migration/dbclient/ScimClient.py", line 379, in import_groups
member_id_list.append(current_group_ids[m['display']])
KeyError: '9db832f5-bb53-4cc7-8250-912e676046d1'

Issues while migrating Metastore from old databricks workspace to the new one.

Hi Team,

We tried to follow the document steps for migrating the metastore from our existing workspace on to the new databricks workspace.

Initially when we executed the script, it was successful - but we noticed that partial database and table information were migrated from old workspace to the new one.

We tried to re-migrate the second time - but we ended up facing issues while executing the export.py script with below error.

Creating remote Spark Session
post: https://westeurope.azuredatabricks.net/api/1.2/contexts/create
post: https://westeurope.azuredatabricks.net/api/1.2/commands/execute
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
Get: https://westeurope.azuredatabricks.net/api/1.2/commands/status
ERROR:
AttributeError: databaseName
Traceback (most recent call last):
File "export_db.py", line 132, in
main()
File "export_db.py", line 126, in main
hive_c.export_hive_metastore()
File "C:\migration-tools\db-migration-master\dbclient\HiveClient.py", line 161, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, ms_dir)
File "C:\migration-tools\db-migration-master\dbclient\HiveClient.py", line 108, in log_all_databases
num_of_dbs = ast.literal_eval(results['data'])
KeyError: 'data'

C:\Users\dsvm\Desktop\db-migration-master>python export_db.py --metastore --azure --profile tpold
Traceback (most recent call last):
File "export_db.py", line 135, in
main()
File "export_db.py", line 25, in main
token = login_args['token']
KeyError: 'token'

Add Notebook / Directory ACLs

Add permissions / ACLs export and import for notebooks and directories

Notebook ids and directory ids change with the migration of the workspace.
Need to define a lookup table to find the new ids quickly.

Error when import into azure

Export for azure created a folder called azure_logs, however, the import is looking for a folder called logs.
Here is an example:
% python3 import_db.py --azure --workspace
Import the complete workspace at 2020-09-01 12:59:00.020444
Import on https://adb-5463815377663355.15.azuredatabricks.net
Traceback (most recent call last):
File "import_db.py", line 123, in
main()
File "import_db.py", line 49, in main
ws_c.import_all_workspace_items(archive_missing=False)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/WorkspaceClient.py", line 364, in import_all_workspace_items
num_exported_users = self.get_num_of_saved_users(src_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/WorkspaceClient.py", line 82, in get_num_of_saved_users
ls = os.listdir(user_home_dir)
FileNotFoundError: [Errno 2] No such file or directory: 'logs/artifacts/Users'

The workaround for me was to rename azure_logs to logs. Should the default for azure import use the same as the export directory as a quick fix?

DDL extraction issue with control characters

Hello

Just found some inconsitencey while extracting a DDL that make use of control characters.

Below was the DDL that as used to create the table

	CREATE EXTERNAL TABLE db.tab
	(
	h_ls_hash array<string>,
	ls_id STRING,
	bin array<STRING>,
	Class1  array<int>,
	Class1_valueString  array<string>,
	Class1_valueFrom  array<float>,
	Class1_valueTo  array<float>,
	Class2  array<int>,
	Class2_valueString  array<string>,
	Class2_valueFrom  array<float>,
	Class2_valueTo  array<float>,
	Class3  array<int>,
	Class3_valueString  array<string>,
	Class3_valueFrom  array<float>,
	Class3_valueTo  array<float>,
	load_ts timestamp COMMENT 'EN: load timestamp | DE: Zeitstempel für das Laden des Datensatzes',
	record_source string COMMENT 'EN: Source Name | DE: Quellenname'
	)
	ROW FORMAT DELIMITED
	FIELDS TERMINATED BY '\u0001'
	STORED AS TEXTFILE
	location 'dbfs:/loc';

but when the DDL's were extracted using the migration tool it resulted in the below.

<div class="ansiout">CREATE EXTERNAL TABLE `db`.`tab`(`h_ls_hash` ARRAY&lt;STRING&gt;, `ls_id` STRING, `bin` ARRAY&lt;STRING&gt;, `Class1` ARRAY&lt;INT&gt;, `Class1_valueString` ARRAY&lt;STRING&gt;, `Class1_valueFrom` ARRAY&lt;FLOAT&gt;, `Class1_valueTo` ARRAY&lt;FLOAT&gt;, `Class2` ARRAY&lt;INT&gt;, `Class2_valueString` ARRAY&lt;STRING&gt;, `Class2_valueFrom` ARRAY&lt;FLOAT&gt;, `Class2_valueTo` ARRAY&lt;FLOAT&gt;, `Class3` ARRAY&lt;INT&gt;, `Class3_valueString` ARRAY&lt;STRING&gt;, `Class3_valueFrom` ARRAY&lt;FLOAT&gt;, `Class3_valueTo` ARRAY&lt;FLOAT&gt;, `load_ts` TIMESTAMP COMMENT &#39;EN: load timestamp | DE: Zeitstempel für das Laden des Datensatzes&#39;, `record_source` STRING COMMENT &#39;EN: Source Name | DE: Quellenname&#39;)
ROW FORMAT SERDE &#39;org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe&#39;
WITH SERDEPROPERTIES (
  &#39;field.delim&#39; = &#39;�&#39;,
  &#39;serialization.format&#39; = &#39;�&#39;
)
STORED AS
  INPUTFORMAT &#39;org.apache.hadoop.mapred.TextInputFormat&#39;
  OUTPUTFORMAT &#39;org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat&#39;
LOCATION &#39;dbfs:/loc&#39;
TBLPROPERTIES (
  &#39;transient_lastDdlTime&#39; = &#39;1606221712&#39;
)

</div>

Can someone help me on this ?

export metastore using existing cluster?

I noticed that when doing an export with --metastore it creates a cluster with a name stored in the data/azure_cluster.json. However, I would like it to use an existing cluster in my workspace. Is this possible?

I changed the cluster name and version to a cluster I crated as shown below but I getting an error. Here is the azure_cluster.json file followed by the error.

{
"num_workers": 1,
"cluster_name": "export",
"spark_version": "7.0.x-scala2.12",
"spark_conf": {},
"node_type_id": "Standard_F4s_v2",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"autotermination_minutes": 30,
"init_scripts": []
}

######################## ERROR #######################################

python3 ./export_db.py --azure --metastore --debug --profile ws-databricks

https://adb-5463815377663355.15.azuredatabricks.net dapi6fe373fdb9f18c9613840ebefdccde43
Export the metastore configs at 2020-09-15 17:16:17.354008
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list
Starting export with id 0915-220257-ruins233
post: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/start
Error: Cluster 0915-220257-ruins233 is in unexpected state Running.
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/get
Cluster creation time: 0:00:00.662221
Creating remote Spark Session
post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/contexts/create
post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/execute
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status
ERROR:
AttributeError: databaseName
{"resultType": "error", "summary": "<span class="ansi-red-fg">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}

Traceback (most recent call last):
File "./export_db.py", line 151, in
main()
File "./export_db.py", line 137, in main
hive_c.export_hive_metastore(cluster_name=args.cluster_name)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, metastore_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases
raise ValueError("Cannot identify number of databases due to the above error")
ValueError: Cannot identify number of databases due to the above error
PS /Users/saldroubi/Dropbox/git/db-migration>

Add Job ACLs and Job Pause Support

Use permissions api endpoint for job ACLs and re-import them on the new workspace.

Scheduled jobs will be imported in the PAUSED state to reduce impact on the current workspace with scheduled jobs.
Both export / import tools support pausing all jobs and un-pausing all jobs to satisfy a switchover for each environment.

DDL's getting truncated when metadata is extrated.

Hello,

I am using this utility to do a dump of all the DDL from the databricks cluster. What i observed is that when the DDL are huge the DDL statement gets truncated.

*** WARNING: skipped 23006 bytes of output ***

USING parquet
OPTIONS (
  path 'dbfs:/XX/XXX/XXXXX'
)
PARTITIONED BY (XXX)

Because of the truncating my DDL statement fails . Is there any parameter that can resolve such errors ?

Note: The table has over 700 columns with comments included.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.