Discover databrickslabs/ucx Open Source project

Your best companion for upgrading to Unity Catalog. It helps you to upgrade all Databricks workspace assets: Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries, SQL Alerts, Token and Password usage permissions that are set on the workspace level, Secret scopes, Notebooks, Directories, Repos, Files.

See contributing instructions to help improve this project.

Introduction

UCX will guide you, the Databricks customer, through the process of upgrading your account, groups, workspaces, jobs etc. to Unity Catalog.

The upgrade process will first install code, libraries, and workflows into your workspace.
After installation, you will run a series of workflows and examine the output.

UCX leverages Databricks Lakehouse platform to upgrade itself. The upgrade process includes creating jobs, notebooks, and deploying code and configuration files.

By running the installation you install the assessment job and several upgrade jobs. The assessment and upgrade jobs are outlined in the custom-generated README.py that is created by the installer.

The custom-generated README.py, config.yaml, and other assets are placed into your Databricks workspace home folder, into a subfolder named .ucx. See interactive tutorial.

Once the custom Databricks jobs are installed, begin by triggering the assessment job. The assessment job can be found under your workflows or via the active link in the README.py. Once the assessment job is complete, you can review the results in the custom-generated Databricks dashboard (linked to by the custom README.py found in the workspace folder created for you).

You will need an account, unity catalog, and workspace administrative authority to complete the upgrade process. To run the installer, you will need to setup databricks-cli and a credential, following these instructions. Additionally, the interim metadata and config data being processed by UCX will be stored into a Hive Metastore database schema generated at install time.

For questions, troubleshooting or bug fixes, please see your Databricks account team or submit an issue to the Databricks UCX github repo

Installation

Prerequisites

Get trained on UC [free instructor-led training 2x week] [full training schedule]
You will need a desktop computer, running Windows, MacOS, or Linux; This computer is used to install the UCX toolkit onto the Databricks workspace, the computer will also need:

Network access to your Databricks Workspace
Network access to the Internet to retrieve additional Python packages (e.g. PyYAML, databricks-sdk,...) and access github.com
Python 3.10 or later - Windows instructions
Databricks CLI with a workspace configuration profile for workspace - instructions
Your windows computer will need a shell environment (GitBash or (WSL)

Within the Databricks Workspace you will need:

Workspace administrator access permissions
The ability for the installer to upload Python Wheel files to DBFS and Workspace FileSystem
A PRO or Serverless SQL Warehouse
The Assessment workflow will create a legacy "No Isolation Shared" and a legacy "Table ACL" jobs clusters needed to inventory Hive Metastore Table ACLS
If your Databricks Workspace relies on an external Hive Metastore (such as glue), make sure to read the External HMS Document.

[AWS] [Azure] [GCP] Account level Identity Setup
[AWS] [Azure] [GCP] Unity Catalog Metastore Created (per region)

Download & Install

We only support installations and upgrades through Databricks CLI, as UCX requires an installation script run to make sure all the necessary and correct configurations are in place.

Installing Databricks CLI on macOS

Install Databricks CLI via curl on Windows

Install UCX

Upgrade UCX

Uninstall UCX

Star History

Project Support

Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

databrickslabs/ucx

databrickslabs

Reviews

Repository Details