Del via


Load data using COPY INTO with Unity Catalog volumes or external locations

This article describes how to use the COPY INTO command to load data from an Azure Data Lake Storage (ADLS) container in your Azure account into a table in Databricks SQL.

The steps in this article assume that your admin has configured a Unity Catalog volume or external location so that you can access your source files in ADLS. If your admin configured a compute resource to use a service principal, see Load data using COPY INTO with a service principal or Tutorial: COPY INTO with Spark SQL instead. If your admin gave you temporary credentials (a Blob SAS token), follow the steps in Load data using COPY INTO with temporary credentials instead.

Databricks recommends using volumes to access files in cloud storage as part of the ingestion process using COPY INTO. For more information about recommendations for using volumes and external locations, see Unity Catalog best practices.

Before you begin

Before you use COPY INTO to load data from a Unity Catalog volume or from a cloud object storage path that's defined as a Unity Catalog external location, you must have the following:

  • The READ VOLUME privilege on a volume or the READ FILES privilege on an external location. For more information about creating volumes, see What are Unity Catalog volumes?. For more information about creating external locations, see Overview of external locations.

  • The path to your source data in the form of a cloud object storage URL or a volume path.

    Example cloud object storage URL: abfss://container@storageAccount.dfs.core.windows.net/raw-data/json.

    Example volume path: /Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json.

  • The USE SCHEMA privilege on the schema that contains the target table.

  • The USE CATALOG privilege on the parent catalog.

For more information about Unity Catalog privileges, see Unity Catalog privileges reference.

Load data from a volume

To load data from a Unity Catalog volume, you must have the READ VOLUME privilege. Volume privileges apply to all nested directories under the specified volume.

For example, if you have access to a volume with the path /Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/, the following commands are valid:

COPY INTO landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json'
FILEFORMAT = JSON;

Optionally, you can also use a volume path with the dbfs scheme. For example, the following commands are also valid:

COPY INTO landing_table
FROM 'dbfs:/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM 'dbfs:/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json'
FILEFORMAT = JSON;

Load data using an external location

Note

Databricks recommends using volumes to access files in cloud storage. External locations are supported but are not the recommended approach. See Managed and external tables.

The following example loads data from ADLS into a table using Unity Catalog external locations to provide access to the source data.

COPY INTO my_json_data
FROM 'abfss://container@storageAccount.dfs.core.windows.net/jsonData'
FILEFORMAT = JSON;

External location privilege inheritance

External location privileges apply to all nested directories under the specified location.

For example, if you have access to an external location defined with the URL abfss://container@storageAccount.dfs.core.windows.net/raw-data, the following commands are valid:

COPY INTO landing_table
FROM 'abfss://container@storageAccount.dfs.core.windows.net/raw-data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM 'abfss://container@storageAccount.dfs.core.windows.net/raw-data/json'
FILEFORMAT = JSON;

Permissions on this external location do not grant any privileges on directories above or parallel to the location specified. For example, neither of the following commands are valid:

COPY INTO parent_table
FROM 'abfss://container@storageAccount.dfs.core.windows.net'
FILEFORMAT = PARQUET;

COPY INTO sibling_table
FROM 'abfss://container@storageAccount.dfs.core.windows.net/json-data'
FILEFORMAT = JSON;

Three-level namespace for target tables

You can target a Unity Catalog table using a three-level namespace (<catalog_name>.<schema_name>.<table_name>). Use the USE CATALOG <catalog_name> and USE SCHEMA <schema_name> commands to set the default catalog and schema for your current query or notebook.

The following example loads data from a volume into a table using the three-level namespace:

COPY INTO quickstart_catalog.quickstart_schema.landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

You can also set the default catalog and schema first:

USE CATALOG quickstart_catalog;
USE SCHEMA quickstart_schema;

COPY INTO landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;