Edit

Share via


Quickstart: Get started with data cubes in Microsoft Planetary Computer Pro

Prerequisites

Set up ingestion source

Before you can begin to ingest data cube data, you'll need to set up an Ingestion Source, which will serve as your credentials to access the Blob Storage account where your assets and STAC Items are stored. You can set up an Ingestion Source using Managed Identity or SAS Token.

Create a data cube collection

Once your Ingestion Source is set up, you can create a Collection for your data cube assets. Steps to create a collection can be followed in Create a STAC Collection with Microsoft Planetary Computer Pro using Python.

Ingest data cube assets

The initiation of the ingestion process for data cube data, and other data types, can be followed in Ingestion Overview. As described in Data Cube Overview, however, ingestion is the step in Planetary Computer Pro's data handling that differs for these file types. While GRIB2 data and associated STAC Items are ingested just like any other two-dimensional raster file, NetCDF and HDF5 assets undergo further data enrichment. The generation of Kerchunk Manifests is documented in Data Cube Overview, but what is important to note is that Kerchunk assets will be added to your Blob Storage container alongside the original assets, and an additional cube:variables field are added to the STAC Item JSON. This is important when rendering these data types in the Planetary Computer Pro Explorer.

Configure a data cube collection

Configuration of your data cube collection is another step that will look slightly different from that of other data types. You can follow the steps described in Configure a collection with the Microsoft Planetary Computer Pro web interface to configure your data cube collection, but you'll need to be aware of the following differences when building your Render Configuration:

Render configuration for NetCDF and HDF5 assets

Remembering that a standard Render Configuration argument in JSON format looks like this:

[
  {
    "id": "prK1950-06-30",
    "name": "prK1950-06-30",
    "type": "raster-tile",
    "options": "assets=pr-kerchunk&subdataset_name=pr&rescale=0,0.01&colormap_name=viridis&datetime=1950-06-30",
    "minZoom": 1
  }
]

The options field is where you'll want to utilize the cloud optimized, Kerchunk asset, as opposed to the original asset listed in the STAC Item. You'll also need to include the subdataset_name argument, which is the name of the variable you want to render.

Render configuration for GRIB2 assets

The options field for the Render Configuration of GRIB2 assets look similar to the previous example, but you won't need to include the subdataset_name argument. This is because GRIB2 data is already optimally structured and referenced via their Index files. The assets argument, in this case, represents the band, or 2D raster layer, you want to render. Below is an example of a GRIB2 Render Configuration:

[ 
 {
    "id": "render-config-1",
    "name": "Mean Zero-Crossing Wave Period",
    "description": "A sample render configuration. Update `options` below.",
    "type": "raster-tile",
    "options": "assets=data&subdataset_bands=1&colormap_name=winter&rescale=0,10",
    "minZoom": 1
 }
]

Render configuration for Zarr assets

The options field for Zarr assets is similar to NetCDF and HDF5, but there are two important requirements:

  1. Select a single time slice for rendering by using sel=time=....
  2. Reduce data to a 2D output before rendering.

If additional dimensions are not collapsed, rendering can fail with errors such as Source data must be 1 band.

You can read more about the sel parameter in Xarray DataArray.sel. The following is a minimal working Zarr render configuration for a single timestep and 2D output:

[
    {
        "id": "era5-zarr-single-time",
        "name": "ERA5 single timestep",
        "type": "raster-tile",
        "options": "assets=data&subdataset_name=precipitation_amount_1hour_Accumulation&sel=time=2024-01-01&sel_method=nearest&colormap_name=viridis&rescale=0,0.01",
        "minZoom": 12
    }
]

Visualize data cube assets in the Explorer

Once your data cube assets are ingested and configured, you can visualize them in the Planetary Computer Pro Explorer. A step-by-step guide for using the Explorer can be followed in Quickstart: Use the Explorer in Microsoft Planetary Computer Pro.

While Microsoft Planetary Computer Pro includes a tiler that can be used to visualize some data cube assets, there are some caveats to note when it comes to each supported data type.

NetCDF and HDF5 visualization

Not all NetCDF datasets that can be ingested into Microsoft Planetary Computer are compatible by the Planetary Computer Pro's visualization tiler. A dataset must have X and Y axes, latitude and longitude coordinates, and spatial dimensions and bounds to be visualized. For example, a dataset in which latitude and longitude are variables, but not coordinates, isn't compatible with Planetary Computer Pro's tiler.

Before attempting to visualize your NetCDF or HDF5 dataset, you can use the following to check whether it meets the requirements.

  1. Install the required dependencies

    pip install xarray[io] rioxarray cf_xarray
    
  2. Run the following function:

    import xarray as xr
    import cf_xarray
    import rioxarray
    
    def is_dataset_visualizable(ds: xr.Dataset):
        """
        Test if the dataset is compatible with the Planetary Computer tiler API.
        Raises an informative error if the dataset is not compatible.
        """
        if not ds.cf.axes:
            raise ValueError("Dataset does not have CF axes")
        if not ds.cf.coordinates:
            raise ValueError("Dataset does not have CF coordinates")
        if not {"X", "Y"} <= ds.cf.axes.keys():
            raise ValueError(f"Dataset must have CF X and Y axes, found: {ds.cf.axes.keys()}")
    
        if not {"latitude", "longitude"} <= ds.cf.coordinates.keys():
            raise ValueError("Dataset must have CF latitude and longitude coordinates, "
                             f"actual: {ds.cf.coordinates.keys()}")
    
        if ds.rio.x_dim is None or ds.rio.y_dim is None:
            raise ValueError("Dataset does not have rioxarray spatial dimensions")
    
        if ds.rio.bounds() is None:
            raise ValueError("Dataset does not have rioxarray bounds")
    
        left, bottom, right, top = ds.rio.bounds()
        if left < -180 or right > 180 or bottom < -90 or top > 90:
            raise ValueError("Dataset bounds are not valid; they must be within [-180, 180] and [-90, 90]")
    
        if ds.rio.resolution() is None:
            raise ValueError("Dataset does not have rioxarray resolution")
    
        if ds.rio.transform() is None:
            raise ValueError("Dataset does not have rioxarray transform")
    
        print("✅ Dataset is compatible with the Planetary Computer tiler API.")
    

GRIB2 visualization

GRIB2 assets that have been ingested into Microsoft Planetary Computer Pro can be visualized in the Explorer as long as they have an associated Index file (.idx) stored in the same Blob Storage container. The Index file is generated during ingestion and is required for optimal access and rendering of GRIB2 data.

Zarr visualization

Zarr assets ingested into Microsoft Planetary Computer Pro can be visualized in the Explorer as long as the Render Configuration specifies which variable and time slice to render using the sel parameter in the options field. Failure to do so will result in the Explorer attempting to render all variables and time slices of the Zarr store at once, which will cause the Explorer to crash.

The size of the Zarr store and spatial chunks will also impact performance. You should aim to keep the total size of a Zarr store under 2 GB, and each chunk less than 100 MB for optimal performance of the tiler.

Zarr visualization limitations and known issues

Time slider behavior (known limitation)

Note

Time slider behavior for Zarr is currently limited. The Explorer time slider only appears when a temporal dimension is correctly detected during ingestion.

Even when Zarr assets contain time values, ingestion may fail to detect temporal metadata for some datasets. In those cases, the time slider will not render, and you must visualize one timestep at a time in render configuration (for example, sel=time=2024-01-01).

To enable time-aware behavior, your STAC metadata should include a temporal dimension in cube:dimensions with:

  • type: temporal
  • extent
  • step

For Zarr source data, follow CF time conventions where possible, for example:

  • standard_name="time"
  • axis="T"

These conventions are necessary for consistent metadata interpretation, but due to current limitations they are not always sufficient to guarantee time slider support for every Zarr dataset.

Kerchunk notes

Kerchunk can improve performance for multidimensional access patterns, but it does not resolve time slider issues when temporal dimensions were not detected during ingestion.

Some Zarr datasets may also fail during index processing with errors such as Index must be monotonic increasing or decreasing.

Roadmap and future support

Current and planned support is:

  • Zarr v2: supported today
  • Zarr v3: not supported yet, planned for future support
  • Multi-time Zarr visualization and temporal handling: partial today, with continued improvements planned

Time slider for data cube visualization

If your data cube assets have a temporal component, you can use the time slider in the Explorer to visualize changes over time. The time slider will appear automatically if your STAC Items contains assets with a time dimension with an extent and step field.

Note

For Zarr assets, see Zarr visualization limitations and known issues for current time slider constraints and required render configuration patterns.