Share via

Request for Guidance: Simplifying Data Lineage Visualization in Microsoft Purview

SudhakarReddy Marepalli 40 Reputation points
2026-03-31T20:26:46.9566667+00:00

Hi Team,

We are currently leveraging Microsoft Purview for data cataloging and lineage tracking across our enterprise data platform, which includes multiple layers such as Processed, Refined, and Refined+ (curated) zones.

As part of our implementation, we are observing that the lineage visualization is becoming highly complex and difficult to interpret. The current lineage view resembles a “spider web” structure, with a large number of intermediate tables, column-level mappings, and system-generated connections. This is making it challenging for business users and data stewards to clearly understand the end-to-end data flow across key layers.

Key Challenges:

  • Excessive intermediate nodes (tables, views, transformations) cluttering the lineage view
  • Column-level lineage creating dense and hard-to-read relationships
  • Difficulty in tracing high-level data flow between Processed → Refined → Refined+ layers
  • Reduced usability for business stakeholders due to overly technical lineage representation

Request:

We would like guidance or recommendations on the following:

  1. Is there a way to simplify or abstract lineage views to show only high-level dataset flow (e.g., layer-to-layer lineage)?
  2. Can we filter or suppress intermediate tables/views in lineage visualization?
  3. Is it possible to toggle or disable column-level lineage to reduce clutter?
  4. Are there any best practices or configurations to improve lineage readability for business users?
  5. Does Purview support custom or curated lineage views (e.g., business-friendly lineage vs technical lineage)?

Additional Context:

  • Data sources include Azure Databricks (Unity Catalog) and downstream Power BI datasets
  • Lineage is being captured through automated scans and integrations
  • Our goal is to present clean, domain-level lineage aligned with governance domains and data products

We would greatly appreciate any recommendations, configuration options, or roadmap features that can help improve lineage clarity and usability.

Please let us know if you need any additional details or screenshots to better understand our current setup.

Thank you for your support.

Best regards,

Sudhakar

Microsoft Security | Microsoft Purview
0 comments No comments

3 answers

Sort by: Most helpful
  1. James 0 Reputation points
    2026-04-05T14:21:35.9833333+00:00
    The "spider web" challenge you're describing is a common limitation of infrastructure-level lineage capture. When lineage is derived from database activity logs and connector integrations, it captures every intermediate hop rather than the logical data flow.
    
    A few thoughts beyond the excellent Purview-specific suggestions already provided:
    
    **1. SQL-Based Lineage as Complement**
    
    Consider analyzing your SQL transformation logic directly rather than relying solely on Purview's automated capture. Tools that parse your actual SQL statements can show you the logical lineage — which columns feed which transformations — without the intermediate storage hops. This gives you a "curated" view alongside Purview's comprehensive one.
    
    **2. Zone-Level vs Table-Level Lineage**
    
    Your Processed -> Refined -> Refined+ architecture suggests you want to see data flow at the "zone" or "domain" level, not every table. SQL-based lineage tools let you select which transformation SQL to include, so you can create a focused view for each zone boundary without all the intermediate noise.
    
    **3. Column-Level Lineage Clarity**
    
    If the core challenge is understanding which source columns ultimately affect which target columns, tools like [SQLFlow](https://sqlflow.gudusoft.com) can visualize column-level lineage from your SQL directly. You can paste your transformation queries and see the lineage graph — upstream and downstream — without the Purview integration complexity. It supports Databricks (Spark SQL), Azure SQL, and 20+ other dialects.
    
    This can complement Purview's infrastructure view with a focused, transformation-level view that's easier for both technical and business stakeholders to follow.
    
    (Disclosure: I work at Gudu Software, which makes SQLFlow.)
    
    What's driving the need for simplified lineage — compliance reporting, impact analysis, or business user self-service? That might help narrow down the best approach for your team.
    
    0 comments No comments

  2. Pilladi Padma Sai Manisha 6,430 Reputation points Microsoft External Staff Moderator
    2026-04-01T00:09:49.4933333+00:00

    Hi SudhakarReddy Marepalli,
    Purview’s “everything gets captured” approach can definitely make the graph look like a spider-web. Out of the box there isn’t a single toggle to collapse everything into a clean layer-to-layer view or switch off column-level links in the Studio today, but here are a few patterns and workarounds you can use to give business users a much clearer, high-level picture:

    1. Scope by Collection or Data Product • Organize your key “Processed → Refined → Refined+” datasets into separate Purview Collections or register them as Data Products. • When you view lineage scoped to that Collection or Data Product, you’ll only see assets and relationships inside that slice of your estate—automatically hiding unrelated intermediate tables.
    2. Add Curated (Manual) Lineage • Use the “Add lineage” feature in Purview to draw high-level arrows between your zone-level assets (e.g., “Processed Sales → Refined Sales”). • Share that curated lineage view with your business stakeholders. It lives alongside the auto-captured graph but only shows the business-friendly flow you care about.
    3. Leverage the REST API (or PyApacheAtlas) + Custom Visuals • Pull the full lineage graph via the Purview Lineage REST API (or use the PyApacheAtlas package). • In your own script or Power BI report, filter out system-generated assets and suppress column-level edges, leaving just the layer-to-layer hops you want.
    4. Best Practices to Improve Readability • Tag or classify your “canonical” zone tables so they stand out in the graph. • Adopt naming conventions that make layer transitions explicit (e.g., prefix your Processed tables with “P_”, Refined with “R_”, etc.). • Encourage stakeholders to start their lineage drill-in from the Collection/Data Product level rather than the global catalog.
    5. Column-Level Lineage Toggle • There isn’t a UI switch in Purview Studio today to hide column-level relationships; everything captured by scans shows up. If you’d like to see that capability, please file a feature request on the Azure Feedback portal—our product team reviews those regularly.

    Reference list:

    • Data lineage in Microsoft Purview (concepts & granularity)

    https://learn.microsoft.com/en-us/azure/purview/concept-data-lineage

    • Manual lineage setup & security baselines

    https://learn.microsoft.com/azure/cloud-adoption-framework/data/governance-security-baselines-purview-data-estate-unify-data-platform#1-data-visibility-baseline

    • Purview Lineage REST API & PyApacheAtlas

    https://github.com/wjohnson/pyapacheatlas

    • Collections & Data Products in Purview Unified Catalog

    https://learn.microsoft.com/purview/unified-catalog-data-products-search

    • Feedback portal for Purview feature requests

    https://aka.ms/azure-feedback

    Hope this gives you a roadmap to deliver clean, high-level lineage to your business users—let me know if you’d like more details on any of these steps!

    Note: This content was drafted with the help of an AI system. Please verify the information before relying on it for decision-making.


  3. Q&A Assist
    2026-03-31T20:27:14.7933333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Microsoft Purview does not currently provide a built‑in “business‑friendly vs technical” lineage mode, nor rich, per‑user configurable lineage abstractions in the way described. However, the context supports several patterns and practices that can be combined to make lineage more readable and aligned to high‑level flows.

    1. Simplifying to high‑level dataset / layer‑to‑layer flow

    Purview’s core lineage model is asset‑to‑asset (datasets, processes, reports, etc.) and is intended to show “data moving from source to destination including how the data was transformed.” This inherently includes intermediate assets and can become complex in large estates.

    To get closer to high‑level, domain or layer‑to‑layer lineage:

    • Use Purview primarily to show lineage between key data products rather than every technical object. The Cloud Adoption Framework recommends focusing on “data products” and using the catalog to make them “discoverable and self-serviceable,” with lineage as supporting context rather than exposing every transformation step.
    • For Fabric and Power BI, consider scanning only the Fabric layer when deep upstream visibility is not required. The “scan only the Fabric layer” option is explicitly recommended where “lineage begins when data enters Fabric” and upstream systems are already governed. This reduces the number of upstream nodes and focuses lineage on curated layers and products.
    • For regulated or critical domains where full provenance is required, keep end‑to‑end lineage, but for less critical domains, limit scans and lineage capture to the main curated zones.
    1. Reducing clutter from intermediate tables/views

    The guidance emphasizes a mix of automated and manual lineage:

    • “Enable automated lineage where available and close gaps manually where required.”
    • “Where automation isn’t available, add lineage manually in Purview to fill gaps.”

    In practice, this can be used to simplify views:

    • Avoid scanning or onboarding every transient or system‑generated object into the catalog. Focus scanning on stable, governed assets (for example, main Processed/Refined/Refined+ tables, key views, and data products) rather than every staging or temp object.
    • Where automated lineage produces overly detailed chains, consider representing some flows as higher‑level manual lineage between fewer, business‑relevant assets (for example, one curated Processed asset → one Refined asset → one Refined+ data product), instead of exposing all intermediate technical hops.
    1. Column‑level lineage and clutter

    The context notes that column‑level lineage is only partially available and source‑dependent (for example, “column level lineage and transformations … are only supported when using Azure SQL Database as source” for Power BI). There is no documented toggle to globally disable column‑level lineage in Purview; rather, column‑level lineage appears only where supported and captured.

    To reduce column‑level clutter:

    • Prefer sources and integration patterns where only table‑level lineage is captured, if column‑level detail is not needed for a given domain.
    • For Power BI, be aware of the limitations: some measures are not shown, and sub‑artifact lineage can introduce additional nodes. Where this detail is not useful for business users, consider exposing them to higher‑level workspace lineage in Power BI itself (see below) and reserving Purview’s detailed lineage for technical users.
    1. Best practices to improve readability for business users

    The “data visibility baseline” guidance for Purview suggests:

    • Use automated lineage where it adds value, but “add lineage manually in Purview to fill gaps.” This can also be interpreted as an opportunity to curate simpler, business‑oriented chains for key data products.
    • Apply a unified sensitivity labeling strategy and business taxonomy. While labels do not simplify the graph structurally, they help users quickly identify important, curated assets (for example, Refined+ / data products) and focus on those nodes in the lineage view.
    • Decide per domain whether to:
      • Scan source systems for full end‑to‑end lineage (more complex graphs, but full provenance), or
      • Scan only the Fabric / analytics layer for simpler, product‑focused lineage.

    For BI scenarios, Power BI’s own lineage view is explicitly designed to help answer high‑level questions like “What happens if I change this data?” and “Why isn’t this report up to date?” It shows “lineage relationships between all the artifacts in a workspace, and all its external dependencies,” focusing on dashboards, reports, semantic models, and dataflows. For business users, this workspace‑level lineage can be easier to understand than the full Purview graph.

    1. Custom or curated lineage views

    The context describes Purview’s lineage as:

    • A platform feature that “shows the lineage between datasets created by data processes,” with automation from systems like Data Factory, Data Share, and Power BI.
    • Supporting “custom lineage reporting … via Atlas hooks and REST API.”

    This implies:

    • Custom lineage can be pushed into Purview to represent higher‑level or curated flows (for example, domain‑level or product‑level lineage) instead of, or in addition to, raw technical lineage.
    • However, there is no documented feature for multiple built‑in “view modes” (for example, business vs technical) within the Purview UI itself. Any abstraction is achieved by controlling which assets are registered/scanned and how lineage is modeled (automated vs manual/custom), not by toggling a visualization mode.

    For Power BI specifically, there are known limitations in how lineage is captured and displayed in Purview (for example, limited source types, no lineage with dynamic M parameters, some measures not shown). These limitations can indirectly simplify the graph but should be treated as constraints rather than configuration options.

    Summary of actionable approaches based on the context:

    • Limit scans and lineage capture to key, curated assets and layers where possible (especially consider the “scan only the Fabric layer” option when appropriate).
    • Use manual/custom lineage to represent simplified, domain‑level flows between Processed → Refined → Refined+ data products, instead of exposing every intermediate technical object.
    • Use Power BI’s workspace lineage view for business‑friendly visualization of BI artifacts, and reserve Purview’s detailed lineage for technical users.
    • Apply consistent sensitivity labels and taxonomy so business users can quickly identify and focus on important curated assets in the lineage graph.

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.