Muistiinpano
Tämän sivun käyttö edellyttää valtuutusta. Voit yrittää kirjautua sisään tai vaihtaa hakemistoa.
Tämän sivun käyttö edellyttää valtuutusta. Voit yrittää vaihtaa hakemistoa.
This article describes the Microsoft Purview Information Protection scanner Custom Reporting feature (preview). Custom Reporting gives scanner administrators the data they need in the scanner cluster database to build their own reports against scan results—including labeling, protection state, and matched sensitive information types (SITs).
Custom Reporting is available with Microsoft Purview Information Protection client and scanner version 3.2.89.0 or later. Custom Reporting is enabled by an administrator through Scanner feature control.
What Custom Reporting enables
Today, the scanner produces per-scan CSV and TXT reports and stores a limited set of operational data in the cluster database. To build a complete picture of file state—what changed, what's labeled, what's protected, and what kinds of sensitive data exist where—administrators have to combine multiple CSV exports across scan cycles and load them into a separate reporting tool.
Custom Reporting moves that data into the scanner cluster database so that administrators can:
- Query the present and previous label, protection state, and SIT counts for every scanned file in a repository.
- Compute deltas between scans (for example, the change in number of matched SITs per file).
- See which sensitive information types were matched on which files, and how many matches per type.
- Connect the scanner cluster database to a reporting tool of their choice (for example, Power BI, an enterprise reporting warehouse, or a SQL-based dashboarding tool) without first having to stitch CSV exports together.
The scanner remains the source of truth. Custom Reporting writes the additional reporting data to the same shared scanner cluster database on the next scan cycle after the feature is enabled.
Database schema
The reporting tables and columns required by Custom Reporting were first added to the scanner cluster database schema in client version 3.2.57.0. Until Custom Reporting is enabled, the new columns and tables exist but aren't populated. Existing scanner functionality is unchanged.
Note
You don't need to install client version 3.2.57.0 first. Whether you're installing the scanner fresh or upgrading from any earlier version, the scanner deploys the full database schema (including the Custom Reporting tables and columns) on install or upgrade.
When Custom Reporting is enabled by using Scanner feature control, the scanner starts populating the following data on the next scan cycle.
Additions to dbo.ScannerFiles
dbo.ScannerFiles continues to hold one row per scanned file. Custom Reporting populates the following additional columns to capture file state and the delta between the current and previous scan:
| Column | Type | Description |
|---|---|---|
LabelName |
NVARCHAR(MAX), nullable |
Current label name applied to the file. NULL if unlabeled. |
PrevLabelId |
NVARCHAR(MAX), nullable |
Label ID applied at the previous scan, stored as a string. NULL if previously unlabeled. |
PrevLabelName |
NVARCHAR(MAX), nullable |
Label name applied at the previous scan. NULL if previously unlabeled. |
ProtectionState |
NVARCHAR(MAX), nullable |
Current protection state for the file at the end of the scan. |
PrevProtectionState |
NVARCHAR(MAX), nullable |
Protection state recorded at the previous scan. |
ClassificationCount |
INT, default 0 |
Count of sensitive information type matches on the file at the current scan. |
LatestScanSessionId |
UNIQUEIDENTIFIER, nullable |
Identifies the most recent scan session that touched the file. Used to compute deltas across scan cycles. |
FileStatus |
NVARCHAR(MAX), nullable |
Final disposition of the file in the scan cycle (for example, the requires-justification status set by ProcessJob, or Failed for files that couldn't be processed). |
New table: dbo.MatchedClassificationAction
dbo.MatchedClassificationAction is a new table that stores the matched sensitive information types per file per scan. Each row represents one matched SIT for a single file in a single scan session.
| Column | Type | Description |
|---|---|---|
Id |
BIGINT IDENTITY (primary key) |
Surrogate key for the row. |
FilePath |
NVARCHAR(MAX), nullable |
Full path of the file that the SIT was matched on. |
FileHashPath |
BINARY(64), nullable |
Hash of the file path. Joins to dbo.ScannerFiles.HashPath and is indexed for join performance. |
ScanSessionId |
UNIQUEIDENTIFIER, nullable |
Identifies the scan session in which the SIT was matched. Joins to dbo.ScannerFiles.ScanSessionId and is indexed. |
MatchedInformationTypeName |
NVARCHAR(MAX), nullable |
Display name of the matched sensitive information type (for example, U.S. social security number (SSN)). |
MatchedInformationTypeId |
UNIQUEIDENTIFIER, nullable |
GUID of the matched SIT. The same GUID is used in the local Microsoft Purview Information Protection logs for Workload=OnPremisesFileShareScanner. |
MatchedInformationTypeCount |
INT, default 0 |
Number of matches for this SIT in the file. |
ConfidenceScore |
INT, default 0 |
Confidence score of the match. |
dbo.ScanSummary (per-scan totals) is unchanged.
Example questions Custom Reporting can answer
Once Custom Reporting is populating the database, administrators can run queries such as:
- Which repositories have the highest concentration of matched SITs, and how has that concentration changed since the last scan?
- Which files were labeled or relabeled during the most recent scan cycle, and what was the previous label?
- Which files are still unlabeled but contain matches for one or more sensitive information types?
- Which sensitive information types are most prevalent in a given repository, and at what confidence?
- Which files transitioned from unprotected to protected (or the reverse) since the last scan?
Enable Custom Reporting
Custom Reporting is turned on through admin-controlled feature configuration. From any node in the scanner cluster, run:
Set-ScannerConfiguration -FeatureSettings @{CustomReporting="On"}
To enable Custom Reporting at install time on a new scanner node, use the -FeatureSettings parameter with Install-Scanner:
Install-Scanner -SqlServerInstance SQLSERVER1 -Cluster Europe -FeatureSettings @{CustomReporting="On"}
To confirm the current state, run:
Get-ScannerConfiguration
The change takes effect on every node in the cluster on the next scan cycle. No service restart is required.
To stop populating the Custom Reporting columns and table, run:
Set-ScannerConfiguration -FeatureSettings @{CustomReporting="Off"}
Disabling Custom Reporting stops new writes. Data already written to the reporting columns and tables isn't deleted, so the feature can be re-enabled later without data loss.
Plan your scanner cluster database for reporting
When you turn on Custom Reporting, the scanner cluster database stores significantly more data per scan cycle—additional columns per file in dbo.ScannerFiles, plus one row per matched SIT per file per scan in dbo.MatchedClassificationAction. Building reports against the scanner cluster database also adds a read workload that runs alongside the scanner's operational reads and writes.
Before you enable Custom Reporting in production, consider:
- Cluster size and scan volume. Estimate the number of files scanned per cycle, the typical number of SIT matches per file, and how often you scan. Use those numbers to size the database.
- Operational impact. Reporting queries against the same database that the scanner is writing to can compete with the scanner for resources during a scan cycle.
- The reporting workload you actually want. Interactive dashboards, scheduled extracts, and ad-hoc queries each have very different impacts.
For most production deployments, we recommend hosting the scanner cluster database on SQL Server Enterprise so that you can dedicate a read-only replica to reporting. With a SQL Server Always On availability group, you can route reporting traffic to a readable secondary replica so that reporting queries don't compete with the scanner's operational workload on the primary replica.
This separation lets reporting tools (for example, Power BI) connect to the read-only replica and refresh on their own cadence without affecting scan throughput on the primary.
Note
The scanner itself always reads from and writes to the primary database. Only your custom reporting workload should be pointed at a read-only replica.
Limitations during preview
- There is no built-in dashboard delivered with Custom Reporting in this preview. Customers build their own reports against the scanner cluster database.
- Settings configured by admin-controlled feature configuration aren't synchronized with the Microsoft Purview portal. Not every scanner feature will end up being configurable from the portal. For features that are available in the portal and have been configured there, the portal-configured setting takes precedence and prevents updates from PowerShell.