Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Apache Spark Structured Streaming is a near real-time processing engine that offers end-to-end fault tolerance with exactly-once processing guarantees using familiar Spark APIs. Structured Streaming lets you express computation on streaming data in the same way you express a batch computation on static data. The Structured Streaming engine performs the computation incrementally and continuously updates the result as streaming data arrives.
For a step-by-step tutorial, see Run your first Structured Streaming workload.
Read from a data stream
Use Structured Streaming to incrementally ingest data from supported data sources.
| Feature | Description |
|---|---|
| Auto Loader | Incrementally and efficiently process new data files as they arrive in cloud storage. |
| Delta table streaming reads and writes | Use Delta Lake tables as streaming sources and sinks with exactly-once processing guarantees. |
| Standard connectors | Connect to message buses, queues, and enterprise applications using standard connectors. |
| Micro-batch size | Limit input rates to maintain consistent batch sizes and prevent processing delays. |
Write to a data sink
Configure how Structured Streaming delivers data to target systems.
| Feature | Description |
|---|---|
| Checkpoints | Store processing state to enable fault tolerance and exactly-once delivery semantics. |
| Output mode | Choose between append, update, and complete modes for stateful streaming queries. |
| Trigger intervals | Set trigger intervals to balance latency and cost for your processing requirements. |
| Real-time mode in Structured Streaming | Process data for real-time workloads with end-to-end latency as low as five milliseconds. |
Stateful and stateless processing
Stateless queries process rows without retaining state. Stateful queries maintain intermediate state for aggregations, joins, and deduplication.
| Feature | Description |
|---|---|
| Stateless streaming queries | Optimize queries that process data without maintaining intermediate state. |
| Watermarks | Control how long Structured Streaming waits for late-arriving data in stateful operations. |
| Stateful streaming | Manage aggregations, stream-stream joins, and deduplication using stateful operators. |
Monitor and manage
Track query performance, apply optimizations, and govern data access for production Structured Streaming workloads.
| Feature | Description |
|---|---|
| Monitor with StreamingQueryListener | Track query progress and performance metrics using the Spark UI and listener API. |
| Govern with Unity Catalog | Configure Unity Catalog for streaming workloads with governance and access control. |