An Apache Spark-based analytics platform optimized for Azure.
Hi zmsoft,
it sounds like your watermark‐table updates are tripping over each other in Delta Lake’s optimistic concurrency model (you’re seeing a ConcurrentAppendException because two jobs are touching the same files/partitions). Here are a few things you can try:
- Partition or shard your watermark table • Right now it’s probably unpartitioned (or all jobs write to the same partition), so any concurrent write will conflict. • If you can partition by something unique per job (e.g. pipeline name, watermark name, run date), then concurrent jobs will each touch different folders and you won’t get conflicts.
- Make your MERGE/UPDATE conditions fully disjoint • Even if the table is partitioned, your MERGE or UPDATE logic must include the partition filter in the predicate. • Example (pseudo-Scala):
• That way each concurrent job only scans & updates its own partition.deltaTable .as("t") .merge( source.as("s"), "s.key = t.key AND t.pipeline = '<myPipeline>' // explicit partition filter AND t.partition_date = '<2024-06-11>' // another filter if needed ") .whenMatched().updateAll() .whenNotMatched().insertAll() .execute() - Add retry + back-off logic • Wrap your Delta write in a try/catch for
ConcurrentAppendException. • On failure, wait a few seconds (or exponential back-off) and retry. - (Optional) Implement a lightweight lock/lease • Before you update the watermark, write a “lock” row (or file) in a dedicated locking partition. • Other jobs see the lock and wait, then delete it once the update completes.
Hope one of these patterns helps you avoid the concurrent-modification errors.
Reference docs:
- Optimistic concurrency & conflict exceptions: https://learn.microsoft.com/azure/databricks/optimizations/isolation-level#concurrentappendexception
- Avoid conflicts by partitioning & disjoint conditions: https://learn.microsoft.com/azure/databricks/optimizations/isolation-level#avoid-conflicts-using-partitioning-and-disjoint-command-conditions
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.