Share via

How to fix concurrent modification exception?

zmsoft 695 Reputation points
2026-03-18T07:41:53.2366667+00:00

Hi,

I created a watermark table in Azure Databricks to record the watermark values, which is helpful for incremental data loading.

However, this watermark table will be modified concurrently, which has led to a concurrent modification error. Do you have any suggestions on how to handle this?

Thanks

zmsoft

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.

0 comments No comments

Answer accepted by question author
  1. SAI JAGADEESH KUDIPUDI 1,600 Reputation points Microsoft External Staff Moderator
    2026-03-20T17:17:15.11+00:00

    Hi zmsoft,
    it sounds like your watermark‐table updates are tripping over each other in Delta Lake’s optimistic concurrency model (you’re seeing a ConcurrentAppendException because two jobs are touching the same files/partitions). Here are a few things you can try:

    1. Partition or shard your watermark table • Right now it’s probably unpartitioned (or all jobs write to the same partition), so any concurrent write will conflict. • If you can partition by something unique per job (e.g. pipeline name, watermark name, run date), then concurrent jobs will each touch different folders and you won’t get conflicts.
    2. Make your MERGE/UPDATE conditions fully disjoint • Even if the table is partitioned, your MERGE or UPDATE logic must include the partition filter in the predicate. • Example (pseudo-Scala):
      
           deltaTable
      
             .as("t")
      
             .merge(
      
               source.as("s"),
      
               "s.key = t.key                                          
      
                AND t.pipeline = '<myPipeline>'                     // explicit partition filter
      
                AND t.partition_date = '<2024-06-11>'               // another filter if needed
      
               ")
      
             .whenMatched().updateAll()
      
             .whenNotMatched().insertAll()
      
             .execute()
      
      
      • That way each concurrent job only scans & updates its own partition.
    3. Add retry + back-off logic • Wrap your Delta write in a try/catch for ConcurrentAppendException. • On failure, wait a few seconds (or exponential back-off) and retry.
    4. (Optional) Implement a lightweight lock/lease • Before you update the watermark, write a “lock” row (or file) in a dedicated locking partition. • Other jobs see the lock and wait, then delete it once the update completes.

    Hope one of these patterns helps you avoid the concurrent-modification errors.

    Reference docs:

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2026-03-18T07:42:15.7066667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Concurrent modification errors on a Databricks “watermark table” are normal when multiple jobs or streams try to read and update the same Delta table at the same time. These are Delta Lake transaction conflicts. The key options are:

    1. Use Delta’s ACID behavior and retry on conflict
      • Errors like DELTA_CONCURRENT_APPEND, DELTA_CONCURRENT_DELETE_READ, DELTA_CONCURRENT_DELETE_DELETE, or DELTA_CONCURRENT_WRITE indicate a transaction conflict.
      • Wrap the watermark update logic in retry logic: if a conflict exception is thrown, re-read the table (getting the latest committed version) and retry the update.
      • This pattern aligns with Delta’s write-conflict model described under isolation and row-level concurrency.
    2. Reduce conflicts with table design and access pattern
      • Conflicts occur when two operations touch the same data files. Partitioning the watermark table and scoping updates to a single partition can reduce conflicts.
      • For example, partition by a key such as pipeline_name or source_system and ensure each job only updates its own partition. As documented, “a conflict occurs only if the two operations affect the same set of files. To make two sets of files disjoint, partition the table by the same columns used in operation conditions.”
      • Avoid broad UPDATE/DELETE statements that scan the whole table; instead, filter by partition and primary key.
    3. Use row-level concurrency where applicable
      • Databricks row-level concurrency reduces write conflicts for concurrent operations on the same Delta table files.
      • It is most effective when operations are scoped to disjoint sets of rows/files (for example, via partitioning and explicit predicates).
      • Be aware of limitations: complex conditional clauses and some MERGE patterns fall back to normal write-conflict behavior.
    4. Avoid concurrent writers to the same logical record
      • Even with partitioning and row-level concurrency, if multiple jobs try to update the same watermark row (for example, same pipeline_name) at the same time, conflicts are expected.
      • Design so that only one job instance is responsible for updating a given watermark row at any time (for example, one orchestrator job that updates the watermark after all downstream work completes).
    5. For streaming workloads, avoid sharing checkpoint/table writers
      • ConcurrentTransactionException occurs if multiple streaming queries use the same checkpoint and write to the same Delta table concurrently.
      • Ensure only one streaming query instance writes to the watermark table for a given checkpoint location.

    In practice, combine:

    • A schema and partitioning strategy that isolates different pipelines/sources into separate partitions.
    • Explicit predicates in UPDATE/MERGE to keep file sets disjoint.
    • Application-level retry logic on Delta concurrent-write exceptions.
    • A single writer per logical watermark record.

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.