union

Return a new DataFrame containing the union of rows in this and another DataFrame.

Syntax

union(other: "DataFrame")

Parameters

Parameter	Type	Description
`other`	DataFrame	Another DataFrame that needs to be unioned.

Returns

DataFrame: A new DataFrame containing the combined rows with corresponding columns.

Notes

This method performs a SQL-style set union of the rows from both DataFrame objects, with no automatic deduplication of elements.

Use the distinct() method to perform deduplication of rows.

The method resolves columns by position (not by name), following the standard behavior in SQL.

Examples

df1 = spark.createDataFrame([(1, 'A'), (2, 'B')], ['id', 'value'])
df2 = spark.createDataFrame([(3, 'C'), (4, 'D')], ['id', 'value'])
df3 = df1.union(df2)
df3.show()
# +---+-----+
# | id|value|
# +---+-----+
# |  1|    A|
# |  2|    B|
# |  3|    C|
# |  4|    D|
# +---+-----+

df1 = spark.createDataFrame([(1, 'A'), (2, 'B'), (3, 'C')], ['id', 'value'])
df2 = spark.createDataFrame([(3, 'C'), (4, 'D')], ['id', 'value'])
df3 = df1.union(df2).distinct().sort("id")
df3.show()
# +---+-----+
# | id|value|
# +---+-----+
# |  1|    A|
# |  2|    B|
# |  3|    C|
# |  4|    D|
# +---+-----+

Palaute

Onko tästä sivusta apua?

Last updated on 2026-04-17