entre as

Verifique se o valor da coluna está entre os limites inferior e superior (inclusive).

Sintaxe

between(lowerBound, upperBound)

Parâmetros

Parâmetro Tipo Descrição
lowerBound valor ou Coluna Valor do limite inferior
upperBound valor ou Coluna Valor do limite superior

Devoluções

Coluna (booleana)

Exemplos

Usando meio-termo com valores inteiros:

df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
df.select(df.name, df.age.between(2, 4)).show()
# +-----+---------------------------+
# | name|((age >= 2) AND (age <= 4))|
# +-----+---------------------------+
# |Alice|                       true|
# |  Bob|                      false|
# +-----+---------------------------+

Usando meio-termo com valores de cadeia:

df = spark.createDataFrame([("Alice", "A"), ("Bob", "B")], ["name", "initial"])
df.select(df.name, df.initial.between("A", "B")).show()
# +-----+-----------------------------------+
# | name|((initial >= A) AND (initial <= B))|
# +-----+-----------------------------------+
# |Alice|                               true|
# |  Bob|                               true|
# +-----+-----------------------------------+

Usando entre com valores de flutuante:

df = spark.createDataFrame(
    [(2.5, "Alice"), (5.5, "Bob")], ["height", "name"])
df.select(df.name, df.height.between(2.0, 5.0)).show()
# +-----+-------------------------------------+
# | name|((height >= 2.0) AND (height <= 5.0))|
# +-----+-------------------------------------+
# |Alice|                                 true|
# |  Bob|                                false|
# +-----+-------------------------------------+

Usando o meio-termo com valores de data:

import pyspark.sql.functions as sf
df = spark.createDataFrame(
    [("Alice", "2023-01-01"), ("Bob", "2023-02-01")], ["name", "date"])
df = df.withColumn("date", sf.to_date(df.date))
df.select(df.name, df.date.between("2023-01-01", "2023-01-15")).show()
# +-----+-----------------------------------------------+
# | name|((date >= 2023-01-01) AND (date <= 2023-01-15))|
# +-----+-----------------------------------------------+
# |Alice|                                           true|
# |  Bob|                                          false|
# +-----+-----------------------------------------------+

Usando o Between com valores de carimbo temporal:

import pyspark.sql.functions as sf
df = spark.createDataFrame(
    [("Alice", "2023-01-01 10:00:00"), ("Bob", "2023-02-01 10:00:00")],
    schema=["name", "timestamp"])
df = df.withColumn("timestamp", sf.to_timestamp(df.timestamp))
df.select(df.name, df.timestamp.between("2023-01-01", "2023-02-01")).show()
# +-----+---------------------------------------------------------+
# | name|((timestamp >= 2023-01-01) AND (timestamp <= 2023-02-01))|
# +-----+---------------------------------------------------------+
# |Alice|                                                     true|
# |  Bob|                                                    false|
# +-----+---------------------------------------------------------+