I am testing the JoinAsof and it works great. Just one problem.
Rows on the left dataframe with null values in the time columns are dropped.
Any way of fixing this problem.
One workaround would be to replace nulls with some dummy value.
left = spark.createDataFrame(
[
[dt.datetime(2021, 1, 1, 10, 30), "x", 1],
[dt.datetime(2021, 1, 1, 10, 30, 10), "x", 2],
[None, "x", 3],
[dt.datetime(2021, 1, 1, 10, 40, 10), "x", 3],
],
"ts timestamp, col1 string, col2 int"
)
right = spark.createDataFrame(
[
[dt.datetime(2021, 1, 1, 10, 29), "x", "a"],
[dt.datetime(2021, 1, 1, 10, 40, 20), "x", "b"],
],
"ts timestamp, col1 string, col3 string"
)
left_ts = TSDF(left, ts_col="ts", partition_cols=["col1"])
right_ts = TSDF(right, ts_col="ts", partition_cols=["col1"])
left_ts.asofJoin(right_ts).df.show()
+-------------------+----+----+-------------------+----------+
| ts|col1|col2| right_ts|right_col3|
+-------------------+----+----+-------------------+----------+
|2021-01-01 10:30:00| x| 1|2021-01-01 10:29:00| a|
|2021-01-01 10:30:10| x| 2|2021-01-01 10:29:00| a|
|2021-01-01 10:40:10| x| 3|2021-01-01 10:29:00| a|
+-------------------+----+----+-------------------+----------+