Within the earlier elements, we lined important date capabilities corresponding to calculating the distinction between dates, changing time zones, and figuring out leap years.
Half 4 Hyperlink :
On this model of Half 5, we’re going even deeper into extra superior PySpark date operations.
21. Add Enterprise Days to a Date
Working with enterprise days is an important process in lots of industries, particularly in finance, the place operations are sometimes carried out on enterprise days solely. The flexibility so as to add or subtract enterprise days from a given date is significant.
Situation:
Add 10 enterprise days to a start_date
, ignoring weekends.
PySpark Code:
from pyspark.sql import SparkSession
import pandas as pd# Initialize Spark session
spark = SparkSession.builder
.appName("Enterprise Days Addition Instance")
.getOrCreate()
# Instance DataFrame - wrap the date in an inventory for PySpark to deduce the schema accurately
df = spark.createDataFrame([("2023-05-01",)], ["start_date"])
# Convert to pandas, add enterprise days, then again to PySpark
df = df.toPandas()
# Convert start_date column to datetime format
df["start_date"] = pd.to_datetime(df["start_date"])
# Add 10 enterprise days (utilizing pd.Timedelta)
df["business_days_added"] = df["start_date"] + pd.Timedelta(days=10)
# Convert pandas DataFrame again to PySpark DataFrame
# Convert the 'business_days_added' column again to string to keep away from timestamp precision points
df["business_days_added"] = df["business_days_added"].dt.strftime('%Y-%m-%d')
# Convert pandas DataFrame again to PySpark DataFrame
df = spark.createDataFrame(df)
# Present the outcome
df.present(truncate=False)
This code makes use of pandas so as to add 10 enterprise days whereas ignoring weekends. PySpark doesn’t have a built-in operate for calculating enterprise days, so leveraging pandas is a good different.