How to Use fillna() Function in PySpark | Step-by-Step Guide

How to Use fillna() Function in PySpark

Author: Aamir Shahzad

Date: March 2025

Introduction

In this tutorial, we will learn how to handle missing or null values in PySpark DataFrames using the fillna() function. Handling missing data is a critical part of data cleaning in data engineering workflows.

Why Use fillna() in PySpark?

Replace NULL values in DataFrame columns with specific values.
Apply different replacement values to different columns.
Clean your dataset before analysis or feeding it into machine learning models.

Step 1: Import SparkSession and Create Spark Session

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("PySparkFillnaFunction") \
    .getOrCreate()

Step 2: Create a Sample DataFrame

data = [
    ("Amir Shahzad", "Engineering", 5000),
    ("Ali", None, 4000),
    ("Raza", "Marketing", None),
    (None, "Sales", 4500),
    ("Ali", None, None)
]

columns = ["Name", "Department", "Salary"]

df = spark.createDataFrame(data, schema=columns)

df.show()

Expected Output

+-------------+-----------+------+
|         Name| Department|Salary|
+-------------+-----------+------+
|Amir Shahzad |Engineering|  5000|
|          Ali|       null|  4000|
|         Raza|  Marketing|  null|
|         null|      Sales|  4500|
|          Ali|       null|  null|
+-------------+-----------+------+

Step 3: Fill All NULL Values

Fill all NULL values with 'Unknown' for string columns and 0 for numeric columns.

df_fill_all = df.fillna("Unknown").fillna(0)

df_fill_all.show()

Expected Output

+-------------+-----------+------+
|         Name| Department|Salary|
+-------------+-----------+------+
|Amir Shahzad |Engineering|  5000|
|          Ali|    Unknown|  4000|
|         Raza|  Marketing|     0|
|      Unknown|      Sales|  4500|
|          Ali|    Unknown|     0|
+-------------+-----------+------+

Step 4: Fill NULLs with Column-Specific Values

df_fill_columns = df.fillna({
    "Department": "NA",
    "Salary": 10000
})

df_fill_columns.show()

Expected Output

+-------------+-----------+------+
|         Name| Department|Salary|
+-------------+-----------+------+
|Amir Shahzad |Engineering|  5000|
|          Ali|         NA|  4000|
|         Raza|  Marketing| 10000|
|         null|      Sales|  4500|
|          Ali|         NA| 10000|
+-------------+-----------+------+

Step 5: Fill NULLs in a Specific Column Only

df_fill_name = df.fillna("No Name", subset=["Name"])

df_fill_name.show()

Expected Output

+-------------+-----------+------+
|         Name| Department|Salary|
+-------------+-----------+------+
|Amir Shahzad |Engineering|  5000|
|          Ali|       null|  4000|
|         Raza|  Marketing|  null|
|      No Name|      Sales|  4500|
|          Ali|       null|  null|
+-------------+-----------+------+

Conclusion

Handling null and missing values is an essential part of data processing in PySpark. The fillna() function provides a simple and flexible way to replace these values, ensuring your data is clean and ready for further analysis or modeling.

PySpark Tutorial: fillna() Function to Replace Null or Missing Values | #PySparkTutorial #PySpark

How to Use fillna() Function in PySpark

Introduction

Why Use fillna() in PySpark?

Step 1: Import SparkSession and Create Spark Session

Step 2: Create a Sample DataFrame

Expected Output

Step 3: Fill All NULL Values

Expected Output

Step 4: Fill NULLs with Column-Specific Values

Expected Output

Step 5: Fill NULLs in a Specific Column Only

Expected Output

Conclusion

Watch the Video Tutorial

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112