Pandas Practical Examples and Use Cases
In the previous articles, we introduced the Pandas library and explored its most widely used functions. In this article, we will demonstrate practical examples and use cases of the Pandas library by applying its functions to real-world scenarios.
1. Data Cleaning and Preprocessing
Let's assume we have a dataset containing information about employees, and we need to clean and preprocess the data before performing further analysis. The dataset has the following columns: 'Name', 'Age', 'Department', 'Salary', and 'Joining Date'.
Load the dataset
import pandas as pd
# Read data from a CSV file
df = pd.read_csv('employee_data.csv')
# Display the DataFrame
print(df.head())
Remove duplicates and missing values
# Remove duplicate rows
df = df.drop_duplicates()
# Remove rows with missing values
df = df.dropna()
Convert 'Joining Date' column to datetime format
# Convert 'Joining Date' to datetime format
df['Joining Date'] = pd.to_datetime(df['Joining Date'])
print(df.head())
2. Data Aggregation and Analysis
Now that we have cleaned and preprocessed the data, we can perform some data aggregation and analysis tasks.
Calculate average salary by department
# Group by 'Department' and calculate the mean salary
average_salary = df.groupby('Department')['Salary'].mean()
print(average_salary)
Find the oldest and youngest employees in each department
# Group by 'Department' and find the oldest employee
oldest_employee = df.loc[df.groupby('Department')['Age'].idxmax()]
# Group by 'Department' and find the youngest employee
youngest_employee = df.loc[df.groupby('Department')['Age'].idxmin()]
print(oldest_employee)
print(youngest_employee)
3. Data Visualization
We can also use Pandas in combination with other libraries, such as Matplotlib, to create visualizations of our data.
Plot the distribution of employee ages
import matplotlib.pyplot as plt
# Plot a histogram of employee ages
df['Age'].plot(kind='hist', bins=10)
plt.xlabel('Age')
plt.title('Employee Age Distribution')
plt.show()
Plot the average salary by department
# Plot a bar chart of average salary by department
average_salary.plot(kind='bar')
plt.xlabel('Department')
plt.ylabel('Average Salary')
plt.title('Average Salary by Department')
plt.show()
4. Data Transformation and Export
After performing the necessary analysis, we can transform the data and export it for further use or reporting.
Add a new column for years of service
import datetime
# Calculate years of service based on the current date
current_date = datetime.datetime.now()
df['Years of Service'] = (current_date - df['Joining Date']).dt.days // 365
print(df.head())
Export the cleaned and transformed data to a new CSV file
# Write data to a CSV file
df.to_csv('cleaned_employee_data.csv', index=False)
Conclusion
In this article, we have explored practical examples and use cases of the Pandas library by applying its functions to real-world scenarios. Through data cleaning, preprocessing, aggregation, analysis, visualization, and transformation, we have demonstrated the versatility and power of the Pandas library in Python for data manipulation and analysis tasks.
Table of Contents
- Introduction to Pandas Library and Its Applications in Different Fields
- Pandas: Most Widely Used Functions and How to Use Them
- Pandas Practical Examples and Use Cases