Excel Consolidation in Python

MoneyTech360
By -
0
Excel Consolidation in Python

Introduction:

This script is a Python program that utilizes the `pandas` library, along with the `os` module, to consolidate data from multiple Excel files (`*.xls` and `*.xlsx`) within a specified directory. The consolidated data is then saved to a new Excel file named "Consolidated_Data.xlsx" within the same directory. The sheet name of the consolidated data is named after the folder's name.
import pandas as pd
import os

# Input the folder path
folder_path = input("Enter the folder path: ")

# List all Excel files in the directory
excel_files = [file for file in os.listdir(folder_path) if file.endswith('.xlsx') or file.endswith('.xls')]

# Create a list to hold dataframes
all_data = []

# Loop through each Excel file and read into a dataframe
for file in excel_files:
    file_path = os.path.join(folder_path, file)
    df = pd.read_excel(file_path, engine='openpyxl')  # Using openpyxl engine to read .xlsx files
    all_data.append(df)

# Concatenate all dataframes
merged_data = pd.concat(all_data, ignore_index=True)

# Extract folder name for the sheet name
folder_name = os.path.basename(folder_path)

# Write to a new Excel file with the folder name as sheet name
output_file = os.path.join(folder_path, "Consolidated_Data.xlsx")
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
    merged_data.to_excel(writer, sheet_name=folder_name, index=False)

print(f"Data consolidated successfully in {output_file}")

Code Breakdown:


Code

import pandas as pd
import os

Explanation:

- The script begins by importing necessary modules. - `pandas`: A powerful data manipulation library in Python. - `os`: A module providing a portable way of using operating system-dependent functionality.




Code

folder_path = input("Enter the folder path: ")

Explanation:

- The user is prompted to input the path of the folder containing the Excel files to be consolidated.




Code

excel_files = [file for file in os.listdir(folder_path) if file.endswith('.xlsx') or file.endswith('.xls')]

Explanation:

- A list comprehension is used to retrieve all Excel files (both `.xls` and `.xlsx` formats) from the specified directory.




Code

all_data = []

Explanation:

- An empty list, `all_data`, is initialized. This list will be used to store individual dataframes, each representing an Excel file's content.




Code

for file in excel_files:
file_path = os.path.join(folder_path, file)
df = pd.read_excel(file_path, engine='openpyxl')
all_data.append(df)

Explanation:

- The script iterates through each Excel file in the folder. - For each file, it constructs the full path using `os.path.join()`. - The file is then read into a dataframe using the `openpyxl` engine, suitable for reading `.xlsx` files. - The dataframe is appended to the `all_data` list.




Code

merged_data = pd.concat(all_data, ignore_index=True)

Explanation:

- All the individual dataframes stored in the `all_data` list are concatenated (or merged) into a single dataframe called `merged_data`. The `ignore_index=True` argument ensures that the resulting dataframe has a continuous index.




Code

folder_name = os.path.basename(folder_path)

Explanation:

- The folder's name is extracted from the provided path using `os.path.basename()`. This name will be used as the sheet name for the consolidated data in the resulting Excel file.




Code

output_file = os.path.join(folder_path, "Consolidated_Data.xlsx")
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
merged_data.to_excel(writer, sheet_name=folder_name, index=False)

Explanation:

- The path for the output Excel file "Consolidated_Data.xlsx" is constructed within the same directory. - Using the `pd.ExcelWriter()` context manager, the consolidated dataframe (`merged_data`) is written to this Excel file with the sheet name as the folder's name. The `index=False` argument ensures that dataframe's index isn't written to the Excel file.




Code

print(f"Data consolidated successfully in {output_file}")

Explanation:

- A confirmation message is printed, indicating the successful consolidation and providing the path to the new consolidated Excel file.
In summary, this script streamlines the process of consolidating multiple Excel files from a specified directory into a single Excel file, naming the sheet after the folder. It harnesses the power of the `pandas` library for data manipulation and the `os` module for directory operations.

Your feedback and suggestion is very valuable for us.
So please comment your thoughts and request code below



Tags:

Post a Comment

0Comments

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!