Introduction:
This script is a Python program that utilizes the `pandas` library, along with the `os` module, to consolidate data from multiple Excel files (`*.xls` and `*.xlsx`) within a specified directory. The consolidated data is then saved to a new Excel file named "Consolidated_Data.xlsx" within the same directory. The sheet name of the consolidated data is named after the folder's name.import pandas as pd
import os
# Input the folder path
folder_path = input("Enter the folder path: ")
# List all Excel files in the directory
excel_files = [file for file in os.listdir(folder_path) if file.endswith('.xlsx') or file.endswith('.xls')]
# Create a list to hold dataframes
all_data = []
# Loop through each Excel file and read into a dataframe
for file in excel_files:
file_path = os.path.join(folder_path, file)
df = pd.read_excel(file_path, engine='openpyxl') # Using openpyxl engine to read .xlsx files
all_data.append(df)
# Concatenate all dataframes
merged_data = pd.concat(all_data, ignore_index=True)
# Extract folder name for the sheet name
folder_name = os.path.basename(folder_path)
# Write to a new Excel file with the folder name as sheet name
output_file = os.path.join(folder_path, "Consolidated_Data.xlsx")
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
merged_data.to_excel(writer, sheet_name=folder_name, index=False)
print(f"Data consolidated successfully in {output_file}")
Code Breakdown:
Code
import pandas as pd
import os
Explanation:
- The script begins by importing necessary modules. - `pandas`: A powerful data manipulation library in Python. - `os`: A module providing a portable way of using operating system-dependent functionality.
Code
folder_path = input("Enter the folder path: ")
Explanation:
- The user is prompted to input the path of the folder containing the Excel files to be consolidated.
Code
excel_files = [file for file in os.listdir(folder_path) if file.endswith('.xlsx') or file.endswith('.xls')]
Explanation:
- A list comprehension is used to retrieve all Excel files (both `.xls` and `.xlsx` formats) from the specified directory.
Code
all_data = []
Explanation:
- An empty list, `all_data`, is initialized. This list will be used to store individual dataframes, each representing an Excel file's content.
Code
for file in excel_files:
file_path = os.path.join(folder_path, file)
df = pd.read_excel(file_path, engine='openpyxl')
all_data.append(df)
Explanation:
- The script iterates through each Excel file in the folder. - For each file, it constructs the full path using `os.path.join()`. - The file is then read into a dataframe using the `openpyxl` engine, suitable for reading `.xlsx` files. - The dataframe is appended to the `all_data` list.
Code
merged_data = pd.concat(all_data, ignore_index=True)
Explanation:
- All the individual dataframes stored in the `all_data` list are concatenated (or merged) into a single dataframe called `merged_data`. The `ignore_index=True` argument ensures that the resulting dataframe has a continuous index.
Code
folder_name = os.path.basename(folder_path)
Explanation:
- The folder's name is extracted from the provided path using `os.path.basename()`. This name will be used as the sheet name for the consolidated data in the resulting Excel file.
Code
output_file = os.path.join(folder_path, "Consolidated_Data.xlsx")
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
merged_data.to_excel(writer, sheet_name=folder_name, index=False)
Explanation:
- The path for the output Excel file "Consolidated_Data.xlsx" is constructed within the same directory. - Using the `pd.ExcelWriter()` context manager, the consolidated dataframe (`merged_data`) is written to this Excel file with the sheet name as the folder's name. The `index=False` argument ensures that dataframe's index isn't written to the Excel file.
Code
print(f"Data consolidated successfully in {output_file}")
Explanation:
- A confirmation message is printed, indicating the successful consolidation and providing the path to the new consolidated Excel file.In summary, this script streamlines the process of consolidating multiple Excel files from a specified directory into a single Excel file, naming the sheet after the folder. It harnesses the power of the `pandas` library for data manipulation and the `os` module for directory operations.