Python lets you do exactly that; no matter how many Excel files you want to combine, you can do it with relative ease. Given its range of libraries and third-party resources, you can import and use Python’s multi-faceted tools to do your bidding.
In this guide, you will need to install and use the Pandas libraries to import data into Python before consolidating it.
Install Pandas Libraries in Python
Pandas is a third-party library that you can install in Python. Some IDEs already have Pandas installed in them.
If you are using an IDE version that doesn’t come with pre-installed Pandas, rest assured, you can install it directly in Python.
Here’s how to install Pandas:
If you’re using Jupyter Notebook, you can install Pandas directly with the PIP command. Mostly, when you have installed Jupyter with Anaconda, there are high chances of already having Pandas available for direct use.
If you cannot call Pandas, you can use the above command to install them directly.
Combining Excel Files With Python
First, you need to create a folder in your preferred location with all the Excel files. Once the folder is ready, you can start writing the code to import the libraries.
You will use two variables in this code:
Pandas: The Pandas library provides the data frames to store Excel files. OS: The library is beneficial for reading data from your machine’s folder
To import these libraries, use these commands:
Import: Python syntax used to import the libraries in Python Pandas: Name of the library pd: Alias given to the library OS: A library to access the system folder
Once you have imported the libraries, create two variables to store the input and output file path. The input file path is needed to access the files’ folder. The output file path is necessary as the combined file will be exported there.
If you are using Python, make sure you change the backslash to forward-slash (\ to /)
Append the / at the end as well to complete the paths.
The folder’s files are available in a list. Create a list to store all the file references of the input folder using the listdir function from the OS library.
If you are unsure of the functions available within a library, you can use the dir function with the library name. For example, to check the exact version of the listdir function, you can use the command as follows:
The output will consist of all the associated functions available within the OS library. The listdir function is one of the many functions available within this library.
Create a new variable to store the input files from the folder.
Print this variable to see the names of the files stored within the folder. All files stored within the folder are displayed once you use the print function.
Next, you need to add a new data frame to store each Excel file. Imagine a data frame as a container for storing data. Here’s the command for creating a data frame.
df: Variable to store the value of the DataFrame pd: Alias for the Pandas library DataFrame: Default syntax for adding a data frame
The input folder has three .xlsx files in this example. The file names are:
To open each file from this folder, you need to run a loop. The loop will run for each of the files in the list created above.
Here’s how you can do it:
Next, it’s necessary to check the extensions of the files since the code will open XLSX files only. To check these files, you can use an If statement.
Use the endswith function for this purpose, as follows:
excel_files: List with all the file values endswith: Function to check the extension of the files (". xlsx"): This string value can change, depending on what you want to search for
Now that you’ve identified the Excel files, you can create a new data frame to read and store the files individually.
df1: New data frame pd: Pandas library read_excel: Function to read Excel files within the Pandas library input_file_path: Path of the folder where the files are stored excel_files: Any variable which is used in the for loop
To start appending the files, you need to use the append function.
Finally, now that the consolidated data frame is ready, you can export it to the output location. In this case, you are exporting the data frame to an XLSX file.
df: Dataframe to export to_excel: Command used to export the data output_file_path: Path defined for storing the output Consolidated_file. xlsx: Name of the consolidated file
Now, let’s look at the final code:
Using Python to Combine Multiple Excel Workbooks
Python’s Pandas is an excellent tool for beginners and advanced users alike. The library is used extensively by developers who want to master Python.
Even if you are a beginner, you can benefit immensely by learning the nuances of Pandas and how the library is used within Python.