Merging Files with Python

Below is an example of how multiple PDF documents, containing a varying number of pages, can be merged together into one file, with all page ones together, followed by all page twos and so on. For this to work the third-party module PyPDF2 must be installed.

Firstly, the file path is set and a check is made to see if it exists. This is followed by another check to verify that there are files to merge. The files are then processed one by one to find the number of pages that each PDF contains and this information is stored in a list along with the corresponding file name. All files without a ‘.pdf’ extension are ignored. Whilst doing this, a record is made of the maximum number of pages in an individual file. The list of file name and page information, along with the maximum number of pages figure, is then used to access pages in each file and check that the desired page actually exists in a particular file, which allows for PDFs of varying sizes to be merged. A confirmation message is also displayed stating how many files have been merged. Finally, ‘try-except’ blocks are used to handle errors with opening, saving and closing files.

# Import required modules
import os
import PyPDF2

# File path
filePath = 'c:\\demo'

# Check to see if the file path exists
if os.path.exists(filePath):

    # Change the current working directory
    os.chdir(filePath)

    # Check if there are any files in the chosen directory
    if len(os.listdir(filePath)) == 0:

        print('There are no files to merge.')

    else:

        # List for file information
        files = []

        # Source PDF file
        pdfFile = None

        # Maximum number of pages
        maxPages = 0

        # Process the files at the path to get information about each file
        for filename in os.listdir(filePath):

            # Check if the file is a PDF document, excluding temp files
            if filename.endswith('.pdf') and not filename.startswith('~'):

                try:

                    # Open the current PDF and assign it to a reader object
                    pdfFile = open(filename, 'rb')
                    pdfReader = PyPDF2.PdfFileReader(pdfFile)

                    # Assign the number of pages to the maximum if greater
                    # than current value
                    if pdfReader.getNumPages() > maxPages:

                        maxPages = pdfReader.getNumPages()

                    # Add the file information to the list
                    files.append((filename, pdfReader.getNumPages()))

                    # Close the PDF file
                    pdfFile.close()

                except PermissionError as e:

                    # Message confirming the file could not be merged
                    print('The file "' + filename + '" cannot be merged.')

        # If there are PDFs to merge, process them
        if maxPages > 0 and len(files) > 1:

            # Writer object for new combined PDF
            pdfWriter = PyPDF2.PdfFileWriter()

            try:

                # Combine PDFs into one file using the file information list
                # Put all page 1s together, then all page 2s and so on
                for pageIndex in range(0, maxPages):

                    # Extract the file name and number of pages for each file
                    for file, pages in files:

                        # Check if the current file has the desired page to merge
                        if pageIndex <= pages-1:

                            # Open the current PDF and assign it to a reader object
                            pdfFile = open(file, 'rb')
                            pdfReader = PyPDF2.PdfFileReader(pdfFile)

                            # Add the page to the new PDF.
                            pdfPage = pdfReader.getPage(pageIndex)
                            pdfWriter.addPage(pdfPage)

                # Open a new PDF file in write binary mode
                pdfCombined = open('combined.pdf', 'wb')

                # Write the PDF object to the new file
                pdfWriter.write(pdfCombined)

                # Close the PDF files to clean up
                pdfCombined.close()
                pdfFile.close()

                # Feedback that file merge has been success
                print(str(len(files)) + ' files have been merged successfully.')

            except PermissionError as e:

                # Display a message stating the merge was unsuccessful.
                print("The file merge was unsuccessful.")

        else:

            # Message to state there are no files to merge
            print('There are no files to merge.')

else:

    # Display a message stating that the file path does not exist
    print('File path does not exist.')