Python and PostgreSQL - Generating Data

In order to test the performance of a database, or application that uses it, where there are large volumes of data, it may be useful to generate data rather than having to enter it manually.

The following example demonstrates how to generate data for a table called ‘person’, which was used in the examples for selecting, inserting, updating, deleting, importing and exporting data.

First of all, a connection to the database is established and the number of records to generate is specified. This can be set to any value, as long as there is enough space in the database to accommodate the records. A number of tuples are defined so that a random first name, last name and title can be selected for each record. The first name tuple also holds the gender associated with the name so that an appropriate title can be selected. A date range is also specified to allow for the generation of a random date of birth, along with other variables.

A ‘for’ loop is used to generate the desired number of records. A random number is generated to select a first name from the corresponding tuple and this is extracted, along with the gender. The same process occurs to extract a random last name. The gender is then used to extract a title from the appropriate tuple. The final random value generated is the date of birth.

Once all the values have been generated, an SQL ‘insert‘ statement is constructed and then executed. Feedback is provided as to the number of records added to the database. A ‘try-except-finally’ block is used to catch any errors that may occur, as well as close the database connection, regardless of whether the data generation is successful or not.

import datetime
import psycopg2
import random

# Database.
connect = None

try:

    # Connect to database.
    connect = psycopg2.connect(host='localhost', database='Demo',
                               user='DemoUN', password='DemoPW')

except psycopg2.DatabaseError as e:

    # Confirm unsuccessful connection and stop program execution.
    print("Database connection unsuccessful.")
    quit()

# Number of records to generate.
recordsToGenerate = 500

# First names.
fname = (
    ("Oliver", "M"), ("Noah", "M"), ("Harry", "M"),
    ("Leo", "M"), ("Charlie", "M"), ("Jack", "M"),
    ("Freddie", "M"), ("Alfie", "M"), ("Archie", "M"),
    ("Theo", "M"), ("Olivia", "F"), ("Sophia", "F"),
    ("Amelia", "F"), ("Emily", "F"), ("Ava", "F"),
    ("Isla", "F"), ("Isabelle", "F"), ("Charlotte", "F"),
    ("Layla", "F"), ("Freya", "F")
)

# Last names.
lname = (
    "Smith", "Johnson", "Williams", "Jones",
    "Brown", "Davis", "Miller", "Wilson",
    "Taylor", "Anderson", "Thomas", "White",
    "Martin", "Thompson", "Robinson", "Clark",
    "Walker", "Young", "Wright", "Hill"
)

# Male titles.
mtitle = (
    "Mr", "Dr", "Prof"
)

# Female titles.
ftitle = (
    "Miss", "Mrs", "Ms", "Dr", "Prof"
)

# Dates for random date of birth range.
dateToday = datetime.date.today()
startDob = datetime.date(dateToday.year - 100, dateToday.month, dateToday.day)
endDob = datetime.date(dateToday.year - 20, dateToday.month, dateToday.day)

# Record count.
recordCount = 0

try:

    # Cursor to execute query.
    cursor = connect.cursor()

    # Generate specified number of records.
    for i in range(0, recordsToGenerate):

        # Randomly select a first name and associated gender.
        randomNumber = random.randint(0, len(fname) - 1)
        firstname = fname[randomNumber][0]
        gender = fname[randomNumber][1]

        # Randomly select a last name.
        randomNumber = random.randint(0, len(lname) - 1)
        lastname = lname[randomNumber]

        # Randomly select a title based on the gender.
        if gender == "M":

            randomNumber = random.randint(0, len(mtitle) - 1)
            title = mtitle[randomNumber]

        else:

            randomNumber = random.randint(0, len(ftitle) - 1)
            title = ftitle[randomNumber]

        # Randomly select a date of birth.
        dob = startDob + (endDob - startDob) * random.random()

        # Query text.
        sqlPersonInfo = \
            "INSERT INTO person (firstname, lastname, title, dob)  \
             VALUES (%s, %s, %s, %s)"

        # Execute query and commit changes.
        cursor.execute(sqlPersonInfo, (firstname, lastname, title, dob))
        connect.commit()

        # Increment the record count.
        recordCount += 1

    # Provide feedback on the number of records added.
    if recordCount == 0:

        print("No new person records added.")

    elif recordCount == 1:

        print(str(recordCount) + " person record added.")

    else:

        print(str(recordCount) + " person records added.")

except psycopg2.DatabaseError as e:

    # Confirm error adding person information and exit.
    print("Error adding person information.")
    quit()

finally:

    # Close database connection.
    connect.close()