Python and SQLite - Generating Data

In order to test the performance of a database, or application that uses it, where there are large volumes of data, it may be useful to generate data rather than having to enter it manually.

The following example demonstrates how to generate data for a table called ‘person’, which was used in the examples for selecting, inserting, updating, deleting, importing and exporting data.

Firstly, a check is made to see if the database file actually exists. If it doesn’t, a message is displayed and execution of the program is halted. If successfully found, a connection to the database is established and the number of records to generate is specified. This can be set to any value, as long as there is enough space in the database to accommodate the records. A number of tuples are defined so that a random first name, last name and title can be selected for each record. The first name tuple also holds the gender associated with the name so that an appropriate title can be selected. A date range is also specified to allow for the generation of a random date of birth, along with other variables.

A ‘for’ loop is used to generate the desired number of records. A random number is generated to select a first name from the corresponding tuple and this is extracted, along with the gender. The same process occurs to extract a random last name. The gender is then used to extract a title from the appropriate tuple. The final random value generated is the date of birth.

Once all the values have been generated, an SQL ‘insert‘ statement is constructed and then executed. Feedback is provided as to the number of records added to the database. A ‘try-except-finally’ block is used to catch any errors that may occur, as well as close the database connection, regardless of whether the data generation is successful or not.

import datetime
import os
import random
import sqlite3

# Database.
database = 'c:\\demo\\testDB.db'
connect = None

# Check if database file exists.
if not os.path.isfile(database):

    # Confirm incorrect database location and stop program execution.
    print("Error locating database.")
    quit()

try:

    # Connect to database.
    connect = sqlite3.connect(database)

except sqlite3.DatabaseError as e:

    # Confirm unsuccessful connection and quit.
    print("Database connection unsuccessful.")
    quit()

# Number of records to generate.
recordsToGenerate = 500

# First names.
fname = (
    ("Oliver", "M"), ("Noah", "M"), ("Harry", "M"),
    ("Leo", "M"), ("Charlie", "M"), ("Jack", "M"),
    ("Freddie", "M"), ("Alfie", "M"), ("Archie", "M"),
    ("Theo", "M"), ("Olivia", "F"), ("Sophia", "F"),
    ("Amelia", "F"), ("Emily", "F"), ("Ava", "F"),
    ("Isla", "F"), ("Isabelle", "F"), ("Charlotte", "F"),
    ("Layla", "F"), ("Freya", "F")
)

# Last names.
lname = (
    "Smith", "Johnson", "Williams", "Jones",
    "Brown", "Davis", "Miller", "Wilson",
    "Taylor", "Anderson", "Thomas", "White",
    "Martin", "Thompson", "Robinson", "Clark",
    "Walker", "Young", "Wright", "Hill"
)

# Male titles.
mtitle = (
    "Mr", "Dr", "Prof"
)

# Female titles.
ftitle = (
    "Miss", "Mrs", "Ms", "Dr", "Prof"
)

# Dates for random date of birth range.
dateToday = datetime.date.today()
startDob = datetime.date(dateToday.year - 100, dateToday.month, dateToday.day)
endDob = datetime.date(dateToday.year - 20, dateToday.month, dateToday.day)

# Record count.
recordCount = 0

try:

    # Cursor to execute query.
    cursor = connect.cursor()

    # Generate specified number of records.
    for i in range(0, recordsToGenerate):

        # Randomly select a first name and associated gender.
        randomNumber = random.randint(0, len(fname) - 1)
        firstname = fname[randomNumber][0]
        gender = fname[randomNumber][1]

        # Randomly select a last name.
        randomNumber = random.randint(0, len(lname) - 1)
        lastname = lname[randomNumber]

        # Randomly select a title based on the gender.
        if gender == "M":

            randomNumber = random.randint(0, len(mtitle) - 1)
            title = mtitle[randomNumber]

        else:

            randomNumber = random.randint(0, len(ftitle) - 1)
            title = ftitle[randomNumber]

        # Randomly select a date of birth.
        dob = startDob + (endDob - startDob) * random.random()

        # Query text.
        sqlPersonInfo = \
            "INSERT INTO person (firstname, lastname, title, dob)  \
             VALUES (?, ?, ?, ?)"

        # Execute query and commit changes.
        cursor.execute(sqlPersonInfo, (firstname, lastname, title, dob))
        connect.commit()

        # Increment the record count.
        recordCount += 1

    # Provide feedback on the number of records added.
    if recordCount == 0:

        print("No new person records added.")

    elif recordCount == 1:

        print(str(recordCount) + " person record added.")

    else:

        print(str(recordCount) + " person records added.")

except sqlite3.DatabaseError as e:

    # Confirm error adding person information and exit.
    print("Error adding person information.")
    quit()

finally:

    # Close database connection.
    connect.close()