Python Sets

Sets in Python can contain multiple unique values of varying data types. Like both lists and tuples, sets are iterable, meaning that it is possible to cycle through the values within to carry out whatever actions are required. Sets however, don’t have an index like lists and tuples.

The example below creates a set of strings called ‘names’, adds four names, then, using a ‘for’ loop, displays the names in the console.

names = set()

names.add("George")
names.add("Bob")
names.add("Fred")
names.add("Bob")

for name in names:
    print(name)

The output from this will be as follows.

Fred
Bob
George

There are a few things to note with this output. Firstly, four names were added to the set but only three names were printed out to the console. As previously stated, sets contain unique values, so if an attempt is made to add a value that already exists, it isn’t added again. No error is produced in this situation.

The second thing to note is that the values from the set are not in the same order as they were added to it. Sets are unordered, so the order of the output cannot be guaranteed.

As well as adding values individually after the set has been defined, it is also possible to add values at the same time as it is being defined. This combines the set declaration, together with the four statements that utilise the ‘add’ method, in to one single statement. When populating a set at the same time as the declaration, curly braces are used. It is not possible to create an empty set, as in the above example, with empty curly braces because this would create an empty dictionary instead. This is the reason for using the ‘set’ function above.

names = {"George", "Bob", "Fred", "Bob"}

In order to add another name to the set it is just a case of repeating how “George”, “Bob” and “Fred” were added in the first example above with the ‘add’ method.

names.add("Andrew")

It is also possible to add multiple values to a set in one statement using the ‘update’ method.

names.update({"Jim"}, {"John"})

If the names were displayed in the console now using the above method, the output would be as follows.

Andrew
Fred
George
Bob
John
Jim

As well as being able to add values to an existing set, it is also possible to remove items using the ‘remove’ method.

names.remove("Bob")

As the order of values in a set is not guaranteed and there is no ‘sort’ method, similar to that available for lists, the only way of sorting the items in a set is to convert it to a list and then sort the list. Converting a set to a list is done using the ‘list’ function, as shown in the below example.

names_list = list(names)
names_list.sort()

for name in names_list:
    print(name)

The output will now show the names in alphabetical order.

Andrew
Fred
George
Jim
John

Comparing Sets

There are a number of methods provided that can be used to compare sets. These are outlined below.

Union

The ‘union’ method takes all the values from two or more sets and combines them into another set. The example below shows the use of ‘union’ with two sets, with names from the resulting set output to the console.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}

names3 = names1.union(names2)

for name in names3:
    print(name)

As sets contain only unique values, only one instance of the names ‘Andrew’ and ‘Bob’ are in the resulting set.

George
Sam
Sally
Andrew
Bob
Fred

If it is required to use the ‘union’ method with more than two sets, a comma is used to separate the additional sets within the parenthesis.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}
names3 = {"Craig", "Andrew", "Robert", "Bob", "Jeremy"}

names4 = names1.union(names2, names3)

for name in names4:
    print(name)

The resulting set now includes the unique values from all three sets.

Jeremy
Andrew
Bob
Craig
Sam
Fred
Sally
George
Robert

As an alternative to using the ‘union’ method, the union operator, which is a ‘|’ can be used. The above example can be re-written as follows to produce the same result.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}
names3 = {"Craig", "Andrew", "Robert", "Bob", "Jeremy"}

names4 = names1 | names2 | names3

for name in names4:
    print(name)

Intersection

The ‘intersection’ method can be used to find the common values in two or more sets.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}
names3 = {"Craig", "Andrew", "Robert", "Bob", "Jeremy"}

names4 = names1.intersection(names2, names3)

for name in names4:
    print(name)

Here, the names ‘Andrew’ and ‘Bob’ are output to the console as these are the only names common to all three sets.

Andrew
Bob

As with ‘union’, there is an equivalent operator, an ‘&’, that can be used instead of the ‘insersection’ method.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}
names3 = {"Craig", "Andrew", "Robert", "Bob", "Jeremy"}

names4 = names1 & names2 & names3

for name in names4:
    print(name)

Difference

The ‘difference’ method produces a resulting set that contains values from a set that aren’t present in one or more other sets.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}
names3 = {"Craig", "Andrew", "Robert", "Bob", "Jeremy"}

names4 = names1.difference(names2, names3)

for name in names4:
    print(name)

Only two names from the first set are not present in either of the other two.

George
Fred

Again, there is an equivalent operator, a ‘-‘, to the method.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}
names3 = {"Craig", "Andrew", "Robert", "Bob", "Jeremy"}

names4 = names1 - names2 - names3

for name in names4:
    print(name)

Symmetric Difference

The ‘symmetric_difference’ method produces a result that contains all the values that are not shared by two sets.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}

names3 = names1.symmetric_difference(names2)

for name in names3:
    print(name)

Four names are not shared by the two sets.

George
Sally
Sam
Fred

The equivalent operator, that can be used instead of the ‘symmetric_difference’ method is a ‘^’.

names1 = {"George", "Bob", "Fred", "Andrew"}
names2 = {"Sam", "Sally", "Andrew", "Bob"}

names3 = names1 ^ names2

for name in names3:
    print(name)