A useful program usually needs to interact with the outside world. Such interaction can involve receiving data or sending data outside the program. Data received from outside is called input data while the data the program sends outside is called output data. Together input and output operations are often referred to as I/O (Input/Output) operations.

How do you think you can provide inputs to a computer?

The input may directly come from the user via the keyboard, an external file, or even the internet. We can display output directly to the console, which we have been doing so far using the print() function. A program can also output data by storing data in a file, database, or sending data through the internet. Let's look into some basic I/O operations in Python.

Input and Output

In primary input and output operations, let's look into how we can get user data using the keyboard and output data into the console screen.

Getting User Data using input

Programs that require input data from the user can use the built-in function input(<prompt>). The input function reads a line of input from the keyboard. When a program has an input statement, Python pauses the program execution to allow the user to type a line of input. After the user types all the characters and presses the Enter key, Python returns the typed character as a string object.

Let's look at the usage of the input function.

>>> your_name = input()
Primer				 # User Input
>>> your_name        # User Input is stored in the name `your_name`
'Primer'

You can also include an optional prompt argument to the input() function. The prompt argument displays a text string as a prompt to the user before pausing the program execution to read input. We can rewrite the above code listing to include a prompt.

>>> your_name = input("May I know your name? ")
May I know your name? Primer    # Prompt is shown before the input
>>> your_name
'Primer'

The input function always returns a string. If you wish to get a numeric value from the input data, you will need to convert the string to an appropriate type. We can convert the string type using numeric types such as int or float built-in functions.

>>> your_age = input("May I know you age ? ")
May I know you age ? 16
>>> your_age
'16'					# String
>>> int(your_age)
16						# Integer

We can convert the input directly to the given type by passing it to the built-in functions int or float.

>>> your_birth_year = int(input("May I know your year of birth ? "))
>>> your_birth_year
2001                # Integer

Take a look at the following code.

>>> options = ["A", "B", "C", "D"]
>>> choice = input()
'1'
>>> choice = __X__
>>> options[choice]
'B'

What is the value of __X__ for which the above code is correct?

  1. int(choice)
  2. int(input)
  3. int('B')
  4. None of the above

Let's understand the input function by creating a small program. Let's create a program that asks the user to guess a random integer by providing hints.

from random import randint            # To generate a random integer

secret_number = randint(1, 99)        # A random secret number

while True:
    try:
        guess = int(input("Guess a number between 1 and 99: "))
    except ValueError:                # In case user types strings
        print("Your guess must be number")
    except KeyboardInterrupt:        # In case user quits abruptly
        print("Quitting the game. Thank you for playing.")
        break
    else:                            # Will run if no exceptions are raised
        if guess < secret_number:
            print("Your guess is low")
        if guess > secret_number:
            print("Your guess is high")
        if guess == secret_number:
                print("You got it correct. It is {}".format(secret_number))
            break
guessing_game.py

The output of the script guessing_game.py is shown below.

Guess a number between 1 and 99: 34
Your guess is high
Guess a number between 1 and 99: 25
Your guess is high
Guess a number between 1 and 99: 10
Your guess is low
Guess a number between 1 and 99: 11
You got it correct. It is 11
Output from guessing_game.py
Do you mind writing a small walk-through of the guessing_game.py in your own words?

The guessing_game script is an excellent demonstration of how the input() function can create interactive Python programs. Let's take a look at how it works.

The guessing_game script works in the following way:

  • Python assigns a randomly generated integer between 1 and 99 to secret_number.
  • The condition for the while loop remains true unless the user correctly guesses the secret_number or purposely quits the program by pressing Ctrl+c.
  • The try block gets the input and converts it to an integer using the int function
  • If the user inputs anything other than a number, Python raises ValueError, which is caught by the except handler
  • The else clause only runs when the try block doesn't raise any exceptions.
  • If the user guesses correctly, the program ends.

Now that we have a bit of an idea about getting input data through the input function. Let's do an exercise.

Take a look at the following code.

while True:
	try:
		mood = input("What's your mood : ")
		if mood.strip().lower() == 'q' or mood.strip().lower() == 'exit':
			print("Thank you for sharing.")
			break
		print(f"Aha ! You are {mood}")
	except KeyboardInterrupt:
		print("Thank you for sharing.")
		break

How can you quit the above script once you have started?

  1. Typing exit and pressing Enter.
  2. Typing Q and pressing Enter.
  3. Typing EXIT and pressing Enter.
  4. All of the above

Reading Command Line Arguments

When we execute a python script using the command line, Python lets you access the command line arguments provided using the sys module. To understand, let's create a small python script and save it as check_arguments.py.

The check_arguments.py script below is a small script that simply prints out sys.argv to the screen.

import sys
print(sys.argv)
check_arguments.py

If you execute the python script directly, you will receive the following output.

> python check_arguments.py
['check_arguments.py']            # Can be accessed by `sys.argv[0]`

The sys.argv module lets you access the list of command-line arguments passed to a Python script. The first argument is the script name, which depends on the operating system as to whether this is a full pathname or not. Let's add some more arguments.

> python check_arguments.py hello world 1 2 3
['check_arguments.py', 'hello', 'world', '1', '2', '3']

All the arguments separated by space are presented as a python list that can be be accessed using their respective index.

Say we have the following script.

import sys
if len(sys.argv) == 4:
	_, dad, mom, son= sys.argv
	print(f"{dad} is married to {mom} and {son} is their son")

The output of the above script is shown below:

python3 happy_family.py __A__
Luffy is married to Boa, and Toffee is their son

What is the __A__ in the output of the above?

  1. Luffy Boa Toffee
  2. Boa Luffy Toffee
  3. Toffee Luffy Boa
  4. Luffy Toffee Boa

Now, let's look into how we can read files in Python.

Reading and Writing Files

One common use of programs is to read, and write files. Python provides the built-in function open() to allow you to open a file for reading as well as writing a file. The open() function returns a file object with various reading and writing operations methods. We will look into the file object later. First, let's look into how we can use the open() function.

Opening Files

To start reading files in Python, let us create a plain text file named epictetus.txt and save the following text.

epictetus.txt

How long are you going to wait before you demand the best for yourself and, in no instance, bypass the discriminations of reason?

You have been given the principles that you ought to endorse, and you have endorsed them. What kind of teacher, then, are you still waiting for to refer your self-improvement to him?

You are no longer a boy but a full-grown man. If you are careless and lazy now and keep putting things off and always deferring the day after which you will attend to yourself, you will not notice that you are making no progress, but you will live and die as someone quite ordinary.

From now on, then, resolve to live as a grown-up who is making progress, and make whatever you think best a law that you never set aside. And whenever you encounter anything difficult or pleasurable, or highly or lowly regarded, remember that the contest is now: you are at the Olympic Games, you cannot wait any longer, and that your progress is wrecked or preserved by a single day and a single event.

That is how Socrates fulfilled himself by attending to nothing except reason in everything he encountered. And you, although you are not yet a Socrates, should live as someone who at least wants to be a Socrates.

In the same folder, start a Python interpreter. To read the epicteus.txt file in Python, use the following code listing.

>>> text_file = open("epictetus.txt")            # Open the file

If the above code runs and doesn't result in the FileNotFoundError exception, it means Python could successfully open the file.

Opening a file in Python doesn't mean the same thing as opening it in the text editor. The open function returns a file-object iterator, which knows how to get the text file data.

How do we test if an object is an iterator?

To check if the text_file is an iterator, let's pass the object to the next() function.

>>> next(text_file)
'How long are you going to wait before you demand the best for yourself and in no instance bypass the discriminations of reason? \n'

The above code illustrates that the text_file object is an iterator. We will check into more of its method later on.

The first thing you should know after learning to open a file is how to close the file. In Python, we can close the file using the close() method available on the file-object.

>>>> text_file.close()                # File is closed

Once you close the file, invoking the next() using the file-object will raise ValueError.

>>> next(text_file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.
In Python, you should always close the file you open.

When the script or application is terminated, Python closes the file on its own. However, there is no guarantee when that will happen. In the meantime, this can cause some unwanted behavior, including resource leaks.

It is in your best interest to always make sure that your code behaves in a way that is well-defined while trying to reduce unwanted behavior.

When you are working with files, there are two main ways that you can use to ensure that the files are closed properly, even when encountering an error.

The first way is to use the try-finally clause to work with files.

text_reader = open('epictetus.txt')
try:
    # Operations using the file-object
finally:
    text_reader.close()                # Always executes
In the previous code, we are using the try-finally clause to close an opened file. What happens in the finally clause when the try encounters an error?

Even if Python encounters an error, it always executes the finally clause. This block ensures that Python always closes the file properly after executing the statements.

The second way to close the file properly is to use the with statement to open a file.

with open('epictetus.txt') as text_reader:
    # Operations using the file-object
After Python executes the code inside the with statement, the with statement automatically closes the file.

Similar to the finally clause, the with statement closes files even when Python encounters an error.

Which of the two ways of opening files you like better and why?
Although you can use any of the two methods, we recommend using the with statement to allow for a cleaner code while handling any unexpected errors.

Next, let's look at the file objects.

File Objects

The open() function takes an argument mode, specifying how the file is opened and returns a file-object. The default mode is r, which means open for reading in text mode. The available modes are as shown in the table 3.

Table 3: Modes in the open() function
Character Meaning
r open for reading (default)
w open for writing, overwriting the file first
rb or wb Open, read and write in binary mode

When we earlier opened a file without specifying the mode, Python assumed we wanted to open in r reading mode. Let's read the contents of the file epictetus.txt in Python.

>>> with open('epictetus.txt') as text_reader:    # Same as open('epictetus.txt', 'r')
...     for line in text_reader:
...             line
'How long are you going to wait before you demand the best for yourself and in no instance bypass the discriminations of reason? \n'
'\n'
...
#### Output Shortened

As the text_reader file-object is an iterator, we can use the for statement to loop over the object. The official documentation describes the file-object as the follows

An object exposing a file-oriented API (with methods such as read() or write()) to an underlying resource. Depending on how it was created, a file object can mediate access to a real on-disk file or another type of storage or communication device (for example, standard input/output, in-memory buffers, sockets, or pipes.). File objects are also called file-like objects or streams.

There are three types of file-objects:

  • raw binary files
  • buffered binary files
  • text files

For this course, we will focus only on file-objects, which are of type text-files.

However, what do you think you are raw binary files?

Binary files are files where the format is made up of non-readable characters usually stored in binary format. Binary files can range from image files like JPEGs or GIFs, audio files like MP3s or binary document formats like PDF. In Python, files are opened in read mode by default. Now, take a look at how to read files.

Reading Files

The file objects have methods relating to reading operations for files.

Table 2: File-objects methods
Method Description
read() Reads and returns entire file
readable() Return whether object opened for reading
readline() Read and return a line
readlines() Returns a list object with lines from the file
read

The read(size=-1) method reads from the file based on the number of size bytes provided as an argument. If you pass no arguments or the arguments that you give are either None or -1 as an argument, then the entire file is read and returned as a string.

>>> with open('epictetus.txt') as reader:
...     print(reader.read(30))
...
How long are you going to wait

In the above code listing, we are passing 30 as a size argument to specify the bytes to return.

readable

To check whether we opened the file for reading, we can use the readable() method, which returns either True or False.

>>> with open('epictetus.txt', 'w') as reader:
    	# Open in `write` mode
...     reader.readable()
...
False
readline

The readline(size=-1) method returns bytes specified by the size argument. Python reads the entire line if the size is -1 or None or not provided.

>>> with open('epictetus.txt') as reader:
...     reader.readline()
...
'How long are you going to wait before you demand the best for yourself and in no instance bypass the discriminations of reason? \n'
readlines

The readlines method reads from the file object and returns them as a list object.

>>> with open('epictetus.txt') as reader:
...     reader.readlines()
...
['How long are you going to wait before you demand the best for yourself and in no instance bypass the discriminations of reason? \n', ..., 'That is how Socrates fulfilled himself by attending to nothing except reason in everything he encountered. And you, although you are not yet a Socrates, should live as someone who at least wants to be a Socrates.']        # List Truncated

For which value of __A__, the following code returns a list object?

>>> with open('epictetus.txt') as reader:
...     type(reader.__A__())
...
<class 'list'>
  1. readlines
  2. readline
  3. readable
  4. read
Iterating over lines

As we mentioned earlier, the file-object is an iterator. So, we can read lines using the iterator itself. We can use a for loop to iterate over the file-object iterator to read the lines.

>>> with open('epictetus.txt') as reader:
...     for lines in reader:
...             print(lines)
...
How long are you going to wait before you demand the best for yourself and, in no instance, bypass the discriminations of reason?

### Lines Truncated

What's the output of the following code?

try:
	reader = open('epictetus.txt', 'w')
	s = ""
	for lines in reader:
		s += lines
	print(len(s))
finally:
	reader.close()
  1. 1238
  2. 1233
  3. 1235
  4. Raises Error

As we opened the file in writing mode bypassing w mode, Python raises an error while reading it.

Now that we have covered how to read files, let's look into writing into files.

Writing Files

We can write into a file only when we open it in the write mode. If you open a file in the write mode, Python overwrites and removes all the previously stored content. The table 3 specifies the methods available to write into the file.

Table 3: Methods available in the write mode
Methods Description
write(text) Writing text to the stream
writelines(lines) Writes the sequence lines to the file.
writable() Returns whether object was opened for writing
write

The write(text) can be used to write strings to a file. The following code listing below illustrates writing to a file. Let's add a serial number to each line in epictetus.txt by writing into a new file numbered_epictetus.txt.

>>> with open('epictetus.txt') as reader:
...     text = reader.readlines()
>>> with open('numbered_epictetus.txt') as writer:
...        for index, line in enumerate(epictetus):
...            writer.write("{}. {}".format(index + 1, line))
writer.py

The output of writer.py is shown below.

1.How long are you going to wait before you demand the best for yourself and, in no instance, bypass the discriminations of reason? 2.
3.You have been given the principles that you ought to endorse, and you have endorsed them. What kind of teacher, then, are you still waiting for to refer your self-improvement to him? 4.
...
Output of writer.py

In the above code,

  • we read all lines from the epictetus.txt and name it text.
  • Then, we open a new file, numbered_epictetus.txt in the write mode and write using the write() function.
  • If you check the directory, you can see that Python creates a new file, numbered_epictetus.txt with each line number listed in the first few characters of the line.

We can also work with two different files at the same time while using with statements.

>>> with open('epictetus.txt') as reader, open('numbered_epictetus_2.txt') as writer:
...        for index, line in enumerate(reader.readlines())
...            writer.write("{}. {}".format(index + 1, line))

The above code listing produces the same file as we saw earlier.

You might notice that the output numbers blank lines as well. We need to remove the blank lines.

with open('epictetus.txt') as __A__, open('numbered_epictetus_2.txt', 'w') as __B__:
	text =  [line + "\n" for line in reader.readlines() if line != "\n"]

    for index, line in enumerate(__C__):
		writer.write(f"{index+1}.{line}")

The above code removes the lines containing newline characters and an additional newline for spacing between paragraphs. What are the values of A, B, and C?

  1. A: reader, B: writer and C: text
  2. A: reader, B: writer and C: reader.readlines()
  3. A: writer, B: reader and C: text
  4. A: text, B: writer and C: reader

Similar to the readlines method for reading files, file objects also have writelines. Let's take a look.

writelines

The writelines method writes sequences to a file. Let's look into an example to understand more.

>>> with open('squares_and_cubes.txt', 'w') as writer:
        a = [(x, x**2, x**3) for x in range(10)]
        writer.writelines(["The square and cube of {} are {} and {} respectively.\n"
                  .format(x, y, z) for x, y, z in a ])

In the above code listing, we pass a list constructed by list comprehension as an argument to the writelines method. Once you have executed the code above and check the directory, you will find a text file squares_and_cubes.txt with the following text.

The square and cube of 0 are 0 and 0, respectively.
The square and cube of 1 are 1 and 1, respectively.
The square and cube of 2 are 4 and 8, respectively.
The square and cube of 3 are 9 and 27, respectively.
...
The writelines and write are two main writing text methods in Python files. Can you think of the main difference between these two methods?

The write() function accept only string arguments while writelines() function takes sequences as arguments.

Python can also work with files such as csv files, which we can open in spreadsheet software such as Microsoft Office, Google Sheets, or LibreOffice Calc. Let's look into how we can do that.

Reading and Writing CSV

A csv file is a type of plain text file that uses a special formating structure to represent tabular data. The csv stands for Comma Separated Values.

To understand why, let's look at an example of a csv file. In a csv file, each line corresponds to one row of the table while a comma separates each cell. Because we use a comma to separate each piece of data, csv is called Comma Separated Values, and the comma is called delimiter. You can use other delimiters as well, such as hyphen -, or space.

Suppose we have a table of data shown in the table 4.

Table 4: Sample Data
Name City Age
John Berlin 45
Mark London 34
Liu Shanghai 45
Balakrishna Chennai 33
Sofia Istanbul 34


We can represent the tabular data in the form of csv in the following form.

person_data.csv

name,city,age
John,Berlin,45
Mark,London,34
Liu,Shanghai,45
Balakrishna,Chennai,33
Sofia,Istanbul,34

Save the file with the text shown above with the name person_data.csv and start an interpreter in the same directory.

Reading CSV File

The csv module exposes a handy function reader(<file-object>) which can parse a given csv file. Let's see it in action.

>>> import csv
>>> with open('person_data.csv') as csv_file:
...        csv_reader = csv.reader(csv_file)
...        print(list(csv_reader))
[['name', ' city', ' age'], ['John', ' Berlin', ' 45'], ['Mark', ' London', ' 34'], ['Liu', ' Shanghai', ' 45'], ['Balakrishna', ' Chennai', ' 33'], ['Sofia', ' Istanbul', ' 34']]

In the above code listing,

we open the file person_data.csv using with statement as csv_file.

To parse the csv file we pass the csv_file to the reader() function of csv module.

The reader() function returns an iterator, which upon each iteration returns a row of the csv file.

Let's parse and store the data in a Python dictionary using dictionary comprehension.

>>> from csv import reader
>>> with open('person_data.csv') as csv_file:
...    csv_reader = reader(csv_file)
...    person_dict = {
...        name:{"city" : city, "age": int(age)}     # Convert the age to integer
...        for index, (name, city, age) in enumerate(tuple(csv_reader))
...        if index != 0 # Skip first line
...    }
>>> person_dict
{'John': {'city': ' Berlin', 'age': 45}, 'Mark': {'city': ' London', 'age': 34}, 'Liu': {'city': ' Shanghai', 'age': 45}, 'Balakrishna': {'city': ' Chennai', 'age': 33}, 'Sofia': {'city': ' Istanbul', 'age': 34}}

Parsing csv using Python is pretty useful, especially when working with a large amount of data.

The following script generates text from the person_data.csv file we earlier created.

import csv

with open('person_data.csv') as csv_file:

	lines = [f"{age} year old {name} is from {city}"
				for (name, city, age) in list(__A__)[1:]]			# A

    for line in lines:
		print(line)						# Prints in the form '45 year old John is from Berlin ...'

What is the value of __A__ in the above code?

  1. csv.reader(csv_file)
  2. csv_file
  3. reader(csv_file)
  4. writer(csv_file)

Apart from the reader function, the csv module has a DictReader function, which helps read csv files and store them directly as Python dictionaries. Let's look into it in the next section.

Reading CSV using DictReader

The DictReader(fileobject, fieldnames) maps the information in each row to a dictionary whose keys are provided to the function using an optional fieldnames parameter. The fieldnames parameter is a sequence. If you omit the fieldnames, Python uses the first row of the file-object values as the fieldnames.

Let's read the persons_data.csv using DictReader function of csv module.

>>> from csv import DictReader
>>>  with open('person_data.csv') as csv_file:
...        csv_reader = DictReader(csv_file)
...        for row in csv_reader:
...            print("{} year old {} belongs to {}"
...                  .format(row["age"], row["name"], row["city"]))
45 year old John belongs to Berlin
34 year old Mark belongs to London
45 year old Liu belongs to Shanghai
33 year old Balakrishna belongs to Chennai
34 year old Sofia belongs to Istanbul

In the above code listing, the csv_reader is an iterator resulting from DictReader. Python references each line item in a row using it's column name. We have not provided the optional fieldnames parameter.

Let's provide a fieldnames parameter to the DictReader function.

>>> with open('person_data.csv') as csv_file:
    	next(csv_file)				# Skip first row
		csv_reader = DictReader(csv_file, fieldnames=["Nom", "Ville", "Âge"])
		print(list(csv_reader))
[{'Nom': 'John', 'Ville': 'Berlin', 'Âge': '45'}, {'Nom': 'Mark', 'Ville': 'London', 'Âge': '34'}, {'Nom': 'Liu', 'Ville': 'Shanghai', 'Âge': '45'}, {'Nom': 'Balakrishna', 'Ville': 'Chennai', 'Âge': '33'}, {'Nom': 'Sofia', 'Ville': 'Istanbul', 'Âge': '34'}]

When we passed the fieldnames parameter with our custom column names, the DictReader function generates dictionary objects with the specified keys.

Note that we have to skip the first row manually; otherwise, it would have been added to the dictionary object as well.

What is the value of __A__ in the below code?

>>> from csv import DictReader
>>> with open('person_data.csv') as csv_file:
		next(csv_file)
        csv_reader = DictReader(csv_file, fieldnames = __A__ )	# A
		for row in csv_reader:
			print(f"{row['age']} year old {row['first_name']} is from {row['location']}")
45 year old John is from Berlin
34 year old Mark is from London
45 year old Liu is from Shanghai
33 year old Balakrishna is from Chennai
34 year old Sofia is from Istanbul
  1. ["first_name", "location","age"]
  2. ["age", "name","location"]
  3. ["location", "first_name","age"]
  4. ["name", "city","age"]

We can also use the csv module to write csv files in Python. Let's take a look.

Writing a Python List to csv file

Let's say we have the following list of data.

>>> person_details = [
    ['John Doe', 'Berlin', 'Germany', 45],
    ['Mark Waugh', 'London', 'United Kingdom', 34],
    ['Liu Xi', 'Shanghai', 'China', 45],
    ['Balakrishna Ram', 'Chennai', 'India', 33],
    ['Sofia Khan', 'Istanbul', 'Turkey', 34]
]

To save the list object as a csv object, we can use the writer function from the csv module.

>>> with open('person_details.csv', mode='w') as csv_file:
        csv_writer = csv.writer(csv_file)
        csv_writer.writerow(['firstname', 'lastname', 'city', 'country', 'age'])
        for person in person_details:
            csv_writer.writerow([
                person[0].split()[0],  # Firstname
                person[0].split()[1],  # Lastname
                person[1],             # City
                person[2],             # Country
                person[3]              # Age
                ]
            )

The writer function takes the file-object as an argument and exposes a function writerows, which can write a row to the csv file.

In the above code, we first write the column names in the first row. It is a practice to write the column name for respective columns in the first row of a csv file. Then for each item in the person_details, we write a row in the csv file.

If you can successfully run the code, you should see a file resembling below.

firstname,lastname,city,country,age
John,Doe,Berlin,Germany,45
Mark,Waugh,London,United Kingdom,34
Liu,Xi,Shanghai,China,45
Balakrishna,Ram,Chennai,India,33
Sofia,Khan,Istanbul,Turkey,34

If you can generate such a file, you have successfully written a python list object to a csv file.

Let's say we need the data in the csv file in the following format.

Name, Location, Age
John Doe, Berlin-Germany, 45 years old
...

The following code achieves the same.

import csv
person_details = [
    ['John Doe', 'Berlin', 'Germany', 45],
		... ]		# Shortened for brevity

with open('person_details.csv', mode='w') as csv_file:
        csv_writer = csv.writer(csv_file)
        csv_writer.writerow(['Name', 'Location', 'Age'])
        for person in person_details:
            csv_writer.writerow([
                person[0],  # name
                __A__,	# City - Country (A)
                person[3]	# Age
                ]
            )

What is the value of __A__?

  1. f"{person[1]}-{person[2]}"
  2. person[1]
  3. f"{person[2] - person[1]}
  4. person[2]

The writerow function of the csv module is useful if you want to write sequences of data row into a csv file. If you wish to write sequences of dict objects into csv, the module offers another function. Let's take a look.

Writing a Python Dict to csv file

Let's say we have a list of dictionary objects, as shown below.

>>> persons_info = [{'firstname': 'John',
  'lastname': 'Doe',
  'city': 'Berlin',
  'country': 'Germany',
  'age': 45},
 {'firstname': 'Mark',
  'lastname': 'Waugh',
  'city': 'London',
  'country': 'United Kingdom',
  'age': 34},
 {'firstname': 'Liu',
  'lastname': 'Xi',
  'city': 'Shanghai',
  'country': 'China',
  'age': 45},
]

We can write the dictionary list object to a csv file using the DictWriter function.

>>> from csv import DictWriter
>>> with open('persons_info.csv', 'w') as csv_file:
...           fieldnames = ['firstname', 'lastname', 'city', 'country', 'age']
...        writer = DictWriter(csv_file, fieldnames=fieldnames)
...        writer.writeheader()
...        for person in persons_info:
...            writer.writerow(person)

The DictWriter function accepts file-object and fieldnames as required parameters. The fieldnames argument is used by the DictWriter to determine the keys and values from the python dictionaries and write the header row. The above code generates the following csv file.

persons_info.csv

firstname,lastname,city,country,age
John,Doe,Berlin,Germany,45
Mark,Waugh,London,United Kingdom,34
Liu,Xi,Shanghai,China,45

We write the following script to generate a csv file.

from csv import DictWriter

pirates = [{"captain": "Luffy", "name": "Zorro", "group": "Strawhat"}]

with open('pirates_info.csv', 'w') as csv_file:
	fieldnames = ['name', 'captain', 'group']
	writer = DictWriter(csv_file, fieldnames=fieldnames)

	writer.writeheader()

	for pirate in pirates:
		writer.writerow(pirate)

What's the order of the items in the generated csv file?

  1. Luffy, Zorro, Strawhat
  2. Strawhat, Zorro, Luffy
  3. Zorro, Luffy, Strawhat
  4. Strawhat, Luffy, Zorro

As mentioned before, we use the files in the csv formats primarily to exchange data between different applications. There is another commonly used data exchange format called JSON. Let's take a look next.

Reading and Writing json

JSON is a data format widely used to exchange data. JSON stands for JavaScript Object Notation and was inspired by a subset of JavaScript Programming language.

JSON has become a language-agnostic or language-independent data format due to its simplicity. Like files written in csv format, files written in JSON format are both readable by machines and humans. JSON is a format that encodes objects in a string.

Serialization is the process of encoding JSON while decoding data is called deserialization.

Say have a python dictionary object.

{foo: [1, 2, 3], bar: "Hello"}

You can serialize it into a string in the JSON format.

'{"foo": [1, 2, 3], "bar": "Hello"}'

This string can be stored or sent anywhere using the internet. The receiver can then recover the underlying data by deserialization. For instance, we can decode the above string in JavaScript in the following way. You can try it out in the browser console.

// Try out in your browser console.
> JSON.parse('{"foo": [1, 2, 3], "bar": "Hello"}')
foo: (3) [1, 2, 3]
bar: "Hello"

The python list object becomes the equivalent array object in JavaScript.

As you can see, the JSON format is quite useful in exchanging data. How does it differ from csv?

CSV and JSON are both forms of structured data.

The primary difference is that CSV is a flat data format, which means you only need to know two values, the row number, and column number, to get any value in the file.

JSON is a hierarchical data format. This means values can be nested underneath each other. You may need to know some pieces of information to get different values in the file.

Python has a built-in module, JSON, which can be used to encoding and decoding the JSON format. Let's look into how we can serialize data.

Serialise Data in JSON

Python JSON module has the following functions to serialize data in the JSON format.

Table 5: Functions for serializing data using json module
Function Description
json.dumps(obj) Serialise python object to a JSON formatted str
json.dump(obj, fp) Serialise python object to a file-object fp

The JSON encodes the Python objects to the corresponding JSON objects. By default, Python converts objects using the conversion table shown in Table 6.

Table 6: Conversion from Python Objects to JSON
Python JSON
dict object
list, tuple array
str string
int, float number
True true
False false
None null
You can notice all Python objects are converted to the corresponding JSON formats of the same object in the above table 6. What is the reason that the JSON format has its data types defined?

You might recall that JSON is a data exchange format. Most programming languages have their implementation of lists, dictionaries, and Boolean.

Therefore, to exchange data between programs written in two or more languages, it makes sense to convert them to a common data format (i.e., JSON format) for convenience.

We will start working with the JSON format by converting Python objects into JSON format. Let's look into how we can write JSON to a string.

Writing json to a string using dumps

The dumps function lets you encode a Python object directly into the JSON formatted string.

>>> import json
>>> a = {"foo": (1, 2, 3, None, True), "bar": "Hello"}
>>> json.dumps(a)
'{"foo": [1, 2, 3, null, true], "bar": "Hello"}'

We can see that the dumps function converts python objects to the JSON format's respective objects in the above code.

For larger JSON string, it is often useful to print with indenting for better readability. It is a hierarchical data format. Printing with auto-indenting is called pretty printing because, well, it looks pretty.

Let's convert to a JSON and pretty-print the JSON.

>>> person_details
{'name': 'John Doe', 'age': 44, 'family': ['Jane Doe', 'Another Doe', 'Yet Another Doe'], 'preferences': {'color': 'blue', 'music': 'country folk'}}
>>> print(json.dumps(person_details))                # Without indenting
{"name": "John Doe", "age": 44, "family": ["Jane Doe", "Another Doe", "Yet Another Doe"], "preferences": {"color": "blue", "music": "country folk"}}

The json.dumps() function takes an optional argument indent and prints the JSON array elements and object members at the particular indent.

>>> print(json.dumps(person_details, indent=4))        # Pretty print
{
    "name": "John Doe",
    "age": 44,
    "family": [
        "Jane Doe",
        "Another Doe",
        "Yet Another Doe"
    ],
    "preferences": {
        "color": "blue",
        "music": "country folk"
    }
}

Take a look at the following code listing.

>>> import json
>>> gift = __A__(name="RC Car", qty=4)
>>> json.dumps(gift)
'{"name": "RC Car", "qty": 4}'

What is the value of __A__?

  1. dict
  2. tuple
  3. list
  4. set

We looked at how we can generate JSON-formatted strings. Let's look at how we can directly write JSON-formatted strings to a file.

Writing json to a file using dump

The dump function from the json module lets us encode an object as JSON formatted stream to a file-like object which supports the write operation.

>>> import json
>>> a = {"foo": (1, 2, 3, None, True), "bar": "Hello"}
>>> with open('sample.json', 'w') as writer:
...     json.dump(a, writer)

The above results in creating a file sample.json in the same directory with the following text.

{ "foo": [1, 2, 3, null, true], "bar": "Hello" }

The dump and dumps function use the same conversion table shown in table 6 to convert Python objects to corresponding JSON format. They also take additional arguments, which we can see in the Python documentation[1] of the module.

Let's say we wish to generate the following json file.

{
  "foo": [1, 2, 3, null, true],
  "bar": "Hello"
}

We have written the following code listing, which generates the above file.

import json
sample = {"foo": (1, 2, 3, None, True), "bar": "Hello"}
with open('sample2.json', 'w') as writer:
	json.dump(__A__)

What's the value of A in the above code sample?

  1. sample, writer, indent=2
  2. writer, indent=4, sample
  3. sample2, writer, indent=2
  4. sample2, writer, indent=2

We have seen how to serialize Python objects to JSON data format. Let's see how we deserialize JSON objects to obtain python objects.

Deserialize Data in JSON

We can use the functions shown in table 7 to deserialize JSON formatted strings or JSON files.

Table 7: Functions for deserializing JSON formated strings or files.
Function Description
json.loads(s) Deserialize string s to a Python object
json.load(fp) Deserialize a file-like object fp to Python object

By default, loads and load use the conversion table shown in table 8 to construct Python objects from a JSON formatted file or string.

Table 8: Conversion table from JSON to Python Object
JSON Python
object dict
array list
string str
number(int) int
number(real) float
true True
false False
null None
Compare the deserialization table above with the serialization table shown in table 6. Do you find anything particularly odd?

When we convert a tuple object into JSON, it is converted into an equivalent JSON array object. However, when we convert back the same JSON array into Python, it is converted to a list object instead of a tuple.

Now, let's look into deserialization in a bit more detail.

Deserialize objects from a json string using loads

We have dumped a python object to JSON formatted string and load it back using the loads function.

>>> import json
>>> b = {"foo" : [(1.25, 2.4, 3/5), False]}
>>> c = json.dumps(b)        # '{"foo": [[1.25, 2.4, 0.6], false]}'
>>> json.loads(c)
{'foo': [[1.25, 2.4, 0.6], False]}

The loads function converts a JSON formatted string to a python object using the JSON-to-Python conversion table shown in table 8.

We can note that the loads function converts the array object in JSON format to the list object instead of the tuple object.

What's the output of the following code?

>>> import json
>>> a = {"foo": [1/3, 2/3, 3/3]}
>>> b = json.dumps(a)
>>> c = json.loads(b)
>>> c == a
  1. True
  2. False

We have learned to deserializing objects from a JSON-formatted string. Let's look into how to deserialize objects from the json file.

Deserialize objects from a json file using load

We can also deserialize python objects directly from a JSON file using the load function. To illustrate, let's create and store a JSON file.

>>> import json
>>> person_details
{'name': 'John Doe', 'age': 44, 'family': ['Jane Doe', 'Another Doe', 'Yet Another Doe'], 'preferences': {'color': 'blue', 'music': 'country folk'}}
>>> with open('person_details.json', 'w') as writer:
...     json.dump(person_details, writer)
...

The json.dump() function serialize python objects into the file person_details.json. Now, let's load it back and print on the screen using the json.load() function.

>>> import json
>>> with open('person_details.json') as reader:
...     loaded = json.load(reader)
>>> loaded
{'name': 'John Doe', 'age': 44, 'family': ['Jane Doe', 'Another Doe', 'Yet Another Doe'], 'preferences': {'color': 'blue', 'music': 'country folk'}}

We are deserializing python objects from JSON formatted file and naming it loaded to access it later in the above code listing. As you can see, this can be useful in many scenarios.

There are additional features of the JSON module that can read on the official python documentation.

In this section, we covered how to write and read files using Python. Next, let's look into how to work with files and directories in Python.

Working with Files and Directories

In this topic, we will look into working with files and directories using Python. Can you think of various operations that we can perform on files and directories?

There are several operations that we perform on files and directories. However, in this topic, we will learn in detail only a few operations.

The following is a list of useful functionalities related to the files and directories that we can perform using Python.

  1. Getting the current working directory
  2. Listing contents of a directory
  3. Creating new directories
  4. Deleting Files and Directories
  5. Copying, Moving and Renaming Files and Directories
We start looking into how we can achieve each one of them using Python. Before we do, can you define what a directory is?
Simply put, a directory is a file that acts as a folder for other files and can also contain other directories.

If you run the Python Interpreter in a directory then, that directory is referred to as current working directory (cwd) of Python.

Let's look at how we can get the current working directory in Python.

Getting Current Working Directory

We can obtain the current working directory using the getcwd() function from the os module.

>>> import os
>>> os.getcwd()
'/home/primer'            # Ouput on Unix based OS

The getcwd() function returns a string representing the current working directory. The result will be different for you, depending on your underlying operating system. The file system in a windows varies from that of a UNIX based operating system. Operating systems such as Ubuntu, Arch, Debian, and even Mac OS are UNIX-based operating systems.

Windows-based OS stores store files in folders in different data drive such as C:, D: or E:. Unix-based OS is ordered in a tree structure, starting with the root directory denoted by /.

The UNIX based OSes use forward slash to separate / directories while windows use backward slash \. Linux creates the Home directory at /<name> while for windows, it is usually C:\Documents and Settings\<name>.

Take the following code listing. What's the output of the last statement?

>>> import os
>>> os.getcwd()
'/home/primer'
>>> len(getcwd().split('/'))
  1. 3
  2. 2
  3. 1
  4. Raises NameError

I hope you got some idea about the different file-systems in different families of OSs. Next, let's look into how to list the content of a directory.

Listing contents of a Directory

We will be working from a directory, my_directory, which has the following contents in it.

my_directory/
|
├── my_world/
|   ├── details.json
|
├── expenses/
|    ├── weekly_expenses.csv
|   └── months_expenses.csv
|
└── automating_boring_work.py

Create an empty folder named my_directory and create some files and folders, as shown in the above structure. Start an interpreter inside the folder my_directory.

>>> import os
>>> os.getcwd()
'/home/primer/my_directory'            # On Unix-based
The rest of the topic will assume that you are working on Python Interpreter out of this particular directory.

To list out contents of the current directory, we can use

  • os.listdir(path='.')
  • os.scandir(path='.')

Let's look at each of the functions starting with os.listdir()

Listing directory using os.listdir()

To get the list of files and folders in the current working directory, we can use the os.listdir() function by supplying it with a path string. The os.listdir() returns a list containing entries in the directory given by the path argument.

>>> import os
>>> current_directory = os.getcwd()            # '/home/primer/my_directory'
>>> os.listdir(current_directory)
['automating_boring_work.py', 'my_world', 'expenses']

The os.listdir() has default argument path=".".

The operating system uses the . character to refer to the current directory. At the same time, we can use the .. characters to refer to the parent directory of the current directory.

If we call os.listdir() without any arguments, we will get the same result as above.

>>> import os
>>> os.listdir()        # Same as os.listdir('.')
['automating_boring_work.py', 'my_world', 'expenses']

What will be the output of the following code listing?

>>> import os
>>> os.listdir('.')
'/home/primer/my_directory'
>>> os.listdir('..')
  1. Will show entries in the folder primer
  2. Will show entries inside the folder my_directory
  3. Will show entries inside the folder home
  4. Will raise an error

As I mentioned previously, .. is used to go back to the parent directory.

Now that we can list entries in a directory, there are two main ways to list entries of directories present outside or inside the current directory. Let's take a look.

Suppose we want to access the my_world directory.

  • Absolute Path
    The directory is referenced starting from the root folder such '/home/primer/my_directory/my_world'
  • Relative Path
    We can reference the file starting from its relative position to the current directory ./my_world. We can even omit the ./ that reference to it only as  my_world as the directory is a subdirectory of the current directory.

Let's see if both of these give the same result.

>>> import os
>>> os.listdir('/home/primer/my_directory/my_world')        # Absolute Path
['details.json']
>>> os.listdir('./my_world')                                # Relative Path #1
['details.json']
>>> os.listdir('my_world')                                    # Relative Path #2
['details.json']

We can see that both absolute and relative paths provide the same output.

What's the output of the following code?

>>> import os
>>> os.getcwd()
`'/home/primer/my_directory`
>>> os.listdir(os.getcwd() + '/' + 'expenses')
  1. ['months_expenses.csv', 'weekly_expenses.csv']
  2. ['months_expenses.csv', 'weekly_expenses.csv']
  3. ['details.json']
  4. Raises Error

Another function that also allows us to list our directories is the scandir from the os module. Let's take a look.

Listing directory using os.scandir()

The os.listdir() returns a list of items that can be slow for many operations. A new function os.scandir() was introduced Python 3.5 onwards, which returns an iterator instead of a list.

For instance,

>>> import os
>>> os.scandir()        # Current Directory
<posix.ScandirIterator object at 0x7ff2ce925bd0>     # Iterator

The os.listdir() functions simply returns the list of entries while os.scandir() returns attributes of the entries as well. The scandir iterator returns the os.DirEntry object for each file or directory entry, which provides information about the entry.

Earlier, we used the open() function with the with statement and mentioned that the with statement automatically closes the files after use. This is because the function open() supports context manager protocol. When an iterator supports the context manager protocol, the iterator automatically frees up the handled resources when the iterator is exhausted.

The os.scandir() supports [context manager protocol]{.s} and therefore can be used using with statement. We will learn more about in context manager protocol in later courses.

Let's take a look at the [os.DirEntry]{.s} objects.

>>> import os
>>> for entry in os.scandir():
...     print(entry)
<DirEntry 'automating_boring_work.py'>        # DirEntry Object
<DirEntry 'my_world'>
<DirEntry 'expenses'>

The attributes and methods of os.DirEntry is shown in table 9 which are useful for additional information about the entries.

Table 9: Methods and attributes of os.DirEntry object
Methods and Attributes Description
name The entry’s base [filename, relative to scandir path argument.]{.s}
path The entry's [full path name]{.s}
is_dir() Return True if [this entry is a directory]{.s}
is_file() Return True if [this entry is a file]{.s}

Let's list the names of all the files and folders present in the current directory and their type.

>>> with os.scandir() as entries:						# with statement can be used
...     for entry in entries:
...         print(f"Entry Name: {entry.name}")			# Print entry name
...         print(f"Entry Path: {entry.path}")			# Print path of the entrie
...         print("Entry Type: {}".format("File" if entry.is_file() else "Directory")) # Print whether file
...         print("{}".format("="*30))		# Divider
Entry Name: automating_boring_work.py
Entry Path: ./automating_boring_work.py
Entry Type: File
==============================
Entry Name: my_world
Entry Path: ./my_world
Entry Type: Directory
==============================
Entry Name: expenses
Entry Path: ./expenses
Entry Type: Directory
==============================

The entry.path depends on the [path argument provided to scandir() function]{.s}. If we pass the current directory's absolute path, we will get the [absolute path of each entry as well]{.s}.

>>> with os.scandir('/home/primer/my_directory') as entries:
...     for entry in entries:
...         print(f'Entry Absolute Path: {entry.path}')
Entry Absolute Path: /home/primer/my_directory/automating_boring_work.py
Entry Absolute Path: /home/primer/my_directory/my_world
Entry Absolute Path: /home/primer/my_directory/expenses

As you can see, using a scandir is a better choice when working with Python's files and directories.

What's the output of the following:

>>> import os
>>> os.getcwd()
'/home/primer/my_directory'
>>> with os.scandir() as entries:
...     [entry.name for entry in entries if entry.is_dir() ]
['automating_boring_work.py', 'my_world', 'expenses']
  1. ['my_world', 'expenses']
  2. ['automating_boring_work.py', 'my_world', 'expenses']
  3. ['automating_boring_work.py']
  4. []

Earlier, we represented the tree-structure of our folder with all of its sub-directories listed out. We can replicate the same using scandir(). Let's create a function that lists out the files and sub-directory of a given folder.

Listing all files and sub-directories

Earlier, we represented the tree-structure of our folder with all of its sub-directories listed out. We can replicate the same using scandir(). Let's create a function that lists out the files and sub-directory of a given folder.

We will name our function list_all(), which takes two arguments, path and indent.

>>> def list_all(path=".", indent=0):
        with os.scandir(path) as entries:
            for entry in entries:
                if entry.is_dir():
                    # Directory
                    print("{}+ {}".format("\t"*indent, entry.name))
                    # Recursion
                    list_all(path=entry.path, indent=indent + 1)
                else:
                    # File
                    print("{}- {}".format("\t"*indent ,entry.name))

When we call the function list_all for the current directory, it gives the following result.

>>> list_all()
- automating_boring_work.py
+ my_world
    - details.json
+ expenses
    - weekly_expenses.csv
    - months_expenses.csv
I would like you to write in your words what the function list_all() is doing under the hood?

In the list_all function, we use recursion to get files and directories of each directory. We represent the files using - while representing the directories using the + symbol.

We wrote a custom function, list_all(), to get all the files and directories entries in a given folder. Python provides a function os.walk() that we can use to check out the files and directories. Let's take a look.

Walking through Directories

The function os.walk(top) returns an iterator that returns entries names in the path top by walking the file tree.

Each directory in the tree rooted at directory path top, including top itself, yields a 3-tuple object consisting of (dirpath, dirnames, filenames) . The os.walk(top, topdown=True) accepts an optional argument topdown whose value is True by default.

If the optional argument is not specified or True, the 3-tuple is generated for a directory top is specified before it's sub-directories. Python generates the directories in a top-down approach.

Let's check out an example.

>>> import os
>>> for dirpath, dirnames, filenames in os.walk('.'):
...    	   # Top-down
...        print(dirpath,dirnames, filenames)
. ['my_world', 'expenses'] ['automating_boring_work.py']
./my_world [] ['details.json'] # Entries in the './my_world'
./expenses [] ['weekly_expenses.csv', 'months_expenses.csv']

In the above code, we can see that the above code returns the list of files filenames and directory dirnames in directory path dirpath. It starts with the path provided, which is the current directory in the above code (.). It then goes to respective folders and gets the list of files and directories.

If the topdown argument is False, the 3-tuple object for the directory top is generated after the 3-tuple object for all of its subdirectories have been created. This is the bottom-up approach. Let's take a look.

>>> import os
>>> for dirpath, dirnames, filenames in os.walk('.', topdown=False):    # Bottom-up
...     print(dirpath,dirnames, filenames)
./my_world [] ['details.json']
./expenses [] ['weekly_expenses.csv', 'months_expenses.csv']
. ['my_world', 'expenses'] ['automating_boring_work.py']
# Root directory is listed at last
Can you write the differences between the bottom-up and topdown approach as you have understood so far?

Let's look into how we can create directories using Python.

Making Directories

At some point, you would want to create directories using Python. The os module provides two functions that can help you create directories which is shown in table 10.

Table 10: Functions for creating directories
Function Description
os.mkdir() Create a single directory
os.makedirs() Create multiple directories


Let's look into how it works.

Creating a directory using mkdir

The os.mkdir(path) accepts a path argument and creates a directory in the path. If the directory already exists, Python raises the FileExistsError exception.

>>> import os
>>> os.mkdir('my_new_dir')            # Directory Created
>>> os.mkdir('my_new_dir')            # Raises Exception
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'my_new_dir'

You can use loops to create multiple directories.

Let's create ten new directories within our newly created my_new_dir directory.

>>> for num in range(10):
...        os.mkdir('my_new_dir/dir_{}'.format(num))

If you check in the my_new_dir directory, you will find ten directories with names ranging from dir_0 to dir_9.

The above code requires that the directory my_new_dir exists before it can create other directories.

What is the output of the following?

>>> import os
>>> os.listdir()
['automating_boring_work.py', 'my_world', 'expenses']
>>> os.mkdir('2020/April/10/1200-1300')
  1. Creates successfully
  2. Raises FileNotFoundError
  3. Raises FileExistsError
  4. Raises SyntaxError

For many endeavors, you want to create intermediate-level directories using a single command.

We can do so using os.makedirs. Let's look into that function next.

Creating directories using makedirs

Let's say you want to create directories with the following structure.

2020/
|
└── April/
    |
    └── 10/
        |
        └── 1200-1300/

We can do this using the os.makedirs(name, exist_ok=False).

>>> import os
>>> os.makedirs('2020/April/10/1200-1300')
>>> os.makedirs('2020/April/10/1200-1300') # Will raise FileExistsError

You can check in the current directory that Python has created the tree-level directories shown above. If you call the makedirs function again with the same set of arguments, it will raise the FileExistsError exception.

If you want to override the exception, the function makedirs accepts exist_ok, which is by default False.

>>> import os
>>> os.makedirs('2020/April/10/1200-1300', exist_ok=True)    # Won't raise Exception

The makedirs is useful when you want to create intermediate folders such as we created above.

Can you think of the main difference between the makedirs and mkdir function of the os module?

I guess we have learned how to create directories using Python. For the next trick, let's learn how to delete files and directories using Python.

Deleting Files and Directories

To delete files and directories, Python provides a lot of built-in functions. Let's start with deleting a single file using Python.

Deleting Single Files

To delete single files, Python provides two identical functions in the os module which is shown in table 11.

Table 11: Functions for deleting a single file
Function Description
os.remove(path) Removes the file path
os.unlink(path) Remove the file path


Both remove(path) and unlink(path) are semantically identical functions meaning they have the same functionality. We can use the unlink command to remove files on the Unix-like operating system, hence the name.

Let's create two blank empty files and then remove them using Python.

>>> with open('blank_1.txt', 'w') as file1, open('blank_2.txt', 'w') as file2:
...        pass

The above code creates two empty files, blank_1.txt and blank_2.txt in the current directory. Now, let's remove these files.

>>> import os
>>> os.remove('blank_1.txt')        # Removes file
>>> os.unlink('blank_2.txt')        # Removes file

If you successfully execute the code above, Python would have successfully removed the files.

Suppose you try to pass a file-path that doesn't exist or is a directory. In that case, the above two functions will raise FileNotFoundError or IsADirectoryError, respectively. Therefore, while removing files, we need to ensure that:

  • the path exists and,
  • the path is not a directory

We can check both of these functions using the path sub-module of the os module. The path module has some interesting functions, some of which we can see in the table 12.

Table 12: Functions in the path submodule
Function Description
os.path.exists(path) Return True if path refers to an existing path
os.path.isfile(path) Return True if path refers to an file, False if doesn't exists or is not a file
os.path.isdir(path) Return True if path refers to a directory
os.path.abspath(path) Returns the absolute version of path
os.path.relpath(path, start=os.curdir) Returns the relative file path either from current directory or optional start directory

The functions exists() and isfile() in the os.path module is useful while deleting a given file. Usually, we should check if the path exists using the function isfile as it checks both if a path exists and is a file.

import os
file_path = "some_file"
if os.path.isfile(file_path):
    os.remove(file_path)

The FileNotFoundError , IsADirectoryError can be caught using OSError exception. We can use the try-except block to handle any errors resulting from remove operation safely.

import os
file_path = 'some_file'
try:
    os.remove(file_path)
except OSError as e:
    print(f'Error while removing {dir_path} : {e.strerror}')

What is the output of the code below?

>>> import os
>>> os.listdir()	# Same directory as earlier
[ 'automating_boring_work.py', 'my_world', 'expenses',]
>>> for file in os.listdir():
    	if os.path.isfile(file):
            os.remove(os.listdir()[0])
>>> len(os.listdir())
  1. 2
  2. 3
  3. 4
  4. 0

There is a difference between deleting empty directories and directories having files in them. Let's check how to remove empty directories in the next section.

Deleting Empty Directories

To delete empty directories, we can use the function os.rmdir(path). When the path argument to the rmdir() function doesn't exist or is not empty, FileNotFoundError or OSError exception is raised, respectively.

We can use the following code to safely remove a directory by handling errors in an except block using OSError exceptions.

import os
dir_path = "some_directory"
try:
    os.rmdir(dir_path)
except OSError as e:
    print(f'Error while removing {dir_path} : {e.strerror}')

When you try to remove a non-empty directory some_directory, the above program throws the following message:

Error while removing some_directory: No such file or directory

Below is a code for removing all the empty directories inside the current working directory.

# DON'T EXECUTE THIS CODE
# YOU MIGHT END UP DELETING IMPORTANT
# AND THEN MAIL ME SAYING I CAUSED IT :(

import os

for dirpath, dirnames, filenames in os.walk('.'):
	for dirname in dirnames:
		try:
			os.rmdir(__A__)							# What's A?
			print(f'Removed empty dir: {__A__}')
		except OSError as e:
			print(f'Error while removing {__A__} : {e.strerror}')

What is the value of A?

  1. dirname
  2. dirpath
  3. filenames
  4. dirnames

We checked out how to delete empty directories. Now, let's look into how to delete non-empty directories in the next section.

Deleting Non-Empty Directories

Python has a built-in shutil module, which contains several functions relating to file collections. To delete the non-empty directories, you can use the rmtree() function in the built-in module shutil.

Let's create some empty files in a folder.

>>> import os
>>> os.mkdir('test_dir')
>>> for tempfile in ["test_dir/file_{}.txt".format(num) for num in range(10)]:
...        with open(tempfile, 'w') as writer:
...            pass

The above code will create a director test_dir and populate it with some empty files. We can delete the folder test_dir in the following way:

>>> import shutil
>>> shutil.rmtree('test_dir')            # Folder is deleted

Python deletes the folder test_dir and its contents when the shutil.rmtree() function is invoked. We use the shutil.rmtree to delete directories, empty or otherwise.

The shutil module also offers other functions that we can use to copy, move, and rename files and directories. We will look into that in the next section.

Copying, Moving and Renaming Files and Directories

Often, we need to copy, move, and rename a set of files and folders. The shutil module provides functions for doing that, and I have listed them in the table 13:

Table 13: Functions for copying, moving and renaming
Function Description
shutil.copy(src, dst) Copies the file src to the file or directory dst.
shutil.copy2(src, dst) Identical to copy() except also attempts to preserve file metadata
shutil.copytree(src, dst) Recursively copy an entire directory tree rooted at src to a directory named dst and return the destination directory
shutil.move(src, dst) Recursively move a file or directory src to another location dst and return the destination directory
shutil.rename(src, dst) Rename the file or directory src to dst

Let's look into each of the functions starting with copying files.

Copying Files

The shutil module provides two functions to copy files and directories: copy and copy2. The copy2() function is identical to copy() except it also attempts to preserve file metadata such creation date or last modified date.

Let's create two folders dir_1 and dir_2 and create an empty file test.txt in dir_1 directory.

>>> import os
>>> os.mkdir('dir_1'); os.mkdir('dir_2')
>>> with open('dir_1/test.txt', 'w') as writer:        # Create a file
...        writer.write('Hello World')

We can copy the fiile test.txt in the dir_1 using either shutil.copy() or shutil.copy2().

>>> import shutil
>>> shutil.copy2('dir_1/test.txt', 'dir_2/test_copy.txt')    # same as `shutil.copy`
'dir_2/test_copy.txt'        # Returns path to newly created file

After running the above code, you can check that the directory dir_2 to find a newly created file, test_copy.txt. In the shutil.copy(src, dst) function, the argument src should be a file path while dst can be file path or directory.

If you put the dst argument as a directory, the name of the newly created file will be taken from the base filename of the src file path.

>>> import shutil
>>> shutil.copy2('dir_1/test.txt', '.')        # dst is current directory
'./test.txt'                                # file is copied to current directory.

In the above code, we provided the argument dst to be the current directory (.); therefore, we copied the file in the current directory. Let's do an exercise.

Take a look at the code below.

>>> import shutil, os
>>> os.makedirs('dir1/dir2/dir3')
>>> with open('dir1/text1.py', 'w') as writer:
    	writer.write('print("Hello World")')
>>> shutil.copy2(__A__, __B__) 		# What's A and B?
'dir1/dir2/dir3/text1.py'
  1. A: 'dir1/text1.py', B: 'dir1/dir2/dir3'
  2. A: 'dir1/text1.py', B: 'dir1/dir2/dir3/text1_copy.py'
  3. A: 'dir1/dir2/dir3/text1.py', B: 'dir1/text1.py'
  4. A: 'dir1/dir2/dir3', B: 'dir1/text1.py'

We can also copy and move files directories using Python. Python's shutil provides a copytree() function to do that. Let's take a look.

Copying Directories

The shutil.copytree(src, dst) function copies an entire directory along with its content rooted at path src to a directory specified by path dst. Earlier in the previous exercise, we created the directory dir_1 with a file test.txt. Let's copy the entire directory to a new directory, dir_3.

>>> import shutil
>>> shutil.copytree('dir_1', 'dir_3')
'dir_3'                                    # Directory copied

The shutil.copytree(src, dst) function creates a new directory at dst if the directory doesn't exist. If a directory exists at path dst, Python will raise FileExistsError.

Moving Files and Directories

The shutil.move(src, dst) moves a file or directory at src to another file path specified by dst and returns the path to the newly moved file. If the destination dst is an existing directory, then src is moved inside that directory.

Let's move the entire directory dir_1 inside a dirs folder.

>>> import shutil
>>> shutil.move('dir_1', 'dirs/dir_1') # if `dirs` directory doesn't exist
'dirs/dir_1'

You can check using the files and directories that the directory dir_1 no longer exists and has been successfully moved to the dirs directory.

Suppose the destination is in the same directory as that of the source file or directory. In that case, Python uses os.rename() to rename files and folders. Otherwise, Python copies files and directories from src to dst, and then Python removes them.

Let's look into renaming files and directories in the next section.

Rename Files and Directories

The function os.rename(src, dst) can be used to rename files and folders. Let's create a directory with some temporary files.

>>> import os
>>> os.mkdir('logs')
>>> for tempfile in [f'logs/LoG_{num}.txt' for num in range(10)]:
...        with open(tempfile) as writer:
...            pass
>>> os.listdir('logs')
['LoG_3.txt', 'LoG_1.txt', 'LoG_8.txt', 'LoG_4.txt', 'LoG_0.txt', 'LoG_5.txt', 'LoG_9.txt', 'LoG_6.txt', 'LoG_7.txt', 'LoG_2.txt']

This spelling LoG_x almost hurts the eyes. We need to rename all the LoG_x format files to log_x. Let's rename each file using os.rename(src, path)

>>> with os.scandir('logs') as entries:
        for entry in entries:
            os.rename(entry.path, entry.path.replace('LoG', 'log'))
>>> os.listdir('logs')
>>> os.listdir('logs')
['log_2.txt', 'log_9.txt', 'log_5.txt', 'log_0.txt', 'log_1.txt', 'log_8.txt', 'log_7.txt', 'log_4.txt', 'log_3.txt', 'log_6.txt']

That looks much better.

We have finally completed basic operations related to files and directories using Python. In the next section, we will look into working with modules and creating our modules in Python.

Modules

So far, we have been importing standard built-in modules. We can write a custom module too.

Before we look into how we can create our modules, user, would you like to define a module?

If you write definitions in a file, you can use them in the interactive interpreter or another script by importing them. Such a file is called a module. The name of the file is the module name with the suffix .py.

Let's create a module and add some definitions to it.

We will create a cases module that will define functions that can convert strings to different case styles shown in the table 14.

Table 14: Functions for our custom cases module
Case Name Example Description
Snake Case snake_case Punctuation is removed, and spaces are replaced by single underscore _, and words are lowercased.
Camel Case CamelCase Spaces and Punctuation are removed, and the first letter of each word is capitalized.
Kebab Case kebab-case Punctuation is removed, and spaces are replaced by a single hyphen - and words are lowercased.


Let's write functions to convert a given string to different cases and save it in a cases file. This file is a module that we can import using the import keyword followed by the filename cases.

def snake_case(str, sep=" "):
    	"""
    	Converts a given string to snake_case

		Parameters:
    		str -- Required string to convert to snake case
    		sep -- Optional delimiter for the passed string; defaults to " "
    	"""

   	return "_".join([x.lower() for x in str.strip().split(sep)])

def kebab_case(str, sep=" "):
    """
    	Converts a given string to kebab_case

		Parameters:
    		str -- Required string to convert to kebab case
    		sep -- Optional delimiter for the passed string; defaults to " "
    	"""

    return "-".join([x.lower() for x in str.strip().split(sep)])

def camel_case(str, sep=" "):
    """
    	Converts a given string to camelCase

		Parameters:
    		str -- Required string to convert to camel case
    		sep -- Optional delimiter for the passed string; defaults to " "
    	"""

    return "".join([x.lower() if not index else x.capitalize()
                        for index, x in enumerate(str.strip().split(sep)) ])
cases.py
We wrote the functions for converting the case of strings to camel case, kebab case, and the snake case. Can you write how the camel_case function works in your words?

In the came_case(), we use a list comprehension to generate a list of words to provide to the str.join() method.

To import our newly created module cases, we will have to start a Python interpreter in the same directory. Then we can directly import our module cases.

The code below uses the function snake_case() from our newly created module cases.

>>> import cases
>>> string = "The quick brown fox jumps over the lazy dog"
>>> cases.snake_case(string)
'the_quick_brown_fox_jumps_over_the_lazy_dog'        # Snake Case

We can also directly import functions from the module cases using the from keyword.

>>> from cases import kebab_case, camel_case
>>> string = "The quick brown fox jumps over the lazy dog"
>>> kebab_case(string)
'the-quick-brown-fox-jumps-over-the-lazy-dog'
>>> camel_case(kebab_case(string), sep="-")
'theQuickBrownFoxJumpsOverTheLazyDog'

As you can see, we can import definitions from the module we defined earlier and used it.

Suppose you were to create a module and import it into a Python interpreter. How would you proceed?

As you might have guessed, we can simply create a module_name.py and start the interpreter in the same directory to access the module. There are certain other directories as well, where you can store your modules, and you will be able to import them in Python.

In the next section, let's understand how Python imports modules.

We imported the module, cases using the import statement.

>>> import cases

When the interpreter executes the import foo statement,

  • it first searches for a built-in module with the name foo.
  • If no such built-in module exists, the interpreter searches for a file named foo.py in a list of directories given by the sys.path.

Let's have a look at the list of the directories provided by sys.path.

>>> import sys
>>> sys.path        # The output will be different for everyone
['', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages']

The directories are given by sys.path is initialized from:

  • the directory containing the input script or the current directory when using the interpreter
  • The list of directories given by the environment variable: PYTHONPATH
  • The installation-dependent directory configured at the time of installation

To ensure that Python is able to import your defined module, you should put it in the directories listed above.

Let's say you created a script called math.py with the following content.

pi = "I am pi, the irrational number"

You started an interpreter in the same directory as that of math.py and wrote the following code.

>>> from math import pi
>>> pi

What's the output of the code above?

  1. 3.141592653589793
  2. 'I am pi, the irrational number'
  3. Raises NameError
  4. Raises ValueError

Python first looks for the module in the built-in directory and finds the standard math module. Therefore, it imports the standard module instead of our custom math.py module.

Next, let's look more at importing definitions.

Importing Definitions

The cases module had three function definitions. A module can contain other definitions of objects and expressions, and its content is made available with the import statement. We can import the content in several ways and use the definitions in several ways.

Importing Modules

We can import the module name foo and then access the definition bar by dot notation.

>>> import cases
>>> cases.snake_case()            # Accessing Definition using dot-notation
>>> cases
<module 'cases' from '/home/primer/Python-I/cases.py'>

We can import several modules using an import statement and separating them with a comma.

>>> import foo, bar, foobar

Importing Modules with a different name

We can import modules with a different name using the as keyword in the import statement.

>>> import cases as c
>>> c
<module 'cases' from '/home/primer/Python-I/cases.py'>
>>> c.snake_case("Hello World")
'hello_world'

In the above instance, we are naming the cases module as c. In this case, Python doesn't add the name cases to the current namespace. The imported module has attributes _name__ and _file__, which gives the module's name and path, respectively.

>>> import cases as c
>>> c.__name__
'cases'
>>> c.__file__
'/home/primer/Python-I/cases.py'
Can you think of any particular advantage of importing modules with a different name?

By importing modules with a different name, we can avoid name-collisions with other functions with a similar name. Other advantages include for convenience purposes such as from decimal import Decimal as D, we looked in Chapter 2. Let's look at how to import definitions from modules directly.

Importing definitions from modules directly

We can also import the module's definitions directly using the from keyword in the import statement.

>>> from cases import snake_case, camel_case, kebab_case

In this case, the function definitions imported are directly available as callable in the current namespace.

>>> kebab_case("Hello World")
'hello-world'

Importing every definition from modules

We can use the wild card operator * to import every definition. We can rewrite the above import statement as follows.

>>> from cases import *         # Import everything

The wild card operator * imports all names present in the module apart from those beginning with an underscore (_).

You really shouldn't import every definition using * in your program. Can you think of a particular reason why?

When you import modules and all their definitions, you overwrite any pre-existing with the same name. Importing every definition also limits the names you can assign to new objects.

We don't recommend importing * from the module as it leads to poor code readability. Although you can use it in the interactive interpreter.
Can you guess the function used to check the list of the definition of a module?

List of definitions in a module using dir function

Earlier, we used the dir() function to check the name-space. This dir() function also returns all the properties and methods, even built-in properties, which are the default for an object.

When used on module objects, it returns the list of definitions as well as other attributes. Let's take a look.

>>> import cases
>>> dir(cases)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'camel_case', 'kebab_case', 'snake_case']

The names starting with _ double underscores are special or dunder methods of a module. We can also see that the three functions that can be accessed using the module are camel_case, kebab_case, and snake_case.

Reloading the module

While keeping the interactive interpreter running, let's add another function, say meow_case() to our cases.py and save it. cases.py

...
def meow_case(str):
    return 'Meow'
Modified cases.py

Now, let's re-import the cases module.

...							# Continued from previous session
>>> import cases			# Re-import cases
>>> cases.meow_case('Hello World')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'cases' has no attribute 'meow_case'

For reasons of efficiency, a module is only loaded once per interpreter session. Suppose you make any changes to the cases.py file and check in the interactive interpreter. In that case, you will need to reload the module or restart the interpreter. To reload the module, use a function called reload() from module importlib.

>>> import cases, importlib
>>> importlib.reload(cases)            # reload the cases module
<module 'cases' from '/home/primer/Python-I/cases.py'>
>>> cases.meow_case('Hello World')
'Meow'

The reload() function takes only a module object. Therefore, reloading a function whose definition has been changed but has been imported directly cannot work.

>>> from cases import camel_case
>>> importib.reload(camel_case)		# Reloading imported function will not work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/importlib/__init__.py", line 140, in reload
    raise TypeError("reload() argument must be a module")
TypeError: reload() argument must be a module

Executing Modules as Scripts

So far, we have been importing definitions from the cases module. What do you think happens when you execute the cases module directly as a script?

When we can execute the cases module as the script, nothing happens as the cases.py module has only function definitions.

> python3 cases.py         # Nothing happens

There is a difference between executing the module as a script and importing it as another file.

To understand the difference, let us create a script greeter.py.

greeter.py

def greet(name="World"):
    print(f"Hello {name}")

Let's execute the greeter.py file as a script.

> python greeter.py 		# Nothing happens

Let's import the greeter in the interactive interpreter.

# Changing the name of the imported module
>>> import greeter as G
>>> G.__name__
'greeter'				# Original module name

When we execute a Python script, it runs it as __main__ irrespective of its filename. The file's name is always accessible in the module using the __name__.

Therefore, we can use __name__ == "__main__" condition in an if statement to check if the module is being run in the script mode.

Earlier, we saw that we could access the command-line arguments using the sys.argv module. Let's rewrite the greeter module to add some more functionality while executing in the script mode.

import sys
def greet(name="World"):
    print(f"Hello {name}")

if __name__ == "__main__":
    try:
        greet(sys.argv[1])

    except IndexError:
        # Exception Handling when no command arguments
        greet()
greeter.py

Statements inside the if block will only execute if you execute the module in the script mode. Let's execute the greeter as a script.

> python3 greeter.py
Hello World

Now that our greeter module accepts command line arguments. Let's try some out.

> python3 greeter.py
Hello World
> python3 greeter.py there
Hello there
> python3 greeter.py "there. What is up?"
Hello there. What is up?

Take a look at the script below.

import sys

def read_file(filename):
	with open(f'{filename}') as reader:
		return " ".join(reader.readlines())

if __name__ == "__main__":
    try:
        print(read_file(sys.argv[1]))
	except:
		print("Please enter a valid filename")

What does the above Python script do?

  1. Reads the file name and outputs the number of characters in it
  2. Reads the file given as argument and prints out the content
  3. Reads the file and outputs the number of words in it
  4. Reads the file and outputs the number of characters in it.

We can also access the standard modules can from the command line. To execute standard modules, we need to use the -m flag, followed by the module name.

Earlier, we used the tokenize module to get the tokens generated for a file. Let's tokenize the greeter.py file contents using the following command.

python3 -m tokenize greeter.py

Executing the above command might result in the following output.

0,0-0,0:            ENCODING       'utf-8'
1,0-1,3:            NAME           'def'
1,4-1,9:            NAME           'greet'
1,9-1,10:           OP             '('
1,10-1,14:          NAME           'name'
1,14-1,15:          OP             '='
1,15-1,22:          STRING         '"World"'
... 				# Shortened for brevity

You can look for other standard modules that which we can execute as a script.

We have looked into how to read files using Python by passing them into the command line. Can you think of an application using such features?

One of the applications is formatting json files. Earlier, we used indented a json file to make it more readable. We can create a custom script to which can help us do generate well-indented json files.

This also brings an end to our lesson on modules. Now that we have covered the module, let's look into how we can handle a collection of modules or packages in the next section.

Packages

In Python, packages are a way of structuring a collection of modules in Python using dotted module names.

We can create a sample package pkg by creating an empty folder named pkg to understand Python packages. Create two empty files, module1.py and module2.py, in the newly created pkg folder. Now, let's write the files with the following content.

def greet():
    print("Hi from module1")
module1.py

def greet():
    print("Hi from module2")
module2.py

Now, your directory structure should look something like this.

pkg
├── module1.py
└── module2.py

Open an interactive interpreter in the same directory as where folder pkg is placed.

>>> import pkg.module1, pkg.module2
>>> pkg.module1.greet()
Hey from module 1
>>> pkg.module2.greet()
Hey from module 2

In the code listing, we can see that both module1 and module2 define the greet() function; however, using the dot-notation to access each function helps us avoid name clashes.

We can also import the modules directly while giving them a different name to avoid name-clash.

>>> from pkg.module1 import greet as greet1
>>> from pkg.module2 import greet as greet2

We can also import the modules directly from the package.

>>> from pkg import module1, module2

Continuing from the previous package pkg we created, which of the following is the correct way to import the greet() function from module1.?


  1. from pkg import module1.greet
  2. from pkg.module1 import greet
  3. from pkg from module1 import greet
  4. from pkg import module1 as module.greet

If a file named __init__.py is present in a package directory, Python invokes it when the package or a module in the package is imported.

You can use this for the execution of package initialization code. Let's understand what initialization means next.

Package initialization

An __init__.py file was required to make Python treat directories containing modules as packages earlier in Python 2.7. The __init__.py was required even though it was empty. However, from Python 3.3 onward, the __init__.py file is not required.

But it's quite useful in certain scenarios. Let's take a look.

If an _ _init__.py file is present in the package directory, Python invokes it when we import the package or module in the package.

In the simplest case, __init__.py can just be an empty file. Let's initialize our pkg package we earlier created by adding an __init__.py file in the pkg directory and the following content.

print(f"Initializing Package {__name__} ")
guest_names = ["Luffy", "Zorro", "Sanji"]
__init__.py

Now our directory tree would look as below.

pkg
├── __init__.py
├── module1.py
└── module2.py

Let's import the pkg package again by opening an interactive interpreter in the same location as the pkg directory.

>>> import pkg
Initializing Package pkg			# __init__.py invoked
>>> pkg.guest_names
["Luffy", "Zorro", "Sanji"]

When we import the package pkg, the statements inside it's __init__.py is automatically executed. We can also access the guest_name list object present in the __init__.py using dot-notation. The pkg package is a name-space.

Names defined in the __init__.py act global names for the package, which can be accessed by any of the modules inside the package directory.

To illustrate, let's import the guest_names in a new module3.py inside the pkg directory.

from pkg import guest_names

def greet_guests(guests=guest_names):
    for guest in guests:
        print(f"Hello there, {guest}")
module3.py

Now, let's test it out in our interactive interpreter.

# Imported from package-level namespace
>>> from pkg import module3
>>> module3.greet_guests()
Hello there, Luffy
Hello there, Zorro
Hello there, Sanji

If we import package pkg on a freshly opened interpreter and pass it to the dir() function, as shown below, which of the following will be present?

>>> import pkg
>>> dir(pkg)
  1. module1
  2. module2
  3. module3
  4. guest_names

Python doesn't automatically import all the modules in the package when the package is imported.

If we simply import the package pkg, we will not be able to access the modules contained in it.

>>> import pkg
Initializing Package pkg
>>> dir(pkg)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'guest_names']
>>> pkg.module1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'pkg' has no attribute 'module1'

Although the names defined in the __init__.py are automatically imported. We can automatically import modules by importing them in the __init__.py file.

To automatically import modules from pkg, we will change the __init__.py to look like below.

import pkg.module1, pkg.module2, pkg.module3
print(f"Initializing Package {__name__} ")
guest_names = ["Luffy", "Zorro", "Sanji"]
Modified __init__.py 

Now, let's save the __init__.py file, restart our interpreter, and import the package again.

>>> import pkg
Initializing Package pkg
>>> dir(pkg)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'guest_names', 'module1', 'module2', 'module3', 'pkg']

As you can see, the modules are automatically imported now.

Now, we can run functions inside our package pkg.

>>> pkg.module1.greet()
Hi from module1

The __init__.py lets us automatically import modules into the package namespace but not into the current namespace.

Let's restart our interpreter and import the package pkg again.

>>> import pkg
Initializing Package pkg
>>> dir()						# Current Namespace
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'pkg']

We can check the current namespace by calling the dir() function. As you can see, only the package pkg is present in the namespace. To import all the definitions from the package, we can use the * wildcard operator.

>>> from pkg import *
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'guest_names', 'module1', 'module2', 'module3', 'pkg']

All the modules we have imported in the __init__.py are imported into our current namespace. We can invoke them directly, as shown below.

>>> module1.greet()
Hi from module1
Let's remove the __init__.py entirely, restart the interpreter and run from pkg import *. Do you think all the modules are still going to be imported to the local namespace? Why or Why not?

Python, by default, doesn't implicitly implicitly any underlying module in a package.

Therefore, none of the modules in the package pkg are going to be imported.

We can decide which modules will be imported while using the * operator by specifying them in __all__ in the __init__.py. Let's take a look.

Let's again create an __init__.py file with the following code.

guest_names = ["Luffy", "Sanji", "Zorro"]
__all__ = ['module1', 'module2']
Modified __init__.py

Save the file, restart the interpreter, and let's import everything from the package once again.

>>> from pkg import *
>>> dir(pkg)
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'module1', 'module2']

Since we specified only module1 and module2 in the __all__, module3 and guest_names were not imported into the local namespace.

We can also use __all__ in the module to specify what objects can be imported while using the wildcard operator.

To understand, let's modify the module1.py to look like the below.

def greet():
    print("Hi from module1")

def _secret_number():
    return 42

def not_so_secret_number():
    return 43
pkg/module1.py file

Save the file and restart the interpreter. Let's now use the * wildcard operator to import every object from the module1 module.

>>> from pkg.module1 import *
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'greet', 'not_so_secret_number']

As you can see, both the greet and not_so_secret_number are imported, but not _secret_number. If we don't specify __all__ in the module, Python imports everything except names starting with an underscore (_).

Let's again change our module1. module to add __all__, and your module should look below.

__all__ = ["greet"]

def greet():
    print("Hi from module1")

def _secret_number():
    return 42

def not_so_secret_number():
    return 43
Updated pkg/module1.py file

Let's save the file and restart the interpreter.

>>> from pkg.module1 import *
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'greet']

As you can see, only the object specified in __all__, i.e., greet(), is imported.

Can you summarise the function of __all__?

We can say that __all__ is used by both packages and modules to specify what to import when import * is invoked.

For a package, when __all__ is not defined, import * does not import anything.

While for a module, when __all__ is not defined, import * imports everything except names starting with an underscore.

We can also contain packages inside another package to an arbitrary depth.

A nested package is called a sub-package.

Sub-package

Let's create a new directory, main_pkg, with the following structure.

main_pkg
├── sub_pkg1
│   ├── mod1.py
│   └── mod2.py
└── sub_pkg2
    ├── mod3.py
    └── mod4.py

You can add the following to each module (mod1-4) in the above.

def greet():
    print(f"Hello from {__name__}!")

Restart the interpreter and use the following code listing. Importing sub-packages works similarly to the dot notation.

>>> import main_pkg.sub_pkg1.mod1							# Notation 1
>>> main_pkg.sub_pkg1.mod1.greet()
Hello from main_pkg.sub_pkg1.mod1!

>>> from main_pkg.sub_pkg1 import mod2						# Notation 2
>>> mod2.greet()
Hello from main_pkg.sub_pkg1.mod2!

>>> from main_pkg.sub_pkg2.mod3 import greet				# Notation 3
>>> greet()
Hello from main_pkg.sub_pkg2.mod3!

>>> from main_pkg.sub_pkg2.mod4 import greet as mod4_greet # Notation 4
>>> mod4_greet()
Hello from main_pkg.sub_pkg2.mod4!

You can also add _init_py to each sub-package and the top-level main_pkg package to initialize the package.

That brings us to the end of this topic on modules and packages. We have covered most of the useful tools you will require to program using Python.

In the next chapter, we will look over the conventions used in the Python community and how to experience the Zen of Python.


https://docs.python.org/3/library/json.html JSON Module ↩︎