Home

Intro to type hinting in Python

Monday, January 3 2022

An introduction to type hints and mypy

Why would we want to use type hints in Python? Python has duck typing - the types just work.

1
2
3
4
5
6
7
8
9
my_duck = 42
my_float = 4.20
my_string = "duck"

>>> len(my_string)
# 4

>>> len(my_duck)
# TypeError: object of type 'int' has no len()
'It just works' - Steve Jobs

Python doesn’t need explicit types at all! Having such flexibility is great, but types can be a huge help, especially with:

  1. Make your code more readable
  2. Add another layer of error catching

Unfortunately, types can be a little confusing at times. The aim is to get you to this place:

Just vibing
How you'll feel about types by the end of this post

1. More readable code

This is a generic function you might encounter in a pull request review, or when reading a codebase.

1
2
3
4
5
6
7
def retrieve_sku(
    inventory_uuid,
    start_time,
    log,
    client=None
):
    ...
No type hints, how am I supposed to tell what's going on?

My questions would right away be:

  1. is inventory_uuid an int, str, or uuid.UUID type?
  2. is site_time a date, datetime, or int? Is this a timestamp perhaps?
  3. is log a str? Perhaps a logger of some kind?
  4. What kind of client are we talking about? Is this function going to call an API?

To know the types of the variables this function expects as arguments, or returns, we need to read through the function and make inferences.

Now with type hints:

1
2
3
4
5
6
7
def retrieve_sku(
    inventory_uuid: str,
    start_time: int,
    log: LoggerContext,
    client: Optional[DataAPI] = None
) -> Optional[SKU]:
    ...
Much clearer

Suddenly this is a lot clearer! Let’s address our previous point:

  1. inventory_uuid is a string, if we’re using uuid types, we need to remember to cast with str(my_uuid)
  2. start_time is an integer. This is still not totally clear what this is, perhaps a UNIX timestamp? At least we know not to be passing in a datetime object. 1
  3. log is a logger of some kind.
  4. The type is DataAPI so this will be calling an API. In that case we should be careful about writing unit tests for this function, we’ll need to mock out the API call.

The downside is that now this has become more verbose. I think this is a worthwhile tradeoff.

Miller’s law suggests that you can only hold about 7 plus/minus 2 chunks in your short term memory at a time. Raymond Hettinger, Python OG, talks about this in PyBay2019. In short: the more information you have easily accessible on a screen, the better. The fewer inferences we have to make about variables and their types, the better. Having type hints reduces the risk of introducing really annoying TypeError bugs where the wrong types are passed around until an error is suddenly hit. Other languages that don’t have dynamic typing would raise an error as soon as you changed type without explicit casting.

2. Error catching

This is an example of a (thankfully simple) error I debugged recently. The stacktrace pointed at the error residing in the following function, which has been ~generalised~,

1
2
3
4
5
6
def get_item_data(inventory_uuid):
    graphql_query = Path(__file__).with_name("foo.graphql").read_text()
    api_response = call_uk_endpoint(inventory_uuid, graphql_query)
    data = api_response.data
    detail = data.get("myQuery")
    return Item(**detail)
No type hints, it's hard to tell what's going on

This was my first time looking at this piece of code, and there’s a lot going in these six short lines! I guess api_response is a class with a data attribute? To name just the first thing I noticed.

However when I inspected api_response in my IDE it showed that the data attribute could be None. Nice! I opened a tiny pull request and tapped myself on the back. Only to receive another bug the next day: it turns out that the inventory_uuid argument took a UUID type while the call_uk_endpoint function expected a string.

The properly type hinted function looks like

1
2
3
4
5
6
7
8
9
def get_item_data(inventory_uuid: UUID, logger: Logger) -> Optional[Item]:
    graphql_query = Path(__file__).with_name("foo.graphql").read_text()
    api_response = call_uk_endpoint(str(inventory_uuid), graphql_query)
    data = api_response.data or {}
    detail = data.get("myQuery")
    if not detail:
        logger.warning("No data returned for inventory UUID", extra={"uuid": inventory_uuid})
        return None
    return Item(**detail)
More verbose, but fewer chances to make mistakes

If there were type hints we could have just run mypy and it would have instantly told us inventory_uuid is the wrong type, and the data attribute can be None. Carl Meyer has a great talk where he speaks about how type hints reduce the space of possible errors.

So you’re convinced

Peter Parkerrrrr!
No green goblins in Python

The good:

  1. Type hints can be introduced gradually. You don’t need one massive pull request with them all in.
  2. Type hints are not enforced, if you make a slight mistake, no one’s going to crucify you.

The bad:

  1. Type hints are not enforced. You can make mistakes… and it might fly under the rug!

For the rest I’ll assume you’re running at least Python 3.9. Types were introduced in Python 3.5 but up until 3.9 you have to import most composite data structures from the typing standard library, e.g. from typing import List. From 3.9 onwards you can just use list.

How do we start?

Install mypy with

1
2
$ pip install mypy
$ mypy
It's that simple.

By default mypy uses a .mypy-ini file, but I use a pyproject.toml for all my settings.

1
2
3
4
5
[tool.mypy]
python_version = 3.9
files = ['src/']
warn_unused_configs = true
warn_redundant_casts = true
An example pyproject.toml

First things first

Just add the type

1
2
def my_simple_func(word: str, num: float) -> int:
    return len(word) * int(num)

What about bigger objects?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def my_func(array: list[str], my_map: dict[str, int]) -> int:
    # nest the type inside a composite object, a list of strings
    # dictionaries are dict[<key>, <value>]
    ...

def other_func(cache: set[str], point: tuple[int, int]) -> int:
    ...

some_var: dict[str, str] = ... # you can put type hint a variable
some_tuple: tuple[int, ...] = ... # tuple with many items, all integers

Adding the nested types to composite objects will type hint the inner elements. Use list[str] instead of list. If you use plain list then mypy won’t check the type of any elements inside that list.

Postel’s law

One strategy is to follow Postel’s law: be liberal in your inputs and conservative in your outputs.

Your function arguments should be as loose as possible for your function to still work, whereas your outputs should be as strict as possible so others know exactly what they’re expecting.

1
2
3
4
5
6
7
8
from collections.abc import Sequence, Mapping, Iterable

def my_func(
    array: Sequence[str],
    my_map: Mapping[str, int],
    cache: Iterable[str]
) -> tuple[int, str]:
    ...
Postel Pat

Sequence, Mapping, and Iterable are abstract base classes, any variable that is one of these will implement certain methods such as __get__.

null

You can type hint None values with either None, Optional, or Union[None, <other-type>]. All work fine, and are preference only. Optional is not related to keyword or optional arguments in a function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from typing import Optional, Union

def maybe_return(a: int) -> Optional[int]:
    if a < 5:
        return a
    return None

def maybe_return(a: int) -> Union[int, None]:
    if a < 5:
        return a
    return None

In fact you can use Union anywhere, to denote a variable can be one of many types.

1
2
3
4
from typing import Optional, Union

def my_func(foo: Union[int, str, None, float]) -> dict[Union[str, int], str]:
    ...
Like a Russian nesting doll

I’m a developer, get me out of here

mypy is throwing errors and you’ve already spent 2 hours 20 minutes debugging what’s going on. It’s time to move on with your life, what do you do?

You can:

  1. Provide Any as the type hint, mypy will ignore the type of the variable
  2. Use cast(<type>, <variable>) to force the variable to the required type.
  3. Add ‘# ignore: type’ at the end of the line
1
2
3
4
def some_aws_util(rds_client: Any, sqlalchemy: Any) -> None:
    ...

my_map: dict[str, Any] = {"foo": "1", "bar": 2}
I tend to use `Any` for AWS clients rather than fighting to use types

Before you use Any try using object which every type derives from. Your code might work if you have object as a type hint, and it provides some barebone type hinting that’s better than nothing.

1
2
3
4
5
def some_aws_util(rds_client: object, sqlalchemy: object) -> None:
    ...

# Could have just used `Union` here, `object` is good for JSONs
my_map: dict[str, object] = {"foo": "1", "bar": 2}
It's generally easier to use `Any` for AWS clients rather than battling to use types

You can also use cast to force mypy to treat a variable as having a certain type. This is useful if you’re working with JSONs, you can’t type hint JSONs right now because of their recursive structure.

1
2
3
from typing import cast

my_val = cast(str, foo["key"])
Go Robin Crusoe on that variable

And as a last resort, force mypy to ignore that line:

1
item = my_map["foo"] # type: ignore
Got 99 problems but a type ain't one

If you do use any of these ‘get out of jail free’ methods, add a comment in the line above what you tried doing and why it didn’t work, it’ll help the next developer.

Type hinting classes

Here’s how you may type hint a class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class BaseClass:
    def __init__(self, a: int) -> None:
        self.a = a
    
    def method(self) -> int:
        raise NotImplementedError

class MyClass(MyBaseClass):
    def method(self) -> int:
        return a * 2

# the argument is a type instance
def some_function(a: BaseClass):
    ...

and to type hint a class that references itself, use quote marks:

1
2
3
4
5
6
class MyClass(MyBaseClass):
    def method(self) -> int:
        return a * 2

    def __gt__(self, other: 'MyClass') -> bool:
        reutrn self.a > other.a
Not exactly an eternal golden braid

To type hint a class type, rather than an instance of that class, use Type and square brackets. This is useful is when you are validating an object may be one of several Enum classes.

1
2
3
4
from typing import Type

def some_function(a: Type[MyClass]) -> None:
    ...
If you actually need to do this, you're probably not reading this guide

Type hinting functions

You can type hint functions with the Callable type

1
2
3
4
5
6
def my_first_class_func(a: int) -> bool:
    ...

# Callable[[<arg>, <arg>, ...], <return>]
def other_func(fn: Callable[[int], bool]) -> str:
    ...
Let's make a Y-Combinator

Type hinting generators

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from typing import Generator, Iterable

# Generator[<yield-type>, <send-type>, <return-type>]
def my_generator() -> Generator[int, int, str]:
    for i in range(5):
        yield i
    return "foo"

# Using Iterable is much simpler
def my_generator() -> Iterable[int]:
    for i in range(5):
        yield i
Did you know you could send values to generators? God help me if I ever need to send a value to a generator

How do you avoid circular imports?

Since you’re passing around loads of variables, and often classes you define, it’s fairly easy to get circular imports. You can avoid this by only importing those arguments when running type checking.

1
2
3
4
5
6
7
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from src.sub_module import my_class

def my_function(foo: my_class) -> int:
    ...
This is supremely inelegant, I wish there was a better way.

Custom types

You can reduce duplication in your code, and increase clarity by providing a custom type. I generally do this at module level, as far as I know there’s no convention to the naming for custom types.

1
2
3
4
5
6
from typing import Union

CustomType = dict[str, Union[int, float, None]]

def my_function(foo: CustomType) -> None:
    ...
Make it custom, like you're Xzibit

That’s all folks!

We have barely scratched the surface of type hints, but this is what I use 99% of the time when adding type hints in my daily work. If you’re interested, you can go deeper with Protocols, overloading, Generics, nominal vs structural typing but is it worth your time?

Questions I have

I’m still new to type hints! Some questions I have include:

  1. How do you type hint JSONs? Is there a best-practice?
  2. What do we do about values returned from JSONs that can be many things? This happens so often when you call APIs. cast seems inelegant, but I often have to use ‘# type: ignore
  3. Should you type hint tests? The folks at urllib3 did so with fantastic results, but right now I think it’s better to spend more time adding unit tests, and if you use pytest type hinting fixtures seems clunky.

Final words

  1. Typing is a means to an end, not an end in itself. Don’t spend too much time on it. Life is too short and it’s fine to move on and use Any or ‘# type: ignore’ in tricky situations.
  2. Make sure mypy is actually enabled, and add it to your continuous integration or pre-commit hook.
  3. Installing types for 3rd-party libraries sucks.
  4. Check out Pydantic and Pyre as two nice adjacent projects
  5. Any questions, feel free to ping me.

Take it easy
Type hints are nice, but not functional - take it easy!

Additional reading

  1. The mypy docs of course.
  2. Docs for the typing standard library
  3. PEPs 483 and 484 which introduce type hints. Reading PEPs is underrated, it’s great to see people explain their reasoning as to why features are introduced.
  4. Cal Peterson on applying types to real world projects
  5. Seth Larson on adding type hints to urllib3.
  6. Dropbox on adding types to 4 million lines of Python
  7. Carl Meyer on type checked Python in the real world.

  1. Yes, start_time is not a great variable name ↩︎