Dataloaders

In this section we'll see how we can use dataloaders to optimize datafetching and prevent the N+1 problem.

We'll also see why we have been using async resolvers to fetch data from the database.

The N+1 problem

As we've seen in the previous section, the following query will result in a N+1 problem:

{
  latestEpisodes {
    title
    podcast {
      title
    }
  }
}

Now let's see how we can solve this problem using dataloaders.

Dataloaders

Dataloaders are a technique that allows us to batch multiple data fetching operations into a single one. They work by collecting all the data fetching operations that need to be done and then executing them in a single query in the next event loop tick, which is why we need to be using async for our resolvers.

Let's use dataloaders to optimize the previous query. To do so we need to create a function to load our data and a dataloader based on that function. Let's start with the function. Go to api/podcasts/dataloaders.py and add the following function:

from strawberry.dataloader import DataLoader

from db import data

from .types import Podcast


async def load_podcasts(ids: list[str]) -> list[Podcast]:
    db_podcasts = await data.find_podcasts_by_ids(ids)

    podcasts_by_id = {
        str(podcast.id): Podcast.from_db(podcast) for podcast in db_podcasts
    }

    return [podcasts_by_id[id] for id in ids]

the API for a dataloader function is pretty simple, it takes a list of ids and returns a list of objects from those ids.

Note: the data should be returned in the same order as the ids

Then we can create a dataloader with the following code:

podcast_loader = DataLoader(load_fn=load_podcasts)

And that's pretty much it. Let's see how can use this dataloader in our resolver.

Using dataloaders

Instead of changing our latest_episodes resolver, we'll be working directly with the podcast resolver on the Episode type.

Go to api/podcasts/types.py and change the podcast resolver to the following:

@strawberry.type
class Episode:
    # ... keep the previous fields

    @strawberry.field
    async def podcast(self) -> Podcast:
        from .dataloaders import podcast_loader

        return await podcast_loader.load(str(self.podcast_id))

Now if we run the following query:

{
  latestEpisodes {
    title
    podcast {
      title
    }
  }
}

We'll see that we are only doing a single query to the database, which means that we are not having the N+1 problem anymore.

How does it work?

Dataloaders work by collecting all the data fetching operations that need to be done and then executing them in a single query in the next event loop tick. When running dataloader.load we are adding the data fetching operation to a queue and the the dataloader will schedule a task to execute all the operations in the queue in the next event loop.

If you want to learn more about dataloaders, you can read the following article: https://xuorig.medium.com/the-graphql-dataloader-pattern-visualized-3064a00f319f