Writing nbdev plugins

How to customize nbdev processors to do what you want

What will this cover?

With nbdev, it’s possible to customize and extend it further beyond the standard capabilities through a well thoughtout and scalable framework. Does your particular library or need require you to inject custom quarto additives in certain cells? What about if you want to do something more trivial such as finding shortcuts to replace complicated quarto directives more easily (such as replacing ::: {.column-margin} with #| margin)?

Writing custom plugins with nbdev is the easiest method to achieve this, and with this tutorial we will bring you up to speed on how you can use this to create your own plugins to expand and simplify your literate-programming experience with nbdev and quarto.

Specifically, we will be building a processor (something that processes a notebook cell) that will let us quickly write out any quarto-specific headers (that ::: {.some_annotation}) and replace it with a div shortcut. This is of course one example very specific to quarto that happens when building the documentation, but this technique can be used to have custom behaviors occur during library exportation as well.

Note: We are using div here is it more closely resembles how each of the related Quarto directives do and behave as they act like <div>s in HTML code

This tutorial won’t cover some of the basics when it comes to nbdev, and instead comes with the understanding you know how to navigate nbdev (such as what are directives, export, etc).

Getting started, how does nbdev make this easy?

First let’s visualize just what we’re trying to achieve.

Instead of doing the following code which will add "Some text" to the sidebar (as shown off to the side currently):

Some text

::: {.column-margin}
Some text
:::

We will create a shorter way to write this out, making use of how nbdev and quarto writes their directives

By the end of this tutorial we will create something that looks like the following:

#| div column-margin

Some text

And this will include cases where a div should be put across multiple cells as well, by specifying a start and an end.

Note: Check out the article layout Quarto documentation to find the best examples of use cases for this custom directive, including the column-margin just shown

This can be achieved in under 50 lines of code!

nbdev let’s us create what are called processors (this is how #| export will shove code into modules, for example). These processors are acted on each cell of a notebook and can modify its contents. These can then be wrapped into a module the same way that nbdev will do nbdev_export or nbdev_docs. Thanks to the power of writing custom nbdev extensions, going deep into the inner-workings of the framework isn’t required!

Bringing in what we need

The actual imports we need to use from nbdev is truly not that many! We just need two: - extract_directives, to read in the list of #| written - The Processor class that will actually perform what we want on notebook cells.

The rest of the imports are there to make some of our lives easier as will be explained later

from nbdev.process import extract_directives
from nbdev.processors import Processor

from fastcore.basics import listify

from string import Template

Lastly for testing purposes we’ll utilize nbdev’s mk_cell function and the NBProcessor class, which will let us mock running our processor on a “real” notebook!

from nbdev.processors import mk_cell, NBProcessor

Writing a converter

The first step is creating a quick and easy way to take the nbdev directive we want to use (such as #| div column-margin) and convert it quickly into something quarto will then read (such as ::: {.column-margin}).

We can create a string Template to perform this for us:

_LAYOUT_STR = Template("::: {.$layout}\n${content}\n")
Tip

This doesn’t have to be a string template, I just found this the easiest to use!

_LAYOUT_STR.substitute(
    layout="column-margin",
    content="Some text to go on the sidebar"
)
'::: {.column-margin}\nSome text to go on the sidebar\n'

Next we need to write a simple converter that operates at the cell level:

def convert_layout(
    cell:dict, # A single cell from a Jupyter Notebook
    is_multicell=False # Whether the div should be wrapped around multiple cells
):
    "Takes a code cell that contains `div` in the directives and modifies the contents to the proper Quarto format"
    content = cell.source
    code = cell.source.splitlines(True)
    div_ = cell.directives_["div"]
    # We check if end is in the first line of the cell source
    if "end" in div_:
        # If it is, just fill the text with `:::` if no code exists there
        cell.source = ":::" if len(code) == 1 else f'{code.source}:::'
    else:
        # Actually modify the code
        cell.source = _LAYOUT_STR.substitute(layout=" ".join(div_), content=content)
        if not is_multicell: cell.source += ":::"

Let’s go into detail on what’s happening here.

    content = cell.source

The source text of whatever exists in a notebook cell will live in .source.

    code = cell.source.splitlines(True)

Then I want to extract the content of the cell and split them into multiple lines, seperated by newlines. This let’s us check if a cell just contains #| div end, which means that the div that was started earlier should stop.

    div_ = cell.directives_["div"]

Any directives (comments in any cell marked with #|) will exist in the directives_ attribute as a dictionary. For our particular processor we only care about the div directive

    if "end" in div_:
        # If it is, just fill the text with `:::` if no code exists there
        cell.source = ":::" if len(code) == 1 else f'{code.source}:::'
    else:
        # Actually modify the code
        cell.source = _LAYOUT_STR.substitute(layout=" ".join(div_), content=content)
        if not is_multicell: cell.source += ":::"

From there this last part checks whether to add the ending ::: block to the cell or to use the _LAYOUT_STR and inject the boilerplate div CSS code in for Quarto.

Let’s see it in action:

cell = mk_cell(
    """#| div margin-column
Here is something for the sidebar!""",
    cell_type="markdown"
)

nbdev will pull out those directives and store them in the cell’s directives_ attribute using the extract_directives function:

cell.directives_ = extract_directives(cell, "#")
cell.directives_
{'div': ['margin-column']}

And now we can test out if our convert_layout function works!

convert_layout(cell)
print(cell.source)
::: {.margin-column}
Here is something for the sidebar!
:::

Note: I print the cell.source here so that it’s text looks cleaner and what we would visually see in a Markdown cell

Looks exactly like we wanted earlier! Great!

How do we tell nbdev to use this and create this Processor class mentioned earlier?

Writing a Processor

The second-to-last step here is to create the custom Processor nbdev utilizes to apply procs (things that modify the contents of cells). The basic understanding of these is simply that you should create a class, have it inherit Processor, and any modifications that should be done must be defined in a cell function which takes in a cell and modifies it in-place.

class LayoutProc(Processor):
    "A processor that will turn `div` based tags into proper quarto ones"
    has_multiple_cells = False
    def cell(self, cell):
        if cell.cell_type == "markdown" and "div" in cell.directives_:
            div_ = cell.directives_["div"]
            if self.has_multiple_cells and "end" in div_:
                convert_layout(cell)
            else:
                is_start = div_[-1] == "start"
                if is_start:
                    self.has_multiple_cells = True
                    div_.remove("start")
                convert_layout(cell, is_start)

How can we test if this will work or not?

A minimal Jupyter Notebook is just a dictionary where the cells are in a cells key and the cells themselves are a list of notebook cells following a special format. We’ve created one of these above. nbdev has a dict2nb function which let’s us convert this minimal idea of a Jupyter Notebook into the true thing quickly.

Afterwards, we can apply the processor to those cells though the NBProcessor class (what nbdev uses to apply these)

from nbdev.process import NBProcessor, dict2nb
nb = {
    "cells":[
    mk_cell("""#| div column-margin
A test""", "markdown"),
    mk_cell("""#| div column-margin start
A test""", "markdown"),
    mk_cell("""#| div end""", "markdown"),
]}

The mk_cell function will create a cell based on some content and a cell type. The particular extension we’ve built works off Markdown cells, so we set the type as markdown.

The NBProcessor takes in a list of procs (processors) that should be applied, and an opened Jupyter Notebook:

processor = NBProcessor(procs=LayoutProc, nb=dict2nb(nb))

The act of applying these processors is done through calling the .process(): function

processor.process()

And now we can see that those code cells were changed:

for i in range(3):
    print(f"Before:\n{nb['cells'][i].source}\n")
    print(f"After:\n{processor.nb.cells[i].source}\n")
Before:
#| div column-margin
A test

After:
::: {.column-margin}
A test
:::

Before:
#| div column-margin start
A test

After:
::: {.column-margin}
A test


Before:
#| div end

After:
:::

Great! We’ve successfully created a plugin for nbdev that will let us lazily write markdown quarto directives easily. How can we actually use this in our projects?

How to enable the plugin on your project

This requires two changes to your settings.ini.

First, if say this were code that lived in nbdev, we can add a special procs key and specify where the processor comes from:

procs = 
    nbdev.extensions:LayoutProc

It follows the format of library.module:processor_name

If this were being used from an external library (such as how this processor is based on the one that lives in nbdev-extensions, you should add that to the requirements of your project:

requirements = nbdev-extensions

And you’re done! Now when calling nbdev_docs or nbdev_preview the processor we just made will be automatically applied to your notebooks and perform this conversion!

Conclusion, nbdev-extensions and a bit about me!

Basically if there’s any part of a cell and how it should look either from exporting modules, building documentation, or creating your own special command to perform post-processing it can be done quickly and efficiently with this Processor class nbdev provides!

If you’re interested in seeing more examples of nbdev-extensions and where you can take it I’ve (Zachary Mueller) written a library dedicated to it called nbdev-extensions where any ideas that may benefit how I approach nbdev I then turn into an extension for the world to use.

Thanks for reading!