The functions that transform notebooks in a library

The most important function defined in this module is notebooks2script, so you may want to jump to it before scrolling though the rest, which explain the details behind the scenes of the conversion from notebooks to library. The main things to remember are:

  • put an # export flag on each cell you want exported
  • put an # exports flag (for export and show) on each cell you want exported with the source code shown in the docs
  • put an # exporti flag (for export internal) on each cell you want exported without it being added to __all__, and without it showing up in the docs.
  • one cell should contain # default_exp flag followed by the name of the module (with points for submodules and without the py extension) everything should be exported in (if one specific cell needs to be exported in a different module, just indicate it after the # export flag: # export special.module)
  • all left members of an equality, functions and classes will be exported and variables that are not private will be put in the __all__ automatically
  • to add something to __all__ if it's not picked automatically, write an exported cell with something like _all_ = ["my_name"] (the single underscores are intentional)

Basic foundations

For bootstrapping nbdev we have a few basic foundations defined in imports, which we test a show here. First, a simple config file class, Config that read the content of your settings.ini file and make it accessible:

Config[source]

Config(cfg_name='settings.ini')

Store the basic information for nbdev to work

create_config("github", "nbdev", user='fastai', path='..', tst_flags='tst', cfg_name='test_settings.ini')
cfg = Config(cfg_name='test_settings.ini')
test_eq(cfg.lib_name, 'nbdev')
test_eq(cfg.git_url, "https://github.com/fastai/nbdev/tree/master/")
test_eq(cfg.lib_path, Path.cwd().parent/'nbdev')
test_eq(cfg.nbs_path, Path.cwd())
test_eq(cfg.doc_path, Path.cwd().parent/'docs')
test_eq(cfg.custom_sidebar, 'False')

We derive some useful variables to check what environment we're in:

if not os.environ.get("IN_TEST", None):
    assert IN_NOTEBOOK
    assert not IN_COLAB
    assert IN_IPYTHON

Then we have a few util functions.

last_index[source]

last_index(x, o)

Finds the last index of occurence of x in o (returns -1 if no occurence)

test_eq(last_index(1, [1,2,1,3,1,4]), 4)
test_eq(last_index(2, [1,2,1,3,1,4]), 1)
test_eq(last_index(5, [1,2,1,3,1,4]), -1)

compose[source]

compose(*funcs, order=None)

Create a function that composes all functions in funcs, passing along remaining *args and **kwargs to all

f1 = lambda o,p=0: (o*2)+p
f2 = lambda o,p=1: (o+1)/p
test_eq(f2(f1(3)), compose(f1,f2)(3))
test_eq(f2(f1(3,p=3),p=3), compose(f1,f2)(3,p=3))
test_eq(f2(f1(3,  3),  3), compose(f1,f2)(3,  3))

parallel[source]

parallel(f, items, *args, n_workers=None, **kwargs)

Applies func in parallel to items, using n_workers

import time,random

def add_one(x, a=1): 
    time.sleep(random.random()/100)
    return x+a

inp,exp = range(50),range(1,51)
test_eq(parallel(add_one, inp, n_workers=2), list(exp))
test_eq(parallel(add_one, inp, n_workers=0), list(exp))
test_eq(parallel(add_one, inp, n_workers=1, a=2), list(range(2,52)))
test_eq(parallel(add_one, inp, n_workers=0, a=2), list(range(2,52)))

first[source]

first(x)

First element of x, or None if missing

Reading a notebook

What's a notebook?

A jupyter notebook is a json file behind the scenes. We can just read it with the json module, which will return a nested dictionary of dictionaries/lists of dictionaries, but there are some small differences between reading the json and using the tools from nbformat so we'll use this one.

read_nb[source]

read_nb(fname)

Read the notebook in fname.

fname can be a string or a pathlib object.

test_nb = read_nb('00_export.ipynb')

The root has four keys: cells contains the cells of the notebook, metadata some stuff around the version of python used to execute the notebook, nbformat and nbformat_minor the version of nbformat.

test_nb.keys()
dict_keys(['cells', 'metadata', 'nbformat', 'nbformat_minor'])
test_nb['metadata']
{'jupytext': {'split_at_heading': True},
 'kernelspec': {'display_name': 'Python 3',
  'language': 'python',
  'name': 'python3'},
 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
  'file_extension': '.py',
  'mimetype': 'text/x-python',
  'name': 'python',
  'nbconvert_exporter': 'python',
  'pygments_lexer': 'ipython3',
  'version': '3.7.7'},
 'toc': {'base_numbering': 1,
  'nav_menu': {},
  'number_sections': True,
  'sideBar': True,
  'skip_h1_title': False,
  'title_cell': 'Table of Contents',
  'title_sidebar': 'Contents',
  'toc_cell': False,
  'toc_position': {},
  'toc_section_display': True,
  'toc_window_display': False}}
f"{test_nb['nbformat']}.{test_nb['nbformat_minor']}"
'4.4'

The cells key then contains a list of cells. Each one is a new dictionary that contains entries like the type (code or markdown), the source (what is written in the cell) and the output (for code cells).

test_nb['cells'][0]
{'cell_type': 'code',
 'execution_count': 1,
 'metadata': {'hide_input': False},
 'outputs': [],
 'source': '# export\nfrom nbdev.imports import *\nfrom fastscript import *'}

Finding patterns

The following functions are used to catch the flags used in the code cells.

check_re[source]

check_re(cell, pat, code_only=True)

Check if cell contains a line with regex pat

pat can be a string or a compiled regex, if code_only=True, this function ignores markdown cells.

cell = test_nb['cells'][0].copy()
assert check_re(cell, '# export') is not None
assert check_re(cell, re.compile('# export')) is not None
assert check_re(cell, '# bla') is None
cell['cell_type'] = 'markdown'
assert check_re(cell, '# export') is None
assert check_re(cell, '# export', code_only=False) is not None

check_re_multi[source]

check_re_multi(cell, pats, code_only=True)

Check if cell contains a line matching any regex in pats, returning the first match found

cell = test_nb['cells'][0].copy()
cell['source'] = "a b c"
assert check_re(cell, 'a') is not None
assert check_re(cell, 'd') is None
# show that searching with paterns ['d','b','a'] will match 'b'
# i.e. 'd' is not found and we don't search for 'a'
assert check_re_multi(cell, ['d','b','a']).span() == (2,3)

This function returns a regex object that can be used to find nbdev flags in multiline text

  • magic_flag pass True to find magic flags, False to find comment flags,
  • body regex fragment to match one or more flags,
  • n_params number of flag parameters to match and catch,
  • comment explains what the compiled regex should do.

is_export[source]

is_export(cell, default)

Check if cell is to be exported and returns the name of the module to export it if provided

is_export returns;

  • a tuple of ("module name", "external boolean") if cell is to be exported or
    • "external boolean" will be False for an internal export
  • None if cell will not be exported.

The cells to export are marked with an #export, #exports or #exporti code, potentially with a module name where we want it exported. The default module is given in a cell of the form #default_exp bla inside the notebook (usually at the top), though in this function, it needs the be passed (the final script will read the whole notebook to find it).

  • a cell marked with # export, # exports or # exporti will be exported to the default module
  • a cell marked with # export special.module, # exports special.module or # exporti special.module will be exported in special.module (located in lib_name/special/module.py)
  • a cell marked with # export will have it's signature added to the documentation
  • a cell marked with # exports will additionally have it's source code added to the documentation
  • a cell marked with # exporti will not show up in the documentation, and will also not be added to __all__.
cell = test_nb['cells'][0].copy()
test_eq(is_export(cell, 'export'), ('export', True))
cell['source'] = "%nbdev_export\nfrom nbdev.imports import *"
test_eq(is_export(cell, 'export'), ('export', True))
cell['source'] = "# exports"
test_eq(is_export(cell, 'export'), ('export', True))
cell['source'] = "%nbdev_export_and_show"
test_eq(is_export(cell, 'export'), ('export', True))
cell['source'] = "# exporti"
test_eq(is_export(cell, 'export'), ('export', False))
cell['source'] = "%nbdev_export_internal"
test_eq(is_export(cell, 'export'), ('export', False))
cell['source'] = "# export mod"
test_eq(is_export(cell, 'export'), ('mod', True))
cell['source'] = "%nbdev_export mod\nfrom nbdev.imports import *"
test_eq(is_export(cell, 'export'), ('mod', True))
cell['source'] = "# export mod.file"
test_eq(is_export(cell, 'export'), (f'mod{os.path.sep}file', True))
cell['source'] = "%nbdev_export mod.file"
test_eq(is_export(cell, 'export'), (f'mod{os.path.sep}file', True))
cell['source'] = "# exporti mod.file"
test_eq(is_export(cell, 'export'), (f'mod{os.path.sep}file', False))
cell['source'] = "%nbdev_export_internal mod.file"
test_eq(is_export(cell, 'export'), (f'mod{os.path.sep}file', False))
cell['source'] = "# expt mod.file"
assert is_export(cell, 'export') is None
cell['source'] = "# exportmod.file"
assert is_export(cell, 'export') is None
cell['source'] = "# exportsmod.file"
assert is_export(cell, 'export') is None
cell['source'] = "%nbdev_export_and_showmod.file"
assert is_export(cell, 'export') is None
cell['source'] = "%nbdev_export_internalmod.file"
assert is_export(cell, 'export') is None
cell['source'] = "# exporti mod file"
assert is_export(cell, 'export') is None
cell['source'] = "%nbdev_export_internal mod file"
assert is_export(cell, 'export') is None

find_default_export[source]

find_default_export(cells)

Find in cells the default export module.

Stops at the first cell containing a # default_exp flag (if there are several) and returns the value behind. Returns None if there are no cell with that code.

test_eq(find_default_export(test_nb['cells']), 'export')
assert find_default_export(test_nb['cells'][2:]) is None

Note: ReTstFlags isn't used during export but is needed by export2html and test.

class ReTstFlags[source]

ReTstFlags(all_flag)

Provides test flag matching regular expressions

Listing all exported objects

The following functions make a list of everything that is exported to prepare a proper __all__ for our exported module.

tst = _re_patch_func.search("""
@patch
@log_args(a=1)
def func(obj:Class):""")
tst, tst.groups()
(<re.Match object; span=(1, 42), match='@patch\n@log_args(a=1)\ndef func(obj:Class)'>,
 ('func', 'Class', None))

export_names[source]

export_names(code, func_only=False)

Find the names of the objects, functions or classes defined in code that are exported.

This function only picks the zero-indented objects on the left side of an =, functions or classes (we don't want the class methods for instance) and excludes private names (that begin with _) but no dunder names. It only returns func and class names (not the objects) when func_only=True.

To work properly with fastai added python functionality, this function ignores function decorated with @typedispatch (since they are defined multiple times) and unwraps properly functions decorated with @patch.

test_eq(export_names("def my_func(x):\n  pass\nclass MyClass():"), ["my_func", "MyClass"])

#Indented funcs are ignored (funcs inside a class)
test_eq(export_names("  def my_func(x):\n  pass\nclass MyClass():"), ["MyClass"])

#Private funcs are ignored, dunder are not
test_eq(export_names("def _my_func():\n  pass\nclass MyClass():"), ["MyClass"])
test_eq(export_names("__version__ = 1:\n  pass\nclass MyClass():"), ["MyClass", "__version__"])

#trailing spaces
test_eq(export_names("def my_func ():\n  pass\nclass MyClass():"), ["my_func", "MyClass"])

#class without parenthesis
test_eq(export_names("def my_func ():\n  pass\nclass MyClass:"), ["my_func", "MyClass"])

#object and funcs
test_eq(export_names("def my_func ():\n  pass\ndefault_bla=[]:"), ["my_func", "default_bla"])
test_eq(export_names("def my_func ():\n  pass\ndefault_bla=[]:", func_only=True), ["my_func"])

#Private objects are ignored
test_eq(export_names("def my_func ():\n  pass\n_default_bla = []:"), ["my_func"])

#Objects with dots are privates if one part is private
test_eq(export_names("def my_func ():\n  pass\ndefault.bla = []:"), ["my_func", "default.bla"])
test_eq(export_names("def my_func ():\n  pass\ndefault._bla = []:"), ["my_func"])

#Monkey-path with @patch are properly renamed
test_eq(export_names("@patch\ndef my_func(x:Class):\n  pass"), ["Class.my_func"])
test_eq(export_names("@patch\ndef my_func(x:Class):\n  pass", func_only=True), ["Class.my_func"])
test_eq(export_names("some code\n@patch\ndef my_func(x:Class, y):\n  pass"), ["Class.my_func"])
test_eq(export_names("some code\n@patch\ndef my_func(x:(Class1,Class2), y):\n  pass"), ["Class1.my_func", "Class2.my_func"])

#Check delegates
test_eq(export_names("@delegates(keep=True)\nclass someClass:\n  pass"), ["someClass"])

#Typedispatch decorated functions shouldn't be added
test_eq(export_names("@patch\ndef my_func(x:Class):\n  pass\n@typedispatch\ndef func(x: TensorImage): pass"), ["Class.my_func"])

extra_add[source]

extra_add(code)

Catch adds to __all__ required by a cell with _all_=

Sometimes objects are not picked to be automatically added to the __all__ of the module so you will need to add them manually. To do so, create an exported cell with the following code _all_ = ["name"] (the single underscores are intentional)

Please note

  • using line breaks between elements in _all_ means they will not be added to __all__
    # these function names won't get added to `__all__`
    _all_ = ['func',
          'func2']
    
  • only the first _all_ in a cell will get picked up
    _all_ = ['func']
    # `func2` won't get added to `__all__`
    _all_ = ['func2']
    
  • but you can have any number of _all_s in a notebook by putting them in different cells.
test_eq(extra_add('_all_ = ["func", "func1", "func2"]'), (["'func'", "'func1'", "'func2'"],''))
test_eq(extra_add('_all_ = ["func",   "func1" , "func2"]'), (["'func'", "'func1'", "'func2'"],''))
test_eq(extra_add("_all_ = ['func','func1', 'func2']\n"), (["'func'", "'func1'", "'func2'"],''))
test_eq(extra_add("_all_ = ['func',\n'func1', 'func2']\n"), ([],"_all_ = ['func',\n'func1', 'func2']\n"))
test_eq(extra_add("_all_ = ['func']\n_all_ = ['func1', 'func2']\n"), (["'func'"],''))
test_eq(extra_add('code\n\n_all_ = ["func", "func1", "func2"]'), (["'func'", "'func1'", "'func2'"],'code'))

relative_import[source]

relative_import(name, fname)

Convert a module name to a name relative to fname

When we say from

from lib_name.module.submodule import bla

in a notebook, it needs to be converted to something like

from .module.submodule import bla

or from .submodule import bla depending on where we are. This function deals with those imports renaming.

Note that import of the form

import lib_name.module

are left as is as the syntax import module does not work for relative imports.

test_eq(relative_import('nbdev.core', Path.cwd()/'nbdev'/'data.py'), '.core')
test_eq(relative_import('nbdev.core', Path('nbdev')/'vision'/'data.py'), '..core')
test_eq(relative_import('nbdev.vision.transform', Path('nbdev')/'vision'/'data.py'), '.transform')
test_eq(relative_import('nbdev.notebook.core', Path('nbdev')/'data'/'external.py'), '..notebook.core')
test_eq(relative_import('nbdev.vision', Path('nbdev')/'vision'/'learner.py'), '.')

Create the library

Saving an index

To be able to build back a correspondence between functions and the notebooks they are defined in, we need to store an index. It's done in the private module _nbdev inside your library, and the following function are used to define it.

reset_nbdev_module[source]

reset_nbdev_module()

Create a skeletton for _nbdev

get_nbdev_module[source]

get_nbdev_module()

Reads _nbdev

save_nbdev_module[source]

save_nbdev_module(mod)

Save mod inside _nbdev

Create the modules

split_flags_and_code[source]

split_flags_and_code(cell, return_type='list')

Splits the source of a cell into 2 parts and returns (flags, code)

return_type tells us if the tuple returned will contain lists of lines or strings with line breaks.

If no magic flags are found, treat the first comment line as a flag

split_flags_and_code example

If magic flags are found, the flags part can contain multiple lines

split_flags_and_code example

def _test_split_flags_and_code(expected_flags, expected_code):
    cell = nbformat.v4.new_code_cell('\n'.join(expected_flags + expected_code))
    test_eq((expected_flags, expected_code), split_flags_and_code(cell))
    expected=('\n'.join(expected_flags), '\n'.join(expected_code))
    test_eq(expected, split_flags_and_code(cell, str))
    
_test_split_flags_and_code([
    '#export'],
    ['# TODO: write this function',
    'def func(x): pass'])

create_mod_file[source]

create_mod_file(fname, nb_path)

Create a module file for fname.

A new module filename is created each time a notebook has a cell marked with # default_exp. In your collection of notebooks, you should only have one notebook that creates a given module since they are re-created each time you do a library build (to ensure the library is clean). Note that any file you create manually will never be overwritten (unless it has the same name as one of the modules defined in a # default_exp cell) so you are responsible to clean up those yourself.

fname is the notebook that contained the # default_exp cell.

create_mod_files[source]

create_mod_files(files, to_dict=False)

Create mod files for default exports found in files

Create module files for all #default_exp tags found in files and return a list containing the names of modules created.

Note: The number if modules returned will be less that the number of files passed in if files do not #default_exp.

By creating all module files before calling _notebook2script, the order of execution no longer matters - so you can now export to a notebook that is run "later".

You might still have problems when

  • converting a subset of notebooks or
  • exporting to a module that does not have a #default_exp yet

in which case _notebook2script will print warnings like;

Warning: Exporting to "core.py" but this module is not part of this build

If you see a warning like this

  • and the module file (e.g. "core.py") does not exist, you'll see a FileNotFoundError
  • if the module file exists, the exported cell will be written - even if the exported cell is already in the module file

add_init[source]

add_init(path)

Add __init__.py in all subdirs of path containing python files if it's not there already

with tempfile.TemporaryDirectory() as d:
    os.makedirs(Path(d)/'a', exist_ok=True)
    (Path(d)/'a'/'f.py').touch()
    os.makedirs(Path(d)/'a/b', exist_ok=True)
    (Path(d)/'a'/'b'/'f.py').touch()
    add_init(d)
    assert not (Path(d)/'__init__.py').exists()
    for e in [Path(d)/'a', Path(d)/'a/b']:
        assert (e/'__init__.py').exists()

update_version[source]

update_version()

Add or update __version__ in the main __init__.py of the library

update_baseurl[source]

update_baseurl()

Add or update baseurl in _config.yml for the docs

notebook2script[source]

notebook2script(fname=None, silent=False, to_dict=False)

Convert notebooks matching fname to modules

Finds cells starting with #export and puts them into the appropriate module. If fname is not specified, this will convert all notebook not beginning with an underscore in the nb_folder defined in setting.ini. Otherwise fname can be a single filename or a glob expression.

silent makes the command not print any statement and to_dict is used internally to convert the library to a dictionary.