Parameterisation and scheduling#

With JupyterLab you can use the Jupyter Scheduler for parameterisation and time-controlled execution. For Jupyter Notebooks, papermill is available.

Install#

$ pipenv install papermill
Installing papermill…
Adding papermill to Pipfile's [packages]…
✔ Installation Succeeded

Use#

  1. Parameterise

    The first step is to parameterise the notebook. For this purpose the cells are tagged as parameters in View ‣ Cell Toolbar ‣ Tags.

  2. Inspect

    You can inspect a notebook for example with:

    $ pipenv run papermill --help-notebook docs/notebook/parameterise/input.ipynb
    Usage: papermill [OPTIONS] NOTEBOOK_PATH [OUTPUT_PATH]
    
    Parameters inferred for notebook 'docs/notebook/parameterise/input.ipynb':
      msg: Unknown type (default None)
    
  3. Execute

    There are two ways to run a notebook with parameters:

    • … via the Python API

      The execute_notebook function can be called to execute a notebook with a dict of parameters:

      execute_notebook(INPUT_NOTEBOOK, OUTPUT_NOTEBOOK, DICTIONARY_OF_PARAMETERS)
      

      for example for input.ipynb:

      In [1]: import papermill as pm
      
      In [2]: pm.execute_notebook(
                  "PATH/TO/INPUT_NOTEBOOK.ipynb",
                  "PATH/TO/OUTPUT_NOTEBOOK.ipynb",
                  parameters=dict(salutation="Hello", name="pythonistas"),
              )
      

      The result is output.ipynb:

      In [1]: salutation = None
              name = None
      
      In [2]: # Parameters
              salutation = "Hello"
              name = "pythonistas"
      
      In [3]: from datetime import date
      
      
              today = date.today()
              print(
                  salutation,
                  name,
                  "– welcome to our event on this " + today.strftime("%A, %d %B %Y"),
              )
      
      Out[3]: Hello pythonistas – welcome to our event on this Monday, 26 June 2023
      
    • … via CLI

      $ pipenv run papermill input.ipynb output.ipynb -p salutation 'Hello' -p name 'pythonistas'
      

      Alternatively, a YAML file can be specified with the parameters, for example params.yaml:

      params.yaml#
      salutation: "Hello"
      name: "Pythonistas"
      
      $ pipenv run papermill input.ipynb output.ipynb -f params.yaml
      

      With -b, a base64-encoded YAML string can be provided, containing the parameter values:

      $ pipenv run papermill input.ipynb output.ipynb -b c2FsdXRhdGlvbjogIkhlbGxvIgpuYW1lOiAiUHl0aG9uaXN0YXMi
      

      See also

      You can also add a timestamp to the file name:

      $ dt=$(date '+%Y-%m-%d_%H:%M:%S')
      $ pipenv run papermill input.ipynb output_$(date '+%Y-%m-%d_%H:%M:%S').ipynb -f params.yaml
      

      This creates an output file whose file name contains a timestamp, for example output_2023-06-26_15:57:33.ipynb.

      Finally, you can use crontab -e to execute the two commands automatically at certain times, for example on the first day of every month:

      dt=$(date '+%Y-%m-%d_%H:%M:%S')
      0 0 1 * * cd ~/jupyter-notebook && pipenv run papermill input.ipynb output_$(date '+%Y-%m-%d_%H:%M:%S').ipynb -f params.yaml
      
  4. Store

    Papermill can store notebooks in a number of locations including S3, Azure data blobs, and Azure data lakes. Papermill allows new data stores to be added over time.