Link Search Menu Expand Document

Napkin

Napkin CLI application has a number of commands to develop, debug and execute specs

Usage: napkin [(-v|--verbose) | (-l|--log-level LOG_LEVEL)] 
              [--log-format LOG_FORMAT] COMMAND

Available options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)
  -h,--help                Show this help text

Init:
  init                     Generates Napkin project from a template

Execute:
  run                      Runs Napkin specs
  history                  Set of commands to work with Napkin previous runs
  auth                     Authenticates with Google BigQuery
  list                     Lists available tables on database
  aeda                     Performs AEDA over given relation
  cleanup                  Remove temporary tables with namespaces/datasets
                           specified in a given spec file
  tags                     List tags with their associated tables

Develop:
  validate                 Validates YAML spec file
  dump                     Performs a dry run and stores queries for inspection
  repl                     Drops into Napkin repl
  optimize                 Spec/query semantic improvements (e.g. unused
                           columns)

Documentation:
  haddock                  Opens web page with Napkin haddocks
  docs                     Opens web page with Napkin tutorial

IDE integration:
  yaml-schema              Stores YAML schema in a file
  hie-bios                 Used by Haskell Language Server

Version:
  version                  Prints Git SHA and version of the build

Napkin has number of global options, which are applicable to almost any command:

  • -l, --log-level - Allows to specify log level for the Napkin execution.
  • -v, --verbose - Enables debug logging, equivalent of providing --log-level Debug.
  • --log-format - Allows to choose desired format of the log messages. Currently supported:
    • Server log format uses Katip’s bracketFormat.
    • Simple log format is a simplified version of Katip’s bracketFormat.
    • Json log format uses Katip’s jsonFormat.
  • -h, --help - Displays traditional help message screen for each command. It is usually “safe” to append --help to any Napkin command to know more about options it supports. Napkin will also display the --help screen when it meets the unfamiliar option.

Init

Napkin project is a folder with a set of various files including a spec file in YAML format having lots of fields.

shell

napkin init --help
Usage: napkin init [-t|--template TPL_REF] [-p|--parameter KEY=VALUE] 
                   [-n|--project-name PROJECT_NAME] [-f|--force] 
                   [-c|--current-dir]
  Generates Napkin project from a template

Available options:
  -t,--template TPL_REF    Template location:
                               [[service='github':]user/repo/template-name
                               service = (gitlab|bitbucket|github),
                               or local path/url or :<embedded template name>
                           (default: :basic)
  -p,--parameter KEY=VALUE Set custom template parameter
  -n,--project-name PROJECT_NAME
                           Name of the project to create
                           (default: "napkin-project")
  -f,--force               Overwrite existing files in case of name collision
                           with files from a template
  -c,--current-dir         Use current directory for the project
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

Generates Napkin project from a template.
Napkin contains following embedded templates: :basic
You can use custom templates from github, gitlab and bitbucket.
Mustache syntax is used for values interpolation.
Template variables are resolved with process environment variables and -p parameters override them.

  • With -n, or --project-name you can specify the name of the project to create. By default, Napkin will try to create a new directory with this name.
  • -t, --template allows to specify the template (builtin or external) for the Napkin to use.
  • -p, --parameter option allows to override variables within the template with custom values.
  • -f, --force flag will force Napkin to override target directory (see --project-name option) if it already exists or files created in the current directory in case of --current-dir was specified.
  • With -c, --current-dir option, Napkin will use current working directory to write new files into.

If you start a project from scratch then init command could save a few seconds by generating minimum working set.

A generated project is based on a hsfiles template. Napkin has a builtin :basic template, but you can define your own templates and supply an url to a repo with them on github e.g. Template file is a list of mustache snippets for every file in a project tree.

For example, following command will create napkin project with name napkinFromScratch from :basic template:

shell

napkin init -n napkinFromScratch -t :basic

Run

napkin run is used to execute Specs. It provides a handful of options that enable you to adjust how the spec is going to be executed. In a summary, you can:

  • perform a partial run that will execute only a subset of defined tables,
  • perform a dry-run and inspect the execution plan,
  • pass spec arguments,
  • enable CLI UI,
  • override database connection settings.

Lets review the CLI command:

shell

napkin run --help
Usage: napkin run [-s|--spec-file SPEC_YAML] 
                  [(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
                    (-s|--spec-file SPEC_YAML)] 
                  [--override /json/pointer={"json": "value"}] 
                  [-C|--credentials-file FILE] [-u|--uri URI] 
                  [--credentials-db CREDENTIALS-DB-PATH] 
                  [--callback-http-port CALLBACK_HTTP_PORT] 
                  [-S|--strict-mustache] 
                  [--arg ARG=VALUE | --arg-json JSON | --arg-file FILE] 
                  [-Q|--metadata-connection-string URI] 
                  [--skip-all | --force-all | --only SELECTOR | 
                    --force-only SELECTOR] 
                  [--force SELECTOR | --force-with-downstream SELECTOR | 
                    --force-with-upstream SELECTOR | --enable SELECTOR | 
                    --skip SELECTOR | --skip-with-downstream SELECTOR | 
                    --skip-with-upstream SELECTOR] [-p|--show-progress] 
                  [-r|--dry-run] [--condensed-logging]
  Runs Napkin specs

Available options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  -C,--credentials-file FILE
                           Path to database connection credentials file
  -u,--uri URI             Connection URI to create a database connection
                           postgres:/// # use libpq defaults
                           postgres://user@remotehost/dbname
                           redshift://user@remotehost/dbname
                           bigquery://bigquery.googleapis.com/project
                           sqlite:///home/dev/myRepo/mySqlite.sqlite3
  --credentials-db CREDENTIALS-DB-PATH
                           Path to database connection credentials directory
  --callback-http-port CALLBACK_HTTP_PORT
                           Network port number for the napkin Auth server
  -S,--strict-mustache     Strict mustache validation mode
  --arg ARG=VALUE          Argument to be passed to spec
  --arg-json JSON          Spec arguments encoded as JSON object
  --arg-file FILE          Spec arguments stored in JSON file
  -Q,--metadata-connection-string URI
                           An optional alternate connection string for tracking
                           napkin operations. Currently supports SQLite3 and
                           Postgres backends, e.g. sqlite:/some/file.sqlite3 or
                           postgresql://user@host
  --skip-all               Table selector: Skip all tables, use other options to
                           enable selected tables
  --force-all              Table selector: Force-enable update of all tables, use other options to skip selected tables
  --only SELECTOR          Table selector: Disable all tables except specified
  --force-only SELECTOR    Table selector: Disable all tables except specified which will be forced
  --force SELECTOR         Table selector: Force table update
  --force-with-downstream SELECTOR
                           Table selector: Force table update, force downstream tables too
  --force-with-upstream SELECTOR
                           Table selector: Force table update, force upstream tables too
  --enable SELECTOR        Table selector: Don't skip or force table, use the update strategy as specified in the Spec file
  --skip SELECTOR          Table selector: Skip table update
  --skip-with-downstream SELECTOR
                           Table selector: Skip table update, skip downstream tables too
  --skip-with-upstream SELECTOR
                           Table selector: Skip table update, skip upstream tables too
  -p,--show-progress       Show a progress bar if the terminal supports it
  -r,--dry-run             Simulate run and report which tables would've been
                           updated
  --condensed-logging      Reduce logging of non-action tasks such as skipped
                           tables.
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

Table selectors and SELECTOR types
  
  Table selectors allow performing partial spec runs. This can be useful during
  spec development. All options are applied in sequence to the execution plan.
  --skip-all, --force-all, --only, and --force-only options can be specified as
  a first table selector only. --only, --force-only, --skip, --enable, --force,
  and --force-with-downstream options accept a SELECTOR, which can be one of:
    * unprocessed table name, as it was defined in the spec,
      e.g. --skip spec:marketing_analytics,
    * processed table name after all preprocessors have been applied,
      e.g. --skip db:development.john_marketing_analytics,
    * table tag, e.g. --enable tag:marketing.
  Table names can be filtered using three matchers:
    * literal table name, e.g. --skip spec:marketing_analytics,
    * table name pattern - an asterisk can be used as a wildcard at the
      beginning or at the end of each table name segment,
      e.g. --force spec:prod_*.marketing_*,
    * table name regex - regex has to be surrounded with /,
      e.g. --force spec:/^marketing.*/,
  Please refer to the User Manual for further reference.
  
  Example:
    napkin run --only tag:marketing --enable spec:sales_totals --skip spec:marketing_mailing_*

CLI UI

  • -p, --show-progress – will enable terminal UI that will display spec execution progress.

Napkin CLI UI

Dry runs

  • -r, --dry-run – will perform a dry run of the Spec. Napkin will always print Spec execution plan prior to executing queries, so this can be used to verify if partial run options have been applied correctly:

shell

napkin run --dry-run --force-only "spec:artist*" --skip "spec:*hashes*"
[2022-01-13 10:30:00][Info] Determining tables for update...
[2022-01-13 10:30:00][Info] Forced tables: none
[2022-01-13 10:30:00][Info] Skipped tables:
- artists_hashes -> development.unknown_artists_hashes
- day_totals -> development.unknown_day_totals
- day_totals_2 -> development.unknown_day_totals_2
- first_run_at -> development.unknown_first_run_at
- long_to_wide -> development.unknown_long_to_wide
- popular_tracks_for_pink_floyd -> development.unknown_popular_tracks_for_pink_floyd
- random -> development.unknown_random
- total_sales_by_year -> development.unknown_total_sales_by_year
- update_daily -> development.unknown_update_daily
- update_on_upstream -> development.unknown_update_on_upstream

[2022-01-13 10:30:00][Info] Unmanaged tables that will be used as an input in this run:
- chinook.Album
- chinook.Artist
- chinook.Track

[2022-01-13 10:30:00][Info] Managed tables that will be used as an input in this run, but will not be updated: none
[2022-01-13 10:30:00][Info] Managed tables that will be updated in this run:
- artist_album_count -> development.unknown_artist_album_count
- artist_album_count_via_mustache -> development.unknown_artist_album_count_via_mustache
- artist_album_count_via_mustache_2 -> development.unknown_artist_album_count_via_mustache_2
- artist_track_count -> development.unknown_artist_track_count
- artist_track_count_2 -> development.unknown_artist_track_count_2
- artist_track_count_view -> development.unknown_artist_track_count_view

[2022-01-13 10:30:00][Info] Estimated runtime: 35s
[2022-01-13 10:30:00][Warning] Dry run, aborting execution.

Partial spec runs

During Spec development it’s convenient to execute only tables the developer is working on. This allows for shorter feedback loop and can reduce costs as well. Napkin provides a number of command-line options that can be used to cherry-pick tables that will be executed during particular run. However, we don’t recommend using them for production runs.

All flags mentioned below will amend the execution plan in their order of appearance.

Initial execution plan:

  • --skip-all will disable execution of all tables. Typically, it’d be combined with other options that will enable or force some tables.
  • --force-all will force execution of all tables. This overrides any update strategy that was set in table definitions.
  • --only will disable execution of all tables and enable tables that fulfill the selector. This brings back any update strategy that was set in table definitions. Synonym of --skip-all --enable SELECTOR.
  • --force-only will disable execution of all tables and force tables that fulfill the selector. This overrides any update strategy that was set in table definitions. Synonym of --skip-all --force SELECTOR.

Updating execution plan:

  • --force will force tables that fulfill the selector. This overrides any update strategy that was set in table definitions.
  • --force-with-downstream and --force-with-upstream will force tables that fulfill the selector as well as all tables depend on them on either direction. This overrides any update strategy that was set in table definitions.
  • --enable will void any changes to execution plan that were introduced with previous option for tables that fulfill the selector and will bring back any update strategy that was set in table definitions.
  • --skip will disable execution of the tables that fulfill the selector.

Table selectors

--only, --force-only, --force, --force-with-downstream, --enable, and --skip options accept a table selector. Napkin will fail when no tables match a selector, which may indicate user error.

  • db:PATTERN – will match against preprocessed table names (with all renaming applied). Wildcards or regular expressions may be used to match multiple tables.
  • spec:PATTERN – will match against unprocessed table names, i.e. the table names as they were specified in the spec file. Wildcards or regular expressions may be used to match multiple tables. If no renaming was done with preprocessors db: and spec: selectors will have the same effect.
  • tag:TAG – will match table tags. Wildcards or regular expressions cannot be used for tags.

Where PATTERN can be:

  • someproject.somedataset.sometable – an exact match.
  • someproject.prod_*.*sales* – a wildcard can be used at the beginning and at the end of each name segment, segments are matched separately, so * will never expand beyond name segment:
    • production.derived.*marketing would match production.derived.new_marketing but not production.derived_update.marketing which a naive regex would otherwise match.
  • /REGEX/ – a regular expression.

Example use cases

  • --force-only spec:top_artists will disable all tables except top_artists table which will be forced,
  • --force-only spec:top_* will disable all tables and force all tables that have prefix top_,
  • --force-only db:*.*.top_* will disable all tables and force all tables that have prefix top_ in all projects and datasets,
  • --skip-all --force-with-downstream tag:monthly will disable all tables and force tables that have a tag monthly plus all tables that depend on them.

Meta arguments

  • --arg ARG=VALUE allows to provide ARG spec argument with the string value of VALUE.
  • --arg-json will accept JSON object as an argument that will be interpreted as key-value store. Values don’t need to be strings, they can be any valid JSON object.
  • --arg-file will read JSON object from a file.

Multiple options --arg* options can be used at the same time and all data provided will be collected.

Other options

  • -u, --uri – can be used to override database connection string. This can be useful, if development and production data sets are located on different DB servers.
  • -C, --credentials-file can be used to provide database credentials manually. Please refer to Connecting to the database page.
  • --credentials-db – can be used to override location of Napkin’s credentials database.
  • --callback-http-port – can be used to override Napkin’s HTTP port used for handling BigQuery authorization.

History

Set of napkin history commands allows displaying information from Napkin’s metadata database in various forms.

When using Napkin frequently, it might be handy to know:

  • When Napkin was executed last time
  • What are the statuses of the tables
  • How many records were updated in each table
  • etc.

Napkin assigns a GUID identifier to each run to uniquely identify it from other napkin run invocations. Along with GUID of the execution, Napkin stores the date and time when particular run command was issued to Napkin.

Lets review the CLI command:

shell

napkin history --help
Usage: napkin history [-s|--spec-file SPEC_YAML] 
                      [(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
                        (-s|--spec-file SPEC_YAML)] 
                      [--override /json/pointer={"json": "value"}] 
                      [-Q|--metadata-connection-string URI] COMMAND
  Set of commands to work with Napkin previous runs

Available options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  -Q,--metadata-connection-string URI
                           An optional alternate connection string for tracking
                           napkin operations. Currently supports SQLite3 and
                           Postgres backends, e.g. sqlite:/some/file.sqlite3 or
                           postgresql://user@host
  -h,--help                Show this help text

Available commands:
  list                     Lists all previous Napkin runs
  show                     Displays Napkin run by ID. Uses latest run if ID
                           wasn't provided
  gantt                    Exports Napkin run into a gantt chart. Uses latest
                           run if ID wasn't provided

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

  • -s, --spec-file as usual, allows to specify the path to the spec.yaml file.
  • --override allows to override any values in the spec.yaml file according to the JSON pointer specification. Can be helpful in cases, where you need to only override several values from spec.yaml for a particular run.
  • -a, --app-name allows to override app-name parameter from the spec.yaml file.
  • -Q, --metadata-connection-string allows to override the metadata storage location (see metadata_url parameter in the spec.yaml file) used by Napkin. This might be useful for the situations, when you want to store all metadata information in the central location for further analysis or to ‘port’ Napkin history from a computer, which was earlier used to run Napkin.

List

shell

napkin history list --help
Usage: napkin history list [LIMIT]
  Lists all previous Napkin runs

Available options:
  LIMIT                    Number of records to show (default: 10)
  -h,--help                Show this help text

Global options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  -Q,--metadata-connection-string URI
                           An optional alternate connection string for tracking
                           napkin operations. Currently supports SQLite3 and
                           Postgres backends, e.g. sqlite:/some/file.sqlite3 or
                           postgresql://user@host
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

  • LIMIT parameter allows to specify the amount of records for the napkin history list command to display. If there are less invocation of the napkin run in history records, only those records will be displayed.

For example:

shell

napkin history list
Napkin executed run with ID fd4bf846-da9c-4bb4-a1d0-b3c971e2d993 on 2022-01-03 16:51:50 with 10 tables
Napkin executed run with ID 4fd5e08f-c03c-4d6b-b31d-d77ff742f8aa on 2022-01-03 16:51:45 with 17 tables
Napkin executed run with ID 17664cb6-7357-45ad-9523-f36daa605a6f on 2022-01-03 16:51:31 with 10 tables
Napkin executed run with ID a85cc760-d55e-499e-8c00-a67e4a44f55a on 2022-01-03 16:51:16 with 16 tables
Napkin executed run with ID 7a554ca7-61dc-4884-9d06-a4614c9e2827 on 2022-01-03 16:51:00 with 15 tables

Show

shell

napkin history show --help
Usage: napkin history show [GUID]
  Displays Napkin run by ID. Uses latest run if ID wasn't provided

Available options:
  GUID                     GUID of the Napkin run
  -h,--help                Show this help text

Global options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  -Q,--metadata-connection-string URI
                           An optional alternate connection string for tracking
                           napkin operations. Currently supports SQLite3 and
                           Postgres backends, e.g. sqlite:/some/file.sqlite3 or
                           postgresql://user@host
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

napkin history show command shows individual table statuses, start and finish execution timestamps, number of affected rows.

shell

napkin history show 7a554ca7-61dc-4884-9d06-a4614c9e2827
Table ns.implicit_aliases_2 started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.alias_shadowing started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.test_1 started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.implicit_aliases started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 0 rows
Table ns.cte_shadowing started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.cte_aliases started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.aliases_where_clause started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 0 rows
Table ns.src_bar started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.src_foo started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.zzz started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 100 rows

  • GUID parameter allows you to specify exact Napkin run to obtain information from. To see list of all runs, try napkin history list command.

If GUID parameter is omitted, napkin history show defaults it to the “last Napkin execution” (the same as in git show).

Gantt

napkin history gantt command allows to export metadata about particular Napkin execution into a human-readable form with help of Highcharts HTML library.

shell

napkin history gantt --help
Usage: napkin history gantt [GUID] [-o|--output-dir DIR]
  Exports Napkin run into a gantt chart. Uses latest run if ID wasn't provided

Available options:
  GUID                     GUID of the Napkin run
  -o,--output-dir DIR      Directory containing OUTPUT files
  -h,--help                Show this help text

Global options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  -Q,--metadata-connection-string URI
                           An optional alternate connection string for tracking
                           napkin operations. Currently supports SQLite3 and
                           Postgres backends, e.g. sqlite:/some/file.sqlite3 or
                           postgresql://user@host
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

  • -o, --output-dir parameter points Napkin to the directory, which should be used for exporting HTML file with a Gantt chart. By default, current working directory will be used.

napkin history gantt command exports metadata of the specified Napkin run to the DIR directory in the form of Gantt chart (HTML page).

shell

napkin history gantt 7a554ca7-61dc-4884-9d06-a4614c9e2827 -o data/
Gantt chart generated: data/gantt_napkin_7a554ca7-61dc-4884-9d06-a4614c9e2827.html

For example:

If GUID parameter is omitted, napkin history gantt defaults it to the “last Napkin execution” (the same as in git show).

Validate

napkin validate command allows to check the whole spec for:

  • Napkin’s ability to correctly parse and process all the sql files.
  • Napkin’s ability to compile and execute parts of the specs defined in Haskell.
  • Validity of the built-in and user-defined preprocessors.
  • Validity of the settings inside spec.yaml file.
  • Correctness of mustache interpolation instructions.
  • Absence of dependency loops across table definitions.

It is a good idea to check for the spec validness before committing the changes to the source code repository (git), so that other team members will not be surprised that napkin run is unable for do it’s job.

shell

napkin validate --help
Usage: napkin validate [-s|--spec-file SPEC_YAML] 
                       [(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
                         (-s|--spec-file SPEC_YAML)] 
                       [--override /json/pointer={"json": "value"}] 
                       [-i|--interactive] [-r|--rolling] [-S|--strict-mustache] 
                       [--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
  Validates YAML spec file

Available options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  -i,--interactive         Watches for changed files and constantly revalidates
  -r,--rolling             Does not clear the screen in live validation mode
  -S,--strict-mustache     Strict mustache validation mode
  --arg ARG=VALUE          Argument to be passed to spec
  --arg-json JSON          Spec arguments encoded as JSON object
  --arg-file FILE          Spec arguments stored in JSON file
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

  • In -i, --interactive mode, Napkin constantly monitors all the files and folders, mentioned in spec.yaml file automatically re-validates the spec on any change. It is convenient to have napkin validate --interactive process opened in a separate window, while you are working on the SQL files or the spec.yaml definition.
  • By default, --interactive mode clears the screen before each validation cycle. To switch that feature off, you can use -r, --rolling mode, in which Napkin will constantly append validation status and error messages (in any) to the end of the output.
  • Napkin has number of strict mustache validation checks, which are not enabled by default. Use -S, --strict-mustache flag to enable additional checks (see mustache interpolation for details).

Without --interactive mode, napkin will print the single OK message to the terminal and exit with 0 exit code.

shell

napkin validate -s specs/spec.yaml

output

OK

With --interactive mode, Napkin constantly monitors for the changes in all files mentioned in the spec.yaml and re-validates as soon as it sees a change.

asciicast

Dump

Besides knowing that your spec is “OK” (by using napkin validate command), Napkin also provides a way to see SQL queries, which Napkin would execute on a real database engine in by using napkin run command.

shell

napkin dump --help
Usage: napkin dump [-s|--spec-file SPEC_YAML] 
                   [(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
                     (-s|--spec-file SPEC_YAML)] 
                   [--override /json/pointer={"json": "value"}] 
                   [--arg ARG=VALUE | --arg-json JSON | --arg-file FILE] 
                   [-o|--output-dir DIR] 
                   [--skip-all | --force-all | --only SELECTOR | 
                     --force-only SELECTOR] 
                   [--force SELECTOR | --force-with-downstream SELECTOR | 
                     --force-with-upstream SELECTOR | --enable SELECTOR | 
                     --skip SELECTOR | --skip-with-downstream SELECTOR | 
                     --skip-with-upstream SELECTOR] 
                   [-e|--exclude-unmanaged-tables] [-S|--strict-mustache] 
                   [--use-spec-names] [-i|--interactive] [-r|--rolling]
  Performs a dry run and stores queries for inspection

Available options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  --arg ARG=VALUE          Argument to be passed to spec
  --arg-json JSON          Spec arguments encoded as JSON object
  --arg-file FILE          Spec arguments stored in JSON file
  -o,--output-dir DIR      Directory containing OUTPUT files (default: "dump")
  --skip-all               Table selector: Skip all tables, use other options to
                           enable selected tables
  --force-all              Table selector: Force-enable update of all tables, use other options to skip selected tables
  --only SELECTOR          Table selector: Disable all tables except specified
  --force-only SELECTOR    Table selector: Disable all tables except specified which will be forced
  --force SELECTOR         Table selector: Force table update
  --force-with-downstream SELECTOR
                           Table selector: Force table update, force downstream tables too
  --force-with-upstream SELECTOR
                           Table selector: Force table update, force upstream tables too
  --enable SELECTOR        Table selector: Don't skip or force table, use the update strategy as specified in the Spec file
  --skip SELECTOR          Table selector: Skip table update
  --skip-with-downstream SELECTOR
                           Table selector: Skip table update, skip downstream tables too
  --skip-with-upstream SELECTOR
                           Table selector: Skip table update, skip upstream tables too
  -e,--exclude-unmanaged-tables
                           Show only managed tables in the DOT graph exclusively
  -S,--strict-mustache     Strict mustache validation mode
  --use-spec-names         Use unprocessed table names as filenames
  -i,--interactive         Watches for changed files and constantly revalidates
  -r,--rolling             Does not clear the screen in live validation mode
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

  • -o, --output-dir parameter is used to specify output directory to put SQL files into. Directory will also contain dependency diagram (in form of .dot and .pdf files) between the tables.
  • -f, --force flag is used to indicate, that it is OK for Napkin to override target directory.
  • By default, Napkin will add both managed and unmanaged tabled into a dependency diagram. -e, --exclude-unmanaged-tables flag forces Napkin to only include managed tables.
  • with --use-spec-names Napkin will use original table names (as declared in spec.yaml) for the file names in dump output. Otherwise, file names are equal to the table names after applying preprocessors.
  • In -i, --interactive mode, Napkin constantly monitors all the files and folders, mentioned in spec.yaml file automatically re-dumps the spec on any change. It is convenient to have napkin dump --interactive process opened in a separate window, while you are working on the SQL files or the spec.yaml definition.
  • By default, --interactive mode clears the screen before each dump cycle. To switch that feature off, you can use -r, --rolling mode, in which Napkin will constantly append dump status and error messages (in any) to the end of the output.

Usually napkin dump command will create following directory structure:

shell

napkin dump -o dump

output

dump
├── MANIFEST.txt
├─ artist_album_count
│ ├─ 1.sql
│ └─ program.log
├─ artist_album_count_via_mustache
│ ├─ 1.sql
│ └─ program.log
├─ day_totals_1
│ ├─ create
│ │ ├─ 3.sql
│ │ └─ program.log
│ └─ update
│    ├─ 3.sql
│    ├─ 4.sql
│    ├─ 5.sql
│    └─ program.log
├─ dependency_graph.dot
├─ dependency_graph.pdf
├─ dump.json
├─ long_to_wide
│ ├─ 1.sql
│ ├─ 2.sql
│ └─ program.log
└─ popular_tracks_for_pink_floyd
  ├─ 1.sql
  └─ program.log

Some source sql folders, like day_totals_1 will produce multiple execution paths:

  • create folder with a single sql file – query, which Napkin would execute in order to create day_totals_1 table is it is not exist yet.
  • update folder with number of sql files in it – queries Napkin would execute in order to update already existing day_totals_1 table.

Along with output SQL files, napkin dump command also generates dependency diagram between the tables in Graphviz format:

graph

digraph {
    1 [label=Album
      ,shape=box
      ,style=dotted];
    2 [label=Artist
      ,shape=box
      ,style=dotted];
    3 [label=InvoiceLine
      ,shape=box
      ,style=dotted];
    4 [label=Track
      ,shape=box
      ,style=dotted];
    5 [label=artist_album_count
      ,fillcolor="#59a14f"
      ,fontcolor="#ffffff"
      ,style=filled];
    6 [label=artist_album_count_via_mustache
      ,fillcolor="#edc948"
      ,fontcolor="#000000"
      ,style=filled];
    7 [label=artist_album_count_via_mustache_2
      ,fillcolor="#b07aa1"
      ,fontcolor="#ffffff"
      ,style=filled];
    8 [label=artist_hex
      ,fillcolor="#4e79a7"
      ,fontcolor="#ffffff"
      ,style=filled];
    9 [label=artist_track_count
      ,fillcolor="#f28e2b"
      ,fontcolor="#000000"
      ,style=filled];
    10 [label=artist_track_count_2
       ,fillcolor="#e15759"
       ,fontcolor="#ffffff"
       ,style=filled];
    11 [label=artist_track_count_view
       ,fillcolor="#76b7b2"
       ,fontcolor="#000000"
       ,style=filled];
    12 [label=popular_tracks_for_pink_floyd
       ,fillcolor="#59a14f"
       ,fontcolor="#ffffff"
       ,style=filled];
    1 -> 5;
    1 -> 6;
    1 -> 7;
    1 -> 9;
    1 -> 11;
    1 -> 12;
    2 -> 5;
    2 -> 6;
    2 -> 7;
    2 -> 8;
    2 -> 9;
    2 -> 11;
    2 -> 12;
    3 -> 12;
    4 -> 9;
    4 -> 11;
    4 -> 12;
    9 -> 10;
}

In PDF format, this diagram (download) looks like that:

dependency_graph

Optimize

The command combines a set of features for discovery and fixing semantic issues with SQL queries or spec itself.

Unused CTE column

CTE is a named subquery introduced with WITH keyword.

In a complex query comprised of many CTE subqueries a user might forget to remove helper column just after debugging and the column remains in the query and contributes to spec running costs.

table_a2.sql

with cte as (select a, b from t) select b from cte

shell

napkin optimize -s specs/spec.yaml
Table: tbl_b2 (table_a2.sql)
   unused column cte.a

Duplicated CTE

This optimization aims equivalent CTEs defined in the same query.

table_a2.sql

with cte1 as (select a from t where b*a > 0),
     cte2 as (select a from t where a*b > 0)
     select b from t2 where a1 in cte1 or a2 in cte2
Table: tbl_b2 (table_a2.sql)
   duplicated CTE tables: cte1, cte2

By default optimize just prints discovered issues, though it is possible to ask Napkin to resolve the issues automatically. Automatic refactoring doesn’t preserve original SQL formatting. SQL based optimizer is supported only by generic dialect.

shell

napkin optimize --help
Usage: napkin optimize [-s|--spec-file SPEC_YAML] 
                       [(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
                         (-s|--spec-file SPEC_YAML)] 
                       [--override /json/pointer={"json": "value"}] [--mode ARG]
                       [--selector ARG] [-S|--strict-mustache] 
                       [--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
  Spec/query semantic improvements (e.g. unused columns)

Available options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  --mode ARG               Mode for running optimizations. Could be:
                           show-issues, interactive (default: show-issues)
  --selector ARG           Optimization feature selector. By default all
                           optimizations are on: unused-columns,
                           duplicated-queries. To turn on extra feature use "+"
                           prefix to remove feature "-" prefix and to run just
                           specific features don't use any prefix.
  -S,--strict-mustache     Strict mustache validation mode
  --arg ARG=VALUE          Argument to be passed to spec
  --arg-json JSON          Spec arguments encoded as JSON object
  --arg-file FILE          Spec arguments stored in JSON file
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

  • --mode TODO
  • --selector TODO

Haddock

shell

napkin haddock --help
Usage: napkin haddock 
  Opens web page with Napkin haddocks

Available options:
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

Napkin is written in Haskell and currently exposes all internal APIs for external use (as it allows meta-programming, expressing SQL statements programmatically, etc.).

Links to the Napkin Haskell API documentation for each individual version can be found here.

You can always “ask” Napkin CLI for the exact link to the API reference for the currently installed version. It will open a browser with an API docs for the exact git commit SHA Napkin was built from.

shell

napkin haddock

output

Opening "https://napkin-public-storage-bucket.s3.us-east-1.amazonaws.com/haddock/git-hash/a8811868647ccc54e5c716e1a2781fe1c8b2ef9b/index.html" URL in the browser

Docs

shell

napkin docs --help
Usage: napkin docs 
  Opens web page with Napkin tutorial

Available options:
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

As well as for Haddock API reference, Napkin CLI can be asked for a link to this documentation site.

shell

napkin docs
Opening "https://docs.napkin.run/fundamentals" URL in the browser

REPL

napkin repl can be useful when using advanced meta-programming features. When this command is used Napkin will spawn Haskell GHCI session. Spec file has to be provided, so the REPL session will be configured appropriately.

shell

napkin run --help
Usage: napkin repl [-s|--spec-file SPEC_YAML] 
                   [(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
                     (-s|--spec-file SPEC_YAML)] 
                   [--override /json/pointer={"json": "value"}] 
                   [--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
  Drops into Napkin repl

Available options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  --arg ARG=VALUE          Argument to be passed to spec
  --arg-json JSON          Spec arguments encoded as JSON object
  --arg-file FILE          Spec arguments stored in JSON file
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

Yaml-Schema

shell

napkin yaml-schema --help
Usage: napkin yaml-schema OUTPUT
  Stores YAML schema in a file

Available options:
  OUTPUT                   Name of the file to put generated yaml schema
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

Many modern text editors and IDEs have (either native or through plugins) support for YAML Schema to enhance the process of editing complex YAML files. Napkin provides a way to export YAML schema to an external file - so it is possible to configure your editor properly.

In addition, if your editor or plugin supports that feature, you can point it to the web link with a latest YAML schema. YAML schemas for previous releases can be found here.

  • OUTPUT parameter is a path to a file to put generate yaml schema for the spec.yaml config.

For example:

shell

napkin yaml-schema $HOME/napkin.schema
Saving YAML schema to a file: /Users/user/napkin.schema

Hie-Bios

shell

napkin hie-bios --help
Usage: napkin hie-bios [-s|--spec-file SPEC_YAML] 
                       [(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
                         (-s|--spec-file SPEC_YAML)] 
                       [--override /json/pointer={"json": "value"}] 
                       [GHC_OPTIONS]
  Used by Haskell Language Server

Available options:
  -s,--spec-file SPEC_YAML Path to the spec yaml file
                           (default: "specs/spec.yaml")
  -y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
                           MatchIndex, Prepend, Append, Replace.
  --override /json/pointer={"json": "value"}
                           Set arbitrary JSON pointer (RFC 6901) in YAML spec
                           with the new value
  GHC_OPTIONS              List of extra GHC arguments to pass
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

napkin hie-bios provides support for advanced meta-programming with Napkin. This command is not supposed to be used directly by the user.

Haskell Language Server sets HIE_BIOS_OUTPUT environment variable, which is used by Napkin as a file name to write all inferred and user-supplied options to. For debug purposes, Napkin defaults file location to hie-bios.txt in the current working directory.

shell

HIE_BIOS_OUTPUT=/dev/tty napkin hie-bios -s specs/spec.yaml
...
-package
containers
-package
napkin
-package
ordered-containers
-package
unordered-containers
-package
text
-package
containers
-package
aeson
...
-XBangPatterns
-XBinaryLiterals
-XConstrainedClassMethods
-XConstraintKinds
-XDeriveDataTypeable
-XDeriveFoldable
...

Usually, it is enough to add the hie.yaml to the root of the Napkin project for the HLS plugin from your editor to load.

hie.yaml

cradle:
  bios:
    shell: napkin hie-bios --spec-file specs/spec.yaml

  • GHC_OPTIONS - List of additional GHC options, which Napkin will pass to the GHC through HLS.

Version

shell

napkin version --help
Usage: napkin version 
  Prints Git SHA and version of the build

Available options:
  -h,--help                Show this help text

Global options:
  -v,--verbose             Print debug log
  -l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
                           Notice, Warning, Error, Critical, Alert, Emergency.
                           (default: Info)
  --log-format LOG_FORMAT  Log line format. Possible values: Simple, Server,
                           Json. (default: Simple)

Napkin can print current version, release data and exact git commit on it was built.

For example:

shell

napkin version
Napkin version: 0.5.11
Git commit hash: a8811868647ccc54e5c716e1a2781fe1c8b2ef9b
Built at: 2022-01-04 10:24:21.398355 UTC

If you want to install a newer version of Napkin – please, follow installation instructions.