Napkin
Napkin CLI application has a number of commands to develop, debug and execute specs
Usage: napkin [(-v|--verbose) | (-l|--log-level LOG_LEVEL)]
[--log-format LOG_FORMAT] COMMAND
Available options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
-h,--help Show this help text
Init:
init Generates Napkin project from a template
Execute:
run Runs Napkin specs
history Set of commands to work with Napkin previous runs
auth Authenticates with Google BigQuery
list Lists available tables on database
aeda Performs AEDA over given relation
cleanup Remove temporary tables with namespaces/datasets
specified in a given spec file
tags List tags with their associated tables
Develop:
validate Validates YAML spec file
dump Performs a dry run and stores queries for inspection
repl Drops into Napkin repl
optimize Spec/query semantic improvements (e.g. unused
columns)
Documentation:
haddock Opens web page with Napkin haddocks
docs Opens web page with Napkin tutorial
IDE integration:
yaml-schema Stores YAML schema in a file
hie-bios Used by Haskell Language Server
Version:
version Prints Git SHA and version of the build
Napkin has number of global options, which are applicable to almost any command:
-
-l
,--log-level
- Allows to specify log level for the Napkin execution. -
-v
,--verbose
- Enables debug logging, equivalent of providing--log-level Debug
. -
--log-format
- Allows to choose desired format of the log messages. Currently supported: -
-h
,--help
- Displays traditional help message screen for each command. It is usually “safe” to append--help
to any Napkin command to know more about options it supports. Napkin will also display the--help
screen when it meets the unfamiliar option.
Init
Napkin project is a folder with a set of various files including a spec file in YAML format having lots of fields.
shell
napkin init --help
Usage: napkin init [-t|--template TPL_REF] [-p|--parameter KEY=VALUE]
[-n|--project-name PROJECT_NAME] [-f|--force]
[-c|--current-dir]
Generates Napkin project from a template
Available options:
-t,--template TPL_REF Template location:
[[service='github':]user/repo/template-name
service = (gitlab|bitbucket|github),
or local path/url or :<embedded template name>
(default: :basic)
-p,--parameter KEY=VALUE Set custom template parameter
-n,--project-name PROJECT_NAME
Name of the project to create
(default: "napkin-project")
-f,--force Overwrite existing files in case of name collision
with files from a template
-c,--current-dir Use current directory for the project
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
Generates Napkin project from a template.
Napkin contains following embedded templates: :basic
You can use custom templates from github, gitlab and bitbucket.
Mustache syntax is used for values interpolation.
Template variables are resolved with process environment variables and -p parameters override them.
- With
-n
, or--project-name
you can specify the name of the project to create. By default, Napkin will try to create a new directory with this name. -
-t
,--template
allows to specify the template (builtin or external) for the Napkin to use. -
-p
,--parameter
option allows to override variables within the template with custom values. -
-f
,--force
flag will force Napkin to override target directory (see--project-name
option) if it already exists or files created in the current directory in case of--current-dir
was specified. - With
-c
,--current-dir
option, Napkin will use current working directory to write new files into.
If you start a project from scratch then init command could save a few seconds by generating minimum working set.
A generated project is based on a hsfiles template. Napkin has a builtin :basic
template, but you can define your own templates and supply an url to a repo with them on github e.g. Template file is a list of mustache snippets for every file in a project tree.
For example, following command will create napkin project with name napkinFromScratch
from :basic
template:
shell
napkin init -n napkinFromScratch -t :basic
Run
napkin run
is used to execute Specs. It provides a handful of options that enable you to adjust how the spec is going to be executed. In a summary, you can:
- perform a partial run that will execute only a subset of defined tables,
- perform a dry-run and inspect the execution plan,
- pass spec arguments,
- enable CLI UI,
- override database connection settings.
Lets review the CLI command:
shell
napkin run --help
Usage: napkin run [-s|--spec-file SPEC_YAML]
[(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
(-s|--spec-file SPEC_YAML)]
[--override /json/pointer={"json": "value"}]
[-C|--credentials-file FILE] [-u|--uri URI]
[--credentials-db CREDENTIALS-DB-PATH]
[--callback-http-port CALLBACK_HTTP_PORT]
[-S|--strict-mustache]
[--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
[-Q|--metadata-connection-string URI]
[--skip-all | --force-all | --only SELECTOR |
--force-only SELECTOR]
[--force SELECTOR | --force-with-downstream SELECTOR |
--force-with-upstream SELECTOR | --enable SELECTOR |
--skip SELECTOR | --skip-with-downstream SELECTOR |
--skip-with-upstream SELECTOR] [-p|--show-progress]
[-r|--dry-run] [--condensed-logging]
Runs Napkin specs
Available options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
-C,--credentials-file FILE
Path to database connection credentials file
-u,--uri URI Connection URI to create a database connection
postgres:/// # use libpq defaults
postgres://user@remotehost/dbname
redshift://user@remotehost/dbname
bigquery://bigquery.googleapis.com/project
sqlite:///home/dev/myRepo/mySqlite.sqlite3
--credentials-db CREDENTIALS-DB-PATH
Path to database connection credentials directory
--callback-http-port CALLBACK_HTTP_PORT
Network port number for the napkin Auth server
-S,--strict-mustache Strict mustache validation mode
--arg ARG=VALUE Argument to be passed to spec
--arg-json JSON Spec arguments encoded as JSON object
--arg-file FILE Spec arguments stored in JSON file
-Q,--metadata-connection-string URI
An optional alternate connection string for tracking
napkin operations. Currently supports SQLite3 and
Postgres backends, e.g. sqlite:/some/file.sqlite3 or
postgresql://user@host
--skip-all Table selector: Skip all tables, use other options to
enable selected tables
--force-all Table selector: Force-enable update of all tables, use other options to skip selected tables
--only SELECTOR Table selector: Disable all tables except specified
--force-only SELECTOR Table selector: Disable all tables except specified which will be forced
--force SELECTOR Table selector: Force table update
--force-with-downstream SELECTOR
Table selector: Force table update, force downstream tables too
--force-with-upstream SELECTOR
Table selector: Force table update, force upstream tables too
--enable SELECTOR Table selector: Don't skip or force table, use the update strategy as specified in the Spec file
--skip SELECTOR Table selector: Skip table update
--skip-with-downstream SELECTOR
Table selector: Skip table update, skip downstream tables too
--skip-with-upstream SELECTOR
Table selector: Skip table update, skip upstream tables too
-p,--show-progress Show a progress bar if the terminal supports it
-r,--dry-run Simulate run and report which tables would've been
updated
--condensed-logging Reduce logging of non-action tasks such as skipped
tables.
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
Table selectors and SELECTOR types
Table selectors allow performing partial spec runs. This can be useful during
spec development. All options are applied in sequence to the execution plan.
--skip-all, --force-all, --only, and --force-only options can be specified as
a first table selector only. --only, --force-only, --skip, --enable, --force,
and --force-with-downstream options accept a SELECTOR, which can be one of:
* unprocessed table name, as it was defined in the spec,
e.g. --skip spec:marketing_analytics,
* processed table name after all preprocessors have been applied,
e.g. --skip db:development.john_marketing_analytics,
* table tag, e.g. --enable tag:marketing.
Table names can be filtered using three matchers:
* literal table name, e.g. --skip spec:marketing_analytics,
* table name pattern - an asterisk can be used as a wildcard at the
beginning or at the end of each table name segment,
e.g. --force spec:prod_*.marketing_*,
* table name regex - regex has to be surrounded with /,
e.g. --force spec:/^marketing.*/,
Please refer to the User Manual for further reference.
Example:
napkin run --only tag:marketing --enable spec:sales_totals --skip spec:marketing_mailing_*
CLI UI
-
-p
,--show-progress
– will enable terminal UI that will display spec execution progress.
Dry runs
-
-r
,--dry-run
– will perform a dry run of the Spec. Napkin will always print Spec execution plan prior to executing queries, so this can be used to verify if partial run options have been applied correctly:
shell
napkin run --dry-run --force-only "spec:artist*" --skip "spec:*hashes*"
[2022-01-13 10:30:00][Info] Determining tables for update...
[2022-01-13 10:30:00][Info] Forced tables: none
[2022-01-13 10:30:00][Info] Skipped tables:
- artists_hashes -> development.unknown_artists_hashes
- day_totals -> development.unknown_day_totals
- day_totals_2 -> development.unknown_day_totals_2
- first_run_at -> development.unknown_first_run_at
- long_to_wide -> development.unknown_long_to_wide
- popular_tracks_for_pink_floyd -> development.unknown_popular_tracks_for_pink_floyd
- random -> development.unknown_random
- total_sales_by_year -> development.unknown_total_sales_by_year
- update_daily -> development.unknown_update_daily
- update_on_upstream -> development.unknown_update_on_upstream
[2022-01-13 10:30:00][Info] Unmanaged tables that will be used as an input in this run:
- chinook.Album
- chinook.Artist
- chinook.Track
[2022-01-13 10:30:00][Info] Managed tables that will be used as an input in this run, but will not be updated: none
[2022-01-13 10:30:00][Info] Managed tables that will be updated in this run:
- artist_album_count -> development.unknown_artist_album_count
- artist_album_count_via_mustache -> development.unknown_artist_album_count_via_mustache
- artist_album_count_via_mustache_2 -> development.unknown_artist_album_count_via_mustache_2
- artist_track_count -> development.unknown_artist_track_count
- artist_track_count_2 -> development.unknown_artist_track_count_2
- artist_track_count_view -> development.unknown_artist_track_count_view
[2022-01-13 10:30:00][Info] Estimated runtime: 35s
[2022-01-13 10:30:00][Warning] Dry run, aborting execution.
Partial spec runs
During Spec development it’s convenient to execute only tables the developer is working on. This allows for shorter feedback loop and can reduce costs as well. Napkin provides a number of command-line options that can be used to cherry-pick tables that will be executed during particular run. However, we don’t recommend using them for production runs.
All flags mentioned below will amend the execution plan in their order of appearance.
Initial execution plan:
-
--skip-all
will disable execution of all tables. Typically, it’d be combined with other options that will enable or force some tables. -
--force-all
will force execution of all tables. This overrides any update strategy that was set in table definitions. -
--only
will disable execution of all tables and enable tables that fulfill the selector. This brings back any update strategy that was set in table definitions. Synonym of--skip-all --enable SELECTOR
. -
--force-only
will disable execution of all tables and force tables that fulfill the selector. This overrides any update strategy that was set in table definitions. Synonym of--skip-all --force SELECTOR
.
Updating execution plan:
-
--force
will force tables that fulfill the selector. This overrides any update strategy that was set in table definitions. -
--force-with-downstream
and--force-with-upstream
will force tables that fulfill the selector as well as all tables depend on them on either direction. This overrides any update strategy that was set in table definitions. -
--enable
will void any changes to execution plan that were introduced with previous option for tables that fulfill the selector and will bring back any update strategy that was set in table definitions. -
--skip
will disable execution of the tables that fulfill the selector.
Table selectors
--only
, --force-only
, --force
, --force-with-downstream
, --enable
, and --skip
options accept a table selector. Napkin will fail when no tables match a selector, which may indicate user error.
-
db:PATTERN
– will match against preprocessed table names (with all renaming applied). Wildcards or regular expressions may be used to match multiple tables. -
spec:PATTERN
– will match against unprocessed table names, i.e. the table names as they were specified in the spec file. Wildcards or regular expressions may be used to match multiple tables. If no renaming was done with preprocessorsdb:
andspec:
selectors will have the same effect. -
tag:TAG
– will match table tags. Wildcards or regular expressions cannot be used for tags.
Where PATTERN can be:
-
someproject.somedataset.sometable
– an exact match. -
someproject.prod_*.*sales*
– a wildcard can be used at the beginning and at the end of each name segment, segments are matched separately, so * will never expand beyond name segment:-
production.derived.*marketing
would matchproduction.derived.new_marketing
but notproduction.derived_update.marketing
which a naive regex would otherwise match.
-
-
/REGEX/
– a regular expression.
Example use cases
-
--force-only spec:top_artists
will disable all tables excepttop_artists
table which will be forced, -
--force-only spec:top_*
will disable all tables and force all tables that have prefixtop_
, -
--force-only db:*.*.top_*
will disable all tables and force all tables that have prefixtop_
in all projects and datasets, -
--skip-all --force-with-downstream tag:monthly
will disable all tables and force tables that have a tagmonthly
plus all tables that depend on them.
Meta arguments
-
--arg
ARG=VALUE
allows to provideARG
spec argument with the string value ofVALUE
. -
--arg-json
will accept JSON object as an argument that will be interpreted as key-value store. Values don’t need to be strings, they can be any valid JSON object. -
--arg-file
will read JSON object from a file.
Multiple options --arg*
options can be used at the same time and all data provided will be collected.
Other options
-
-u
,--uri
– can be used to override database connection string. This can be useful, if development and production data sets are located on different DB servers. -
-C
,--credentials-file
can be used to provide database credentials manually. Please refer to Connecting to the database page. -
--credentials-db
– can be used to override location of Napkin’s credentials database. -
--callback-http-port
– can be used to override Napkin’s HTTP port used for handling BigQuery authorization.
History
Set of napkin history
commands allows displaying information from Napkin’s metadata database in various forms.
When using Napkin frequently, it might be handy to know:
- When Napkin was executed last time
- What are the statuses of the tables
- How many records were updated in each table
- etc.
Napkin assigns a GUID
identifier to each run to uniquely identify it from other napkin run
invocations. Along with GUID
of the execution, Napkin stores the date and time when particular run
command was issued to Napkin.
Lets review the CLI command:
shell
napkin history --help
Usage: napkin history [-s|--spec-file SPEC_YAML]
[(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
(-s|--spec-file SPEC_YAML)]
[--override /json/pointer={"json": "value"}]
[-Q|--metadata-connection-string URI] COMMAND
Set of commands to work with Napkin previous runs
Available options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
-Q,--metadata-connection-string URI
An optional alternate connection string for tracking
napkin operations. Currently supports SQLite3 and
Postgres backends, e.g. sqlite:/some/file.sqlite3 or
postgresql://user@host
-h,--help Show this help text
Available commands:
list Lists all previous Napkin runs
show Displays Napkin run by ID. Uses latest run if ID
wasn't provided
gantt Exports Napkin run into a gantt chart. Uses latest
run if ID wasn't provided
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
-
-s
,--spec-file
as usual, allows to specify the path to the spec.yaml file. -
--override
allows to override any values in the spec.yaml file according to the JSON pointer specification. Can be helpful in cases, where you need to only override several values fromspec.yaml
for a particular run. -
-a
,--app-name
allows to overrideapp-name
parameter from the spec.yaml file. -
-Q
,--metadata-connection-string
allows to override the metadata storage location (seemetadata_url
parameter in the spec.yaml file) used by Napkin. This might be useful for the situations, when you want to store all metadata information in the central location for further analysis or to ‘port’ Napkin history from a computer, which was earlier used to run Napkin.
List
shell
napkin history list --help
Usage: napkin history list [LIMIT]
Lists all previous Napkin runs
Available options:
LIMIT Number of records to show (default: 10)
-h,--help Show this help text
Global options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
-Q,--metadata-connection-string URI
An optional alternate connection string for tracking
napkin operations. Currently supports SQLite3 and
Postgres backends, e.g. sqlite:/some/file.sqlite3 or
postgresql://user@host
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
-
LIMIT
parameter allows to specify the amount of records for thenapkin history list
command to display. If there are less invocation of thenapkin run
in history records, only those records will be displayed.
For example:
shell
napkin history list
Napkin executed run with ID fd4bf846-da9c-4bb4-a1d0-b3c971e2d993 on 2022-01-03 16:51:50 with 10 tables
Napkin executed run with ID 4fd5e08f-c03c-4d6b-b31d-d77ff742f8aa on 2022-01-03 16:51:45 with 17 tables
Napkin executed run with ID 17664cb6-7357-45ad-9523-f36daa605a6f on 2022-01-03 16:51:31 with 10 tables
Napkin executed run with ID a85cc760-d55e-499e-8c00-a67e4a44f55a on 2022-01-03 16:51:16 with 16 tables
Napkin executed run with ID 7a554ca7-61dc-4884-9d06-a4614c9e2827 on 2022-01-03 16:51:00 with 15 tables
Show
shell
napkin history show --help
Usage: napkin history show [GUID]
Displays Napkin run by ID. Uses latest run if ID wasn't provided
Available options:
GUID GUID of the Napkin run
-h,--help Show this help text
Global options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
-Q,--metadata-connection-string URI
An optional alternate connection string for tracking
napkin operations. Currently supports SQLite3 and
Postgres backends, e.g. sqlite:/some/file.sqlite3 or
postgresql://user@host
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
napkin history show
command shows individual table statuses, start and finish execution timestamps, number of affected rows.
shell
napkin history show 7a554ca7-61dc-4884-9d06-a4614c9e2827
Table ns.implicit_aliases_2 started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.alias_shadowing started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.test_1 started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.implicit_aliases started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 0 rows
Table ns.cte_shadowing started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.cte_aliases started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.aliases_where_clause started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 0 rows
Table ns.src_bar started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.src_foo started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 1 rows
Table ns.zzz started at 2022-01-03 16:51:50 and finished at 2022-01-03 16:51:50 successfully affecting 100 rows
-
GUID
parameter allows you to specify exact Napkin run to obtain information from. To see list of all runs, trynapkin history list
command.
If GUID
parameter is omitted, napkin history show
defaults it to the “last Napkin execution” (the same as in git show
).
Gantt
napkin history gantt
command allows to export metadata about particular Napkin execution into a human-readable form with help of Highcharts HTML library.
shell
napkin history gantt --help
Usage: napkin history gantt [GUID] [-o|--output-dir DIR]
Exports Napkin run into a gantt chart. Uses latest run if ID wasn't provided
Available options:
GUID GUID of the Napkin run
-o,--output-dir DIR Directory containing OUTPUT files
-h,--help Show this help text
Global options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
-Q,--metadata-connection-string URI
An optional alternate connection string for tracking
napkin operations. Currently supports SQLite3 and
Postgres backends, e.g. sqlite:/some/file.sqlite3 or
postgresql://user@host
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
-
-o
,--output-dir
parameter points Napkin to the directory, which should be used for exporting HTML file with a Gantt chart. By default, current working directory will be used.
napkin history gantt
command exports metadata of the specified Napkin run to the DIR
directory in the form of Gantt chart (HTML page).
shell
napkin history gantt 7a554ca7-61dc-4884-9d06-a4614c9e2827 -o data/
Gantt chart generated: data/gantt_napkin_7a554ca7-61dc-4884-9d06-a4614c9e2827.html
For example:
If GUID
parameter is omitted, napkin history gantt
defaults it to the “last Napkin execution” (the same as in git show
).
Validate
napkin validate
command allows to check the whole spec for:
- Napkin’s ability to correctly parse and process all the sql files.
- Napkin’s ability to compile and execute parts of the specs defined in Haskell.
- Validity of the built-in and user-defined preprocessors.
- Validity of the settings inside spec.yaml file.
- Correctness of mustache interpolation instructions.
- Absence of dependency loops across table definitions.
It is a good idea to check for the spec validness before committing the changes to the source code repository (git), so that other team members will not be surprised that napkin run
is unable for do it’s job.
shell
napkin validate --help
Usage: napkin validate [-s|--spec-file SPEC_YAML]
[(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
(-s|--spec-file SPEC_YAML)]
[--override /json/pointer={"json": "value"}]
[-i|--interactive] [-r|--rolling] [-S|--strict-mustache]
[--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
Validates YAML spec file
Available options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
-i,--interactive Watches for changed files and constantly revalidates
-r,--rolling Does not clear the screen in live validation mode
-S,--strict-mustache Strict mustache validation mode
--arg ARG=VALUE Argument to be passed to spec
--arg-json JSON Spec arguments encoded as JSON object
--arg-file FILE Spec arguments stored in JSON file
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
- In
-i
,--interactive
mode, Napkin constantly monitors all the files and folders, mentioned in spec.yaml file automatically re-validates the spec on any change. It is convenient to havenapkin validate --interactive
process opened in a separate window, while you are working on the SQL files or the spec.yaml definition. - By default,
--interactive
mode clears the screen before each validation cycle. To switch that feature off, you can use-r
,--rolling
mode, in which Napkin will constantly append validation status and error messages (in any) to the end of the output. - Napkin has number of strict mustache validation checks, which are not enabled by default. Use
-S
,--strict-mustache
flag to enable additional checks (see mustache interpolation for details).
Without --interactive
mode, napkin will print the single OK
message to the terminal and exit with 0 exit code.
shell
napkin validate -s specs/spec.yaml
output
OK
With --interactive
mode, Napkin constantly monitors for the changes in all files mentioned in the spec.yaml and re-validates as soon as it sees a change.
Dump
Besides knowing that your spec is “OK” (by using napkin validate
command), Napkin also provides a way to see SQL queries, which Napkin would execute on a real database engine in by using napkin run
command.
shell
napkin dump --help
Usage: napkin dump [-s|--spec-file SPEC_YAML]
[(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
(-s|--spec-file SPEC_YAML)]
[--override /json/pointer={"json": "value"}]
[--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
[-o|--output-dir DIR]
[--skip-all | --force-all | --only SELECTOR |
--force-only SELECTOR]
[--force SELECTOR | --force-with-downstream SELECTOR |
--force-with-upstream SELECTOR | --enable SELECTOR |
--skip SELECTOR | --skip-with-downstream SELECTOR |
--skip-with-upstream SELECTOR]
[-e|--exclude-unmanaged-tables] [-S|--strict-mustache]
[--use-spec-names] [-i|--interactive] [-r|--rolling]
Performs a dry run and stores queries for inspection
Available options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
--arg ARG=VALUE Argument to be passed to spec
--arg-json JSON Spec arguments encoded as JSON object
--arg-file FILE Spec arguments stored in JSON file
-o,--output-dir DIR Directory containing OUTPUT files (default: "dump")
--skip-all Table selector: Skip all tables, use other options to
enable selected tables
--force-all Table selector: Force-enable update of all tables, use other options to skip selected tables
--only SELECTOR Table selector: Disable all tables except specified
--force-only SELECTOR Table selector: Disable all tables except specified which will be forced
--force SELECTOR Table selector: Force table update
--force-with-downstream SELECTOR
Table selector: Force table update, force downstream tables too
--force-with-upstream SELECTOR
Table selector: Force table update, force upstream tables too
--enable SELECTOR Table selector: Don't skip or force table, use the update strategy as specified in the Spec file
--skip SELECTOR Table selector: Skip table update
--skip-with-downstream SELECTOR
Table selector: Skip table update, skip downstream tables too
--skip-with-upstream SELECTOR
Table selector: Skip table update, skip upstream tables too
-e,--exclude-unmanaged-tables
Show only managed tables in the DOT graph exclusively
-S,--strict-mustache Strict mustache validation mode
--use-spec-names Use unprocessed table names as filenames
-i,--interactive Watches for changed files and constantly revalidates
-r,--rolling Does not clear the screen in live validation mode
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
-
-o
,--output-dir
parameter is used to specify output directory to put SQL files into. Directory will also contain dependency diagram (in form of.dot
and.pdf
files) between the tables. -
-f
,--force
flag is used to indicate, that it is OK for Napkin to override target directory. - By default, Napkin will add both managed and unmanaged tabled into a dependency diagram.
-e
,--exclude-unmanaged-tables
flag forces Napkin to only include managed tables. - with
--use-spec-names
Napkin will use original table names (as declared in spec.yaml) for the file names in dump output. Otherwise, file names are equal to the table names after applying preprocessors. - In
-i
,--interactive
mode, Napkin constantly monitors all the files and folders, mentioned in spec.yaml file automatically re-dumps the spec on any change. It is convenient to havenapkin dump --interactive
process opened in a separate window, while you are working on the SQL files or the spec.yaml definition. - By default,
--interactive
mode clears the screen before each dump cycle. To switch that feature off, you can use-r
,--rolling
mode, in which Napkin will constantly append dump status and error messages (in any) to the end of the output.
Usually napkin dump
command will create following directory structure:
shell
napkin dump -o dump
output
dump
├── MANIFEST.txt
├─ artist_album_count
│ ├─ 1.sql
│ └─ program.log
├─ artist_album_count_via_mustache
│ ├─ 1.sql
│ └─ program.log
├─ day_totals_1
│ ├─ create
│ │ ├─ 3.sql
│ │ └─ program.log
│ └─ update
│ ├─ 3.sql
│ ├─ 4.sql
│ ├─ 5.sql
│ └─ program.log
├─ dependency_graph.dot
├─ dependency_graph.pdf
├─ dump.json
├─ long_to_wide
│ ├─ 1.sql
│ ├─ 2.sql
│ └─ program.log
└─ popular_tracks_for_pink_floyd
├─ 1.sql
└─ program.log
Some source sql folders, like day_totals_1
will produce multiple execution paths:
-
create
folder with a singlesql
file – query, which Napkin would execute in order to createday_totals_1
table is it is not exist yet. -
update
folder with number ofsql
files in it – queries Napkin would execute in order to update already existingday_totals_1
table.
Along with output SQL files, napkin dump
command also generates dependency diagram between the tables in Graphviz format:
graph
digraph {
1 [label=Album
,shape=box
,style=dotted];
2 [label=Artist
,shape=box
,style=dotted];
3 [label=InvoiceLine
,shape=box
,style=dotted];
4 [label=Track
,shape=box
,style=dotted];
5 [label=artist_album_count
,fillcolor="#59a14f"
,fontcolor="#ffffff"
,style=filled];
6 [label=artist_album_count_via_mustache
,fillcolor="#edc948"
,fontcolor="#000000"
,style=filled];
7 [label=artist_album_count_via_mustache_2
,fillcolor="#b07aa1"
,fontcolor="#ffffff"
,style=filled];
8 [label=artist_hex
,fillcolor="#4e79a7"
,fontcolor="#ffffff"
,style=filled];
9 [label=artist_track_count
,fillcolor="#f28e2b"
,fontcolor="#000000"
,style=filled];
10 [label=artist_track_count_2
,fillcolor="#e15759"
,fontcolor="#ffffff"
,style=filled];
11 [label=artist_track_count_view
,fillcolor="#76b7b2"
,fontcolor="#000000"
,style=filled];
12 [label=popular_tracks_for_pink_floyd
,fillcolor="#59a14f"
,fontcolor="#ffffff"
,style=filled];
1 -> 5;
1 -> 6;
1 -> 7;
1 -> 9;
1 -> 11;
1 -> 12;
2 -> 5;
2 -> 6;
2 -> 7;
2 -> 8;
2 -> 9;
2 -> 11;
2 -> 12;
3 -> 12;
4 -> 9;
4 -> 11;
4 -> 12;
9 -> 10;
}
In PDF format, this diagram (download) looks like that:
Optimize
The command combines a set of features for discovery and fixing semantic issues with SQL queries or spec itself.
Unused CTE column
CTE is a named subquery introduced with WITH keyword.
In a complex query comprised of many CTE subqueries a user might forget to remove helper column just after debugging and the column remains in the query and contributes to spec running costs.
table_a2.sql
with cte as (select a, b from t) select b from cte
shell
napkin optimize -s specs/spec.yaml
Table: tbl_b2 (table_a2.sql)
unused column cte.a
Duplicated CTE
This optimization aims equivalent CTEs defined in the same query.
table_a2.sql
with cte1 as (select a from t where b*a > 0),
cte2 as (select a from t where a*b > 0)
select b from t2 where a1 in cte1 or a2 in cte2
Table: tbl_b2 (table_a2.sql)
duplicated CTE tables: cte1, cte2
By default optimize just prints discovered issues, though it is possible to ask Napkin to resolve the issues automatically. Automatic refactoring doesn’t preserve original SQL formatting. SQL based optimizer is supported only by generic dialect.
shell
napkin optimize --help
Usage: napkin optimize [-s|--spec-file SPEC_YAML]
[(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
(-s|--spec-file SPEC_YAML)]
[--override /json/pointer={"json": "value"}] [--mode ARG]
[--selector ARG] [-S|--strict-mustache]
[--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
Spec/query semantic improvements (e.g. unused columns)
Available options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
--mode ARG Mode for running optimizations. Could be:
show-issues, interactive (default: show-issues)
--selector ARG Optimization feature selector. By default all
optimizations are on: unused-columns,
duplicated-queries. To turn on extra feature use "+"
prefix to remove feature "-" prefix and to run just
specific features don't use any prefix.
-S,--strict-mustache Strict mustache validation mode
--arg ARG=VALUE Argument to be passed to spec
--arg-json JSON Spec arguments encoded as JSON object
--arg-file FILE Spec arguments stored in JSON file
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
Haddock
shell
napkin haddock --help
Usage: napkin haddock
Opens web page with Napkin haddocks
Available options:
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
Napkin is written in Haskell and currently exposes all internal APIs for external use (as it allows meta-programming, expressing SQL statements programmatically, etc.).
Links to the Napkin Haskell API documentation for each individual version can be found here.
You can always “ask” Napkin CLI for the exact link to the API reference for the currently installed version. It will open a browser with an API docs for the exact git commit SHA Napkin was built from.
shell
napkin haddock
output
Opening "https://napkin-public-storage-bucket.s3.us-east-1.amazonaws.com/haddock/git-hash/a8811868647ccc54e5c716e1a2781fe1c8b2ef9b/index.html" URL in the browser
Docs
shell
napkin docs --help
Usage: napkin docs
Opens web page with Napkin tutorial
Available options:
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
As well as for Haddock API reference, Napkin CLI can be asked for a link to this documentation site.
shell
napkin docs
Opening "https://docs.napkin.run/fundamentals" URL in the browser
REPL
napkin repl
can be useful when using advanced meta-programming features. When this command is used Napkin will spawn Haskell GHCI session. Spec file has to be provided, so the REPL session will be configured appropriately.
shell
napkin run --help
Usage: napkin repl [-s|--spec-file SPEC_YAML]
[(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
(-s|--spec-file SPEC_YAML)]
[--override /json/pointer={"json": "value"}]
[--arg ARG=VALUE | --arg-json JSON | --arg-file FILE]
Drops into Napkin repl
Available options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
--arg ARG=VALUE Argument to be passed to spec
--arg-json JSON Spec arguments encoded as JSON object
--arg-file FILE Spec arguments stored in JSON file
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
Yaml-Schema
shell
napkin yaml-schema --help
Usage: napkin yaml-schema OUTPUT
Stores YAML schema in a file
Available options:
OUTPUT Name of the file to put generated yaml schema
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
Many modern text editors and IDEs have (either native or through plugins) support for YAML Schema to enhance the process of editing complex YAML files. Napkin provides a way to export YAML schema to an external file - so it is possible to configure your editor properly.
In addition, if your editor or plugin supports that feature, you can point it to the web link with a latest YAML schema. YAML schemas for previous releases can be found here.
-
OUTPUT
parameter is a path to a file to put generate yaml schema for the spec.yaml config.
For example:
shell
napkin yaml-schema $HOME/napkin.schema
Saving YAML schema to a file: /Users/user/napkin.schema
Hie-Bios
shell
napkin hie-bios --help
Usage: napkin hie-bios [-s|--spec-file SPEC_YAML]
[(-s|--spec-file SPEC_YAML) | (-y|--yaml-merge STRATEGY)
(-s|--spec-file SPEC_YAML)]
[--override /json/pointer={"json": "value"}]
[GHC_OPTIONS]
Used by Haskell Language Server
Available options:
-s,--spec-file SPEC_YAML Path to the spec yaml file
(default: "specs/spec.yaml")
-y,--yaml-merge STRATEGY YAML array merge strategy. Possible values:
MatchIndex, Prepend, Append, Replace.
--override /json/pointer={"json": "value"}
Set arbitrary JSON pointer (RFC 6901) in YAML spec
with the new value
GHC_OPTIONS List of extra GHC arguments to pass
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
napkin hie-bios
provides support for advanced meta-programming with Napkin. This command is not supposed to be used directly by the user.
Haskell Language Server sets HIE_BIOS_OUTPUT
environment variable, which is used by Napkin as a file name to write all inferred and user-supplied options to. For debug purposes, Napkin defaults file location to hie-bios.txt
in the current working directory.
shell
HIE_BIOS_OUTPUT=/dev/tty napkin hie-bios -s specs/spec.yaml
...
-package
containers
-package
napkin
-package
ordered-containers
-package
unordered-containers
-package
text
-package
containers
-package
aeson
...
-XBangPatterns
-XBinaryLiterals
-XConstrainedClassMethods
-XConstraintKinds
-XDeriveDataTypeable
-XDeriveFoldable
...
Usually, it is enough to add the hie.yaml
to the root of the Napkin project for the HLS plugin from your editor to load.
hie.yaml
cradle:
bios:
shell: napkin hie-bios --spec-file specs/spec.yaml
-
GHC_OPTIONS
- List of additional GHC options, which Napkin will pass to the GHC through HLS.
Version
shell
napkin version --help
Usage: napkin version
Prints Git SHA and version of the build
Available options:
-h,--help Show this help text
Global options:
-v,--verbose Print debug log
-l,--log-level LOG_LEVEL Log severity level. Possible values: Debug, Info,
Notice, Warning, Error, Critical, Alert, Emergency.
(default: Info)
--log-format LOG_FORMAT Log line format. Possible values: Simple, Server,
Json. (default: Simple)
Napkin can print current version, release data and exact git commit on it was built.
For example:
shell
napkin version
Napkin version: 0.5.11
Git commit hash: a8811868647ccc54e5c716e1a2781fe1c8b2ef9b
Built at: 2022-01-04 10:24:21.398355 UTC
If you want to install a newer version of Napkin – please, follow installation instructions.