--- title: "The ggsql knitr engine" knitr: opts_chunk: collapse: true comment: "#>" description: | An introduction to using the ggsql knitr engine in Rmarkdown and Quarto vignette: > %\VignetteIndexEntry{The ggsql knitr engine} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{r setup} library(ggsql) ``` The main selling point of the ggsql R package is arguably it's knitr engine that allows you to add `{ggsql}` blocks to your Rmarkdown and Quarto documents. While ggsql also provides a Jupyter kernel that Quarto can use, Jupyter only allows a single kernel at a time, so if you wish to mix R and/or Python blocks with ggsql visualizations, then the ggsql knitr engine is the way. This vignette will show you how to set it up and use it. ## Registering the engine Before knitr knows what to do with a `{ggsql}` code chunk you need to register the ggsql engine. Registration happens automatically when ggsql is loaded, so the only thing required is to load the package in the setup chunk: ```{{r setup}} library(ggsql) # register the engine # Other stuff you may want to do set.seed(7) ``` ## The `{ggsql}` chunk Once you've registered the engine you can now begin to add `{ggsql}` blocks to your document containing ggsql queries: ```{{ggsql}} VISUALIZE species AS x FROM ggsql:penguins DRAW bar ``` Which, when rendered will result in: ```{ggsql} VISUALIZE species AS x FROM ggsql:penguins DRAW bar ``` If the code only contains a pure SQL query, then the table is returned: ```{ggsql} SELECT COUNT(*) AS count, species FROM ggsql:penguins GROUP BY species ``` ### Chunk options The ggsql knitr engine responds to all the standard plot related chunk options you are familiar with. However, there are a few exceptions: * `dev` and `dev.args`: ggsql does not use the R graphics device system for rendering * `fig.showtext`: The showtext package works with the graphics device system * `external` and `sanitize`: ggsql does not use the tikz system for rendering In addition to the standards there are two new options you can use: * `connection` * `writer` Both will be described below. The ggsql engine supports inline chunk options, using both the standard SQL comment chararcters (`--`) and the R comment character that many users will find recognizable for chunk options. Either of these, proceeded by the pipe character (`--|` and `#|`) will mark a chunk option line as long as they are the first appearing in the chunk. #### Defining a backend By default, a ggsql chunk will provide an in memory DuckDB writer when used with the knitr engine. This is fine as long as you don't need to interact directly with data from your warehouse. If you need something else you can provide a different connection string in the `connection` chunk option. Chunks that use the same connection will use the same writer, so temporary tables and views will be accessible across chunks. To interact directly with a file based DuckDB database living at `/data/users.db` you'd set it like so: ```{{ggsql}} --| connection: duckdb:///data/users.db VISUALIZE ... ``` #### Choosing a writer ggsql doesn't have the concept of graphics devices as you may be familiar with from R. Instead it has *writers* which are responsible for rendering. Currently we only provide a Vega-Lite writer, but it comes in three flavors in the knitr engine: `vegalite`, `vegalite_svg`, and `vegalite_png`. The former renders to a Vega-Lite spec which is then rendered to the final graphic when the page HTML is rendered. The latter two uses Vega-Lite off-screen to render to SVG and PNG respectively, making the final document fully stand-alone. The default for HTML documents is to use the `vegalite` writer and for PDF to use the `vegalite_png` writer, but you can overwrite this using the `writer` option: ```{{ggsql}} --| writer: vegalite_svg VISUALIZE ... ``` The `writer` option also affects rendering of `Spec` objects (as returned by `ggsql_execute()`) in standard `{r}` code chunks. ## Communicating with R and Python The big selling point of the knitr engine over the ggsql Jupyter kernel is its ability to co-exist with R and Python chunks and use data defined in them. This means you can bring data into R, do some manipulation or statistical analysis that would be difficult to express in SQL, and then use ggsql to directly visualize it. The way this works borrows the syntax from referencing built-in data in ggsql. Above, we used `ggsql:penguins` to refer to the penguins dataset shipped with ggsql. Likewise, you can prefix a table with `r:` and `py:` to refer to data from the documents R and Python chunks respectively: ```{r} gapminder_2002 <- gapminder::gapminder[gapminder::gapminder$year == 2002, ] ``` ```{ggsql} VISUALIZE gdpPercap AS x, lifeExp AS y FROM r:gapminder_2002 DRAW point MAPPING continent AS stroke, pop AS size SETTING fill => null LABEL title => 'Rendering data from R' ``` ### Getting data back from ggsql While there is certainly more use from the above direction of communication it is also possible to use tables or views created in ggsql chunks from R and Python. The way this works mirrors how R and Python communicate with each other. The ggsql knitr engine provides a `sql` object in the R and Python environments respectively which can be indexed to return any table or view available in the ggsql backend. If your document uses multiple ggsql backends the `connection` option of the R/Python chunk determines which backend is used to reference into: ```{ggsql} CREATE VIEW may_airquality AS SELECT * FROM ggsql:airquality WHERE Month = 5 ``` ```{ggsql} CREATE TABLE penguin_count AS SELECT COUNT(*) AS number, species FROM ggsql:penguins GROUP BY species ``` ```{r} names(sql) sql$penguin_count ``` ```{r} #| include: false # ggsql caches builtin datasets (penguins, airquality) as parquet files in # the OS temp dir (Rust's `env::temp_dir()`, i.e. `$TMPDIR` or `/tmp`) and # never cleans them up. Remove them so R CMD check doesn't flag them as # detritus in the temp directory. tmp_root <- Sys.getenv("TMPDIR", unset = "/tmp") unlink(file.path(tmp_root, c("airquality.parquet", "penguins.parquet"))) ```