--- title: "openxlsx2 basic manual" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{openxlsx2 basic manual} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} library(openxlsx2) options(rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Welcome to the basic manual to `openxlsx2`. In this manual you will learn how to use `openxlsx2` to import data from xlsx-files to R as well as how to export data from R to xlsx, and how to import and modify these openxml workbooks in R. This package is based on the work of many contributors to `openxlsx`. It was mostly rewritten using `pugixml` and `R6` making use of modern technology, providing a fresh and easy to use R package. Over the years many people have worked on the tricky task to handle xls and xlsx files. Notably `openxlsx`, but there are countless other R-packages as well as third party libraries or calculation software capable of handling such files. Please feel free to use and test your files with other software and or let us know about your experience. Open an issue on github or write us a mail. ### First steps First let's assume that you have a working installation of `openxlsx2` otherwise run the lines below to install the latest CRAN release: ```{r, eval = FALSE} install.packages("openxlsx2") ``` Now we load the library: ```{r, eval = FALSE} library(openxlsx2) ``` ### Handling workbooks The foundation of `openxlsx2` is a workbook object. You can think of this object as a workbook loaded in a spreadsheet software. We import the entire thing. Every sheet, every chart, image, column, formula style, conditional formatting, pivot table and whatever else a spreadsheet file is allowed to carry. Therefore if you have a file that you want to work with, you can load it with: ```{r, eval = FALSE} wb <- wb_load("your_file.xlsx") ``` We usually name workbook objects `wb` in our documentation, but this is no obligation, you can name your workbook object whatever you like to call them. If you do not have a workbook yet, it is possible to create one. In the next line we will use three wrapper functions `wb_workbook()`, `wb_add_worksheet()`, and `wb_add_data()`. The wrapper functions are piped together using the classic `magrittr` pipe operator `%>%`, but similarly you can use the `R`'s native pipe operator `|>`. [^nopipe] We assume that you have a dataset `your_data`, either a vector, a matrix or a data frame and want to write this in a worksheet: ```{r, eval = FALSE} wb <- wb_workbook() %>% wb_add_worksheet() %>% wb_add_data(x = your_data) ``` [^nopipe]: Basically a pipe operator allows to write code from left to right. Without pipes the code would look like this: ```{r, eval = FALSE} wb <- wb_add_data(wb_add_worksheet(wb_workbook()), x = your_data) ``` Okay, now you have a workbook object, but what have we actually done? Let's work along the pipe syntax: (1) first we have created the workbook object `wb_worbkook()`, (2) next we have assigned it a worksheet `wb_add_worksheet()`, and (3) we have written data onto the worksheet. Let's try this with actual data. We use the `mtcars` dataset. In the code we switch the fictional `your_data` with `mtcars`: ```{r} wb <- wb_workbook() %>% wb_add_worksheet() %>% wb_add_data(x = mtcars) ``` Let's see what the output looks like: ```{r} wb ``` The output looks a little cryptic, it simply tells the name of the worksheet: `wb_add_worksheet()` created a default worksheet name `"Sheet 1"`. In the code above you can see that we do not use `sheet` to tell `wb_add_data()` where it should write the data. This is because internally we use a waiver `current_sheet()` so that we do not have to write `sheet = "Sheet 1"` whenever we work on the same worksheet. Basically the current sheet is updated whenever a new worksheet is added to the workbook. ```{r} wb <- wb_workbook() %>% wb_add_worksheet() %>% wb_add_worksheet() %>% wb_add_data(x = mtcars) ``` This will create two sheets `"Sheet 1"` and `"Sheet 2"` and the data will be written to the second sheet. ```{r} wb ``` So how can we access the data on the sheet? Either with `wb_to_df()` our internal handler to read from workbooks (this is the underlying function for `wb_read()` and `read_xlsx()` which are mere aliases for `wb_to_df()`). So lets have a look at the top of the output: ```{r} wb %>% wb_to_df() %>% head() ``` Ah! The output is on the second sheet. We need either `sheet = 2` or `sheet = "Sheet 2"`. We go with the second variant, because the sheet index position and their name might differ. ```{r} wb %>% wb_to_df(sheet = "Sheet 2") %>% head() ``` This looks like the head of the `mtcars` dataset. So we have successfully read from the workbook. Now you want to export the workbook to a file: ```{r, eval = FALSE} wb %>% wb_save(file = "my_first_worksheet.xlsx") ``` Alternatively you can directly open it in a spreadsheet software (if you have one installed): ```{r, eval = FALSE} wb %>% wb_open() ``` Once again, lets try this with the `USPersonalExpenditure` dataset: ```{r, eval = FALSE} wb <- wb_workbook() wb_add_worksheet(wb, sheet = "USexp") wb_add_data(wb, "USexp", USPersonalExpenditure) ``` ```{r, echo = FALSE} wb <- wb_workbook() wb_add_worksheet(wb, sheet = "USexp") wb_add_data(wb, "USexp", USPersonalExpenditure) %>% try() ``` Dang! What did we do? We've added a worksheet, but wait, did we? No, you have to assign wrapper functions to an object for them to have an effect. Wrapper functions do not alter the workbook objects they are executed on. You can check that the workbook has no worksheets: ```{r} wb %>% wb_get_sheet_names() ``` Once we assign a sheet, this changes, and the data was correctly written: ```{r} wb <- wb_workbook() wb <- wb_add_worksheet(wb, sheet = "USexp") wb <- wb_add_data(wb, "USexp", USPersonalExpenditure) wb_get_sheet_names(wb) wb_to_df(wb) ``` Now you're probably thinking, I don't want to assign the workbook object all the time and all the `wb_` functions are a little tedious to type. There is an alternative for you and it is called chaining. Since the workbook is a `R6` object internally, you can make use of chains. Basically every function that starts with `wb_` should have a underlying function of the same name without the prefix. So our data writing example from above can be written as: ```{r} wb <- wb_workbook()$add_worksheet("USexp")$add_data(x = USPersonalExpenditure) wb$to_df() ``` Whether you use wrapper functions or chain functions is up to you and personal preference. There is just one thing to remember, the documentation is exclusively written for the wrapper function. So if you want to know the arguments for the `wb$add_data()` part, you have to lookup the wrapper functions man page `?wb_add_data`. ### Importing as workbook In addition to importing directly from xlsx or xlsm files, `openxlsx2` provides the `wbWorkbook` class used for importing and modifying entire the openxml files in `R`. This `workbook` class is the heart of `openxlsx2` and probably the reason why you are reading this manual in the first place. Importing a file into a workbook looks like this: ```{r} # the file we are going to load file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2") # loading the file into the workbook wb <- wb_load(file = file) ``` The additional options `wb_load()` provides are for internal use: `sheet` loads only a selected sheet from the workbook and `data_only` reads only the data parts from a workbook and ignores any additional graphics or pivot tables. Both functions create workbook objects that can only be used to read data, and we do not recommend end users to use them. Especially not if they intend to re-export the workbook afterwards. Once a workbook is imported, we provide several functions to interact with and modify it (the `wb_to_df()` function mentioned above works the same way for an imported workbook). It is possible to add new sheets and remove sheets, as well as to add or remove data. R-plots can be inserted and also the style of the workbook can be changed, new fonts, background colors and number formats. There is a wealth of options explained in the man pages and the additional style vignette (more vignettes to follow). ## Exporting data ### Exporting data frames or vectors If you want to export a data frame from R, you can use `write_xlsx()` which will create an xlsx file. This file can be tweaked further. See `?write_xlsx` to see all the options. (further explanation and examples will follow). ```{r, eval = FALSE} write_xlsx(x = mtcars, file = "mtcars.xlsx") ``` ### Exporting a `wbWorkbook` Imported workbooks can be saved as xlsx or xlsm files with the wrapper `wb_save()` or with `wb$save()`. Both functions take the filename and an optional `overwrite` option. If the latter is set, an optional guard is provided to check if the file you want to write already exists. But be careful, this is optional. The default is to save the file and replace an existing file. Of course, on Windows, files that are locked (for example, if they were opened by another process) will not be replaced. ```{r, eval = FALSE} # replace the existing file wb$save("mtcars.xlsx") # do not overwrite the existing file try(wb$save("mtcars.xlsx", overwrite = FALSE)) ``` ## `dims`/ `wb_dims()` In `openxlsx2` functions that interact with worksheet cells are using `dims` as argument and require the users to provide these. `dims` are cells or cell ranges in A1 notation. The single argument `dims` hereby replaces `col`/`row`, `cols`/`rows` and `xy`. Since A1 notation is rather simple in the first few columns it might get confusing after the 26. Therefore we provide a wrapper to construct it: ```{r} # various options wb_dims(from_row = 4) wb_dims(rows = 4, cols = 4) wb_dims(rows = 4, cols = "D") wb_dims(rows = 4:10, cols = 5:9) wb_dims(rows = 4:10, cols = "A:D") # same as below wb_dims(rows = seq_len(7), cols = seq_len(4), from_row = 4) # 10 rows and 15 columns from indice B2. wb_dims(rows = 1:10, cols = 1:15, from_col = "B", from_row = 2) # data + col names wb_dims(x = mtcars) # only data wb_dims(x = mtcars, select = "data") # The dims of the values of a column in `x` wb_dims(x = mtcars, cols = "cyl") # a column in `x` with the column name wb_dims(x = mtcars, cols = "cyl", select = "x") # rows in `x` wb_dims(x = mtcars) # in a wb chain wb <- wb_workbook()$ add_worksheet()$ add_data(x = mtcars)$ add_fill( dims = wb_dims(x = mtcars, rows = 1:5), # only 1st 5 rows of x data color = wb_color("yellow") )$ add_fill( dims = wb_dims(x = mtcars, select = "col_names"), # only column names color = wb_color("cyan2") ) # or if the data's first coord needs to be located in B2. wb_dims_custom <- function(...) { wb_dims(x = mtcars, from_col = "B", from_row = 2, ...) } wb <- wb_workbook()$ add_worksheet()$ add_data(x = mtcars, dims = wb_dims_custom())$ add_fill( dims = wb_dims_custom(rows = 1:5), color = wb_color("yellow") )$ add_fill( dims = wb_dims_custom(select = "col_names"), color = wb_color("cyan2") ) ``` ## A note on speed and memory usage The current state of `openxlsx2` is that it is reasonably fast. That is, it works well with reasonably large input data when reading or writing. It may not work well with data that tests the limits of the `openxml` specification. Things may slow down on the R side of things, and performance and usability will depend on the speed and size of the local operating system's CPU and memory. Note that there are at least two cases where `openxlsx2` constructs potentially large data frames (i) when loading, `openxlsx2` usually needs to read the entire input file into pugixml and convert it into long data frame(s), and `wb_to_df()` converts one long data frame into two data frames that construct the output object and (ii) when adding data to the workbook, `openxlsx2` reshapes the input data frame into a long data frame and stores it in the workbook, and writes the entire worksheet into a pugixml file that is written when it is complete. Applying cell styles, date conversions etc. will further slow down the process and finally the sheets will be zipped to provide the xlsx output. Therefore, if you are faced with an unreasonably large dataset, either give yourself enough time, use another package to write the xlsx output (`openxlsx2` was not written with the intention of working with maximum memory efficiency), and by all means use other ways to store data (binary file formats or a database). However, we are always happy to improve, so if you have found a way to improve what we are currently doing, please let us know and open an issue or a pull request.