--- title: "Daily Weather Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Daily Weather Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(GHCNr) library(terra) # for handling countries geometries ``` # Select GHCNd stations The station inventory file of GHCNd is stored at . The function `stations()` can read from this source or from a local file, specified with `filename`. The inventory can also be downloaded to a file using `download_inventory()`. ```{r download-inventory, eval=FALSE} inventory_file <- download_inventory("~/Downloads/ghcn-inventory.txt") s <- stations(inventory_file, variables = "TMAX") ``` ```{r read-inventory, eval=FALSE} s <- stations(variables = "TMAX") ``` By specifying `variables = "TMAX"` only the stations that recorded that variable are kept. Available variables implemented at the moment are precipitation ("PRCP"), minimum temperature ("TMIN"), and maximum temperature ("TMAX"). Stations spanning a time range can be filtered easily. ```{r filter-by-year, eval=FALSE} s <- s[s$startYear <= 1990, ] s <- s[s$endYear >= 2000, ] s # A tibble: 16,763 × 6 station latitude longitude variable startYear endYear 1 AE000041196 25.3 55.5 TMAX 1944 2024 2 AEM00041194 25.3 55.4 TMAX 1983 2024 3 AEM00041217 24.4 54.7 TMAX 1983 2024 4 AFM00040938 34.2 62.2 TMAX 1973 2020 5 AFM00040948 34.6 69.2 TMAX 1966 2021 6 AFM00040990 31.5 65.8 TMAX 1973 2020 7 AG000060390 36.7 3.25 TMAX 1940 2024 8 AG000060590 30.6 2.87 TMAX 1940 2024 9 AG000060611 28.0 9.63 TMAX 1958 2024 10 AG000060680 22.8 5.43 TMAX 1940 2004 # ℹ 16,753 more rows # ℹ Use `print(n = ...)` to see more rows ``` Spatial filters can also be easily applied. Spatial boundaries of countries can be downloaded from using the `get_countr(couuntry_code = ...)` function, where `country_code` is the ISO3 code. ```{r get-country, eval=FALSE} italy <- get_country("ITA") ``` `get_countries()` can take several ISO3 codes to return a geometry of multiple countries. ```{r spatial-filter, eval=FALSE} s <- filter_stations(s, italy) s # A tibble: 41 × 6 station latitude longitude variable startYear endYear 1 IT000016090 45.4 10.9 TMAX 1951 2024 2 IT000016134 44.2 10.7 TMAX 1951 2024 3 IT000016232 42 15 TMAX 1975 2024 4 IT000016239 41.8 12.6 TMAX 1951 2024 5 IT000016320 40.6 17.9 TMAX 1951 2024 6 IT000016560 39.2 9.05 TMAX 1951 2024 7 IT000160220 46.2 11.0 TMAX 1951 2024 8 IT000162240 42.1 12.2 TMAX 1954 2024 9 IT000162580 41.7 16.0 TMAX 1951 2024 10 ITE00100554 45.5 9.19 TMAX 1763 2008 # ℹ 31 more rows # ℹ Use `print(n = ...)` to see more rows ``` # Download daily timeseries Daily timeseries for a station can be downloaded using the `daily()` function. In addition to the station ID, `daily()` needs start and end dates of the timeseries. These should be provided as strings with the format "YYYY-mm-dd", e.g., "1990-01-01". ```{r daily, eval=FALSE} daily_ts <- daily( station_id = "CA003076680", start_date = paste("2002", "11", "01", sep = "-"), end_date = paste("2024", "04", "22", sep = "-"), variables = "tmax" ) daily_ts ``` ```{r daily-saved, echo=FALSE} daily_ts <- CA003076680[, c("date", "station", "tmax", "tmax_flag")] daily_ts ``` Multiple stations can also be downloaded at once. Too many stations will cause the API to fail. ```{r multidaily, eval=FALSE} daily_ts <- daily( station_id = c("CA003076680", "USC00010655"), start_date = paste("2002", "11", "01", sep = "-"), end_date = paste("2024", "04", "22", sep = "-"), variables = "tmax" ) plot(daily_ts, "tmax") ``` ```{r multidaily-saved, echo=FALSE} daily_ts <- rbind(CA003076680, USC00010655)[, c("date", "station", "tmax", "tmax_flag")] plot(daily_ts, "tmax") ``` Implmented variables are "tmin", "tmax", and "prcp". `daily()` returns a table with the value of the variable chosen and associated flags. ## Remove flagged records Flagged records can be removed using `remove_flagged()`. In `remove_flagged()` the argument `strict` (dafault = `TRUE`) specifies which flags to include. The flags removed are: ```{r flags, echo=FALSE} as.list(GHCNr:::.flags(strict = TRUE)) ``` Setting `strict = FALSE` will only remove the flags: ```{r flags-strict, echo=FALSE} as.list(GHCNr:::.flags(strict = FALSE)) ``` This will also remove the "*_flag=" column. ```{r remove-flagged} daily_ts <- remove_flagged(daily_ts) plot(daily_ts, "tmax") ``` # Temporal coverage Coverage of the timeseries can be calculated using `coverage()`. ```{r daily-coverage} station_coverage <- coverage(daily_ts) station_coverage ``` `period_coverage` calculates the coverage across the whole period, including missing years. The output is a table with coverage by month and year (`monthly_coverage`), by year (`annual_coverage`), and for the whole time period (`period_coverage`). `annual_coverage` is constant within the same year and `year` is always a constant. This table is useful to inspect stations that may have problematic timeseries, such as ```{r low-coverage} unique(station_coverage[ station_coverage$annual_coverage_tmax < .95, c("station", "year", "annual_coverage_tmax") ]) ``` # Monthly and annual timeseries, climatological normals The functions `monthly()`, `annual()` nad `normal()` summarized the weather time series to monthly and annual time series and to climatological normal (long-term averages), respectively. Summaries are calculated as follows: - $T_{min}$ is the minimum daily temperature recorded in the month or the year. - $T_{max}$ is the maximum daily temperature recorded in the month or the year. - $Prcp$ is the cumulative precipitation during the month or year. `NA`s are removed during calculation. ```{r monthly} monthly_ts <- monthly(daily_ts) monthly_ts plot(monthly_ts, "tmax") ``` ```{r annual} annual_ts <- annual(daily_ts) annual_ts plot(annual_ts, "tmax") ```