Neat Data for Presentation

Shiva

require(neatR)
#> Loading required package: neatR

We use R extensively not just for intensive computation, but also for presentation. Javascript visualization libraries in R and elegant ways to present data using R markdown makes R one stop shop for analytics and data science.

We spend most of the time preparing, cleaning, analyzing and modeling the data. However, the last leg of analytics, which is presentation of results don’t get enough attention most of the times.

neatR package helps in formatting the results by providing simple utility functions covering common use cases.

Formatting dates

Often, we encounter dates which are either in mm/dd/yyyy or dd/mm/yyyy format and wondering what is the month or what is the date especially if there are no date values after 12th day of a month. An unambiguous approach would be to show the date in mmm dd, yyyy format with day of week which is easier to grasp.

ndate(Sys.Date() - 3)
#> [1] "Mar 19, 2023 (Sun)"
ndate(Sys.Date() - 1)
#> [1] "Mar 21, 2023 (Tue)"
ndate(Sys.Date())
#> [1] "Mar 22, 2023 (Wed)"
ndate(Sys.Date() + 1)
#> [1] "Mar 23, 2023 (Thu)"
ndate(Sys.Date() + 4)
#> [1] "Mar 26, 2023 (Sun)"

To just get the date without the day of week, set display.weekday to FALSE

ndate(Sys.Date(), display.weekday = FALSE)
#> [1] "Mar 22, 2023"

When we are looking at the monthly data, abbreviating the date to mmm’yy is an elegant way to show the date and often helpful for charts.

ndate(Sys.Date(), display.weekday = FALSE, is.month = TRUE)
#> [1] "Mar'23"

To see the context of the date with respect to current date (referring dates within 1 week before or after current date), use the nday function.

Day of week with context based on current date, reference.alias can be directly used on dates or timestamps.

nday(Sys.Date(), reference.alias = FALSE)
#> [1] "Wed"
nday(Sys.Date(), reference.alias = TRUE)
#> [1] "Today, Wed"
nday(Sys.time(), reference.alias = TRUE)
#> [1] "Tomorrow, Wed"

Below is another example with context based on current date.

x <- seq(Sys.Date() - 10, Sys.Date() + 10, by = '1 day')
nday(x, reference.alias = TRUE)
#>  [1] "Sun"            "Mon"            "Last Tue"       "Last Wed"      
#>  [5] "Last Thu"       "Last Fri"       "Last Sat"       "Last Sun"      
#>  [9] "Last Mon"       "Yesterday, Tue" "Today, Wed"     "Tomorrow, Thu" 
#> [13] "Coming Fri"     "Coming Sat"     "Coming Sun"     "Coming Mon"    
#> [17] "Coming Tue"     "Coming Wed"     "Coming Thu"     "Fri"           
#> [21] "Sat"

Formatting timestamp

Timestamps are feature rich representation of date and time.

ntimestamp(Sys.time())
#> [1] "Mar 22, 2023 09H 12M 09S PM PDT (Wed)"

To format only date from the timestamp, we can use ndate function.

ndate(Sys.time())
#> [1] "Mar 22, 2023 (Wed)"

To extract and format only the time from timestamp, we can do the following,

ntimestamp(Sys.time(), display.weekday = FALSE,
  include.date = FALSE, include.timezone = FALSE)
#> [1] "09H 12M 09S PM"

Note: Hours are shown based on 12H clock format with AM / PM suffix.

Components of time can be toggled on or off based on preference.

ntimestamp(Sys.time(), include.date = FALSE, display.weekday = FALSE,
           include.hours = TRUE,  include.minutes = TRUE,
           include.seconds = FALSE, include.timezone = FALSE)
#> [1] "09H 12M PM"

Timezone can be toggled on or off using include.timezone parameter.

ntimestamp(Sys.time(), include.timezone = FALSE)
#> [1] "Mar 22, 2023 09H 12M 09S PM (Wed)"

Formatting number

Most of the times, we deal with large numbers which are shown in scientific format from the output of a statistical model or just the raw data itself. nnumber can format the numeric data and show them in easily readable way.

By default, the numbers are formatted in a more appropriate unit that best represents individual values. See the below example,

x <- c(10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000)
nnumber(x)
#> [1] "10.0"     "100.0"    "1.0 K"    "10.0 K"   "100.0 K"  "1.0 Mn"   "10.0 Mn" 
#> [8] "100.0 Mn" "1.0 Bn"
nnumber(x, digits = 0)
#> [1] "10"     "100"    "1 K"    "10 K"   "100 K"  "1 Mn"   "10 Mn"  "100 Mn"
#> [9] "1 Bn"

nnumber can automatically determine best single unit to display all the numbers by setting unit = 'auto'. In the below example the unit of thousand seem to best fit most of the numbers. Any number lower than 0.1K are displayed as ‘<0.1K’ for easier reference.

x <- c(1e6, 99e3, 76e3, 42e3, 12e3, 789, 53)
nnumber(x, unit = 'auto')
#> [1] "1,000 K" "99 K"    "76 K"    "42 K"    "12 K"    "0.8 K"   "0.1 K"

We can specify the units in which the number to be formatted,

nnumber(123456789.123456, unit = 'Mn')
#> [1] "123.5 Mn"

Default units are, ‘K’ for thousand, ‘Mn’ for million, ‘Bn’ for billion, ‘Tn’ for trillions. The unit labeling can be customized using unit.labels which is a list encompassing values and labels.

nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M'))
#> [1] "123.5 M"

Below example, gives customization of all units.

x <- c(10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000)
nnumber(x, unit.labels = 
          list(thousand = 'K', million = 'M', billion= 'B', trillion = T))
#> [1] "10.0"    "100.0"   "1.0 K"   "10.0 K"  "100.0 K" "1.0 M"   "10.0 M" 
#> [8] "100.0 M" "1.0 B"

Along with the formatted number, we can (optionally) add a prefix or suffix.

nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M'),
        prefix = '$ ')
#> [1] "$ 123.5 M"

nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M'),
        suffix = ' CAD')
#> [1] "123.5 M CAD"

nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M'),
        prefix = '$ ', suffix = ' CAD')
#> [1] "$ 123.5 M CAD"

Sometimes, we are interested in showing the number as it is, which can be done by setting unit = ''. thousand.separator parameter is useful in separating the thousands which makes it easy to read the numbers.

nnumber(123456789.123456, digits = 2, unit = '',
        thousand.separator = ',')
#> [1] "123,456,789.1"

thousand.separator can take the following values ",", ".", "'", " ", "_", ""

The parameter unit can take any of the following values,

custom: Unit is customized for each individual values. This is the default value to the unit parameter.

auto: A single unit that best represents the overall data is automatically detected and applied based on majority of the values.

K: The numbers are displayed in thousands.

Mn: The number are displayed in millions.

Bn: The number are displayed in billions.

Tn: The number are displayed in trillions.

If the unit labels are customized and provided via a list, for an example: unit.labels = list(thousand = 'k') then this string k to be provided for the unit.

Formatting percentages

Percentage data can come in two types, with or without multiplied by 100. For an example, 22.8% can be stored as 22.8 or 0.228

npercent(22.8, is.decimal = FALSE)
#> [1] "+22.8%"
npercent(0.228, is.decimal = TRUE)
#> [1] "+22.8%"

By default, is.decimal is set as TRUE and decimal digits is set to 1.

It is also useful to show if the percent is a positive number by adding a prefix of plus sign. This is the default behavior of the npercent function, which can be set to FALSE

npercent(0.228, plus.sign = TRUE)
#> [1] "+22.8%"
npercent(0.228, plus.sign = FALSE)
#> [1] "22.8%"

When the percentages are high (especially while calculating growth from time A to time B), it would be easy to read this as ‘nX’.

tesla_2017 <- 20
tesla_2023 <- 200
g <- (tesla_2023 - tesla_2017)/tesla_2017
npercent(g, plus.sign = TRUE)
#> [1] "+900.0%"
npercent(g, plus.sign = TRUE, factor.out = TRUE)
#> [1] "+900.0% (9x growth)"

Formatting string

Formatting character vectors or string can be done with case type, options to remove special characters and selecting only english characters and numbers from the string.

Below are the available case conversions,

lower: converts string to lower case.

upper: converts string to upper case.

title: converts string to title case (first letter of each word is capitalized except stop words. Based on tools::toTitleCase).

start: converts string to start case (first letter of each word is capitalized and rest of the letters are in lower case).

initcap: converts string to initcap case (first letter of first word is capitalized and rest of the letters are in lower case).

nstring('   All MOdels are wrong.   some ARE useful!!!  â', 
        case = 'title', remove.specials = TRUE)
#> [1] "all Models are Wrong some are Useful â"

To exclude any special characters and retain only numbers and english alphabets, we can set en.only parameter to TRUE

nstring('   All MOdels are wrong.   some ARE useful!!!  â  ', 
        case = 'title', remove.specials = TRUE, en.only = TRUE)
#> [1] "all Models are Wrong some are Useful"

By default, Trailing and leading white spaces are removed and extra white spaces are reduced to single white space.