This package has been designed to provide teaching materials for various statistics courses that are aimed at students.
The main function of this package is called gh
, which
can be used to perform several tasks such as:
Using this function, students can access various teaching materials such as interactive apps, R code, data files, and other resources, which can be helpful in learning statistics concepts. By providing easy access to these materials, the package aims to facilitate the learning process for students and make it more interactive and engaging.
GitHub allows downloading the repository as a ZIP file, see in the
repository under the Code
button
(Download ZIP
). mmstat4
works with this ZIP
file, but you can also use any of your own ZIP files.
In my courses I assume that all R programs run in a freshly started R, i.e. there are no path dependencies, all necessary libraries are loaded in the R program and so on. My repositories contain not only the example programs for the students, but also the programs I use to generate images and tables, and also the Shiny Apps I show.
ghget
A ZIP file or repository can be stored locally or in the internet. A user defined key has to be given for the location of the ZIP file
ghget(dummy="https://github.com/sigbertklinke/mmstat4.dummy/archive/refs/heads/main.zip")
Three repositories are predefined: hu.data
,
hu.stat
and dummy
. You can retrieve them
via
ghget('dummy')
ghget('hu.stat')
ghget('hu.data')
ghget() # uses hu.data
ghget
downloads the ZIP file, saves it to a temporary
location and unpacks it. For non-temporary locations, see the FAQ.
In addition, unique short names, related to the zip file content, are generated from the path components.
After unpacking the ZIP file, unique short names are generated for these files.
ghget('dummy')
gd <- ghdecompose(ghlist(full.names=TRUE))
head(gd)
#> commonpath uniquepath minpath filename
#> 1 /tmp/RtmpiCpOOd/mmstat4.dummy-main LICENSE
#> 2 /tmp/RtmpiCpOOd/mmstat4.dummy-main README.md
#> 3 /tmp/RtmpiCpOOd/mmstat4.dummy-main data 12411-0006.csv
#> 4 /tmp/RtmpiCpOOd/mmstat4.dummy-main data ArbeitsloseBerlin.csv
#> 5 /tmp/RtmpiCpOOd/mmstat4.dummy-main data BANK2.sav
#> 6 /tmp/RtmpiCpOOd/mmstat4.dummy-main data Preisindex.csv
#> source
#> 1 /tmp/RtmpiCpOOd/mmstat4.dummy-main/LICENSE
#> 2 /tmp/RtmpiCpOOd/mmstat4.dummy-main/README.md
#> 3 /tmp/RtmpiCpOOd/mmstat4.dummy-main/data/12411-0006.csv
#> 4 /tmp/RtmpiCpOOd/mmstat4.dummy-main/data/ArbeitsloseBerlin.csv
#> 5 /tmp/RtmpiCpOOd/mmstat4.dummy-main/data/BANK2.sav
#> 6 /tmp/RtmpiCpOOd/mmstat4.dummy-main/data/Preisindex.csv
The file name is split into four parts. The last two parts,
minpath
and filename
, are used to create short
names:
/tmp/RtmpXXXXXX/mmstat4.dummy-main/LICENSE
is
LICENSE
. There was no other file named LICENSE
in the ZIP file. Therefore, it is sufficient to address this file in the
ZIP file./tmp/RtmpXXXXXX/mmstat4.dummy-main/data/BANK2.sav
is
data/BANK2.sav
. There is another file called
BANK2.sav
in the ZIP file, but to address it uniquely,
data/BANK2.sav
is sufficient for this file in the ZIP file
(the other is dbscan/BANK2.sav
). Currently, no check is
made whether two files with identical basenames are also identical in
content.ghlist("BANK2", full.names=TRUE) # full names
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/BANK2.sav"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/examples/data/cluster/dbscan/BANK2.sav"
ghlist("BANK2") # short names
#> [1] "data/BANK2.sav" "dbscan/BANK2.sav"
ghopen
, ghload
, ghsource
The short names (or the full names) can be used to work with the files
## x <- ghload("data/BANK2.sav") # load data via rio::import
## ghopen("univariate/example_ecdf.R") # open file in RStudio editor
## ghsource("univariate/example_ecdf.R") # execute file via source
ghlist("example_ecdf") # "univariate/" was unnecessary
#> [1] "example_ecdf.R"
ghlist
, ghquery
With ghlist
you can get a list of unique (short) names
for all files or a subset based on a regular expression
pattern
in the repository
str(ghlist()) # get all short names
#> chr [1:473] "LICENSE" "README.md" "12411-0006.csv" "ArbeitsloseBerlin.csv" ...
ghlist("\\.pdf$") # get all short names of PDF files
#> [1] "Aufgaben.pdf" "Formelsammlung.pdf" "Loesungen.pdf"
With ghquery
you can query the list of unique (short)
names for all files based on the overlap distance.
ghlist("bnk") # pattern = "bnk
#> character(0)
ghquery("bnk") # nearest string matching to "bnk"
#> [1] "data/BANK2.sav" "dbscan/BANK2.sav" "dbscan.R" "kernel.R"
#> [5] "dbscan2.R" "linkage.R"
ghfile
, ghpath
,
ghdecompose
ghfile
tries to find a unique match for a given file and
returns the full path. If there is no unique match, an error is returned
with some possible matches.
ghdecompose
builds a data frame and decomposes the full
names of the files into
commonpath
the path part which is the same for all
files,uniquepath
the path part that is unique for all
files,minpath
the minimum path part, so that all files are
uniquely addressable,filename
the base name of the file, andsource
the input for shortpath.The short names for the files are built from the components
minpath
and filename
.
ghpath
builds up the short name with various path
components from a ghdecompose
object.
ghfile('data/BANK2.sav')
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/BANK2.sav"
ghget(local=system.file("zip", "mmstat4.dummy.zip", package="mmstat4"))
fnf <- ghlist(full.names=TRUE)
dfn <- ghdecompose(fnf)
head(dfn)
#> commonpath uniquepath minpath filename
#> 1 /tmp/RtmpiCpOOd/mmstat4.dummy data hhberlin.csv
#> 2 /tmp/RtmpiCpOOd/mmstat4.dummy data Preisindex.csv
#> 3 /tmp/RtmpiCpOOd/mmstat4.dummy data BANK2.sav
#> 4 /tmp/RtmpiCpOOd/mmstat4.dummy data 12411-0006.csv
#> 5 /tmp/RtmpiCpOOd/mmstat4.dummy data child_data.sav
#> 6 /tmp/RtmpiCpOOd/mmstat4.dummy data hhD.rda
#> source
#> 1 /tmp/RtmpiCpOOd/mmstat4.dummy/data/hhberlin.csv
#> 2 /tmp/RtmpiCpOOd/mmstat4.dummy/data/Preisindex.csv
#> 3 /tmp/RtmpiCpOOd/mmstat4.dummy/data/BANK2.sav
#> 4 /tmp/RtmpiCpOOd/mmstat4.dummy/data/12411-0006.csv
#> 5 /tmp/RtmpiCpOOd/mmstat4.dummy/data/child_data.sav
#> 6 /tmp/RtmpiCpOOd/mmstat4.dummy/data/hhD.rda
head(ghpath(dfn))
#> 1 2 3 4
#> "hhberlin.csv" "Preisindex.csv" "data/BANK2.sav" "12411-0006.csv"
#> 5 6
#> "child_data.sav" "hhD.rda"
The package comes with two RStudio addins (see under
Addins -> MMSTAT4
):
Open a file from a zip file (ghopenAddin
),
which gives access to the unzipped zip file and opens the selected file
in an RStudio editor window.
Execute a Shiny app from a zip file
(ghappAddin
), which extracts all directories containing
Shiny apps and opens the selected app in a web browser (using the
default browser).
Currently there are the following routines to support R code snippets:
Rlibs
, which extracts all library
and
require
calls from the R code snippets and returns a
frequency table of the packages called.ghget(local=system.file("zip", "mmstat4.dummy.zip", package="mmstat4"))
files <- ghlist(pattern="*.R$", full.names = TRUE)
head(Rlibs(files), 30)
#>
#> Amelia CHAID DescTools GGally Hmisc
#> 1 1 6 4 1
#> MASS MissingDataGUI NbClust QuantPsyc RColorBrewer
#> 130 1 1 6 2
#> TeachingDemos UsingR VIM additivityTests agricolae
#> 1 1 2 1 5
#> alphahull andrews ape aplpack ash
#> 1 4 1 3 2
#> boot car cluster coin dbscan
#> 4 13 17 1 3
#> deldir devtools e1071 effsize entropy
#> 1 3 5 1 3
Rsolo
, which checks that each R code snippet runs
smoothly in a freshly started R.# just check the last files from the list
# Note that the R console will show more output (warnings etc.)
Rsolo(files, start=435)
Rdups
, which checks if the duplicate files can be found
based on checksumsfiles <- ghlist(full.names = TRUE)
head(Rdups(files))
#> $c300e8fe6f0bc562256e81670c23d8c0
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy/data/BANK2.sav"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/cluster/dbscan/BANK2.sav"
#>
#> $`4efddb6dc6c7ed743221295d55133817`
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/nnet/mincer_nnet3.R"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/nnet/mincer_nnet5.R"
#>
#> $`9f9fe7603aa82f33bbc85a9d32e39d03`
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/cluster/dbscan/app.tmpl"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/mgraphics/scagnostics/app.tmpl"
#>
#> $`0b74b824367df429803599708daf2e2e`
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/subgroup/example_mosaic.R"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/mgraphics/example_mosaic.R"
#>
#> $`8eaa4f89e233ba69fcda053d238699aa`
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/subgroup/example_mosaic_cotabplot.R"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/mgraphics/example_mosaic_cotabplot.R"
#>
#> $`8ed6128aab796148df5e71cbeab547da`
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/subgroup/example_mosaic_graphics.R"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/mgraphics/example_mosaic_graphics.R"
Note: there is also an error message if the necessary libraries are not installed!
Once you created your ZIP file you need to know under which names a
specific file can be accessed. In the example we use a ZIP file which
comes with the package mmstat4
:
ghget(local=system.file("zip", "mmstat4.dummy.zip", package="mmstat4"))
ghnames <- ghdecompose(ghlist(full.names=TRUE))
ghnames[58,]
#> commonpath uniquepath minpath filename
#> 58 /tmp/RtmpiCpOOd/mmstat4.dummy examples/data/cluster dbscan BANK2.sav
#> source
#> 58 /tmp/RtmpiCpOOd/mmstat4.dummy/examples/data/cluster/dbscan/BANK2.sav
The shortest possible name is determined by minpath
and
filename
. But all other paths determined by
uniquepath
, minpath
and filename
should also work.
For file number 58, the following access names are possible:
BANK2.sav
will not work since more than one file named
BANK2.sav
in the ZIP file.dbscan/BANK2.sav
will work since this the shortest
possible name.cluster/dbscan/BANK2.sav
,
data/cluster/dbscan/BANK2.sav
, and
examples/data/cluster/dbscan/BANK2.sav
will work.x1 <- ghload("BANK2.sav")
#> Possible matches:
#> data/BANK2.sav
#> dbscan/BANK2.sav
#> Error in ghfile(x): Several files for 'BANK2.sav' found, check matches!
x2 <- ghload("dbscan/BANK2.sav")
x3 <- ghload("cluster/dbscan/BANK2.sav")
x4 <- ghload("data/cluster/dbscan/BANK2.sav")
x5 <- ghload("examples/data/cluster/dbscan/BANK2.sav")
Please email me at sigbert@hu-berlin.de
. You can also
try the current development version of the package from GitHub:
# install.packages("devtools")
devtools::install_github("sigbertklinke/mmstat4")
ghget("dummy", .force=TRUE)
ghget("dummy", .tempdir=FALSE) # install non-temporarily
ghget("dummy", .tempdir="~/mmstat4") # install non-temporarily to ~/mmstat4
ghget("dummy", .tempdir=TRUE) # install again temporarily
Note: If a repository was installed permanently and you switch back to temporarily storage then the downloaded files will not be deleted.
ghget("dummy", .tempdir=TRUE)
ghlist(pattern="/(app|server)\\.R$")
ghopen("dbscan") # open the app
csv
data files?ghget("dummy", .tempdir=TRUE)
ghlist(pattern="\\.csv$", ignore.case=TRUE, full.names=TRUE)
#> [1] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/12411-0006.csv"
#> [2] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/ArbeitsloseBerlin.csv"
#> [3] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/Preisindex.csv"
#> [4] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/TelefonDaten.csv"
#> [5] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/haushalte.csv"
#> [6] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/haushalte_berlin.csv"
#> [7] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/hhberlin.csv"
#> [8] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/hhberlin_2017.csv"
#> [9] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/pechstein.csv"
#> [10] "/tmp/RtmpiCpOOd/mmstat4.dummy-main/data/rentcap.csv"
# use mmstat4::ghload for importing
ghlist(pattern="\\.csv$")
#> [1] "12411-0006.csv" "ArbeitsloseBerlin.csv" "Preisindex.csv"
#> [4] "TelefonDaten.csv" "haushalte.csv" "haushalte_berlin.csv"
#> [7] "hhberlin.csv" "hhberlin_2017.csv" "pechstein.csv"
#> [10] "rentcap.csv"
pechstein <- ghload("pechstein.csv")
str(pechstein)
#> 'data.frame': 29 obs. of 3 variables:
#> $ Datum : chr "04.02.00" "01.02.01" "10.11.01" "06.02.02" ...
#> $ Tag : int 34 397 679 767 771 783 1043 1160 1166 1421 ...
#> $ Retikulozyten: chr "2,3" "2,5" "2,45" "2,1" ...
The package has three default repositories: dummy
,
hu.stat
, and hu.data
.
Repository | Size | ZIP file location |
---|---|---|
dummy |
3 MB | https://github.com/sigbertklinke/mmstat4.dummy/archive/refs/heads/main.zip |
hu.data |
29 MB | https://github.com/sigbertklinke/mmstat4.data/archive/refs/heads/main.zip |
hu.stat |
31 MB | https://github.com/sigbertklinke/mmstat4.stat/archive/refs/heads/main.zip |
dummy
is small subsample of hu.stat
and
hu.data
which is intended for examples and test
purposes.
Mathematische Grundlagen - Einführung - Grundbegriffe - Univariate Verteilungen - Parameter univariater Verteilungen - Bivariate Verteilungen - Parameter bivariater Verteilungen - Regressionanalyse - Zeitreihenanalyse - Indexzahlen - Wahrscheinlichkeitsrechnung - Zufallsvariablen - So lügt man mit Statistik - Wichtige Verteilungsmodelle - Stichprobentheorie - Statistische Schätzverfahren - Regressionsmodell - Konfidenzintervalle - Statistische Testverfahren - Parameterische Tests - Nichtparametrische Tests
ghget("hu.stat")
ghopen("Statistik.pdf")
ghopen("Aufgaben.pdf")
ghopen("Loesungen.pdf")
ghopen("Formelsammlung.pdf")
General - R - Basics and data generation - Test and estimation theory - Parameter of distributions - Distribution - Transformations - Robust statistics - Missing values - Subgroup analysis - Correlation and association - Multivariate graphics - Principal component analysis - Exploratory factor analysis - Reliability - Cluster analysis - Regression analysis - Linear regression - Nonparametric regression - Classification and regression trees - Neural networks
ghget("hu.data")
ghopen("dataanalysis.pdf")
Einführung - Entdeckung und Identifikation von Ausreißern - Prüfung der Verteilungsform von Variablen - Parametervergleiche bei unbhängigen Stichproben - Anhänge A-D, Literaturverzeichnis, Index
ghget("hu.data")
ghopen("cs1_roenz.pdf")
Vorwort - Überprüfung von Zusammenhängen - Regressionsanalyse - Reliabilitäts- und Homogenitätsanalyse von Konstrukten - Anhänge A-H, Literaturverzeichnis, Stichwortverzeichnis
ghget("hu.data")
ghopen("cs2_roenz.pdf")
Einführung - Verallgemeinerte lineare Modelle (generalized linear models, GLM) - Modellierung binärer Daten - Das multinomiale Logit Modell - Modellierung multinomialer Daten (log-lineare Modelle) - Literaturverzeichnis, Index
ghget("hu.data")
ghopen("glm_roenz.pdf")