CRAN Package Check Results for Package tm

Last updated on 2023-10-01 12:00:52 CEST.

Flavor Version Tinstall Tcheck Ttotal Status Flags
r-devel-linux-x86_64-debian-clang 0.7-11 38.85 70.69 109.54 NOTE
r-devel-linux-x86_64-debian-gcc 0.7-11 24.46 54.10 78.56 NOTE
r-devel-linux-x86_64-fedora-clang 0.7-11 151.13 NOTE
r-devel-linux-x86_64-fedora-gcc 0.7-11 165.90 NOTE
r-devel-windows-x86_64 0.7-11 33.00 139.00 172.00 NOTE
r-patched-linux-x86_64 0.7-11 37.49 69.44 106.93 NOTE
r-release-linux-x86_64 0.7-11 31.46 63.07 94.53 NOTE
r-release-macos-arm64 0.7-11 50.00 NOTE
r-release-macos-x86_64 0.7-11 96.00 NOTE
r-release-windows-x86_64 0.7-11 30.00 139.00 169.00 NOTE
r-oldrel-macos-arm64 0.7-11 47.00 NOTE
r-oldrel-macos-x86_64 0.7-11 68.00 ERROR
r-oldrel-windows-x86_64 0.7-11 37.00 133.00 170.00 NOTE

Check Details

Version: 0.7-11
Check: package dependencies
Result: NOTE
    Packages suggested but not available for checking:
     'Rcampdf', 'tm.lexicon.GeneralInquirer'
Flavors: r-devel-linux-x86_64-debian-clang, r-devel-linux-x86_64-debian-gcc, r-devel-linux-x86_64-fedora-clang, r-devel-linux-x86_64-fedora-gcc, r-patched-linux-x86_64, r-release-linux-x86_64, r-release-macos-arm64, r-oldrel-macos-arm64, r-oldrel-macos-x86_64

Version: 0.7-11
Check: Rd cross-references
Result: NOTE
    Undeclared packages ‘tm.plugin.dc’, ‘readtext’, ‘graph’, ‘tau’, ‘tokenizers’ in Rd xrefs
Flavor: r-devel-linux-x86_64-fedora-clang

Version: 0.7-11
Check: package dependencies
Result: NOTE
    Packages suggested but not available for checking:
     'Rcampdf', 'Rpoppler', 'tm.lexicon.GeneralInquirer'
Flavors: r-devel-windows-x86_64, r-release-macos-x86_64, r-release-windows-x86_64, r-oldrel-windows-x86_64

Version: 0.7-11
Check: Rd cross-references
Result: NOTE
    Package unavailable to check Rd xrefs: ‘Rpoppler’
Flavor: r-release-macos-x86_64

Version: 0.7-11
Check: examples
Result: ERROR
    Running examples in ‘tm-Ex.R’ failed
    The error most likely occurred in:
    > ### Name: readPDF
    > ### Title: Read In a PDF Document
    > ### Aliases: readPDF
    > ### Keywords: file
    > ### ** Examples
    > uri <- paste0("file://",
    + system.file(file.path("doc", "tm.pdf"), package = "tm"))
    > engine <- if(nzchar(system.file(package = "pdftools"))) {
    + "pdftools"
    + } else {
    + "ghostscript"
    + }
    > reader <- readPDF(engine)
    > pdf <- reader(elem = list(uri = uri), language = "en", id = "id1")
    > cat(content(pdf)[1])
     Introduction to the tm Package
     Text Mining in R
     Ingo Feinerer
     February 5, 2023
    This vignette gives a short introduction to text mining in R utilizing the text mining framework provided by
    the tm package. We present methods for data import, corpus handling, preprocessing, metadata management,
    and creation of term-document matrices. Our focus is on the main aspects of getting started with text mining
    in R—an in-depth description of the text mining infrastructure offered by tm was published in the Journal of
    Statistical Software (Feinerer et al., 2008). An introductory article on text mining in R was published in R
    News (Feinerer, 2008).
    Data Import
    The main structure for managing documents in tm is a so-called Corpus, representing a collection of text
    documents. A corpus is an abstract concept, and there can exist several implementations in parallel. The
    default implementation is the so-called VCorpus (short for Volatile Corpus) which realizes a semantics as known
    from most R objects: corpora are R objects held fully in memory. We denote this as volatile since once the
    R object is destroyed, the whole corpus is gone. Such a volatile corpus can be created via the constructor
    VCorpus(x, readerControl). Another implementation is the PCorpus which implements a Permanent Corpus
    semantics, i.e., the documents are physically stored outside of R (e.g., in a database), corresponding R objects
    are basically only pointers to external structures, and changes to the underlying corpus are reflected to all R
    objects associated with it. Compared to the volatile corpus the corpus encapsulated by a permanent corpus
    object is not destroyed if the corresponding R object is released.
     Within the corpus constructor, x must be a Source object which abstracts the input location. tm provides a
    set of predefined sources, e.g., DirSource, VectorSource, or DataframeSource, which handle a directory, a vector
    interpreting each component as document, or data frame like structures (like CSV files), respectively. Except
    DirSource, which is designed solely for directories on a file system, and VectorSource, which only accepts (char-
    acter) vectors, most other implemented sources can take connections as input (a character string is interpreted
    as file path). getSources() lists available sources, and users can create their own sources.
     The second argument readerControl of the corpus constructor has to be a list with the named components
    reader and language. The first component reader constructs a text document from elements delivered by
    a source. The tm package ships with several readers (e.g., readPlain(), readPDF(), readDOC(), . . . ). See
    getReaders() for an up-to-date list of available readers. Each source has a default reader which can be
    overridden. E.g., for DirSource the default just reads in the input files and interprets their content as text.
    Finally, the second component language sets the texts’ language (preferably using ISO 639-2 codes).
     In case of a permanent corpus, a third argument dbControl has to be a list with the named components
    dbName giving the filename holding the sourced out objects (i.e., the database), and dbType holding a valid
    database type as supported by package filehash. Activated database support reduces the memory demand,
    however, access gets slower since each operation is limited by the hard disk’s read and write capabilities.
     So e.g., plain text files in the directory txt containing Latin (lat) texts by the Roman poet Ovid can be
    read in with following code:
    > txt <- system.file("texts", "txt", package = "tm")
    > (ovid <- VCorpus(DirSource(txt, encoding = "UTF-8"),
    + readerControl = list(language = "lat")))
    Metadata: corpus specific: 0, document level (indexed): 0
    Content: documents: 5
    > VCorpus(URISource(uri, mode = ""),
    + readerControl = list(reader = readPDF(engine = "ghostscript")))
    sh: : command not found
    Error in system2(gs_cmd, c("-dNODISPLAY -q", sprintf("-sFile=%s", shQuote(file)), :
     error in running command
    Calls: VCorpus ... mapply -> <Anonymous> -> <Anonymous> -> pdf_info -> system2
    Execution halted
Flavor: r-oldrel-macos-x86_64