cleanNLP 3.0

Version 3 of the toolkit introduces significant changes to the package, largely in response to improvements in the underlying NLP annotators.

Users with scripts based on the previous version of cleanNLP will need to modify them to match the new semantics. We believe the small changes required will make the toolkit easier to both install and use.

cleanNLP 2.0

This is a major re-structuring of the cleanNLP package. The primary changes include:

There are also many internal changes, primarily to deal with the new spaCy (2.0) version and to make the use of udpipe more naturally.

cleanNLP 1.10

In this version, the internal mechanisms for running the tokenizers backend have been changed. We are now directly calling the stringi functions with options that better mimic those of the the spaCy and CoreNLP backends. Despite the lack of dependency on the tokenizers package, we will continue to use the name “tokenizers” for the backend to maintain backwards consistency.

As part of the change to custom stringi function, we now also support setting the locale as part of initalizing the tokenizers backend. This allows for an easy way of tokenizing text where custom spaCy or coreNLP models do not yet exist.

There is currently a pre-release version of spaCy version 2.0.0. The current version has been tested and runs smoothly with cleanNLP. The new neural network models are sufficently faster and more accurate; we suggest migrating to the version 2 series as it becomes stable for production.

cleanNLP 1.8.0

This version contains many internal changes to the way that external libraries are called and referenced in order to comply with goodpractice::gp(). Two important user-facing changes include:

cleanNLP 1.7.0

This update contains several major changes, include: