PubMedMining-vignette

Jeff DIDIER

This package has been created for easy and fast term-based text mining of the broad PubMed article repository. To find relevant articles to your research topic, you must:

The terms are stored as character strings in the according variables “fixterms” and “pubterms”. The desired output pathway can be stored in the “output” variable.

fixterms = c("bike", "downhill")
pubterms = c("dangerous", "extreme", "injuries")
output = getwd() #or "YOUR/DESIRED/PATHWAY"
pubmed_textmining(fixterms, pubterms, output)

Two kinds of results are generated by the function (.txt files):

Definition of Pointwise Mutual Information (PMI) scoring:
Good collocation pairs have high PMI because the probability of co-occurrence is only slightly lower than the probabilities of occurrence of each word. Conversely, a pair of words whose probabilities of occurrence are considerably higher than their probability of co-occurrence gets a small PMI score. If PMi = -Inf, no articles found for the respective collocation pair.