PubMedMining-vignette

Jeff DIDIER

This package has been created for easy and fast term-based text mining of the broad PubMed article repository. To find relevant articles to your research topic, you must:

Figure out the main terms of your research focus (here fixterms)
Figure out important terms that might pivot around your focus (here pubterms)
(optional) define an output for the results files (Default = current location)
Have stable internet access

The terms are stored as character strings in the according variables “fixterms” and “pubterms”. The desired output pathway can be stored in the “output” variable.

fixterms = c("bike", "downhill")
pubterms = c("dangerous", "extreme", "injuries")
output = getwd() #or "YOUR/DESIRED/PATHWAY"
pubmed_textmining(fixterms, pubterms, output)

Two kinds of results are generated by the function (.txt files):

PMI-scores: Point-wise mutual information score table for each fix-term with scores for each pub-term
relevant articles: for each fixterm+pubterm pair, a text file with relevant article titles and publishing year is generated

Definition of Pointwise Mutual Information (PMI) scoring:
Good collocation pairs have high PMI because the probability of co-occurrence is only slightly lower than the probabilities of occurrence of each word. Conversely, a pair of words whose probabilities of occurrence are considerably higher than their probability of co-occurrence gets a small PMI score. If PMi = -Inf, no articles found for the respective collocation pair.