Running NLP using NILE

To generate NLP features, parse the EMR narrative notes to identify and count positive mentions of all CUIs in the dictionary using the NILE.

As an example, obtain the “Training: RiskFactors Complete Set 1 MAE” data under “2014 De-identification and Heart Disease Risk Factors Challenge” from the i2b2 NLP Research Data Sets.

Use xml_Utils.java to extract notes from downloaded xml files.

Then use the dictionary CAD_dict.txt generated from MetaMap and parse these notes using NILE.

Results from processing the “Training: RiskFactors Complete Set 1 MAE” data can be found in NLP2014_set1_res.txt.