A simple piece of code that returns Title:Abstract in the form of a json file from a given keyword and desired number of results. This will give us our pre-processed data from PubMed, I will then parse that data down to required lengths in the form of a tuple with the attached keyword that it contains (a current thought, there might be a better approach). Furthermore the parsed data will be in the form of sentences within parts of it’s surrounding sentences, in order to better the context in which our keyword of interest is being used.
The idea here is using a defined list of keywords, we can build our “viral” language model with the collecting of where these keywords are used and their greater context. Prof. Beal has provided a solid (yet appendable) list to start.