- Part I. General Background (40 minutes).
- Relevance of natural language data for bioinformatics and biomedicine.
- Basic features of biomedical language data.
- Overview of current document repositories for life science.
- Introduction the main tasks in bioinformatics and Systems Biology
- Part II. Applications – part I (60 minutes).
- Applications in Biology: Bio-entity recognition.
- Labeling text with bio-entities.
- Dictionary lookup approaches.
- Heuristics, and rules for finding gene/protein mentions.
- The machine learning approach: CRFs and deep learning.
- Applications in Biology: bio-entity normalization and grounding
- Linking text to database records.
- Strategies for disambiguating.
- Practical importance of protein mention normalization.
- Applications in Biology: Interaction extraction.
- Manual curation of interactions and databases.
- Extracting protein interactions automatically from text.
- Introduce and illustrate some of the existing systems.
- Applications in Biology: Bio-entity recognition.
Coffee Break [15 minutes]
- Part II. Applications – part II, cont. (40 minutes).
- Analyzing function of genes, gene lists and clusters using literature mining.
- Deriving gene function and annotations from literature.
- Characterizing gene lists and clustering.
- Association of genes and chemicals to diseases through text mining.
- Text mining to discover disease associated genes, proteins and mutations.
- Extracting mutations and epigenetic characteristics.
- Linking mutations to proteins.
- Text mining in epigenetics.
- Finding automatically protein locations in text.
- Building domain-specific terminological resources for biomedicine and life sciences.
- NLP to extract technical terms for building lexical resources in biology.
- Knowledge discovery and pathways.
- Biomedical text mining efforts beyond English Efforts beyond.
- Analyzing function of genes, gene lists and clusters using literature mining.
- Part III. Case study (20 minutes).
- Associating protein sequence variants (mutations) to diseases through text mining