Outline

  1. Part I. General Background (40 minutes).
    • Relevance of natural language data for bioinformatics and biomedicine.
    • Basic features of biomedical language data.
    • Overview of current document repositories for life science.
    • Introduction the main tasks in bioinformatics and Systems Biology
  2. Part II. Applications – part I (60 minutes).
    1. Applications in Biology: Bio-entity recognition.
      • Labeling text with bio-entities.
      • Dictionary lookup approaches.
      • Heuristics, and rules for finding gene/protein mentions.
      • The machine learning approach: CRFs and deep learning.
    2. Applications in Biology: bio-entity normalization and grounding
      • Linking text to database records.
      • Strategies for disambiguating.
      • Practical importance of protein mention normalization.
    3. Applications in Biology: Interaction extraction.
      • Manual curation of interactions and databases.
      • Extracting protein interactions automatically from text.
      • Introduce and illustrate some of the existing systems.

Coffee Break [15 minutes]

  1. Part II. Applications – part II, cont. (40 minutes).
    1. Analyzing function of genes, gene lists and clusters using literature mining.
      • Deriving gene function and annotations from literature.
      • Characterizing gene lists and clustering.
    2. Association of genes and chemicals to diseases through text mining.
      • Text mining to discover disease associated genes, proteins and mutations.
    3. Extracting mutations and epigenetic characteristics.
      • Linking mutations to proteins.
      • Text mining in epigenetics.
    4. Finding automatically protein locations in text.
    5. Building domain-specific terminological resources for biomedicine and life sciences.
      • NLP to extract technical terms for building lexical resources in biology.
    6. Knowledge discovery and pathways.
    7. Biomedical text mining efforts beyond English Efforts beyond.
  2. Part III. Case study (20 minutes).
    1. Associating protein sequence variants (mutations) to diseases through text mining