Skip to content

Latest commit

 

History

History
executable file
·
120 lines (82 loc) · 10.2 KB

index.md

File metadata and controls

executable file
·
120 lines (82 loc) · 10.2 KB
layout title
default
Home

Course Description

Rapid developments in bio- and information- technology and are changing the way that biomedical scientists interact with data. Traditionally, data were the end result of laborious experimentation, and their interpretation mostly involved careful thought and background knowledge. Today, data are increasingly generated much earlier in the scientific workflow and are much larger in scale. Also, before the data can be interpreted, extensive computational processing is often necessary. Thus, the data deluge in biomedicine now requires mining and modeling on a large scale - ie biomedical data science.

This course aims to equip students with some of the concepts and skills relevant to biomedical data science, with an emphasis on bioinformatics, a sub-discipline of this broader field, through examples of mining and modeling of genomic and proteomic data. More specifically, bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, mining of functional genomics data sets, and machine learning approaches for data integration.

Course Survey

If you are taking the class, please fill this out by the first day of class (Jan 18th): https://forms.gle/un4731Na28jA7yPk6

Overall Flow of the Class

(Module = Group of Lectures)

  • Introduction
  • Module on "the Data" (Genomic, Proteomic & Structural Data), introducing the main data sources (their properties, where you access, &c). This module also includes discussion of databases and knowledge representation issues.
  • Module on Mining (Alignment & variant calling necessary for personal genomics; Basic multi-omics calculations; Supervised & unsupervised mining approaches towards multi-omic data; Networks)
  • Module on Molecular Modeling

Lectures

  • MW 1:00 - 2:15 PM, BASS305. All lectures will be recorded. Recordings will be available in Canvas a few minutes after each lecture

Discussion Section

Session Time Location
Section 1 Thurs 1:00-2:00 PM YSB352
Section 2 Fri 10:00-11:00 AM BASS405
Section 3 Fri 10:00-11:00 AM YSB352
Section 4 Fri 1:00-2:00 PM YSB352

Different headings for this class (5 variants)

  • CBB 752 / CPSC 752 - Grad. with programming

    • This graduate-level version of the course consists of lectures, in-class tests, discussion section, programming assignments, and a final programming project.
  • MB&B 752 / MCDB 752 - Grad. without programming

    • This graduate-level version of the course consists of lectures, in-class tests, discussion section, written problem sets, and a final (semi-computational section and a literature survey) project. Unlike CBB752, there is no programming required.
  • MB&B 753b3 / MB&B 754b4 - Modules

    • For graduate students the course can be broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):
    • 753 - Biomedical Data Science: Mining (1st half of term)
    • 754 - Biomedical Data Science: Modeling (2nd half of term)
    • Each module consists of lectures, in-class tests, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.
  • MB&B 452 / MCDB 452 - Undergrad.

    • This undergraduate version of the course consists of lectures, in-class tests, discussion section, written problem sets, and a final (semi-computational section and a literature survey) project. The programming assignments from CBB752 can be substituted for the written work by permission of instructor.
  • S&DS 352 - Undergrad.

    • This undergraduate version of the course consists of lectures, in-class tests, discussion section, programming assignments, and a final programming project.
  • Auditing

    • This is allowed. We would strongly prefer if you would register for the class.

Prerequisites

The course is keyed towards CBB graduate students as well as advanced undergraduates and graduate students wishing to learn about types of large-scale quantitative analysis that whole-genome sequencing and forms of large-scale biological data will make possible. It would also be suitable for students from other fields such as computer science, statistics or physics wanting to learn about an important new biological application for computation.

Students should have:

  • A basic knowledge of biochemistry and molecular biology.
  • A knowledge of basic quantitative concepts, such as single variable calculus, basic probability & statistics, and basic programming skills.

These can be fulfilled by: MBB 200 and Mathematics 115 or permission of the instructor.

Class materials

There is no text book for this class. PPT slides will be available after the lectures. We recommend Biochemistry by Lubert Stryer for biochemistry prerequisite.

Class Requirements

Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

In-class tests: Quiz

  • There will be a quiz covering the 1st half of the course.
  • There will be a quiz covering the 2nd half of the course.

Quizes will comprise simple questions that you should be able to answer from the lectures plus the main readings.

For references, please refer the previous Quiz Archive

Programming Assignments (Req'd for CBB and CS grad. students)

  • There will be two homework assignments. We will try to promote the idea of reproducible research and using version control system, specifically GitHub, in facilitating the process of homework submission.

Non-programming Assignments

  • There will be equivalent two homework assignments, particularly for MB&B and MCDB students without a programming background. The programming part will be replaced with assignments involving the use of web-based tools or essay questions.

Pages from previous years

Class data dump

  • Syllabus and class info dump in single PDF file: PDF
  • Class poster: pdf