Bioinformatics Program

Summary

Many biologists with traditional undergraduate training have excellent laboratory skills.  But in the modern age of Big Data, they increasingly find themselves needing computational skills with which to analyze huge data files in the graduate research laboratory.  The CMSE Bioinformatics Program offers modular courses in basic computation and then transitions students so they can take advanced courses in the bioinformatics or computational area that matches their research needs.

 

Modules Summary

The bioinformatics modules are a set of introductory courses that help life science students learn basic skills in computation and bioinformatics.  These modules are 1 graduate credit, one month long, and flipped classroom (students watch video lectures online for homework and then come to class to solve problems and ask questions).  Postdocs, staff, visiting scholars, faculty, and other MSU-affiliates who are not students can audit these modules for a fee.  Contact the Bioinformatics Program Coordinator at ablackpz@msu.edu to register.  Undergraduates at the junior/senior level who wish to take the modules should also contact the Coordinator to discuss their eligibility.  Undergraduates who are taking the CMSE minor and wish to take the modules should contact the CMSE Undergraduate Director.

 

Course Descriptions

CMSE890:301 Programming Foundations
This module is for biologists who are absolute beginners in programming.  Using R/R-Studio, students will learn fundamental programming concepts like operators, data structures, control structures, graphing with basic R and ggplot2, advanced data wrangling, tidying data, advanced string handling, and producing markdown documents of publishable quality.  All examples use scientific data.  There is no prerequisite other than students have a basic laptop with administrator privileges and be able to use the Internet and word processing programs.  Instructor: Alexis Black Pyrkosz. Next offering: Fall 2019.

CMSE890:302 Statistical Analysis and Visualization of Biological Data
This module is for biologists who need to perform basic statistical analyses on their data.  Using R/R-Studio, students will learn standard statistics topics like summary statistics, common probability distributions and density functions, sampling distributions, confidence intervals, p-values, hypothesis testing, Type I/II errors, ANOVA, regression, and PCA.  All examples use scientific data.  Prerequisite: CMSE890:301 or equivalent experience in R.  Instructor: Alexis Black Pyrkosz. Next offering: Fall 2019.

CMSE890:303 Intro to Data Handling: Unix and Python
This module is for scientists who have huge files of research data that they need to handle and analyze.  Using the MSU HPCC (High Performance Compute Cluster), students will learn basic Unix navigation, transferring data to HPCCs, command-line text editors, sed/grep/awk, bash scripting, loading software on HPCCs, submitting jobs to the cluster, managing cluster resources, writing simple Python scripts, common Python script structures for data handling, debugging code, using Python modules, and Jupyter notebooks.  All examples use biological data.  Students who do not already have an account on the MSU HPCC will be given one for the duration of the class.  No prerequisite, but prior programming experience or coenrollment in CMSE890:301 is strongly recommended.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2020.

CMSE890:304 Genomics and Intro to Sequencing Analysis
This module is for biologists who need to get started with genome sequencing for research.  Topics include current sequencing technologies, quality control of raw data, read mapping, alignment validation, single nucleotide variant calling, structural variants, and genome assembly.  All examples use real sequencing data.  Prerequisite: CMSE890:303 or equivalent experience in bash scripting, HPCC usage, and Python.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2020.

CMSE890:305 RNA-Seq
This module is for biologists who need to get started with RNA-Seq data analysis.  Topics include experimental design considerations, alignment, GO Terms and clustering, and network analysis.  All examples use real sequencing data.  Prerequisites: CMSE890:304 or equivalent experience with basic genomics and HPCC usage as well as CMSE890:302 or equivalent statistics programming experience in R.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2020.

CMSE890:306 Metagenomics
This module is for biologists who need to get started with Metagenomics.  Topics include amplicon sequencing, defining operational taxonomic units, taxonomic classification, diversity analysis, shotgun metagenome analysis, targeted gene assembly, metagenome assembled genomes.  All examples use real sequencing data.  Prerequisites: CMSE890:303 and 304 or equivalent experience with HPCC, Unix, and genomic analyses.  Instructor: Alexis Black Pyrkosz with materials provided by Ashley Shade.  Next offering: Spring 2020.

CMSE890:309 Classical Sequence Analysis
This course is for biologists who need to get started with basic sequence analysis.  Topics include BLAST, multiple sequence alignment, sequence identity and similarity metrics, unique or representative sets, clustering, phylogenetic trees, Hidden Markov models, and Monte Carlo.  Prerequisites: CMSE890:302 and 303 or equivalent experience with statistics in R and Unix on the HPCC.  Instructor: Alexis Black Pyrkosz.  Next offering: Spring 2020.

CMSE890:310 Gaps, Errors and Missteps in Statistical Data Analysis
This is an advanced short course designed to: a) Discuss common misunderstandings & typical errors in the practice of statistical data analysis, and b) Provide a mental toolkit for critical thinking and enquiry of analytical methods and results. Classes will involve lectures, discussions, hands-on exercises, and homework about concepts critical to the day-to-day use and consumption of quantitative/computational techniques. Topics include: P-value & p-hacking • Multiple hypothesis correction • Estimation of error & uncertainty • Statistical power & sample size calculation • Pseudoreplication, confounding variables, & batch effects • Circular analysis, regression to the mean, & stopping rules • Confirmation & survivorship bias • Summary statistics, measuring associations, & model abuse • Visualization challenges • Researcher degrees of freedom, data sharing/hiding, & reproducible research. Prerequisites: This is not an introductory course in statistics or programming. We will assume: 1) Familiarity with basic statistics & probability. 2) Ability to do basic data wrangling, analyses, & visualization using R or Python. Strongly recommended MSU courses: CMSE 201 and CMSE 890 Sec 301-or-303 and Sec 302.  Instructor: Arjun Krishnan (arjun@msu.edu). Website: https://github.com/krishnanlab/teaching/tree/master/2018-fall_statgaps.  Next offering: Fall 2019.

CMSE890:311 Linear Algebra for Life Sciences
This module is for scientists who will be modeling various biological processes using linear functions with several variables. The material includes methods for solving systems of linear equations in many variables, both by hand and by self-coded computer algorithm. Also included are eigenvalues, eigenvectors, and singular value decomposition of matrices. Finding and using eigenvalues, singular values, and eigenvectors of matrices is of fundamental importance in quantitative science and this will be covered through theory, examples, and computer algorithms.  Prerequisite: CMSE890:301 or equivalent programming experience.  Instructor: Peter Bates (Math Department).  Next offering: Fall 2019.

CMSE890:312 Applied Calculus
This module is for students progressing toward systems biology or machine learning.  It begins with a review of single variable calculus followed by visualization techniques for shapes in higher dimensions. Vector notation will be introduced and functions of severable variables will be analyzed. Partial derivatives and gradients will be defined and used. Vector fields and gradient descent of energy functions will be discussed. Line, surface, and volume integrals will be introduced. Examples will be given and computation in MATLAB or PYTHON will be included.  Prerequisite: CMSE890:301 or equivalent programming experience.  Instructor: Peter Bates (Math Department).  Next offering: Fall 2019.

 

Other CMSE Bioinformatics Courses

CMSE491/890 Computational Medicine
This course provides a survey of quantitative and computational techniques in contemporary biomedical research, based on diverse large-scale data. Course components include: 1) Lectures to introduce biomedical questions, critical datasets, and statistical/machine learning techniques. 2) Real-world case-studies guided by discussions of recent seminal papers of data-driven biomedical research. 3) Group projects to utilize computational methodology to answer open biomedical questions.  Expected learning outcomes: 1) Identify the biomedical problems to be addressed using large-scale biological datasets; 2) Formulate quantitative models; 3) Explain the algorithms behind computational tools for biomedicine; 4) Understand and practice modern machine learning concepts and methodology; and 5) Integrate big-data resources to answer medically related questions using custom-written software.  Next offering: Fall 2018.  Instructor: Jianrong Wang.  Prerequisites: CMSE 201 and two semesters of introductory biology (LB 144 and 145, OR BS 161 and 162, OR BS 181H and 182H, OR equivalent). Statistics at the level of STT 231 is strongly recommended.

CMSE410/890 Bioinformatics and Computational Biology
This course is an introduction to contemporary topics in bioinformatics and computational biology, dealing with combining large-scale data and modern analytical techniques to gain biological/biomedical insights. In each topic, we will raise the major biological & biomedical questions, explore the relevant molecular/genomics/biomedical datasets, and discuss the statistical, probabilistic, & machine-learning concepts underlying the state-of-the-art approaches. Students will learn how to formulate problems for quantitative inquiry, design computational projects, understand and think critically about data & methods, communicate research findings, perform reproducible research, and practice open science. Students will apply all these by carrying out a project, presenting their project in class, and submit a report at the end of the course. Instructor: Arjun Krishnan (arjun@msu.edu). Prerequisites: CMSE 201 or CMSE 301-304 or equivalent with programming experience and two semesters of introductory biology (LB 144 and 145 OR BS 161 and 162 OR BS 181H and 182H, or equivalent). Statistics at the level of STT 231 is strongly recommended. Website: https://github.com/krishnanlab/teaching/tree/master/2019-spring_compbio

Certificate in Bioinformatics Tools
In progress

Advanced Courses in Bioinformatics/ Computational Biology/Computation

CMSE801 Intro to Computational Modeling
Computational models are very useful tools to understand the world around us. Over the course of this semester, we will explore the tools required to design, build and utilize computational models that are applied to many problems from a wide range of scientific disciplines. The main goal is to learn the skills necessary to apply computational techniques to a dataset of interest. Creating models to describe and understand systems (in the physical, life, or social sciences, or in engineering) is the driving principle of this course.  Next offering: Fall 2019.  Instructor: Devin Silvia.  Prerequisite: one semester of calculus.

CMSE802 Methods of Computational Modeling
Computational science uses computers to solve problems, simulate phenomena, and create knowledge.  Over the course of this semester, we will explore various aspects of computational science.  We will learn standard modeling methods and tools, as well as programming (in Python), code-management, and basic data science techniques.  These techniques and skills will be applied to the student's own research.  This is a project-focused course with topics primarily driven by the research interests and needs of the students.  Students will work with the instructors to come up with specific learning goals and objectives relating to using computational science techniques to solve problems related to their research area.  All students will be expected to present their work to their peers with a final goal of distilling what they have learned in the form of example codes and training materials that can be shared with future students.  Some results may be submitted to conferences and journals.  This course is taught using a flipped classroom format similar to CMSE201/801.  Class examples and assignments will be done using the Python programming language.  Prerequisite: CMSE801.  Instructor: Dirk Colbry.  Next offering: Fall 2019.

CMSE490 Image Processing
In this course, we intend to develop and explore tools that assist researchers in analyzing their scientific image datasets. To do this we are focusing on the computational representation of images and the types and classes of algorithms that have been developed for science analysis.  Next offering: ?. Instructor: Dirk Colbry.  Prerequisite: CMSE890:301 or 303 or prior programming experience.

BME891:003 Dynamical Modeling of Biological Systems
"All biology is computational biology", or so claimed the title of a 2017 article in the journal PloS Biology. Decide for yourself by taking this new course, which will introduce you to fundamental ideas in computational modeling of biological systems and their response to environmental stress. Starting with simple models of gene regulation, we will move on to "systems biology" approaches to model cell signaling networks, signal amplification in protein kinase cascades, and the intricacies of the cell cycle. You will learn how cells in a noisy microenvironment "decide" among multiple possible fates, and why even genetically identical cells exhibit cell-to-cell variability in gene expression. We will also discuss mathematical models of spatial pattern formation in animal and plant systems including branching morphogenesis, and multicellular "virtual tissue" models. This course, which combines theory with hands-on computer modeling, will be of interest to graduate and advanced undergraduate students in various biomedical science disciplines, and engineers, physicists and computer scientists interested in modeling biological systems.  Next offering: Fall 2019.  Instructor: Sudin Bhattacharya.  Prerequisite: Familiarity with basic concepts in biology and differential equations.

ANSI890 Introduction to Quantitative Genetics
Quantitative trait variation is pervasive in nature; it can be found among individuals in populations of virtually all life forms. For many quantitative traits, including in particular production of plants and animals, disease risks of humans, genetics contributes a significant part. Quantitative genetics is the discipline that deals with how the heritable (genetic) part of quantitative trait variation originates, dynamically changes, and passes on to future generations. This is important for genetic improvement of food animals and crops, development of diagnosis and treatments of genetic diseases, and understanding evolution.

This course covers the basics of quantitative genetics and is highly recommended for students who seek advanced and/or professional studies in genetics and employment opportunities in the breeding industry. Topics include the life cycle and properties of mutations, population parameters of genetic effects and phenotypes and their properties (means, variances, breeding values, heritability), dynamics of mutations and genetic variation (genetic drift, selection), quantitative genetics in the molecular era (mapping, prediction).

This course covers the general principles of quantitative genetics and should be applicable across species (animals, plants, model organisms, humans, etc.)

Next Offering: Fall 2019 and every Fall thereafter (as ANS804 starting 2020). Instructor: Wen Huang. Course website: https://qgg-lab.github.io/teaching.html