Bioinformatics Program


Many biologists with traditional undergraduate training have excellent laboratory skills.  But in the modern age of Big Data, they increasingly find themselves needing computational skills with which to analyze huge data files in the graduate research laboratory.  The CMSE Bioinformatics Program offers modular courses in basic computation and then transitions students so they can take advanced courses in the bioinformatics or computational area that matches their research needs.

Modules Summary

The bioinformatics modules are a set of introductory courses that help life science students learn basic skills in computation and bioinformatics.  These modules are 1 graduate credit, one month long, and flipped classroom (students watch video lectures online for homework and then come to class to solve problems and ask questions).  Postdocs, staff, visiting scholars, faculty, and other MSU-affiliates who are not students can audit these modules for a fee.  Contact the Bioinformatics Program Coordinator at to register.  Undergraduates at the junior/senior level who wish to take the modules should also contact the Coordinator to discuss their eligibility.

Course Descriptions

CMSE890:301 Programming Foundations
This module is for biologists who are absolute beginners in programming.  Using R/R-Studio, students will learn fundamental programming concepts like operators, data structures, control structures, graphing with basic R and ggplot2, advanced data wrangling, tidying data, advanced string handling, and producing markdown documents of publishable quality.  All examples use scientific data.  There is no prerequisite other than students have a basic laptop with administrator privileges and be able to use the Internet and word processing programs.

CMSE890:302 Statistical Analysis and Visualization of Biological Data
This module is for biologists who need to perform basic statistical analyses on their data.  Using R/R-Studio, students will learn standard statistics topics like summary statistics, common probability distributions and density functions, sampling distributions, confidence intervals, p-values, hypothesis testing, Type I/II errors, ANOVA, regression, and PCA.  All examples use scientific data.  Prerequisite: CMSE890:301 or equivalent experience in R.

CMSE890:303 Intro to Data Handling: Unix and Python
This module is for scientists who have huge files of research data that they need to handle and analyze.  Using the MSU HPCC (High Performance Compute Cluster), students will learn basic Unix navigation, transferring data to HPCCs, command-line text editors, sed/grep/awk, bash scripting, loading software on HPCCs, submitting jobs to the cluster, managing cluster resources, writing simple Python scripts, common Python script structures for data handling, debugging code, using Python modules, and Jupyter notebooks.  All examples use biological data.  Students who do not already have an account on the MSU HPCC will be given one for the duration of the class.  No prerequisite, but prior programming experience or coenrollment in CMSE890:301 is strongly recommended.

CMSE890:304 Genomics and Intro to Sequencing Analysis
This module is for biologists who need to get started with genome sequencing for research.  Topics include current sequencing technologies, quality control of raw data, read mapping, alignment validation, single nucleotide variant calling, structural variants, and genome assembly.  All examples use real sequencing data.  Prerequisite: CMSE890:303 or equivalent experience in bash scripting, HPCC usage, and Python.

CMSE890:305 RNA-Seq
This module is for biologists who need to get started with RNA-Seq data analysis.  Topics include experimental design considerations, alignment, GO Terms and clustering, and network analysis.  All examples use real sequencing data.  Prerequisites: CMSE890:304 or equivalent experience with basic genomics and HPCC usage as well as CMSE890:302 or equivalent statistics programming experience in R.

CMSE890:306 Metagenomics
This module is for biologists who need to get started with Metagenomics.  Topics include amplicon sequencing, defining operational taxonomic units, taxonomic classification, diversity analysis, shotgun metagenome analysis, targeted gene assembly, metagenome assembled genomes.  All examples use real sequencing data.  Prerequisites: CMSE890:303 and 304 or equivalent experience with HPCC, Unix, and genomic analyses.

Other CMSE Bioinformatics Courses

CMSE491/890 Computational Medicine
This course provides a survey of quantitative and computational techniques in contemporary biomedical research, based on diverse large-scale data. Course components include: 1) Lectures to introduce biomedical questions, critical datasets, and statistical/machine learning techniques. 2) Real-world case-studies guided by discussions of recent seminal papers of data-driven biomedical research. 3) Group projects to utilize computational methodology to answer open biomedical questions.  Expected learning outcomes: 1) Identify the biomedical problems to be addressed using large-scale biological datasets; 2) Formulate quantitative models; 3) Explain the algorithms behind computational tools for biomedicine; 4) Understand and practice modern machine learning concepts and methodology; and 5) Integrate big-data resources to answer medically related questions using custom-written software.  Next offering: Fall 2018.  Instructor: Jianrong Wang.  Prerequisites: CMSE 201 and two semesters of introductory biology (LB 144 and 145, OR BS 161 and 162, OR BS 181H and 182H, OR equivalent). Statistics at the level of STT 231 is strongly recommended.

CMSE491/890 Bioinformatics and Computational Biology
This course is an introduction to contemporary topics in bioinformatics and computational biology, dealing with combining large-scale data and modern analytical techniques to gain biological/biomedical insights. In each topic, we will raise the major biological & biomedical questions, explore the relevant molecular/genomics/biomedical datasets, and discuss the statistical, probabilistic, & machine-learning concepts underlying the state-of-the-art approaches. Students will learn how to formulate problems for quantitative inquiry, design computational projects, understand and think critically about data & methods, communicate research findings, perform reproducible research, and practice open science. Students will apply all these by carrying out a project, presenting their project in class, and submit a report at the end of the course. Instructor: Arjun Krishnan ( Prerequisites: CMSE 201 or CMSE 301-304 or equivalent with programming experience and two semesters of introductory biology (LB 144 and 145 OR BS 161 and 162 OR BS 181H and 182H, or equivalent). Statistics at the level of STT 231 is strongly recommended. Website:

Certificate in Bioinformatics Tools
In progress

Advanced Courses in Bioinformatics/ Computational Biology/Computation

BMB 961-301 Gaps, Missteps, and Errors in Statistical Data Analysis
This is an advanced short course designed to: a) Discuss common misunderstandings & typical errors in the practice of statistical data analysis, and b) Provide a mental toolkit for critical thinking and enquiry of analytical methods and results. Classes will involve lectures, discussions, hands-on exercises, and homework about concepts critical to the day-to-day use and consumption of quantitative/computational techniques. Topics include: P-value & p-hacking • Multiple hypothesis correction • Estimation of error & uncertainty • Statistical power & sample size calculation • Pseudoreplication, confounding variables, & batch effects • Circular analysis, regression to the mean, & stopping rules • Confirmation & survivorship bias • Summary statistics, measuring associations, & model abuse • Visualization challenges • Researcher degrees of freedom, data sharing/hiding, & reproducible research. Prerequisites: This is not an introductory course in statistics or programming. We will assume: 1) Familiarity with basic statistics & probability. 2) Ability to do basic data wrangling, analyses, & visualization using R or Python. Strongly recommended MSU courses: CMSE 201 and CMSE 890 Sec 301-or-303 and Sec 302.  Instructor: Arjun Krishnan ( Website:

CMSE801 Intro to Computational Modeling
Computational models are very useful tools to understand the world around us. Over the course of this semester, we will explore the tools required to design, build and utilize computational models that are applied to many problems from a wide range of scientific disciplines. The main goal is to learn the skills necessary to apply computational techniques to a dataset of interest. Creating models to describe and understand systems (in the physical, life, or social sciences, or in engineering) is the driving principle of this course.  Next offering: Spring 2019.  Instructor: Jianrong Wang.  Prerequisite: one semester of calculus.

CMSE490 Image Processing
In this course, we intend to develop and explore tools that assist researchers in analyzing their scientific image datasets. To do this we are focusing on the computational representation of images and the types and classes of algorithms that have been developed for science analysis.  Next offering: ?. Instructor: Dirk Colbry.  Prerequisite: CMSE890:301 or 303 or prior programming experience.

PLB 810 Theories and Practices in Bioinformatics 
Course goals: 1) Identify biological questions that can be addressed using large-scale datasets. 2) Explain how basic bioinformatics tools and their algorithms work and apply them to solve problems. 3) Handle large-scale omics datasets with basic computer programming to address your research questions. 4) Interact with high performance computing resources to accelerate the pace of your computational research.  Next offering: Fall 2019 (fall of even years). Instructor: Shinhan Shiu. Prerequisites: 1) Intro undergraduate level statistics courses covering distributions and probability. 2) Biology course covering basic genetics, macro-molecules, evolution, energy metabolism, genetic materials, and signal transduction.  Recommended: A genetics course with a genomics emphasis (e.g. MMG433, BMB 961, or PLB 802).  You DO NOT need to have any experience in programming or computer science.

BME891:003 Dynamical Modeling of Biological Systems
"All biology is computational biology", or so claimed the title of a 2017 article in the journal PloS Biology. Decide for yourself by taking this new course, which will introduce you to fundamental ideas in computational modeling of biological systems and their response to environmental stress. Starting with simple models of gene regulation, we will move on to "systems biology" approaches to model cell signaling networks, signal amplification in protein kinase cascades, and the intricacies of the cell cycle. You will learn how cells in a noisy microenvironment "decide" among multiple possible fates, and why even genetically identical cells exhibit cell-to-cell variability in gene expression. We will also discuss mathematical models of spatial pattern formation in animal and plant systems including branching morphogenesis, and multicellular "virtual tissue" models. This course, which combines theory with hands-on computer modeling, will be of interest to graduate and advanced undergraduate students in various biomedical science disciplines, and engineers, physicists and computer scientists interested in modeling biological systems.  Next offering: Fall 2019.  Instructor: Sudin Bhattacharya.  Prerequisite: Familiarity with basic concepts in biology and differential equations.