Bioinformatics Program
Summary
Many biologists with traditional undergraduate training have excellent laboratory skills. But in the modern age of Big Data, they increasingly find themselves needing computational skills with which to analyze huge data files in the graduate research laboratory. The CMSE Bioinformatics Program offers modular courses in basic computation and then transitions students so they can take advanced courses in the bioinformatics or computational area that matches their research needs.
Modules Summary
The bioinformatics modules are a set of introductory courses that help life science students learn basic skills in computation and bioinformatics. These modules are 1 graduate credit, one month long, and flipped classroom (students watch video lectures online for homework and then come to class to solve problems and ask questions). Postdocs, staff, visiting scholars, faculty, and other MSU-affiliates who are not students can audit these modules for a fee. Contact the Bioinformatics Program Coordinator at ablackpz@msu.edu to register. Undergraduates at the junior/senior level who wish to take the modules should also contact the Coordinator to discuss their eligibility. Undergraduates who are taking the CMSE minor and wish to take the modules should contact the CMSE Undergraduate Director.
Course Descriptions
CMSE890:301 Programming Foundations
This module is for biologists who are absolute beginners in programming. Using R/R-Studio,
students will learn fundamental programming concepts like operators, data structures,
control structures, graphing with basic R and ggplot2, advanced data wrangling, tidying
data, advanced string handling, and producing markdown documents of publishable quality.
All examples use scientific data. There is no prerequisite other than students have
a basic laptop with administrator privileges and be able to use the Internet and word
processing programs. Instructor: Alexis Black Pyrkosz. Next offering: Spring 2025
and Summer 2025.
CMSE890:302 Statistical Analysis and Visualization of Biological Data
This module is for biologists who need to perform basic statistical analyses on their
data. Using R/R-Studio, students will learn standard statistics topics like summary
statistics, common probability distributions and density functions, sampling distributions,
confidence intervals, p-values, hypothesis testing, Type I/II errors, ANOVA, regression,
and PCA. All examples use scientific data. Prerequisite: CMSE890:301 or equivalent
experience in R. Instructor: Alexis Black Pyrkosz. Next offering: Spring 2025 and
Summer 2025.
CMSE890:303 Intro to Data Handling: Unix and Python
This module is for scientists who have huge files of research data that they need
to handle and analyze. Using the MSU HPCC (High Performance Compute Cluster), students
will learn basic Unix navigation, transferring data to HPCCs, command-line text editors,
sed/grep/awk, bash scripting, loading software on HPCCs, submitting jobs to the cluster,
managing cluster resources, writing simple Python scripts, common Python script structures
for data handling, debugging code, using Python modules, and Jupyter notebooks. All
examples use biological data. Students who do not already have an account on the
MSU HPCC will be given one for the duration of the class. No prerequisite, but prior
programming experience or coenrollment in CMSE890:301 is strongly recommended. Instructor:
Alexis Black Pyrkosz. Next offering: Spring 2025.
CMSE890:304 Genomics and Intro to Sequencing Analysis
This module is for biologists who need to get started with genome sequencing for research.
Topics include current sequencing technologies, quality control of raw data, read
mapping, alignment validation, single nucleotide variant calling, structural variants,
and genome assembly. All examples use real sequencing data. Prerequisite: CMSE890:303
or equivalent experience in bash scripting, HPCC usage, and Python. Instructor: Alexis
Black Pyrkosz. Next offering: Spring 2025.
CMSE890:305 RNA-Seq
This module is for biologists who need to get started with RNA-Seq data analysis.
Topics include experimental design considerations, alignment, GO Terms and clustering,
and network analysis. All examples use real sequencing data. Prerequisites: CMSE890:304
or equivalent experience with basic genomics and HPCC usage as well as CMSE890:302
or equivalent statistics programming experience in R. Instructor: Alexis Black Pyrkosz.
Next offering: Spring 2025.
CMSE890:306 Metagenomics
This module is for biologists who need to get started with Metagenomics. Topics include
amplicon sequencing, defining operational taxonomic units, taxonomic classification,
diversity analysis, shotgun metagenome analysis, and metagenome assembled genomes.
All examples use real sequencing data. Prerequisites: CMSE890:303 and 304 or equivalent
experience with HPCC, Unix, and genomic analyses. Instructor: Alexis Black Pyrkosz.
Next offering: Fall 2025 (tentative).
CMSE890:309 Classical Sequence Analysis
This course is for biologists who need to get started with basic sequence analysis.
Topics include BLAST, Hidden Markov models, multiple sequence alignment, phylogenetic
trees, and combining sequence with structure. Prerequisites: CMSE890:301 required
and 302 strongly suggested or equivalent experience with statistics in R. Instructor:
Alexis Black Pyrkosz. Next offering: Fall 2025 (tentative).
CMSE890:310 Gaps in Data Analysis
This course contains topics that do not fit into regular data analysis courses but
are useful for people who will do data analysis professionally. Topics include data
collection bias, short-term data management plans (hardware and software), long-term
data management plans (NIH FAIR and TRUST), legal considerations for sensitive data,
secure computing techniques in industry, and model abuse. Prerequisites: CMSE890:302
or equivalent experience with statistics in R. Instructor: Alexis Black Pyrkosz.
Next offering: Spring 2025.
Other CMSE Bioinformatics Courses
CMSE410/890 Bioinformatics and Computational Biology
This course is an introduction to contemporary topics in bioinformatics and computational
biology, dealing with combining large-scale data and modern analytical techniques
to gain biological/biomedical insights. In each topic, we will raise the major biological
& biomedical questions, explore the relevant molecular/genomics/biomedical datasets,
and discuss the statistical, probabilistic, & machine-learning concepts underlying
the state-of-the-art approaches. Students will learn how to formulate problems for
quantitative inquiry, design computational projects, understand and think critically
about data & methods, communicate research findings, perform reproducible research,
and practice open science. Students will apply all these by carrying out a project,
presenting their project in class, and submit a report at the end of the course. Next
offering: Spring 2022. Instructor: Arjun Krishnan (arjun@msu.edu). Prerequisites: CMSE 201 or CMSE 301-304 or equivalent with programming experience
and two semesters of introductory biology (LB 144 and 145 OR BS 161 and 162 OR BS
181H and 182H, or equivalent). Statistics at the level of STT 231 is strongly recommended.
Website: https://github.com/
CMSE411/890 Computational Medicine
This course provides a survey of quantitative and computational techniques in contemporary
biomedical research, based on diverse large-scale data. Course components include:
1) Lectures to introduce biomedical questions, critical datasets, and statistical/machine
learning techniques. 2) Real-world case-studies guided by discussions of recent seminal
papers of data-driven biomedical research. 3) Group projects to utilize computational
methodology to answer open biomedical questions. Expected learning outcomes: 1) Identify
the biomedical problems to be addressed using large-scale biological datasets; 2)
Formulate quantitative models; 3) Explain the algorithms behind computational tools
for biomedicine; 4) Understand and practice modern machine learning concepts and methodology;
and 5) Integrate big-data resources to answer medically related questions using custom-written
software. Next offering: TBD. Instructor: Jianrong Wang. Prerequisites: CMSE 201
and two semesters of introductory biology (LB 144 and 145, OR BS 161 and 162, OR BS
181H and 182H, OR equivalent). Statistics at the level of STT 231 is strongly recommended.
ADVANCED COURSES IN BIOINFORMATICS/COMPUTATIONAL BIOLOGY/COMPUTATION
CMSE801 Intro to Computational Modeling
Computational models are very useful tools to understand the world around us. Over the course of this semester, we will explore the tools required to design, build and utilize computational models that are applied to many problems from a wide range of scientific disciplines. The main goals of the course are to learn the skills necessary to apply computational techniques to a dataset of interest and create models to describe and understand systems (in the physical, life, or social sciences, or in engineering). Next offering: Fall 2021. Instructor: S. Shiu. Prerequisite: one semester of calculus.
CMSE802 Methods of Computational Modeling
Computational science uses computers to solve problems, simulate phenomena, and create
knowledge. Over the course of this semester, we will explore various aspects of computational
science. We will learn standard modeling methods and tools, as well as programming
(in Python), code-management, and basic data science techniques. These techniques
and skills will be applied to the student's own research. This is a project-focused
course with topics primarily driven by the research interests and needs of the students.
Students will work with the instructors to come up with specific learning goals and
objectives relating to using computational science techniques to solve problems related
to their research area. All students will be expected to present their work to their
peers with a final goal of distilling what they have learned in the form of example
codes and training materials that can be shared with future students. Some results
may be submitted to conferences and journals. This course is taught using a flipped
classroom format similar to CMSE201/801. Class examples and assignments will be done
using the Python programming language. Recommended Prerequisite: CMSE801. Instructor:
H. Yu. Next offering: Fall 2021.
BME891:003 Dynamical Modeling of Biological Systems
"All biology is computational biology", or so claimed the title of a 2017 article
in the journal PloS Biology. Decide for yourself by taking this new course, which
will introduce you to fundamental ideas in computational modeling of biological systems
and their response to environmental stress. Starting with simple models of gene regulation,
we will move on to "systems biology" approaches to model cell signaling networks,
signal amplification in protein kinase cascades, and the intricacies of the cell cycle.
You will learn how cells in a noisy microenvironment "decide" among multiple possible
fates, and why even genetically identical cells exhibit cell-to-cell variability in
gene expression. We will also discuss mathematical models of spatial pattern formation
in animal and plant systems including branching morphogenesis, and multicellular "virtual
tissue" models. This course, which combines theory with hands-on computer modeling,
will be of interest to graduate and advanced undergraduate students in various biomedical
science disciplines, and engineers, physicists and computer scientists interested
in modeling biological systems. Next offering: Fall 2019. Instructor: Sudin Bhattacharya.
Prerequisite: Familiarity with basic concepts in biology and differential equations.
ANS804 Introduction to Quantitative Genetics
Quantitative trait variation is pervasive in nature; it can be found among individuals
in populations of virtually all life forms. For many quantitative traits, including
in particular production of plants and animals, disease risks of humans, genetics
contributes a significant part. Quantitative genetics is the discipline that deals
with how the heritable (genetic) part of quantitative trait variation originates,
dynamically changes, and passes on to future generations. This is important for genetic
improvement of food animals and crops, development of diagnosis and treatments of
genetic diseases, and understanding evolution.
This course covers the basics of quantitative genetics and is highly recommended for students who seek advanced and/or professional studies in genetics and employment opportunities in the breeding industry. Topics include the life cycle and properties of mutations, population parameters of genetic effects and phenotypes and their properties (means, variances, breeding values, heritability), dynamics of mutations and genetic variation (genetic drift, selection), quantitative genetics in the molecular era (mapping, prediction).
This course covers the general principles of quantitative genetics and should be applicable across species (animals, plants, model organisms, humans, etc.)
Next Offering: Fall 2020 and every Fall thereafter. Instructor: Wen Huang. Course website: https://qgg-lab.github.io/teaching.html