Display Accessibility Tools

Accessibility Tools

Grayscale

Highlight Links

Change Contrast

Increase Text Size

Increase Letter Spacing

Readability Bar

Dyslexia Friendly Font

Increase Cursor Size

Bioinformatics Program

Summary

Many biologists with traditional undergraduate training have excellent laboratory skills.  But in the modern age of Big Data, they increasingly find themselves needing computational skills with which to analyze huge data files in the graduate research laboratory.  The CMSE Bioinformatics Program offers modular courses in basic computation and then transitions students so they can take advanced courses in the bioinformatics or computational area that matches their research needs.

 

Modules Summary

The bioinformatics modules are a set of introductory courses that help life science students learn basic skills in computation and bioinformatics.  These modules are 1 graduate credit, one month long, and flipped classroom (students watch video lectures online for homework and then come to class to solve problems and ask questions).  Postdocs, staff, visiting scholars, faculty, and other MSU-affiliates who are not students can audit these modules for a fee.  Contact the Bioinformatics Program Coordinator at ablackpz@msu.edu to register.  Undergraduates at the junior/senior level who wish to take the modules should also contact the Coordinator to discuss their eligibility.  Undergraduates who are taking the CMSE minor and wish to take the modules should contact the CMSE Undergraduate Director.

 

Course Descriptions

CMSE890:301 Programming Foundations
This module is for biologists who are absolute beginners in programming.  Using R/R-Studio, students will learn fundamental programming concepts like operators, data structures, control structures, graphing with basic R and ggplot2, advanced data wrangling, tidying data, advanced string handling, and producing markdown documents of publishable quality.  All examples use scientific data.  There is no prerequisite other than students have a basic laptop with administrator privileges and be able to use the Internet and word processing programs.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2024, Summer 2024, and Fall 2024.

CMSE890:302 Statistical Analysis and Visualization of Biological Data
This module is for biologists who need to perform basic statistical analyses on their data.  Using R/R-Studio, students will learn standard statistics topics like summary statistics, common probability distributions and density functions, sampling distributions, confidence intervals, p-values, hypothesis testing, Type I/II errors, ANOVA, regression, and PCA.  All examples use scientific data.  Prerequisite: CMSE890:301 or equivalent experience in R.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2024, Summer 2024, and Fall 2024.

CMSE890:303 Intro to Data Handling: Unix and Python
This module is for scientists who have huge files of research data that they need to handle and analyze.  Using the MSU HPCC (High Performance Compute Cluster), students will learn basic Unix navigation, transferring data to HPCCs, command-line text editors, sed/grep/awk, bash scripting, loading software on HPCCs, submitting jobs to the cluster, managing cluster resources, writing simple Python scripts, common Python script structures for data handling, debugging code, using Python modules, and Jupyter notebooks.  All examples use biological data.  Students who do not already have an account on the MSU HPCC will be given one for the duration of the class.  No prerequisite, but prior programming experience or coenrollment in CMSE890:301 is strongly recommended.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2024 and Fall 2024

CMSE890:304 Genomics and Intro to Sequencing Analysis
This module is for biologists who need to get started with genome sequencing for research.  Topics include current sequencing technologies, quality control of raw data, read mapping, alignment validation, single nucleotide variant calling, structural variants, and genome assembly.  All examples use real sequencing data.  Prerequisite: CMSE890:303 or equivalent experience in bash scripting, HPCC usage, and Python.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2024 and Fall 2024.

CMSE890:305 RNA-Seq
This module is for biologists who need to get started with RNA-Seq data analysis.  Topics include experimental design considerations, alignment, GO Terms and clustering, and network analysis.  All examples use real sequencing data.  Prerequisites: CMSE890:304 or equivalent experience with basic genomics and HPCC usage as well as CMSE890:302 or equivalent statistics programming experience in R.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2024 (will not be offered in Fall 2024).

CMSE890:306 Metagenomics
This module is for biologists who need to get started with Metagenomics.  Topics include amplicon sequencing, defining operational taxonomic units, taxonomic classification, diversity analysis, shotgun metagenome analysis, targeted gene assembly, metagenome assembled genomes.  All examples use real sequencing data.  Prerequisites: CMSE890:303 and 304 or equivalent experience with HPCC, Unix, and genomic analyses.  Instructor: Alexis Black Pyrkosz with materials provided by Ashley Shade.  Next offering: Fall 2024.

CMSE890:309 Classical Sequence Analysis
This course is for biologists who need to get started with basic sequence analysis.  Topics include BLAST, Hidden Markov models, multiple sequence alignment, phylogenetic trees, and combining sequence with structure.  Prerequisites: CMSE890:301 required and 302 strongly suggested or equivalent experience with statistics in R.  Instructor: Alexis Black Pyrkosz.  Next offering: Fall 2024.

CMSE890:310 Gaps in Data Analysis
This course contains topics that do not fit into regular data analysis courses but are useful for people who will do data analysis professionally. Topics include data collection bias, short-term data management plans (hardware and software), long-term data management plans (NIH FAIR and TRUST), legal considerations for sensitive data, secure computing techniques in industry, and model abuse.  Prerequisites: CMSE890:302 or equivalent experience with statistics in R.  Instructor: Alexis Black Pyrkosz. Next offering: Spring 2024 (will not be offered in Fall 2024).

Other CMSE Bioinformatics Courses

CMSE410/890 Bioinformatics and Computational Biology
This course is an introduction to contemporary topics in bioinformatics and computational biology, dealing with combining large-scale data and modern analytical techniques to gain biological/biomedical insights. In each topic, we will raise the major biological & biomedical questions, explore the relevant molecular/genomics/biomedical datasets, and discuss the statistical, probabilistic, & machine-learning concepts underlying the state-of-the-art approaches. Students will learn how to formulate problems for quantitative inquiry, design computational projects, understand and think critically about data & methods, communicate research findings, perform reproducible research, and practice open science. Students will apply all these by carrying out a project, presenting their project in class, and submit a report at the end of the course. Next offering: Spring 2022. Instructor: Arjun Krishnan (arjun@msu.edu). Prerequisites: CMSE 201 or CMSE 301-304 or equivalent with programming experience and two semesters of introductory biology (LB 144 and 145 OR BS 161 and 162 OR BS 181H and 182H, or equivalent). Statistics at the level of STT 231 is strongly recommended. Website: https://github.com/krishnanlab/teaching/tree/master/2019-spring_compbio

CMSE411/890 Computational Medicine
This course provides a survey of quantitative and computational techniques in contemporary biomedical research, based on diverse large-scale data. Course components include: 1) Lectures to introduce biomedical questions, critical datasets, and statistical/machine learning techniques. 2) Real-world case-studies guided by discussions of recent seminal papers of data-driven biomedical research. 3) Group projects to utilize computational methodology to answer open biomedical questions.  Expected learning outcomes: 1) Identify the biomedical problems to be addressed using large-scale biological datasets; 2) Formulate quantitative models; 3) Explain the algorithms behind computational tools for biomedicine; 4) Understand and practice modern machine learning concepts and methodology; and 5) Integrate big-data resources to answer medically related questions using custom-written software.  Next offering: TBD.  Instructor: Jianrong Wang.  Prerequisites: CMSE 201 and two semesters of introductory biology (LB 144 and 145, OR BS 161 and 162, OR BS 181H and 182H, OR equivalent). Statistics at the level of STT 231 is strongly recommended.

 

ADVANCED COURSES IN BIOINFORMATICS/COMPUTATIONAL BIOLOGY/COMPUTATION

CMSE801 Intro to Computational Modeling

Computational models are very useful tools to understand the world around us. Over the course of this semester, we will explore the tools required to design, build and utilize computational models that are applied to many problems from a wide range of scientific disciplines. The main goals of the course are to learn the skills necessary to apply computational techniques to a dataset of interest and create models to describe and understand systems (in the physical, life, or social sciences, or in engineering).  Next offering: Fall 2021.  Instructor: S. Shiu.  Prerequisite: one semester of calculus.

CMSE802 Methods of Computational Modeling
Computational science uses computers to solve problems, simulate phenomena, and create knowledge.  Over the course of this semester, we will explore various aspects of computational science.  We will learn standard modeling methods and tools, as well as programming (in Python), code-management, and basic data science techniques.  These techniques and skills will be applied to the student's own research.  This is a project-focused course with topics primarily driven by the research interests and needs of the students.  Students will work with the instructors to come up with specific learning goals and objectives relating to using computational science techniques to solve problems related to their research area.  All students will be expected to present their work to their peers with a final goal of distilling what they have learned in the form of example codes and training materials that can be shared with future students.  Some results may be submitted to conferences and journals.  This course is taught using a flipped classroom format similar to CMSE201/801.  Class examples and assignments will be done using the Python programming language.  Recommended Prerequisite: CMSE801.  Instructor: H. Yu.  Next offering: Fall 2021.

BME891:003 Dynamical Modeling of Biological Systems
"All biology is computational biology", or so claimed the title of a 2017 article in the journal PloS Biology. Decide for yourself by taking this new course, which will introduce you to fundamental ideas in computational modeling of biological systems and their response to environmental stress. Starting with simple models of gene regulation, we will move on to "systems biology" approaches to model cell signaling networks, signal amplification in protein kinase cascades, and the intricacies of the cell cycle. You will learn how cells in a noisy microenvironment "decide" among multiple possible fates, and why even genetically identical cells exhibit cell-to-cell variability in gene expression. We will also discuss mathematical models of spatial pattern formation in animal and plant systems including branching morphogenesis, and multicellular "virtual tissue" models. This course, which combines theory with hands-on computer modeling, will be of interest to graduate and advanced undergraduate students in various biomedical science disciplines, and engineers, physicists and computer scientists interested in modeling biological systems.  Next offering: Fall 2019.  Instructor: Sudin Bhattacharya.  Prerequisite: Familiarity with basic concepts in biology and differential equations.

ANS804 Introduction to Quantitative Genetics
Quantitative trait variation is pervasive in nature; it can be found among individuals in populations of virtually all life forms. For many quantitative traits, including in particular production of plants and animals, disease risks of humans, genetics contributes a significant part. Quantitative genetics is the discipline that deals with how the heritable (genetic) part of quantitative trait variation originates, dynamically changes, and passes on to future generations. This is important for genetic improvement of food animals and crops, development of diagnosis and treatments of genetic diseases, and understanding evolution.

This course covers the basics of quantitative genetics and is highly recommended for students who seek advanced and/or professional studies in genetics and employment opportunities in the breeding industry. Topics include the life cycle and properties of mutations, population parameters of genetic effects and phenotypes and their properties (means, variances, breeding values, heritability), dynamics of mutations and genetic variation (genetic drift, selection), quantitative genetics in the molecular era (mapping, prediction).

This course covers the general principles of quantitative genetics and should be applicable across species (animals, plants, model organisms, humans, etc.)

Next Offering: Fall 2020 and every Fall thereafter. Instructor: Wen Huang. Course website: https://qgg-lab.github.io/teaching.html