Programming for Genomics, Spring 2014

Instructor: John McCutcheon
Time, place, credits, CRN: Tuesdays and Thursdays 2:10-3:30pm, HS 114, 3 credits, 33403
Required textbook: Practical Computing for Biologists, 2011
Optional textbook: Processing: A Programming Handbook for Visual Designers and Artists, 2007
Suggested text editor: TextWrangler
Suggested supplementary readings for winter break, before class starts:
Software Carpentry's lectures on the UNIX shell
My introduction to UNIX
Software Carpentry's lectures on regular expressions
Course description:
Cheap, easily accessible DNA sequencing has transformed biology. For a couple of thousand dollars, individual research groups can generate more data in a week than the entire Human Genome Project generated in ten years. However, while the generation of these large data sets is routine, the analyses of these data are not. This course is aimed at teaching students the skills required to manage, analyze, and display large genomic data sets.

Most of the course will be taught in the UNIX environment using the programming language python. Students will write on average one new program a week, using problem areas and data sets from genomics as examples. We will cover the use of regular expressions (pattern matching), data types and structures, program control using logic and loops, reading and writing files, and python modules for biology. In the last few weeks of the course, we will explore large data visualization using Processing, a programming language originally designed for visual artists.

This course will be difficult and time consuming. Learning to express ideas in computer code can often be challenging, and almost always takes a significant intellectual investment. The rewards, however, are considerable, and (I believe) reach far beyond the ability to handle modern genomic data.

This course is intended for advanced undergraduates or graduate students. The prerequisites are either completion of BIOB 486, Genomics, or consent of instructor. Students already working with genomic data are encouraged to bring these data to the course. Practical Computing for Biologists is the only required text; Processing: A Programming Handbook for Visual Designers and Artists is optional.

The red and yellow image in the header of this page was created in the 2014 version of the class by Kristen Cook, a UM undergraduate, using Python and Processing. She visualized the quality values for an Illumina lane (redder is lower quality, yellower is higher quality). This allowed Kristen to see a strong edge effect in quality scores, represented by a red streak across the top of the image.