PH525x series – Biomedical Data Science

github:

The unprecedented advance in digital technology during the second half of the 20th century has produced a measurement revolution that is transforming science. In the life sciences, data analysis is now part of practically every research project. Genomics, in particular, is being driven by new measurement technologies that permit us to observe certain molecular entities for the first time. These observations are leading to discoveries analogous to identifying microorganisms and other breakthroughs permitted by the invention of the microscope. Choice examples of these technologies are microarrays and next generation sequencing.

Scientific fields that have traditionally relied upon simple data analysis techniques have been turned on their heads by these technologies. In the past, for example, researchers would measure the transcription levels of a single gene of interest. Today, it is possible to measure all 20,000+ human genes at once. Advances such as these have brought about a shift from hypothesis to discovery-driven research. However, interpreting information extracted from these massive and complex datasets requires sophisticated statistical skills as one can easily be fooled by patterns arising by chance. This has greatly elevated the importance of statistics and data analysis in the life sciences.