Open course: Data Analysis

Foundations of Data Analysis

This is a hands on course with a data lab to teach fundamental statistical topics such as descriptive statistics, inferential testing, and modeling.

About this Course

In a world that’s full of data, we have many questions: How long do animals in a shelter have to wait until they are adopted? Can we model the growth of internet usage in a country? Do films with a more adult rating make more money that other rated films?

Luckily, the world is also full of data to help us answer those questions. This course will walk through the basics of statistical thinking – starting with an interesting question. Then, we’ll learn the correct statistical tool to help answer our question of interest – using R and hands-on Labs. Finally, we’ll learn how to interpret our findings and develop a meaningful conclusion.

This course will consist of instructional videos for statistical concepts broken down into manageable chunks – each followed by some guided questions to help your understanding of the topic. Most weeks, the instructional section will be followed by tutorial videos for using R, which we’ll then apply to a hands-on Lab where we will answer a specific question using real-world datasets.

We’ll cover basic Descriptive Statistics in our first “Unit” – learning about visualizing and summarizing data. Unit two will be a “modeling” investigation where we’ll learn about linear, exponential, and logistic functions. We’ll learn how to interpret and use those functions with a little bit of Pre-Calculus (but we’ll keep it very basic). Finally in the third Unit, we’ll learn about Inferential statistical tests such as the t-test, ANOVA, and chi-square.

This course is intended to have the same “punch” as a typical introductory undergraduate statistics course, with an added twist of modeling. This course is also intentionally devised to be sequential, with each new piece building on the previous topics. Once completed, students should feel comfortable using basic statistical techniques to answer their own questions about their own data, using a widely available statistical software package (R).




Read More…