Reshaping Data with tidyr
Transform almost any dataset into a tidy format to make analysis easier.
Course Description
Data in the wild can be scary—when confronted with a complicated and messy dataset you may find yourself wondering, where do I even start? The tidyr package allows you to wrangle such beasts into nice and tidy datasets. Inaccessible values stored in column names will be put into rows, JSON files will become data frames, and missing values will never go missing again. You’ll practice these techniques on a wide range of messy datasets, learning along the way how many dogs the Soviet Union sent into space and what bird is most popular in New Zealand. With the tidyr package in your tidyverse toolkit, you’ll be able to transform almost any dataset in a tidy format which will pay-off during the rest of your analysis.
What You’ll Learn
Tidy Data
You’ll be introduced to the concept of tidy data which is central to this course. In the first two lessons, you’ll jump straight into the action by separating messy character columns into tidy variables and observations ready for analysis. In the final lesson, you’ll learn how to overwrite and remove missing values.
Expanding Data
Values can often be missing in your data, and sometimes entire observations are absent too. In this chapter, you’ll learn how to complete your dataset with these missing observations. You’ll add observations with zero values to counted data, expand time series to a full sequence of intervals, and more!
From Wide to Long and Back
This chapter is all about pivoting data from a wide to long format and back again using the pivot_longer() and pivot_wider() functions. You’ll need these functions when variables are hidden in messy column names or when variables are stored in rows instead of columns. You’ll learn about space dogs, nuclear bombs, and planet temperatures along the way.
Rectangling Data
In the final chapter, you’ll learn how to turn nested data structures such as JSON and XML files into tidy, rectangular data. This skill will enable you to process data from web APIs. You’ll also learn how nested data structures can be used to write elegant modeling pipelines that produce tidy outputs.