Grammer of Data Wrangling

Lecture 5

Dr. Elijah Meyer

Duke University
STA 199 - Summer 2023

2023-09-12

Checklist

– Clone AE-04

– AEs will be graded starting today

– HW-1 is due at 11:59PM tonight! Note: being unable to render a PDF is not an excuse for late work.

– HW-2 released

– Keep checking Slack

– Lab-1 due Wednesday at 11:59 PM.

How to turn AE’s via Github

– commit, and push

  1. Save your qmd (render if you would like but not required for AEs)

  2. Check the box next to each document in the Git tab (this is called “staging” the changes). Commit the changes you made using an simple and informative message.

  3. Use the green arrow to push your changes to your repo on GitHub.

  4. Check your repo on GitHub and see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!

Wrap up Plotting (for now)

– Let the types of variables dictate the plot

– Informative title

– Axes should be labeled

– Careful consideration of aesthetic choices (like color)

Match the variables to plots

– 1 categorical variable, 1 quantitative variable

– 2 categorical variables

– 2 quantitative variables

geom_histogram

geom_point

geom_bar

geom_boxplot

Patterns in plots

Scatterplot

  • Strength

  • Direction

  • Linear or non-linear

  • Outliers

  • Correlation - strength and direction of a linear relationship [-1,1]

Patterns in plots

Histogram

  • Shape

  • Center (ish)

  • Skew

  • Outliers

Patterns in plots

Boxplot

  • Shape

  • Center (ish)

  • Skew

  • Outliers

Patterns in plots

Bar Graphs

  • Relative Pattern

Dplyr

Sometimes we need to manipulate the data

– To make certain plots

– To create summary statistics

– To model data

Goals for today

– Practice with dplyr functions

ae-04