JJ Ying

Overview

This activity will focus on doing basic positional and filtering joins using some built-in data sets in R to prepare you to be able to do spatial joins later in the workshop using georeferenced data sources. Data joins commonly fit within a larger work-flow in tidyverse—here are the slides & narrative for review.

Logistics

For this activity, you will need the following packages:

library( knitr )
library( tidyverse )
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.3     ✓ purrr   0.3.4
✓ tibble  3.1.2     ✓ dplyr   1.0.6
✓ tidyr   1.1.3     ✓ stringr 1.4.0
✓ readr   1.4.0     ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
library( kableExtra )

Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':

    group_rows
library( nycflights13 )

and for reference purposes, here are links to both the slides and the narrative for this topic.

Activity Questions

  1. As a percentage of total number of flights, are some airlines more prone to canceling flights than others?
flights %>% 
  mutate( no_flight = is.na(air_time) ) %>%
  select( no_flight, carrier ) %>%
  group_by( carrier ) %>%
  summarize( Canceled = sum( no_flight == TRUE ),
             Good = sum( no_flight == FALSE )) %>%
  mutate( `Cancel Rate` = Canceled / Good ) %>%
  left_join( airlines, by="carrier") %>%
  select( Carrier = name,
          `Cancel Rate`) %>%
  arrange( -`Cancel Rate`) %>%
  kable( digits = 3, 
         caption = "The frequency of flight cancelations by airline.") %>%
  kable_styling( full_width=TRUE )
The frequency of flight cancelations by airline.
Carrier Cancel Rate
Mesa Airlines Inc.  0.105
SkyWest Airlines Inc.  0.103
Endeavor Air Inc.  0.067
ExpressJet Airlines Inc.  0.060
Envoy Air 0.054
US Airways Inc.  0.036
AirTran Airways Corporation 0.027
American Airlines Inc.  0.024
Southwest Airlines Co.  0.019
United Air Lines Inc.  0.015
JetBlue Airways 0.011
Delta Air Lines Inc.  0.009
Virgin America 0.009
Alaska Airlines Inc.  0.007
Frontier Airlines Inc.  0.006
Hawaiian Airlines Inc.  0.000
  1. Are you more likely to be delayed if you fly on a big plane or a small plane?
flights %>%
  group_by( tailnum ) %>%
  summarize( AveDelay = mean( arr_delay, na.rm=TRUE) ) %>% 
  left_join( planes, by="tailnum" ) %>%
  select( seats, AveDelay ) %>%
  group_by( seats ) %>%
  summarize( delay = mean( AveDelay, na.rm=TRUE) ) %>%
  mutate( seats = as.numeric( seats ) ) %>% 
  filter( !is.na(delay),
          !is.na(seats) ) -> df 

summary( df )
     seats           delay         
 Min.   :  2.0   Min.   :-10.2500  
 1st Qu.: 19.0   1st Qu.:  0.5252  
 Median :146.0   Median :  4.6666  
 Mean   :156.9   Mean   :  8.6665  
 3rd Qu.:256.2   3rd Qu.: 10.4785  
 Max.   :450.0   Max.   :120.0000  
df %>%
  ggplot( aes(seats, delay) ) + 
    geom_point() + 
    stat_smooth(method="lm", formula="y ~ x")

fit.df <- lm( delay ~ seats, data=df)
summary( fit.df )

Call:
lm(formula = delay ~ seats, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-27.862  -9.235  -2.199   5.170  95.362 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.11540    4.08539   0.028    0.978  
seats        0.05449    0.02029   2.686    0.010 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 17.74 on 46 degrees of freedom
Multiple R-squared:  0.1356,    Adjusted R-squared:  0.1168 
F-statistic: 7.215 on 1 and 46 DF,  p-value: 0.01002
  1. Is there any correlation between any of the aspects of weather at the airports and departure delays? For this I’m going to assume that a departure delay of over 30 minutes is significant.
flights %>%
  left_join(weather) %>%
  select( -(year:sched_dep_time),
          -(arr_time:time_hour) ) %>%
  filter( !is.na(dep_delay) ) %>%
  gather( feature, value, temp:visib, factor_key = TRUE ) %>%
  mutate( delayed = dep_delay > 30 )  -> df
Joining, by = c("year", "month", "day", "origin", "hour", "time_hour")
df %>%
  group_by( feature ) %>%
  summarize( on_time = mean( value[ df$delayed==FALSE], na.rm=TRUE),
             delayed = mean( value[ df$delayed==TRUE], na.rm=TRUE),
             t_test = t.test(value[ df$delayed==FALSE], 
                             value[ df$delayed==TRUE])$statistic,
             P = t.test(value[ df$delayed==FALSE], 
                             value[ df$delayed==TRUE])$p.value ) %>%
  kable( digits=3,
         caption = "Relationship between environmenal features measured for flights that had ") %>%
  kable_styling( full_width = TRUE )
Relationship between environmenal features measured for flights that had
feature on_time delayed t_test P
temp 56.642 59.272 -28.757 0
dewp 40.807 45.692 -49.755 0
humid 58.359 64.284 -58.578 0
wind_dir 202.420 198.768 7.313 0
wind_speed 10.946 11.734 -28.076 0
wind_gust 25.030 25.756 -11.824 0
precip 0.003 0.010 -29.477 0
pressure 1018.212 1015.851 58.490 0
visib 9.356 8.893 39.489 0
  1. Which carriers have planes that had over 100 flights in 2013?
flights %>%
  filter( !is.na(air_time) ) %>%
  group_by(carrier, tailnum ) %>%
  summarize( flights = length( distance ), .groups="keep" ) %>%
  filter( flights >= 100 ) %>%
  left_join(airlines, by="carrier") %>%
  group_by( carrier ) %>%
  summarize( Planes = length(tailnum) ) %>%
  left_join( airlines, by="carrier" ) %>%
  arrange( -Planes ) %>%
  select( Carrier = name, `Number of Flights` = Planes) %>%
  knitr::kable( caption="Number of planes by carrier that have over 100 flights in 2013.") %>%
  kableExtra::kable_styling(full_width = FALSE)
Number of planes by carrier that have over 100 flights in 2013.
Carrier Number of Flights
United Air Lines Inc.  317
ExpressJet Airlines Inc.  270
Delta Air Lines Inc.  209
JetBlue Airways 187
Envoy Air 80
US Airways Inc.  47
Endeavor Air Inc.  41
American Airlines Inc.  28
Virgin America 26
LS0tCnRpdGxlOiAiRGF0YSBKb2lucyBOYXJyYXRpdmUiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCjxjZW50ZXI+ClshW0pKIFlpbmddKGh0dHBzOi8vdW5zcGxhc2guY29tL3Bob3Rvcy9QRHhZZlhWbEsyTS9kb3dubG9hZD9mb3JjZT10cnVlJnc9NjQwKV0oaHR0cHM6Ly91bnNwbGFzaC5jb20vcGhvdG9zL1BEeFlmWFZsSzJNP3V0bV9zb3VyY2U9dW5zcGxhc2gmdXRtX21lZGl1bT1yZWZlcnJhbCZ1dG1fY29udGVudD1jcmVkaXRTaGFyZUxpbmspCjwvY2VudGVyPgoKIyMgT3ZlcnZpZXcKClRoaXMgYWN0aXZpdHkgd2lsbCBmb2N1cyBvbiBkb2luZyBiYXNpYyBwb3NpdGlvbmFsIGFuZCBmaWx0ZXJpbmcgam9pbnMgdXNpbmcgc29tZSBidWlsdC1pbiBkYXRhIHNldHMgaW4gYFJgIHRvIHByZXBhcmUgeW91IHRvIGJlIGFibGUgdG8gZG8gc3BhdGlhbCBqb2lucyBsYXRlciBpbiB0aGUgd29ya3Nob3AgdXNpbmcgZ2VvcmVmZXJlbmNlZCBkYXRhIHNvdXJjZXMuICBEYXRhIGpvaW5zIGNvbW1vbmx5IGZpdCB3aXRoaW4gYSBsYXJnZXIgd29yay1mbG93IGluIHRpZHl2ZXJzZS0tLWhlcmUgYXJlIHRoZSBbc2xpZGVzXSguLi90aWR5dmVyc2Uvc2xpZGVzLmh0bWwpICZhbXA7IFtuYXJyYXRpdmVdKC4uL3RpZHl2ZXJzZS9uYXJyYXRpdmUubmIuaHRtbCkgZm9yIHJldmlldy4KCiMjIExvZ2lzdGljcwoKRm9yIHRoaXMgYWN0aXZpdHksIHlvdSB3aWxsIG5lZWQgdGhlIGZvbGxvd2luZyBwYWNrYWdlczoKCmBgYHtyfQpsaWJyYXJ5KCBrbml0ciApCmxpYnJhcnkoIHRpZHl2ZXJzZSApCmxpYnJhcnkoIGthYmxlRXh0cmEgKQpsaWJyYXJ5KCBueWNmbGlnaHRzMTMgKQpgYGAKCmFuZCBmb3IgcmVmZXJlbmNlIHB1cnBvc2VzLCBoZXJlIGFyZSBsaW5rcyB0byBib3RoIHRoZSBbc2xpZGVzXShzbGlkZXMuaHRtbCkgYW5kIHRoZSBbbmFycmF0aXZlXShuYXJyYXRpdmUubmIuaHRtbCkgZm9yIHRoaXMgdG9waWMuCgojIyBBY3Rpdml0eSBRdWVzdGlvbnMKCjEuIEFzIGEgcGVyY2VudGFnZSBvZiB0b3RhbCBudW1iZXIgb2YgZmxpZ2h0cywgYXJlIHNvbWUgYWlybGluZXMgbW9yZSBwcm9uZSB0byBjYW5jZWxpbmcgZmxpZ2h0cyB0aGFuIG90aGVycz8KCmBgYHtyfQpmbGlnaHRzICU+JSAKICBtdXRhdGUoIG5vX2ZsaWdodCA9IGlzLm5hKGFpcl90aW1lKSApICU+JQogIHNlbGVjdCggbm9fZmxpZ2h0LCBjYXJyaWVyICkgJT4lCiAgZ3JvdXBfYnkoIGNhcnJpZXIgKSAlPiUKICBzdW1tYXJpemUoIENhbmNlbGVkID0gc3VtKCBub19mbGlnaHQgPT0gVFJVRSApLAogICAgICAgICAgICAgR29vZCA9IHN1bSggbm9fZmxpZ2h0ID09IEZBTFNFICkpICU+JQogIG11dGF0ZSggYENhbmNlbCBSYXRlYCA9IENhbmNlbGVkIC8gR29vZCApICU+JQogIGxlZnRfam9pbiggYWlybGluZXMsIGJ5PSJjYXJyaWVyIikgJT4lCiAgc2VsZWN0KCBDYXJyaWVyID0gbmFtZSwKICAgICAgICAgIGBDYW5jZWwgUmF0ZWApICU+JQogIGFycmFuZ2UoIC1gQ2FuY2VsIFJhdGVgKSAlPiUKICBrYWJsZSggZGlnaXRzID0gMywgCiAgICAgICAgIGNhcHRpb24gPSAiVGhlIGZyZXF1ZW5jeSBvZiBmbGlnaHQgY2FuY2VsYXRpb25zIGJ5IGFpcmxpbmUuIikgJT4lCiAga2FibGVfc3R5bGluZyggZnVsbF93aWR0aD1UUlVFICkKYGBgCgoyLiBBcmUgeW91IG1vcmUgbGlrZWx5IHRvIGJlIGRlbGF5ZWQgaWYgeW91IGZseSBvbiBhIGJpZyBwbGFuZSBvciBhIHNtYWxsIHBsYW5lPwoKCmBgYHtyfQpmbGlnaHRzICU+JQogIGdyb3VwX2J5KCB0YWlsbnVtICkgJT4lCiAgc3VtbWFyaXplKCBBdmVEZWxheSA9IG1lYW4oIGFycl9kZWxheSwgbmEucm09VFJVRSkgKSAlPiUgCiAgbGVmdF9qb2luKCBwbGFuZXMsIGJ5PSJ0YWlsbnVtIiApICU+JQogIHNlbGVjdCggc2VhdHMsIEF2ZURlbGF5ICkgJT4lCiAgZ3JvdXBfYnkoIHNlYXRzICkgJT4lCiAgc3VtbWFyaXplKCBkZWxheSA9IG1lYW4oIEF2ZURlbGF5LCBuYS5ybT1UUlVFKSApICU+JQogIG11dGF0ZSggc2VhdHMgPSBhcy5udW1lcmljKCBzZWF0cyApICkgJT4lIAogIGZpbHRlciggIWlzLm5hKGRlbGF5KSwKICAgICAgICAgICFpcy5uYShzZWF0cykgKSAtPiBkZiAKCnN1bW1hcnkoIGRmICkKYGBgCgpgYGB7cn0KZGYgJT4lCiAgZ2dwbG90KCBhZXMoc2VhdHMsIGRlbGF5KSApICsgCiAgICBnZW9tX3BvaW50KCkgKyAKICAgIHN0YXRfc21vb3RoKG1ldGhvZD0ibG0iLCBmb3JtdWxhPSJ5IH4geCIpCmBgYAoKYGBge3J9CmZpdC5kZiA8LSBsbSggZGVsYXkgfiBzZWF0cywgZGF0YT1kZikKc3VtbWFyeSggZml0LmRmICkKYGBgCgozLiBJcyB0aGVyZSBhbnkgY29ycmVsYXRpb24gYmV0d2VlbiBhbnkgb2YgdGhlIGFzcGVjdHMgb2Ygd2VhdGhlciBhdCB0aGUgYWlycG9ydHMgYW5kIGRlcGFydHVyZSBkZWxheXM/ICBGb3IgdGhpcyBJJ20gZ29pbmcgdG8gYXNzdW1lIHRoYXQgYSBkZXBhcnR1cmUgZGVsYXkgb2Ygb3ZlciAzMCBtaW51dGVzIGlzIHNpZ25pZmljYW50LgoKYGBge3IgY2FjaGU9VFJVRX0KZmxpZ2h0cyAlPiUKICBsZWZ0X2pvaW4od2VhdGhlcikgJT4lCiAgc2VsZWN0KCAtKHllYXI6c2NoZWRfZGVwX3RpbWUpLAogICAgICAgICAgLShhcnJfdGltZTp0aW1lX2hvdXIpICkgJT4lCiAgZmlsdGVyKCAhaXMubmEoZGVwX2RlbGF5KSApICU+JQogIGdhdGhlciggZmVhdHVyZSwgdmFsdWUsIHRlbXA6dmlzaWIsIGZhY3Rvcl9rZXkgPSBUUlVFICkgJT4lCiAgbXV0YXRlKCBkZWxheWVkID0gZGVwX2RlbGF5ID4gMzAgKSAgLT4gZGYKCmRmICU+JQogIGdyb3VwX2J5KCBmZWF0dXJlICkgJT4lCiAgc3VtbWFyaXplKCBvbl90aW1lID0gbWVhbiggdmFsdWVbIGRmJGRlbGF5ZWQ9PUZBTFNFXSwgbmEucm09VFJVRSksCiAgICAgICAgICAgICBkZWxheWVkID0gbWVhbiggdmFsdWVbIGRmJGRlbGF5ZWQ9PVRSVUVdLCBuYS5ybT1UUlVFKSwKICAgICAgICAgICAgIHRfdGVzdCA9IHQudGVzdCh2YWx1ZVsgZGYkZGVsYXllZD09RkFMU0VdLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICB2YWx1ZVsgZGYkZGVsYXllZD09VFJVRV0pJHN0YXRpc3RpYywKICAgICAgICAgICAgIFAgPSB0LnRlc3QodmFsdWVbIGRmJGRlbGF5ZWQ9PUZBTFNFXSwgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgdmFsdWVbIGRmJGRlbGF5ZWQ9PVRSVUVdKSRwLnZhbHVlICkgJT4lCiAga2FibGUoIGRpZ2l0cz0zLAogICAgICAgICBjYXB0aW9uID0gIlJlbGF0aW9uc2hpcCBiZXR3ZWVuIGVudmlyb25tZW5hbCBmZWF0dXJlcyBtZWFzdXJlZCBmb3IgZmxpZ2h0cyB0aGF0IGhhZCAiKSAlPiUKICBrYWJsZV9zdHlsaW5nKCBmdWxsX3dpZHRoID0gVFJVRSApCmBgYAoKCjQuIFdoaWNoIGNhcnJpZXJzIGhhdmUgcGxhbmVzIHRoYXQgaGFkIG92ZXIgMTAwIGZsaWdodHMgaW4gMjAxMz8KCmBgYHtyfQpmbGlnaHRzICU+JQogIGZpbHRlciggIWlzLm5hKGFpcl90aW1lKSApICU+JQogIGdyb3VwX2J5KGNhcnJpZXIsIHRhaWxudW0gKSAlPiUKICBzdW1tYXJpemUoIGZsaWdodHMgPSBsZW5ndGgoIGRpc3RhbmNlICksIC5ncm91cHM9ImtlZXAiICkgJT4lCiAgZmlsdGVyKCBmbGlnaHRzID49IDEwMCApICU+JQogIGxlZnRfam9pbihhaXJsaW5lcywgYnk9ImNhcnJpZXIiKSAlPiUKICBncm91cF9ieSggY2FycmllciApICU+JQogIHN1bW1hcml6ZSggUGxhbmVzID0gbGVuZ3RoKHRhaWxudW0pICkgJT4lCiAgbGVmdF9qb2luKCBhaXJsaW5lcywgYnk9ImNhcnJpZXIiICkgJT4lCiAgYXJyYW5nZSggLVBsYW5lcyApICU+JQogIHNlbGVjdCggQ2FycmllciA9IG5hbWUsIGBOdW1iZXIgb2YgRmxpZ2h0c2AgPSBQbGFuZXMpICU+JQogIGtuaXRyOjprYWJsZSggY2FwdGlvbj0iTnVtYmVyIG9mIHBsYW5lcyBieSBjYXJyaWVyIHRoYXQgaGF2ZSBvdmVyIDEwMCBmbGlnaHRzIGluIDIwMTMuIikgJT4lCiAga2FibGVFeHRyYTo6a2FibGVfc3R5bGluZyhmdWxsX3dpZHRoID0gRkFMU0UpCmBgYAoKCg==