Exploring Air Freight: Visualizing the Journey of Airborne Cargo Using R
I have been watching airplanes fly by all of my life. I have always found myself looking at the airplanes and wondering who or what is on those airplanes. I have been ordering stuff online for all of my adult life. I realize that the majority of items I order do not come from nearby and probably have undergone their own journeys to arrive in possession.
As a student in the Masters in Environmental Data Science program at the University of California Santa Barbara, I was given the opportunity to create some data visualizations on any topic of my choosing so I decided I would answer my mostly life long question about airplanes and shipping. I wanted to know where are planes coming from and going to? How much do we actually ship via freight? How has this changed over time? This is a guide on how I created the visuals I used to answer these questions.
Definitions and Background
Background on data
The data comes from the Department of Transportation’s Air Carriers: T-100 International Segment for US carriers only data. This contains information on all international flights to and from the United States, focusing primarily on freight volume (in pounds) and departure/destination details. While the dataset included IATA airport codes (3-letter location identifiers), it lacked geographic coordinates. I added this information by using the airportr package and merging the datasets.
Data loading
Code
# Function for reading in CSVsread <-function(year) { file_name <-paste0("flights", year, ".csv") file_path <-here("blog_posts", "2025-3-14-air-freight-viz", "data", file_name) # Read and filter dataset df <-read.csv(file_path) %>%filter(FREIGHT !=0) %>%clean_names()return(df)}
Reading in the data via the function
Code
# Read in data via functionfreight2024 <-read(2024)freight2023 <-read(2023)freight2022 <-read(2022)freight2021 <-read(2021)freight2020 <-read(2020)freight2019 <-read(2019)freight2018 <-read(2018)freight2017 <-read(2017)freight2016 <-read(2016)freight2015 <-read(2015)freight2014 <-read(2014)freight2013 <-read(2013)freight2012 <-read(2012)freight2011 <-read(2011)freight2010 <-read(2010)freight2009 <-read(2009)freight2008 <-read(2008)freight2007 <-read(2007)freight2006 <-read(2006)freight2005 <-read(2005)freight2004 <-read(2004)
Code
# Create function to join datasetsjoin_freight_data <-function(...) {Reduce(full_join, list(...))}
Code
# Use the function to join all datasetsfreight_all <-join_freight_data( freight2004, freight2005, freight2006, freight2007, freight2008, freight2009, freight2010, freight2011, freight2012, freight2013, freight2014, freight2015, freight2016, freight2017, freight2018, freight2019, freight2020, freight2021, freight2022, freight2023, freight2024)
Code
air_coords <- airports %>%clean_names()
The Visualizations:
The Map
The first visualization I wanted to create was a map showing which airports had the largest amount of freight in pounds traveling between them. While the map adequately visualized the the connections between the airports, it should be noted that it does not accurately show the route the planes took. For example, a plane flying from Anchorage Alaska to South Korea is most likely going to fly over the pacific ocean, not over the Atlantic, Africa and the rest of Asia.
Data cleaning and combining for the map
Code
# Rename origin and dest columnsfreight_map_data <- freight_all %>%rename(iata_origin = origin) %>%rename(iata_dest = dest)
# Duplicate and rename lat longsair_coords_rn$lat_dest = air_coords_rn$latitudeair_coords_rn$long_dest = air_coords_rn$longitude
Code
# Rename original lat longsair_duocoords <- air_coords_rn %>%rename(lat_origin = latitude) %>%rename(long_origin = longitude)
Joining the data
This is where I joined the edited freight data with the airport coordinate data.
Code
# Join freight_all with air_duocoords to get destination coordinatesfreight_coords <- freight_map_data %>%left_join(air_duocoords %>%select(iata_dest, lat_dest, long_dest), by ="iata_dest") %>%# Join again to get origin coordinatesleft_join(air_duocoords %>%select(iata_origin, lat_origin, long_origin), by ="iata_origin")
Code
airport_routes <- freight_coords %>%group_by(iata_origin, iata_dest) %>%mutate(freight_shipped =sum(freight, na.rm =TRUE)) %>%# Sum freight shipped for each routeungroup() %>%# Remove grouping after adding the freight countdistinct(iata_origin, iata_dest, .keep_all =TRUE) %>%# Keep only unique routesarrange(desc(freight_shipped))
Code
# Filtering for the top 20 routestop_20_routes <- airport_routes %>%arrange(desc(freight_shipped)) %>%head(20)
Plotting the map
To create the map, I used ggplot2. I used geom_curve to ensure the lines between the airports were curved and not just straight.
Code
# Define an aviation-themed color paletteaviation_pal <-c("#1E3A8A", "#d2232aff", "#e369a2ff", "#93C5FD", "#6366F1", "#f68b21ff", "#FACC15", "#F97316", "#64748B", "#94A3B8", "#CBD5E1", "#E5E7EB") # Get world map dataworld_map <-map_data("world")# Create ggplot map_pounds <-ggplot() +# Add world map background with aviation themegeom_polygon(data = world_map, aes(x = long, y = lat, group = group),fill ="#CBD5E1", color ="#CBD5E1") +# Light gray-blue for land with slightly darker borders# Add curved lines for routesgeom_curve(data = top_20_routes, aes(x = long_origin, y = lat_origin, xend = long_dest, yend = lat_dest),color ="#3B82F6", # Medium blue for flight pathscurvature =0.2, size = .5, alpha =0.8) +# Add points for origin airportsgeom_point(data = top_20_routes, aes(x = long_origin, y = lat_origin, color ="Origin"), size =1) +# Add points for destination airportsgeom_point(data = top_20_routes, aes(x = long_dest, y = lat_dest, color ="Destination"), size =1) +# Theme settings matching the bump chart styletheme_minimal() +theme(axis.text =element_blank(),axis.ticks =element_blank(),panel.grid =element_blank(),legend.position =c(.48, .013),legend.justification =c("center"),legend.direction ="horizontal",legend.text =element_text(size =10, face ="bold"),plot.title =element_text(size =14, face ="bold"),plot.margin =margin(20, 50, 20, 50) ) +# Add the legend with colors from aviation palettescale_color_manual(values =c("Origin"="#EAB308", # Gold for origin points (resembles takeoff)"Destination"="#F97316"), # Orange for destination points (resembles landing)name =" ") +labs(title ="Top Air-Freight Connections", x ="", y ="")# Store the plot in a variableroute_map <- map_poundsroute_map
I enjoy looking at the map and imagining which planes might be coming from different areas.
The bump chart
The data I am using includes information from 2004 all the way to 2024. For the basic map made above, I used information from all the years. However, where things are coming from and going to has almost certainly changed over the past 20 years. This is where I thought using a bump chart could effectively show how things have shifted over the past several years.
Code
# Get the amount of freight based on the non us country other_countries_col <- freight_all %>%mutate(other_country =ifelse(origin_country_name =="United States", dest_country_name, origin_country_name)) %>%group_by(other_country, year) %>%summarize(total_freight =sum(freight, na.rm =TRUE))
Code
# Rank the number one countries by yearcountry_rank_by_year <- other_countries_col %>%filter(year >=2019) %>%select(year, other_country, total_freight) %>%group_by(year) %>%mutate(rank =row_number(desc(total_freight)) ) %>%ungroup() %>%arrange(rank, year)
The reasoning behind using a bump chart for this visualization is not only because it can display the information accurately, but also because it gives a sense of flow and travel. One of the key themes I want on my visualization is a sense of movement and forward flow.
After creating the bump chart, my next goal was to create a timeline that showed how the amount of freight in pounds has changed over the past 20 years. When I was thinking of how to put all of the elements together, I originally wanted the bump chart and the timeline to focus on the same amount of years. However, when I lengthened the time frame of the bump chart, the information got harder to follow and it lost its use in conveying information. When I tried to shorten the timeline, certain events like the freight recession, became less noticeable. Ultimately, I decided it was better and more interesting if the plots focused on different years.
Code
# Convert year and month into a proper date formatfreight_all <- freight_all %>%mutate(date =make_date(year, month, 1))
Plotting the timeline involved making some of the harder visual decisions. I wanted to show the lines for each year, but I was afraid that would take away from the general cohesiveness of all the elements put together. Ultimately, I think it worked out but it was challenging for me to figure out how I could make it cohesive.
Code
timeline <-ggplot(freight_month, aes(x = date, y = total_freight)) +geom_smooth(color ="#1E3A8A", size =1, span =0.1, se =FALSE) +scale_x_date(date_labels ="%Y", date_breaks ="24 months") +scale_y_continuous(labels = scales::label_number(scale =1e-6, suffix ="M lb", big.mark =",") ) +labs(title ="Pounds of Air-Freight Moved To and From The USA",x ="",y ="") +theme_minimal() +theme(plot.title =element_text(size =12, face ="bold", hjust =0.5), # Match bump chart titleaxis.text.x =element_text(size =10, face ="bold"), axis.text.y =element_text(size =10, face ="bold"), panel.grid.major.y =element_blank(), panel.grid.minor.y =element_blank(),panel.grid.minor.x =element_blank(),plot.margin =margin(20, 50, 20, 50) # Ensure proper spacing ) print(timeline)
The final visuals
Once all those pieces were finished, I used google slides to put to final pieces together.