5. Data Frames#

Data frames are the premier data structure for working with tabular data. In Julia, the DataFrames.jl package provides data frames and functions to work with them. Besides reading this chapter, you can learn more about how to use the package from the official documentation and cheat sheet.

This chapter uses the Airline On-Time Performance Data Set introduced in Section 4.

5.1. Inspecting#

When you load a data set, it’s a good idea to inspect it to make sure it was loaded correctly and contains the data you expect. Julia and DataFrames.jl provide several functions that are helpful for inspecting data frames:

  • describe to get a summary

  • first, last to get the first or last n rows

  • nrow, ncol, size, ndims to get dimension information

  • names to get column names

  • typeof, eltype to get types

Let’s take a look at the first 5 rows of the air data to refresh our memory:

first(air, 5)
5×110 DataFrame
10 columns omitted
RowYearQuarterMonthDayofMonthDayOfWeekFlightDateReporting_AirlineDOT_ID_Reporting_AirlineIATA_CODE_Reporting_AirlineTail_NumberFlight_Number_Reporting_AirlineOriginAirportIDOriginAirportSeqIDOriginCityMarketIDOriginOriginCityNameOriginStateOriginStateFipsOriginStateNameOriginWacDestAirportIDDestAirportSeqIDDestCityMarketIDDestDestCityNameDestStateDestStateFipsDestStateNameDestWacCRSDepTimeDepTimeDepDelayDepDelayMinutesDepDel15DepartureDelayGroupsDepTimeBlkTaxiOutWheelsOffWheelsOnTaxiInCRSArrTimeArrTimeArrDelayArrDelayMinutesArrDel15ArrivalDelayGroupsArrTimeBlkCancelledCancellationCodeDivertedCRSElapsedTimeActualElapsedTimeAirTimeFlightsDistanceDistanceGroupCarrierDelayWeatherDelayNASDelaySecurityDelayLateAircraftDelayFirstDepTimeTotalAddGTimeLongestAddGTimeDivAirportLandingsDivReachedDestDivActualElapsedTimeDivArrDelayDivDistanceDiv1AirportDiv1AirportIDDiv1AirportSeqIDDiv1WheelsOnDiv1TotalGTimeDiv1LongestGTimeDiv1WheelsOffDiv1TailNumDiv2AirportDiv2AirportIDDiv2AirportSeqIDDiv2WheelsOnDiv2TotalGTimeDiv2LongestGTimeDiv2WheelsOffDiv2TailNumDiv3AirportDiv3AirportIDDiv3AirportSeqIDDiv3WheelsOnDiv3TotalGTimeDiv3LongestGTimeDiv3WheelsOffDiv3TailNumDiv4AirportDiv4AirportIDDiv4AirportSeqIDDiv4WheelsOnDiv4TotalGTimeDiv4LongestGTimeDiv4WheelsOff
Int64Int64Int64Int64Int64DateString3Int64String3String7Int64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64?Float64?Float64?Float64?Int64?String15Float64?Int64?Int64?Float64?Int64Int64?Float64?Float64?Float64?Int64?String15Float64String3?Float64Float64?Float64?Float64?Float64Float64Int64Float64?Float64?Float64?Float64?Float64?Int64?Float64?Float64?Int64Float64?Float64?Float64?Float64?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?MissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissing
1202311212023-01-029E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800757-3.00.00.0-10800-085911.080883320.0905853-12.00.00.0-10900-09590.0missing0.065.056.025.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
2202311322023-01-039E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085919.08148516.0905857-8.00.00.0-10900-09590.0missing0.065.062.037.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
3202311432023-01-049E203639EN331PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085914.08098377.0905844-21.00.00.0-20900-09590.0missing0.065.049.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
4202311542023-01-059E203639EN906XJ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800754-6.00.00.0-10800-085913.08078453.0905848-17.00.00.0-20900-09590.0missing0.065.054.038.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
5202311652023-01-069E203639EN337PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800759-1.00.00.0-10800-085917.08168445.0905849-16.00.00.0-20900-09590.0missing0.065.050.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing

The size function returns the number of rows and columns in a data frame (you can use nrow and ncol to get these individually):

size(air)
(538837, 110)

The names function returns the names of a data frame’s columns:

names(air)
110-element Vector{String}:
 "Year"
 "Quarter"
 "Month"
 "DayofMonth"
 "DayOfWeek"
 "FlightDate"
 "Reporting_Airline"
 "DOT_ID_Reporting_Airline"
 "IATA_CODE_Reporting_Airline"
 "Tail_Number"
 "Flight_Number_Reporting_Airline"
 "OriginAirportID"
 "OriginAirportSeqID"
 ⋮
 "Div4LongestGTime"
 "Div4WheelsOff"
 "Div4TailNum"
 "Div5Airport"
 "Div5AirportID"
 "Div5AirportSeqID"
 "Div5WheelsOn"
 "Div5TotalGTime"
 "Div5LongestGTime"
 "Div5WheelsOff"
 "Div5TailNum"
 "Column110"

Tip

By default, Julia tries to make printed output fit on the screen. This is unhelpful when you want to see all of a particular data structure. You can use the print function to make Julia show everything. For instance, try this code:

print(names(air))

One way to characterize a data frame is by the types of elements in its columns. In a Julia data frame, columns are generally Vectors, and the eltype function gets the element type(s) of a Vector. To get the element types for all columns, use the eachcol function to get an iterator over the columns, and then broadcast eltype over the iterator:

eltype.(eachcol(air))
110-element Vector{Type}:
 Int64
 Int64
 Int64
 Int64
 Int64
 Dates.Date
 String3
 Int64
 String3
 String7
 Int64
 Int64
 Int64
 ⋮
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing

We can make this result easier to read by putting it in a data frame with the column names (and possibly other summary information). The constructor function DataFrame makes a new data frame:

air_types = DataFrame(name = names(air), type = eltype.(eachcol(air)))
air_types
110×2 DataFrame
85 rows omitted
Rownametype
StringType
1YearInt64
2QuarterInt64
3MonthInt64
4DayofMonthInt64
5DayOfWeekInt64
6FlightDateDate
7Reporting_AirlineString3
8DOT_ID_Reporting_AirlineInt64
9IATA_CODE_Reporting_AirlineString3
10Tail_NumberString7
11Flight_Number_Reporting_AirlineInt64
12OriginAirportIDInt64
13OriginAirportSeqIDInt64
99Div4LongestGTimeMissing
100Div4WheelsOffMissing
101Div4TailNumMissing
102Div5AirportMissing
103Div5AirportIDMissing
104Div5AirportSeqIDMissing
105Div5WheelsOnMissing
106Div5TotalGTimeMissing
107Div5LongestGTimeMissing
108Div5WheelsOffMissing
109Div5TailNumMissing
110Column110Missing

5.2. Indexing#

Data frames use square brackets [ ] for indexing (like most other data structures in Julia). Since data frames are two-dimensional, two indexes are required. The following subsections describe different kinds of indexes you can use, as well as some other ways to get data out of a data frame.

5.2.1. By Position#

You can use integer arguments to select elements by position. For example, to extract the value in row 2, column 1 of the air data frame:

air[2, 1]
2023

The first argument is the row index, while the second is the column index.

As with other data structures, you can also use an array of indexes to select multiple values. For instance, to get rows 1, 3, and 1 again from column 5:

air[[1, 3, 1], 5]

You can also use a slice to select a range of values. For instance, to select the values in the first 3 rows, column 5:

air[1:3, 5]
3-element Vector{Int64}:
 1
 2
 3

Tip

You can use the end keyword in a slice to mean the last element. The end keyword can be combined with arithmetic operators. For example, to get the last 2 rows, column 5:

air[end-1:end, 5]

In DataFrames.jl, there are two different ways to indicate that you want all of the elements along a dimension. A : selects all elements and returns a copy, while a ! selects all elements and returns a view (or reference). It’s generally safer to use a copy (especially if you’re going to modify the data), but more CPU- and memory-efficient to use a view. Here are examples of both:

year = air[!, 1]
year_copy = air[:, 1]
538837-element Vector{Int64}:
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
    ⋮
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023
 2023

Caution

Returning a view with ! is only possible if the resulting data are contiguous in the original data frame.

Tip

More generally, you can use the @view macro to get a view based on indexing even when you don’t want all elements along an axis. For example:

@view air[1, 1]

You can combine indexing with assignment (=) to reassign specific elements of a data frame. Note that if you reassign elements of a view, the elements will also change in the original data frame.

5.2.2. By Name#

You can use String arguments to select elements by name. For instance, to select row 1 of the Year column:

air[1, "Year"]
2023

You can also use Symbol arguments to select elements by name. In Julia, you can write a literal Symbol by putting a colon : in front of text. For instance, to select row 1 of the Year column:

air[1, :Year]
2023

Indexing with Symbols is faster than indexing with Strings, so use Symbols when possible.

As with positional indexes, you can use arrays of indexes to select multiple elements. For example:

air[1:3, [:Year, :Month, :DayofMonth]]
3×3 DataFrame
RowYearMonthDayofMonth
Int64Int64Int64
1202312
2202313
3202314

Selection by name is primarily used for columns, since rows usually don’t have names. If you want to select an entire column, there are two more ways to do it besides [ ]: attribute access (.NAME) and the select function. As an example, here are three ways to select the entire DayofMonth column:

air[:, "DayofMonth"]
air.DayofMonth
select(air, :DayofMonth)
538837×1 DataFrame
538812 rows omitted
RowDayofMonth
Int64
12
23
34
45
56
67
714
821
928
109
1110
1211
1312
5388262
5388272
5388282
5388292
5388302
5388312
5388322
5388332
5388342
5388352
5388362
5388372

Note that the first two return arrays, while select returns a data frame.

5.2.3. By Condition#

The indexing operator [ ] also accepts arrays of Boolean values, to facilitate getting elements based on a condition. For example, suppose you want to get all rows where DayofMonth is less than 15. You can test for these rows with this condition:

air.DayofMonth .< 15
538837-element BitVector:
 1
 1
 1
 1
 1
 1
 1
 0
 0
 1
 1
 1
 1
 ⋮
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1

And the code to actually get the rows is:

air[air.DayofMonth .< 15, :]
240261×110 DataFrame
10 columns and 240236 rows omitted
RowYearQuarterMonthDayofMonthDayOfWeekFlightDateReporting_AirlineDOT_ID_Reporting_AirlineIATA_CODE_Reporting_AirlineTail_NumberFlight_Number_Reporting_AirlineOriginAirportIDOriginAirportSeqIDOriginCityMarketIDOriginOriginCityNameOriginStateOriginStateFipsOriginStateNameOriginWacDestAirportIDDestAirportSeqIDDestCityMarketIDDestDestCityNameDestStateDestStateFipsDestStateNameDestWacCRSDepTimeDepTimeDepDelayDepDelayMinutesDepDel15DepartureDelayGroupsDepTimeBlkTaxiOutWheelsOffWheelsOnTaxiInCRSArrTimeArrTimeArrDelayArrDelayMinutesArrDel15ArrivalDelayGroupsArrTimeBlkCancelledCancellationCodeDivertedCRSElapsedTimeActualElapsedTimeAirTimeFlightsDistanceDistanceGroupCarrierDelayWeatherDelayNASDelaySecurityDelayLateAircraftDelayFirstDepTimeTotalAddGTimeLongestAddGTimeDivAirportLandingsDivReachedDestDivActualElapsedTimeDivArrDelayDivDistanceDiv1AirportDiv1AirportIDDiv1AirportSeqIDDiv1WheelsOnDiv1TotalGTimeDiv1LongestGTimeDiv1WheelsOffDiv1TailNumDiv2AirportDiv2AirportIDDiv2AirportSeqIDDiv2WheelsOnDiv2TotalGTimeDiv2LongestGTimeDiv2WheelsOffDiv2TailNumDiv3AirportDiv3AirportIDDiv3AirportSeqIDDiv3WheelsOnDiv3TotalGTimeDiv3LongestGTimeDiv3WheelsOffDiv3TailNumDiv4AirportDiv4AirportIDDiv4AirportSeqIDDiv4WheelsOnDiv4TotalGTimeDiv4LongestGTimeDiv4WheelsOff
Int64Int64Int64Int64Int64DateString3Int64String3String7Int64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64?Float64?Float64?Float64?Int64?String15Float64?Int64?Int64?Float64?Int64Int64?Float64?Float64?Float64?Int64?String15Float64String3?Float64Float64?Float64?Float64?Float64Float64Int64Float64?Float64?Float64?Float64?Float64?Int64?Float64?Float64?Int64Float64?Float64?Float64?Float64?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?MissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissing
1202311212023-01-029E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800757-3.00.00.0-10800-085911.080883320.0905853-12.00.00.0-10900-09590.0missing0.065.056.025.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
2202311322023-01-039E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085919.08148516.0905857-8.00.00.0-10900-09590.0missing0.065.062.037.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
3202311432023-01-049E203639EN331PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085914.08098377.0905844-21.00.00.0-20900-09590.0missing0.065.049.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
4202311542023-01-059E203639EN906XJ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800754-6.00.00.0-10800-085913.08078453.0905848-17.00.00.0-20900-09590.0missing0.065.054.038.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
5202311652023-01-069E203639EN337PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800759-1.00.00.0-10800-085917.08168445.0905849-16.00.00.0-20900-09590.0missing0.065.050.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
6202311762023-01-079E203639EN336PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800750-10.00.00.0-10800-085917.08078457.0905852-13.00.00.0-10900-09590.0missing0.065.062.038.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
72023111462023-01-149E203639EN311PQ462812953129530431703LGANew York, NYNY36New York2211193111930233105CVGCincinnati, OHKY21Kentucky5215001452-8.00.00.0-11500-155926.0151816436.017201649-31.00.00.0-21700-17590.0missing0.0140.0117.085.01.0585.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
8202311912023-01-099E203639EN491PX462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292122-7.00.00.0-12100-215925.0214722205.022282225-3.00.00.0-12200-22590.0missing0.059.063.033.01.0147.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
92023111022023-01-109E203639EN478PX462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292114-15.00.00.0-12100-215945.0215922304.0222822346.06.00.002200-22590.0missing0.059.080.031.01.0147.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
102023111132023-01-119E203639EN135EV462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York222129220435.035.01.022100-215946.0225023263.02228232961.061.01.042200-22590.0missing0.059.085.036.01.0147.010.00.026.00.035.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
112023111242023-01-129E203639EN197PQ462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292341132.0132.01.082100-215916.02357355.0222840132.0132.01.082200-22590.0missing0.059.059.038.01.0147.0150.00.00.00.082.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
122023111352023-01-139E203639EN915XJ462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292124-5.00.00.0-12100-215961.0222522593.02228230234.034.01.022200-22590.0missing0.059.098.034.01.0147.010.00.034.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
13202311172023-01-019E203639EN906XJ463011337113370531337DLHDuluth, MNMN27Minnesota6313487134870231650MSPMinneapolis, MNMN27Minnesota635105166.06.00.000001-055955.06116446.062665024.024.01.010600-06590.0missing0.076.094.033.01.0144.016.00.018.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240250202311212023-01-02UA19977UAN13138110511618116180231703EWRNewark, NJNJ34New Jersey2113204132040231454MCOOrlando, FLFL12Florida3315001809189.0189.01.0121500-155920.01829204014.018002054174.0174.01.0111800-18590.00.0180.0165.0131.01.0937.040.00.078.00.096.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240251202311212023-01-02UA19977UAN488UA110411298112980630194DFWDallas/Fort Worth, TXTX48Texas7411292112920230325DENDenver, COCO8Colorado821337140023.023.01.011300-135915.0141514406.0144614460.00.00.001400-14590.00.0129.0106.085.01.0641.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240252202311212023-01-02UA19977UAN35260110311292112920230325DENDenver, COCO8Colorado8210721107210230721BOSBoston, MAMA25Massachusetts139509577.07.00.000900-095927.0102415334.015411537-4.00.00.0-11500-15590.00.0231.0220.0189.01.01754.08missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240253202311212023-01-02UA19977UAN813UA110211066110660631066CMHColumbus, OHOH39Ohio4414771147710432457SFOSan Francisco, CACA6California91708706-2.00.00.0-10700-075919.07259214.0933925-8.00.00.0-10900-09590.00.0325.0319.0296.01.02120.09missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240254202311212023-01-02UA19977UAN871UA110113502135020233502MTJMontrose/Delta, COCO8Colorado8211618116180231703EWRNewark, NJNJ34New Jersey211525161348.048.01.031500-155913.01626214525.02126221044.044.01.022100-21590.00.0241.0237.0199.01.01795.0844.00.00.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240255202311212023-01-02UA19977UAN839UA110014869148690334614SLCSalt Lake City, UTUT49Utah8711292112920230325DENDenver, COCO8Colorado8214421824222.0222.01.0121400-145936.01900200917.016112026255.0255.01.0121600-16590.00.089.0122.069.01.0391.0265.00.033.00.0157.0180711.011.00missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240256202311212023-01-02UA19977UAN884UA109711697116970632467FLLFort Lauderdale, FLFL12Florida3313930139300830977ORDChicago, ILIL17Illinois41702657-5.00.00.0-10700-075921.071885811.0923909-14.00.00.0-10900-09590.00.0201.0192.0160.01.01182.05missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240257202311212023-01-02UA19977UAN877UA109511618116180231703EWRNewark, NJNJ34New Jersey2113342133420733342MKEMilwaukee, WIWI55Wisconsin4582583611.011.00.000800-085932.09089589.0100110076.06.00.001000-10590.00.0156.0151.0110.01.0725.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240258202311212023-01-02UA19977UAN73270109311292112920230325DENDenver, COCO8Colorado8212892128920832575LAXLos Angeles, CACA6California917457527.07.00.000700-075949.08419327.092393916.016.01.010900-09590.00.0158.0167.0111.01.0862.047.00.09.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240259202311212023-01-02UA19977UAN433UA109213930139300830977ORDChicago, ILIL17Illinois4114635146350231714RSWFort Myers, FLFL12Florida331400150464.064.01.041400-145918.0152218524.01802185654.054.01.031800-18590.00.0182.0172.0150.01.01120.050.00.054.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240260202311212023-01-02UA19977UA109214635146350231714RSWFort Myers, FLFL12Florida3311618116180231703EWRNewark, NJNJ34New Jersey211905missingmissingmissingmissingmissing1900-1959missingmissingmissingmissing2159missingmissingmissingmissingmissing2100-21591.0A0.0174.0missingmissing1.01068.05missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240261202311212023-01-02UA19977UAN68823108612266122660331453IAHHouston, TXTX48Texas7414771147710432457SFOSan Francisco, CACA6California9118142044150.0150.01.0101800-185921.0210523067.020352313158.0158.01.0102000-20590.00.0261.0269.0241.01.01635.070.00.0158.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing

DataFrames.jl also provides a subset function as an alternative (often more efficient) way to get subsets. The first argument to the subset function is the data set, while the second argument is a Pair that describes a condition. In Julia, a Pair is a helper data structure that pairs two pieces of information, and can be created with the => operator. In this case, the Pair should pair column name(s) with a test function to apply to the column(s). For example, the anonymous function x -> x .< 15 tests whether the elements of an array are less than 15, so you can get all rows where DayofMonth is less than 15 with this code:

subset(air, :DayofMonth => x -> x .< 15)
# Or: subset(air, :DayofMonth => ByRow(x -> x < 15))
240261×110 DataFrame
10 columns and 240236 rows omitted
RowYearQuarterMonthDayofMonthDayOfWeekFlightDateReporting_AirlineDOT_ID_Reporting_AirlineIATA_CODE_Reporting_AirlineTail_NumberFlight_Number_Reporting_AirlineOriginAirportIDOriginAirportSeqIDOriginCityMarketIDOriginOriginCityNameOriginStateOriginStateFipsOriginStateNameOriginWacDestAirportIDDestAirportSeqIDDestCityMarketIDDestDestCityNameDestStateDestStateFipsDestStateNameDestWacCRSDepTimeDepTimeDepDelayDepDelayMinutesDepDel15DepartureDelayGroupsDepTimeBlkTaxiOutWheelsOffWheelsOnTaxiInCRSArrTimeArrTimeArrDelayArrDelayMinutesArrDel15ArrivalDelayGroupsArrTimeBlkCancelledCancellationCodeDivertedCRSElapsedTimeActualElapsedTimeAirTimeFlightsDistanceDistanceGroupCarrierDelayWeatherDelayNASDelaySecurityDelayLateAircraftDelayFirstDepTimeTotalAddGTimeLongestAddGTimeDivAirportLandingsDivReachedDestDivActualElapsedTimeDivArrDelayDivDistanceDiv1AirportDiv1AirportIDDiv1AirportSeqIDDiv1WheelsOnDiv1TotalGTimeDiv1LongestGTimeDiv1WheelsOffDiv1TailNumDiv2AirportDiv2AirportIDDiv2AirportSeqIDDiv2WheelsOnDiv2TotalGTimeDiv2LongestGTimeDiv2WheelsOffDiv2TailNumDiv3AirportDiv3AirportIDDiv3AirportSeqIDDiv3WheelsOnDiv3TotalGTimeDiv3LongestGTimeDiv3WheelsOffDiv3TailNumDiv4AirportDiv4AirportIDDiv4AirportSeqIDDiv4WheelsOnDiv4TotalGTimeDiv4LongestGTimeDiv4WheelsOff
Int64Int64Int64Int64Int64DateString3Int64String3String7Int64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64?Float64?Float64?Float64?Int64?String15Float64?Int64?Int64?Float64?Int64Int64?Float64?Float64?Float64?Int64?String15Float64String3?Float64Float64?Float64?Float64?Float64Float64Int64Float64?Float64?Float64?Float64?Float64?Int64?Float64?Float64?Int64Float64?Float64?Float64?Float64?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?MissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissing
1202311212023-01-029E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800757-3.00.00.0-10800-085911.080883320.0905853-12.00.00.0-10900-09590.0missing0.065.056.025.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
2202311322023-01-039E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085919.08148516.0905857-8.00.00.0-10900-09590.0missing0.065.062.037.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
3202311432023-01-049E203639EN331PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085914.08098377.0905844-21.00.00.0-20900-09590.0missing0.065.049.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
4202311542023-01-059E203639EN906XJ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800754-6.00.00.0-10800-085913.08078453.0905848-17.00.00.0-20900-09590.0missing0.065.054.038.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
5202311652023-01-069E203639EN337PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800759-1.00.00.0-10800-085917.08168445.0905849-16.00.00.0-20900-09590.0missing0.065.050.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
6202311762023-01-079E203639EN336PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800750-10.00.00.0-10800-085917.08078457.0905852-13.00.00.0-10900-09590.0missing0.065.062.038.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
72023111462023-01-149E203639EN311PQ462812953129530431703LGANew York, NYNY36New York2211193111930233105CVGCincinnati, OHKY21Kentucky5215001452-8.00.00.0-11500-155926.0151816436.017201649-31.00.00.0-21700-17590.0missing0.0140.0117.085.01.0585.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
8202311912023-01-099E203639EN491PX462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292122-7.00.00.0-12100-215925.0214722205.022282225-3.00.00.0-12200-22590.0missing0.059.063.033.01.0147.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
92023111022023-01-109E203639EN478PX462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292114-15.00.00.0-12100-215945.0215922304.0222822346.06.00.002200-22590.0missing0.059.080.031.01.0147.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
102023111132023-01-119E203639EN135EV462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York222129220435.035.01.022100-215946.0225023263.02228232961.061.01.042200-22590.0missing0.059.085.036.01.0147.010.00.026.00.035.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
112023111242023-01-129E203639EN197PQ462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292341132.0132.01.082100-215916.02357355.0222840132.0132.01.082200-22590.0missing0.059.059.038.01.0147.0150.00.00.00.082.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
122023111352023-01-139E203639EN915XJ462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292124-5.00.00.0-12100-215961.0222522593.02228230234.034.01.022200-22590.0missing0.059.098.034.01.0147.010.00.034.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
13202311172023-01-019E203639EN906XJ463011337113370531337DLHDuluth, MNMN27Minnesota6313487134870231650MSPMinneapolis, MNMN27Minnesota635105166.06.00.000001-055955.06116446.062665024.024.01.010600-06590.0missing0.076.094.033.01.0144.016.00.018.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240250202311212023-01-02UA19977UAN13138110511618116180231703EWRNewark, NJNJ34New Jersey2113204132040231454MCOOrlando, FLFL12Florida3315001809189.0189.01.0121500-155920.01829204014.018002054174.0174.01.0111800-18590.00.0180.0165.0131.01.0937.040.00.078.00.096.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240251202311212023-01-02UA19977UAN488UA110411298112980630194DFWDallas/Fort Worth, TXTX48Texas7411292112920230325DENDenver, COCO8Colorado821337140023.023.01.011300-135915.0141514406.0144614460.00.00.001400-14590.00.0129.0106.085.01.0641.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240252202311212023-01-02UA19977UAN35260110311292112920230325DENDenver, COCO8Colorado8210721107210230721BOSBoston, MAMA25Massachusetts139509577.07.00.000900-095927.0102415334.015411537-4.00.00.0-11500-15590.00.0231.0220.0189.01.01754.08missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240253202311212023-01-02UA19977UAN813UA110211066110660631066CMHColumbus, OHOH39Ohio4414771147710432457SFOSan Francisco, CACA6California91708706-2.00.00.0-10700-075919.07259214.0933925-8.00.00.0-10900-09590.00.0325.0319.0296.01.02120.09missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240254202311212023-01-02UA19977UAN871UA110113502135020233502MTJMontrose/Delta, COCO8Colorado8211618116180231703EWRNewark, NJNJ34New Jersey211525161348.048.01.031500-155913.01626214525.02126221044.044.01.022100-21590.00.0241.0237.0199.01.01795.0844.00.00.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240255202311212023-01-02UA19977UAN839UA110014869148690334614SLCSalt Lake City, UTUT49Utah8711292112920230325DENDenver, COCO8Colorado8214421824222.0222.01.0121400-145936.01900200917.016112026255.0255.01.0121600-16590.00.089.0122.069.01.0391.0265.00.033.00.0157.0180711.011.00missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240256202311212023-01-02UA19977UAN884UA109711697116970632467FLLFort Lauderdale, FLFL12Florida3313930139300830977ORDChicago, ILIL17Illinois41702657-5.00.00.0-10700-075921.071885811.0923909-14.00.00.0-10900-09590.00.0201.0192.0160.01.01182.05missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240257202311212023-01-02UA19977UAN877UA109511618116180231703EWRNewark, NJNJ34New Jersey2113342133420733342MKEMilwaukee, WIWI55Wisconsin4582583611.011.00.000800-085932.09089589.0100110076.06.00.001000-10590.00.0156.0151.0110.01.0725.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240258202311212023-01-02UA19977UAN73270109311292112920230325DENDenver, COCO8Colorado8212892128920832575LAXLos Angeles, CACA6California917457527.07.00.000700-075949.08419327.092393916.016.01.010900-09590.00.0158.0167.0111.01.0862.047.00.09.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240259202311212023-01-02UA19977UAN433UA109213930139300830977ORDChicago, ILIL17Illinois4114635146350231714RSWFort Myers, FLFL12Florida331400150464.064.01.041400-145918.0152218524.01802185654.054.01.031800-18590.00.0182.0172.0150.01.01120.050.00.054.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240260202311212023-01-02UA19977UA109214635146350231714RSWFort Myers, FLFL12Florida3311618116180231703EWRNewark, NJNJ34New Jersey211905missingmissingmissingmissingmissing1900-1959missingmissingmissingmissing2159missingmissingmissingmissingmissing2100-21591.0A0.0174.0missingmissing1.01068.05missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
240261202311212023-01-02UA19977UAN68823108612266122660331453IAHHouston, TXTX48Texas7414771147710432457SFOSan Francisco, CACA6California9118142044150.0150.01.0101800-185921.0210523067.020352313158.0158.01.0102000-20590.00.0261.0269.0241.01.01635.070.00.0158.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing

5.3. Grouping & Aggregating#

Aggregation is especially useful when combined with grouping. You can group sets of rows in a data frame with the groupby function. Its first argument is the data and its second is the grouping columns. For example, to group the air data by Year:

groupby(air, :Year)

GroupedDataFrame with 1 group based on key: Year

First Group (538837 rows): Year = 2023
10 columns and 538812 rows omitted
RowYearQuarterMonthDayofMonthDayOfWeekFlightDateReporting_AirlineDOT_ID_Reporting_AirlineIATA_CODE_Reporting_AirlineTail_NumberFlight_Number_Reporting_AirlineOriginAirportIDOriginAirportSeqIDOriginCityMarketIDOriginOriginCityNameOriginStateOriginStateFipsOriginStateNameOriginWacDestAirportIDDestAirportSeqIDDestCityMarketIDDestDestCityNameDestStateDestStateFipsDestStateNameDestWacCRSDepTimeDepTimeDepDelayDepDelayMinutesDepDel15DepartureDelayGroupsDepTimeBlkTaxiOutWheelsOffWheelsOnTaxiInCRSArrTimeArrTimeArrDelayArrDelayMinutesArrDel15ArrivalDelayGroupsArrTimeBlkCancelledCancellationCodeDivertedCRSElapsedTimeActualElapsedTimeAirTimeFlightsDistanceDistanceGroupCarrierDelayWeatherDelayNASDelaySecurityDelayLateAircraftDelayFirstDepTimeTotalAddGTimeLongestAddGTimeDivAirportLandingsDivReachedDestDivActualElapsedTimeDivArrDelayDivDistanceDiv1AirportDiv1AirportIDDiv1AirportSeqIDDiv1WheelsOnDiv1TotalGTimeDiv1LongestGTimeDiv1WheelsOffDiv1TailNumDiv2AirportDiv2AirportIDDiv2AirportSeqIDDiv2WheelsOnDiv2TotalGTimeDiv2LongestGTimeDiv2WheelsOffDiv2TailNumDiv3AirportDiv3AirportIDDiv3AirportSeqIDDiv3WheelsOnDiv3TotalGTimeDiv3LongestGTimeDiv3WheelsOffDiv3TailNumDiv4AirportDiv4AirportIDDiv4AirportSeqIDDiv4WheelsOnDiv4TotalGTimeDiv4LongestGTimeDiv4WheelsOff
Int64Int64Int64Int64Int64DateString3Int64String3String7Int64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64Int64String3StringString3Int64StringInt64Int64Int64?Float64?Float64?Float64?Int64?String15Float64?Int64?Int64?Float64?Int64Int64?Float64?Float64?Float64?Int64?String15Float64String3?Float64Float64?Float64?Float64?Float64Float64Int64Float64?Float64?Float64?Float64?Float64?Int64?Float64?Float64?Int64Float64?Float64?Float64?Float64?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?String3?Int64?Int64?Int64?Float64?Float64?Int64?String7?MissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissingMissing
1202311212023-01-029E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800757-3.00.00.0-10800-085911.080883320.0905853-12.00.00.0-10900-09590.0missing0.065.056.025.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
2202311322023-01-039E203639EN605LR462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085919.08148516.0905857-8.00.00.0-10900-09590.0missing0.065.062.037.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
3202311432023-01-049E203639EN331PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800755-5.00.00.0-10800-085914.08098377.0905844-21.00.00.0-20900-09590.0missing0.065.049.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
4202311542023-01-059E203639EN906XJ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800754-6.00.00.0-10800-085913.08078453.0905848-17.00.00.0-20900-09590.0missing0.065.054.038.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
5202311652023-01-069E203639EN337PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800759-1.00.00.0-10800-085917.08168445.0905849-16.00.00.0-20900-09590.0missing0.065.050.028.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
6202311762023-01-079E203639EN336PQ462810529105290730529BDLHartford, CTCT9Connecticut1112953129530431703LGANew York, NYNY36New York22800750-10.00.00.0-10800-085917.08078457.0905852-13.00.00.0-10900-09590.0missing0.065.062.038.01.0101.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
72023111462023-01-149E203639EN311PQ462812953129530431703LGANew York, NYNY36New York2211193111930233105CVGCincinnati, OHKY21Kentucky5215001452-8.00.00.0-11500-155926.0151816436.017201649-31.00.00.0-21700-17590.0missing0.0140.0117.085.01.0585.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
82023112162023-01-219E203639EN917XJ462812953129530431703LGANew York, NYNY36New York2211193111930233105CVGCincinnati, OHKY21Kentucky5215001450-10.00.00.0-11500-155916.0150616505.017201655-25.00.00.0-21700-17590.0missing0.0140.0125.0104.01.0585.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
92023112862023-01-289E203639EN336PQ462812953129530431703LGANew York, NYNY36New York2211193111930233105CVGCincinnati, OHKY21Kentucky5215001455-5.00.00.0-11500-155915.0151016569.017201705-15.00.00.0-11700-17590.0missing0.0140.0130.0106.01.0585.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
10202311912023-01-099E203639EN491PX462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292122-7.00.00.0-12100-215925.0214722205.022282225-3.00.00.0-12200-22590.0missing0.059.063.033.01.0147.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
112023111022023-01-109E203639EN478PX462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292114-15.00.00.0-12100-215945.0215922304.0222822346.06.00.002200-22590.0missing0.059.080.031.01.0147.01missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
122023111132023-01-119E203639EN135EV462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York222129220435.035.01.022100-215946.0225023263.02228232961.061.01.042200-22590.0missing0.059.085.036.01.0147.010.00.026.00.035.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
132023111242023-01-129E203639EN197PQ462912953129530431703LGANew York, NYNY36New York2210577105770530577BGMBinghamton, NYNY36New York2221292341132.0132.01.082100-215916.02357355.0222840132.0132.01.082200-22590.0missing0.059.059.038.01.0147.0150.00.00.00.082.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538826202311212023-01-02UA19977UAN13138110511618116180231703EWRNewark, NJNJ34New Jersey2113204132040231454MCOOrlando, FLFL12Florida3315001809189.0189.01.0121500-155920.01829204014.018002054174.0174.01.0111800-18590.00.0180.0165.0131.01.0937.040.00.078.00.096.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538827202311212023-01-02UA19977UAN488UA110411298112980630194DFWDallas/Fort Worth, TXTX48Texas7411292112920230325DENDenver, COCO8Colorado821337140023.023.01.011300-135915.0141514406.0144614460.00.00.001400-14590.00.0129.0106.085.01.0641.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538828202311212023-01-02UA19977UAN35260110311292112920230325DENDenver, COCO8Colorado8210721107210230721BOSBoston, MAMA25Massachusetts139509577.07.00.000900-095927.0102415334.015411537-4.00.00.0-11500-15590.00.0231.0220.0189.01.01754.08missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538829202311212023-01-02UA19977UAN813UA110211066110660631066CMHColumbus, OHOH39Ohio4414771147710432457SFOSan Francisco, CACA6California91708706-2.00.00.0-10700-075919.07259214.0933925-8.00.00.0-10900-09590.00.0325.0319.0296.01.02120.09missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538830202311212023-01-02UA19977UAN871UA110113502135020233502MTJMontrose/Delta, COCO8Colorado8211618116180231703EWRNewark, NJNJ34New Jersey211525161348.048.01.031500-155913.01626214525.02126221044.044.01.022100-21590.00.0241.0237.0199.01.01795.0844.00.00.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538831202311212023-01-02UA19977UAN839UA110014869148690334614SLCSalt Lake City, UTUT49Utah8711292112920230325DENDenver, COCO8Colorado8214421824222.0222.01.0121400-145936.01900200917.016112026255.0255.01.0121600-16590.00.089.0122.069.01.0391.0265.00.033.00.0157.0180711.011.00missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538832202311212023-01-02UA19977UAN884UA109711697116970632467FLLFort Lauderdale, FLFL12Florida3313930139300830977ORDChicago, ILIL17Illinois41702657-5.00.00.0-10700-075921.071885811.0923909-14.00.00.0-10900-09590.00.0201.0192.0160.01.01182.05missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538833202311212023-01-02UA19977UAN877UA109511618116180231703EWRNewark, NJNJ34New Jersey2113342133420733342MKEMilwaukee, WIWI55Wisconsin4582583611.011.00.000800-085932.09089589.0100110076.06.00.001000-10590.00.0156.0151.0110.01.0725.03missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538834202311212023-01-02UA19977UAN73270109311292112920230325DENDenver, COCO8Colorado8212892128920832575LAXLos Angeles, CACA6California917457527.07.00.000700-075949.08419327.092393916.016.01.010900-09590.00.0158.0167.0111.01.0862.047.00.09.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538835202311212023-01-02UA19977UAN433UA109213930139300830977ORDChicago, ILIL17Illinois4114635146350231714RSWFort Myers, FLFL12Florida331400150464.064.01.041400-145918.0152218524.01802185654.054.01.031800-18590.00.0182.0172.0150.01.01120.050.00.054.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538836202311212023-01-02UA19977UA109214635146350231714RSWFort Myers, FLFL12Florida3311618116180231703EWRNewark, NJNJ34New Jersey211905missingmissingmissingmissingmissing1900-1959missingmissingmissingmissing2159missingmissingmissingmissingmissing2100-21591.0A0.0174.0missingmissing1.01068.05missingmissingmissingmissingmissingmissingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing
538837202311212023-01-02UA19977UAN68823108612266122660331453IAHHouston, TXTX48Texas7414771147710432457SFOSan Francisco, CACA6California9118142044150.0150.01.0101800-185921.0210523067.020352313158.0158.01.0102000-20590.00.0261.0269.0241.01.01635.070.00.0158.00.00.0missingmissingmissing0missingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissingmissing

Once data is grouped, you can use the combine function to call a function on each group (or columns within a group). For instance, to get the frequency for each day of week value:

combine(groupby(air, :DayOfWeek), nrow)
7×2 DataFrame
RowDayOfWeeknrow
Int64Int64
1190875
2286270
3368901
4472392
5572554
6661150
7786695

You can also use combine with a Pair in the second argument to apply a function to specific columns. For example, to get the mean (non-missing) delay for flights by day of week:

using Statistics

combine(groupby(air, :DayOfWeek), :DepDelay => x -> mean(skipmissing(x)))
7×2 DataFrame
RowDayOfWeekDepDelay_function
Int64Float64
1114.9877
2212.0788
3326.5502
4410.8908
557.71696
666.40796
7711.9266

The DataFrames.jl documentation provides more examples of ways you can use groupby and combine.

5.4. Helper Packages#

The DataFrames.jl programming interface may seem awkward or excessively verbose if you’re coming to Julia from R or Python. The community is aware of this and as a result, there are now several packages that provide more familiar programming interfaces. In particular:

  • DataFramesMeta.jl provides a macro interface similar to R’s dplyr package.

  • Pandas.jl is a wrapper around Python’s Pandas package.