[1] FALSE TRUE FALSE
[1] TRUE FALSE TRUE
EC7422: Data Science for Economic Analysis
The Institute for Evaluation of Labour Market and Education Policy (IFAU)
April 16, 2026
if / else control decisionsVectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
If you try to mix values, R will force them to a common type:
Warning in c(1, 2, 3) > c(1, 3): longer object length is not a
multiple of shorter object length
[1] FALSE FALSE TRUE
is.na()%in% to check membership[1] TRUE TRUE FALSE
Often clearer than many |:
ifelse() builds a new vectorThe if / else pattern requires condition to be a single logical value. If you want to apply the same logic to many values, ifelse() is the vectorized alternative:
[]RUse : to create a sequence of integers:
[]Use a same-length logical vector to keep matching elements:
[]Or let a condition evaluate to a logical vector on the fly:
This is what we use to filter table rows!
[1] 2025 3844 6084
$name
[1] "Alice"
$age
[1] 24
$passed
[1] TRUE
[1] "list"
Vectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
A table is really just a collection of equal-length vectors. Let’s build one:
data.frame() is the built-in way of doing this
municipality_code year unemployment_rate
1 0180 2023 6.2
2 1280 2023 7.1
Note how the naming is implicit.
data.frame is just a named equal-length list:
'data.frame': 2 obs. of 3 variables:
$ municipality_code: chr "0180" "1280"
$ year : num 2023 2023
$ unemployment_rate: num 6.2 7.1
Matrices are also rectangular, but they hold one common type:
municipality_code year unemployment_rate
[1,] "0180" "2023" "6.2"
[2,] "1280" "2023" "7.1"
'data.frame': 2320 obs. of 4 variables:
$ municipality_code: int 114 115 117 120 123 125 126 127 128 136 ...
$ municipality_name: chr "Upplands Väsby" "Vallentuna" "Österåker" "Värmdö" ...
$ year : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ unemployment_rate: num 11.9 6.1 7.2 7.3 13.7 5.6 12 17.7 9.7 11.9 ...
Also try View(panel_df) in VsCode for an interactive viewer.
A database concept: the key is a set of columns that uniquely identifies one observation
municipality_code + year'data.frame': 2320 obs. of 4 variables:
$ municipality_code: int 114 115 117 120 123 125 126 127 128 136 ...
$ municipality_name: chr "Upplands Väsby" "Vallentuna" "Österåker" "Värmdö" ...
$ year : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ unemployment_rate: num 11.9 6.1 7.2 7.3 13.7 5.6 12 17.7 9.7 11.9 ...
'data.frame': 2320 obs. of 4 variables:
$ municipality_code: chr "0114" "0115" "0117" "0120" ...
$ municipality_name: chr "Upplands Väsby" "Vallentuna" "Österåker" "Värmdö" ...
$ year : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ unemployment_rate: num 11.9 6.1 7.2 7.3 13.7 5.6 12 17.7 9.7 11.9 ...
Vectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
$ pulls the column out, then [] subsets that vector like before
[] keeps table, [[]], $ pulls vector[] returns a one-column table
[[]] returns the vector inside
[] municipality_name year
1 Upplands Väsby 2016
2 Vallentuna 2016
3 Österåker 2016
Works just like vector filtering, but keeps the table shape:
municipality_code municipality_name year unemployment_rate
1757 0180 Stockholm 2022 10.5
1772 0380 Uppsala 2022 10.2
1857 1280 Malmö 2022 18.0
2047 0180 Stockholm 2023 10.5
2062 0380 Uppsala 2023 9.9
2147 1280 Malmö 2023 17.7
, at the end to keep all columnsVectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
We can create a new column by assigning to a name that doesn’t exist yet:
municipality_name unemployment_rate high_unemployment
1 Upplands Väsby 11.9 TRUE
2 Vallentuna 6.1 FALSE
3 Österåker 7.2 FALSE
4 Värmdö 7.3 FALSE
5 Järfälla 13.7 TRUE
6 Ekerö 5.6 FALSE
ifelse() can build categoriesVectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
factor() creates categorical dataA factor is a vector of integer levels with attached labels
Vectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
Tidy data is a standard way of organizing tables that makes them easier to work with. The tidy data principles are simple but powerful, and they apply to almost any kind of data.
municipality_code year unemployment_rate
1 0180 2022 10.5
2 0180 2023 10.5
3 0380 2022 10.2
4 0380 2023 9.9
5 1280 2022 18.0
6 1280 2023 17.7
Both tables contain the same information, but one is easier to work with:
municipality_code unemployment_rate_2023
1 0180 10.5
2 0380 9.9
3 1280 17.7
Vectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
Before filtering or plotting, check the boring structure first:
[1] "municipality_code" "municipality_name" "year"
[4] "unemployment_rate" "high_unemployment" "period"
'data.frame': 2320 obs. of 6 variables:
$ municipality_code: chr "0114" "0115" "0117" "0120" ...
$ municipality_name: chr "Upplands Väsby" "Vallentuna" "Österåker" "Värmdö" ...
$ year : int 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ unemployment_rate: num 11.9 6.1 7.2 7.3 13.7 5.6 12 17.7 9.7 11.9 ...
$ high_unemployment: logi TRUE FALSE FALSE FALSE TRUE FALSE ...
$ period : chr "older" "older" "older" "older" ...
R will treat the columnSummary statistics catch implausible values quickly:
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.00 9.80 12.20 12.49 14.80 23.40
Then count missingness explicitly:
municipality_code municipality_name year unemployment_rate
1 0114 Upplands Väsby 2016 11.9
3 0117 Österåker 2016 7.2
4 0120 Värmdö 2016 7.3
1.1 0114 Upplands Väsby 2016 11.9
high_unemployment period
1 TRUE older
3 FALSE older
4 FALSE older
1.1 TRUE older
Vectors
From Vectors To Tables
Subsetting Tables
Working With Table Columns
Categories
Tidy Data
Basic Checks
Quick Inspection Plots
Simple plots are useful long before polished visualization:
hist(x)boxplot(x)plot(x, y)