That’s it - we won’t be covering any new GIS techniques in this part This is instead designed as a sandbox for you to put all your new skills to the test, playing around with an entirely different kind of linguistic dataset.
I’ve downloaded the entire database from The World Atlas of Language Structures Online (WALS), which is freely available on their GitHub repo (although in a format that requires lots of pre-processing - luckily I’ve done the hard work for you!)
Load it in using read_csv()
(remember you can download the datasets needed for this workshop here)
<- read_csv("data/wals_clean.csv") wals
Let’s get a quick feel for the structure and content of this dataset before you go off and do your own thing with it
head(wals)
## # A tibble: 6 × 11
## lang_id lang feature_id feature feature_val_id feature_val feature_val_desc
## <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 abi Abipón 100A Alignm… 472 Accusative Accusative alig…
## 2 abk Abkhaz 100A Alignm… 473 Ergative Ergative alignm…
## 3 abn Arabana 100A Alignm… 471 Neutral Neutral alignme…
## 4 abu Abun 100A Alignm… 471 Neutral Neutral alignme…
## 5 ace Acehne… 100A Alignm… 474 Active Active alignment
## 6 acm Achuma… 100A Alignm… 472 Accusative Accusative alig…
## # ℹ 4 more variables: macro_area <chr>, family <chr>, latitude <dbl>,
## # longitude <dbl>
It has over 76000 rows, because it’s in ‘long format’, i.e. each language~feature pair is on its own row:
%>% nrow() wals
## [1] 76477
There are 193 different typological features:
%>%
wals select(feature) %>%
unique() %>%
nrow()
## [1] 193
Let’s look at 10 random features as an example:
unique(wals$feature) %>% sample(10)
## [1] "NegSVO Order"
## [2] "Order of Subject and Verb"
## [3] "Asymmetrical Case-Marking"
## [4] "Suppletion in Imperatives and Hortatives"
## [5] "Red and Yellow"
## [6] "Nasal Vowels in West Africa"
## [7] "Relationship between the Order of Object and Verb and the Order of Adposition and Noun Phrase"
## [8] "Question Particles in Sign Languages"
## [9] "The Position of Negative Morphemes in Object-Initial Languages"
## [10] "Polar Questions"
And there are over 2600 languages (although not all languages have a value for all features)
%>%
wals select(lang) %>%
unique() %>%
nrow()
## [1] 2662
What other info do we have?
head(wals)
## # A tibble: 6 × 11
## lang_id lang feature_id feature feature_val_id feature_val feature_val_desc
## <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 abi Abipón 100A Alignm… 472 Accusative Accusative alig…
## 2 abk Abkhaz 100A Alignm… 473 Ergative Ergative alignm…
## 3 abn Arabana 100A Alignm… 471 Neutral Neutral alignme…
## 4 abu Abun 100A Alignm… 471 Neutral Neutral alignme…
## 5 ace Acehne… 100A Alignm… 474 Active Active alignment
## 6 acm Achuma… 100A Alignm… 472 Accusative Accusative alig…
## # ℹ 4 more variables: macro_area <chr>, family <chr>, latitude <dbl>,
## # longitude <dbl>
The actual value of a feature is in the feature_val
column (with a further description of that value in feature_val_desc
). We have the macro_area
of the language, and its family
, as well as some geographic information of course (in latitude
and longitude
).
Remember this is just a plain csv, so we first need to convert it into a spatial-type object so that we can perform geospatial operations on it.
#convert from plain dataframe into sf object
<- wals %>%
wals st_as_sf(coords = c("longitude", "latitude"))
#set CRS
st_crs(wals) <- 4326
As an example, let’s plot the distribution of the ‘Uvular consonants’ feature:
%>%
wals filter(feature == 'Uvular Consonants') %>%
mapview(zcol = 'feature_val', label = 'lang')
We can also plot a static map using ggplot:
%>%
world ggplot() +
geom_sf() +
geom_sf(data = filter(wals, feature == 'Uvular Consonants'), aes(colour = feature_val)) +
theme_void() +
theme(legend.position = 'bottom')
Exercise
Now it’s time to explore! Here are some ideas of things to look at:
I’ve already prepared some datasets you can load in for Q3 - in the data folder you’ll find the following csv files:
elevation_stats.csv
: column for average elevation in metres (avg_elevation
) - downloaded from herecarrots_turnips.csv
: columns for total production of carrots/turnips in tonnes (production_tons
), production per person in kg (production_pp_kg
), total acreage in hectares (acreage_hectare
) and yield in kg per hectare (yield_kg_hc
) - downloaded from hereland_stats.csv
: columns for proportion of arable land (arable_land_prop
), proportion of crop cover (crop_cover_prop
) and proportion of forest cover (forest_cover_prop
) - downloaded from hereSo many options… the world (atlas of language structures) is your oyster!