educabr 0.1.0
First public release. Initial set of harmonised long-run series on Brazilian education plus a bundled Shiny dashboard.
Datasets
enrollment_kang_fgv— 6,238 rows. Brazilian school enrollment counts and gross rates by stage (EF1, EF2, EF, EM, ES), 1933-2010 at national level and 1955-2010 at UF level, with breakdown by colour/race 1960-2010. Per-paper source attribution (kang_paese_felix_2021,kang_menetrier_2024,kang_menetrier_comim_2024).enrollment_tertiary— 1,341 rows. Brazilian tertiary enrollment 1907-2024 compiled across seven primary sources: IBGE Estatísticas do Século XX, Durham (2005), Maduro Junior (2007), Kang/Paese/Felix (2021), INEP CENSUP Synopsis (1995-2008), INEP CENSUP Microdata (2009-2024), and the INEP CENSUP Power BI panel. Multiple sources per year-network are kept on purpose to support cross-source comparison. Includes 25 reconstructed total rows (is_derived = TRUE) that fill the 2000-2008 transition period where INEP published in-person and EAD enrollment separately.schooling_kang_fgv— 2,287 rows. Mean years of schooling for the adult population, 1925-2015 (BR), 1950-2015 (region, UF), with sex and race breakdowns at BR level (Walter & Kang 2024).
API
get_enrollment()— long-format access to enrollment series with filters forlevel,network,institution_type,modality,year,geo_level/geo,dimension,indicator,source,include_derived. Returns the canonical schema with English labels (lang = "en") or PT-BR labels (lang = "pt").get_schooling()— long-format access to the mean-years-of-schooling series with filters foryear,geo_level/geo,dimension,source,lang.run_dashboard()— launches the bundled Shiny dashboard locally.educabr_cite()— buildsbibentryobjects (or APA-style prose, or BibTeX) for any of the harmonised data sources, driven by the controlled vocabulary ininst/dict/vocabularies/sources.yaml.
Dashboard
- Three-theme navbar (English UI): Enrollment, Tertiary Education, Educational Attainment.
- Tertiary Education tab features multi-source comparison with interaction-based colour palette (each source × modality combination gets a unique colour shade), shape-by-source, and linetype-by-modality encoding.
- “View R code” button on every tab generates a self-contained, copy-pasteable R snippet (educabr + ggplot2) that reproduces the current chart locally.
Schema
- Canonical tidy-long schema documented in
inst/dict/schema.yamlwith primary-key constraints, year domain, controlled vocabularies for factor levels, and conventions for missing values. - 13 primary source entries documented in
inst/dict/vocabularies/sources.yamlwith DOIs, URLs, and coverage metadata. - PT-BR labels for every factor level in
inst/dict/i18n.yaml.
Build pipeline
-
data-raw/01_build_enrollment_kang_fgv.R— Kang/FGV-IBRE 2023 compilation (4 xlsx files → enrollment_kang_fgv.rda). -
data-raw/02_build_schooling_kang_fgv.R— Walter & Kang 2024 series (1 xlsx file → schooling_kang_fgv.rda). -
data-raw/03_build_enrollment_tertiary.R— multi-source tertiary compilation, with canonicalisation of 69 raw source strings into 7 canonical keys and 4 composite derived-row keys, plus exact-duplicate deduplication.
Tests
- 9 tests for
get_enrollment()(core filters and pivots). - 9 tests for
get_schooling(). - 7 tests for the tertiary-specific arguments (
institution_type,modality,include_derived, composite source keys, loader normalisation of legacy datasets).
