Nation-Building Research
  • Home
  • Team
  • Data
  • Country Profiles
  • Publications
  • Contact

The NBP Glossary

Here we explain the key terms used across the NBP Country Profiles, including how we define them and how they are applied in our research.

Core group

Core groups have access to political power and use it to align the state’s identity with their own. Their members have consistently held dominant political positions, and state institutions promote the group’s cultural markers—such as language, religion, collective memory, and symbols—across the country. A core group need not constitute a demographic majority. We assigned core-group status based on whether (a) other datasets (GROWUp/EPR and AMAR) classified the group as politically dominant, (b) the constitution’s preamble treated the group as a carrier of the state, and (c) the group’s language and/or religion was recognized as the state’s official language and/or religion.

Racialized group

Racialized groups are those to whom external actors—including states and other ethnic groups—attribute shared physical traits (e.g., skin color, hair texture, or facial features). The NBP Dataset combines attention to the legacies of colonial racial domination with coding strategies used in other group-based datasets (AMAR and EPR). We classified a group as racialized if (a) its name referred to phenotype or a distinct world region, (b) AMAR or EPR classified it as originating from a distinct world region, and/or (c) it appeared in colonial-era racial classification schemes.

Indigenous group

Indigenous groups are communities that, in line with the International Work Group for Indigenous Affairs’s (IWGIA) approach, self-identify as indigenous, trace their descent to populations present before conquest/colonization or marginalized during modern state formation, and retain at least some distinct social, economic, cultural, or political institutions. We coded a group as indigenous if it was identified as such in IWGIA country summaries, annual reports, or other relevant IWGIA publications.

Self-determination movement

Self-determination movements are political organizations that claim to represent a particular group and that actively seek increased self-determination from the state, including demands for national independence, union with another state, or internal autonomy. Defined this way, the concept captures the organized, on-the-ground presence of such movements rather than the broader prevalence of nationalist sentiment within the population. We coded self-determination movements by matching organizations in the Self-Determination Movements Dataset (Sambanis et al. 2018) to groups in the NBP dataset.

Education

When tracing state policies in the domain of education, the NBP dataset is primarily concerned with mass schooling. Following UNESCO’s definitions, we distinguish between public schools (run or governed by public authorities) and private schools (run by non-governmental actors). We explicitly exclude international/consular and other specialized schools that do not reflect the “average” public-school experience. Building on the same framework, we also differentiate among levels of education: primary (foundational literacy and numeracy with minimal specialization), secondary (building on primary and preparing for work or further study), and higher education (post-secondary study or research provided by state-approved institutions).

Language education policy

For tracking language education policies we distinguish between languages of instruction and language classes. The former refers to the language used to teach multiple subjects (not merely a single subject taught in that language), while we recognize that bilingual or multilingual systems may have more than one language of instruction. The latter refers to courses focused on language acquisition; we restricted our attention to those offered as part of compulsory schooling for children and adolescents, excluding adult-education language courses.

Our coding is de jure and based on relevant laws and curricula. In the context of public schools, we documented whether a given language was taught as a language of instruction or in a language class, and for each we also recorded where it applied (nationwide vs only in part of the country), for which time period, and at which level it applied (public primary vs public secondary). For language classes specifically, we also coded whether instruction was mandatory or optional. For private schools, we coded whether language education at private schools faced any restrictions or bans on language instruction, including instances where private schools were illegal. If restrictions existed, we coded whether there was a fixed list of permitted languages of instruction/language classes, or specific banned languages for instruction/language classes.

Religious instruction

We coded whether public schools provided any religious instruction and, for each religion or denomination, the relevant level (primary/secondary), where it applied (nationwide vs only in part of the country), periodizing changes based on shifts in laws or curricula. We distinguished between classes that were mandatory for all students, mandatory for some (e.g., required for most but with opt-outs), and purely optional. When curricula referred only to a broad religion (e.g., “Christian education”), we treated this as covering all denominations within that religion, but when only one denomination was taught (e.g., Catholicism), we coded “no” for other denominations within the same faith. For private schools, we tracked whether religious instruction was permitted for any faith, limited to certain faiths/denominations, prohibited, or impossible because private schools could not operate legally, drawing on private-school regulations unless the official curriculum was legally binding.

Bans and Restrictions

In the NBP Dataset religious and language restrictions refer to formal state-imposed limits (laws or executive orders). We further distinguish between bans and restrictions based on scope: a ban applies when a practice or symbol was prohibited in all public spaces, whereas a restriction applied when it was prohibited only in some places or times. For language, we assessed restrictions in three domains—speaking publicly (public statements in public spaces, excluding private settings), naming (rules for place names, organizations, and personal names), and media use (broadcasting and newspapers). For religion, we coded bans/restrictions: wearing religious symbols and clothing, accessing places of worship, religious publications, and conversion and proselytizing.

Affirmative action

The NBP Dataset approaches affirmative action through the lens of education, centering on state-mandated preferential access that is explicitly based on linguistic, religious, or ethnoracial group membership. We code affirmation action policies by determining whether they grant members of a particular group preferential access to higher education (via quotas, lowered admission thresholds, or bonus points on entrance exams) and/or group-based quotas for teacher recruitment. For each applicable policy, we recorded the time period whether the policy operated nationwide or only in part of the country. Our focus on education means parliamentary quota systems, broad poverty-alleviation programs, or special funding schemes are not considered.

Spatial segregation

Spatial segregation refers to legally mandated, group-based physical separation imposed that separates an ethnic group (or most of its members) from the core group or other groups, typically through policies implemented—and if necessary enforced—by the state. This definition excludes de facto segregation (e.g., ethnic enclaves) and policies that merely encourage voluntary separation (e.g., housing incentives or localized provision of group-specific services). We coded whether, and if so, which form spatial segregation policies took—ghettos, reservations/reserves/homelands, internment camps, or other specified arrangements—and the period(s) during which it was in place. In addition, we documented whether a group was required to attend distinct public schools (nationwide or only in parts of the country) and periodized these policies using formal laws and regulations rather than de facto educational sorting.

State-sponsored violence

State-based violence refers to the use of armed force carried out by any state agency (e.g., military, police, or special security forces) that results in the intentional killing of at least 100 of a group’s non-combatant members during a given year. We included only those cases where victims were unarmed civilians targeted as members of a discrete ethnic group and the state was the perpetrator. The NBP Dataset does not consider non-group-specific repression or routine disproportionate policing that reflects systemic racism rather than exclusionary nation-building. For every period of state-sponsored violence we additionally coded whether the violence met the threshold for mass violence (1,000+ civilian deaths in any given year) and whether it occurred during a civil war or an international war. Empirically, we began by screening the core mass-atrocity datasets (Anderton’s compilation, TMK Events 1946–2020, and Ulfelder & Valentino 1945–2006) and then used additional secondary sources such as the UCDP encyclopedia.

Forced relocation

Forced relocation is defined as a systematic, government-sponsored policy to remove an ethnic, racial, religious, or national group without individual legal review and without recognizing a right to return. We coded forced relocation to include targeted movements such as deportations, resettlements, and population transfers that originate from state policy and may displace people internally or across borders. In practice, we first checked the Government-Sponsored Mass Expulsion Dataset (GSME) for relevant cross-border expulsions and then consulted secondary sources to identify internal displacements. An event was coded only when the state systematically removed a discrete group because of shared group characteristics (not incidental displacement from violence, famine, or natural disasters) and met an annual 1,000-person threshold. For cross-border cases, we attributed responsibility to the government creating the removal policy or conditions (and coded both states in population exchanges).

Cultural eradication

Cultural elimination refers to state policies that attempt to destroy a group’s culture and identity, in whole or in part, through coercive measures targeting (a) intangible practices (e.g., language, traditions, religion) and/or (b) tangible cultural structures (e.g., places of worship, heritage sites, artifacts). Policies we coded as indicative of the former include the creation of resocialization camps, the forced removal or transfer of children, forced religious conversion, and resettlement and transmigration (including demographic dilution). Policies we coded as indicative of the latter include the destruction of cultural heritage sites, the destruction of cultural artifacts such as books or artworks, and other coercive measures targeting material culture.

 

© 2025 NBP | Built by Jan Schlebusch in Quarto | Funded by the European Research Council under the Horizon 2020 research and innovation programme (Grant No 864333)
Contact | Data Protection | Cookies