The Ethical Imperative: Data Science in Sacred Domains and the Principle of “Do No Harm”
Introduction
We live in an era where data is frequently described as the “new oil,” a resource to be extracted, refined, and monetized. However, when data science intersects with “sacred domains”—cultural heritage, religious practices, indigenous knowledge, and tight-knit community identities—the metaphor of extraction becomes dangerous. In these contexts, data is not merely a collection of variables; it is a repository of identity, history, and communal belonging.
When data scientists apply traditional analytical frameworks to these sensitive areas without caution, they risk dismantling the very social fabrics they aim to study or serve. Prioritizing the principle of “do no harm” to community cohesion is not just a moral ideal; it is a technical and social necessity for sustainable, respectful innovation.
Key Concepts
Sacred Domains: These are spaces—physical, digital, or social—that hold deep existential or symbolic value for a group of people. This includes burial grounds, indigenous genealogy, religious rituals, and cultural linguistic data.
Data Colonialism: The practice of collecting, aggregating, and utilizing data from communities—often without meaningful consent or benefit-sharing—in ways that mirror historical colonial patterns of exploitation. It treats communal knowledge as public domain data, ignoring ownership and spiritual significance.
Social Cohesion: The strength of relationships and the sense of solidarity among members of a community. In the context of data science, this is threatened when algorithmic profiling or public data exposure leads to stigmatization, the commodification of private traditions, or the erosion of group trust.
Step-by-Step Guide to Ethical Data Practice
- Consultation Before Collection: Before a single byte is scraped or surveyed, engage in meaningful dialogue with community leaders. Ask not just “what data do we need,” but “what are the risks if this data becomes public or is processed through an automated system?”
- Implement Data Sovereignty Models: Move beyond the traditional “user consent” model. Implement communal control, where the community retains the right to access, store, and authorize the use of their data. This recognizes that the data belongs to the collective, not just the individuals who provided it.
- Contextual Privacy Assessment: Conduct a “Community Impact Assessment.” Unlike a standard DPIA (Data Protection Impact Assessment), this examines how the data might be misused by third parties to disrupt local social norms or provoke division within the group.
- Value-Aligned Algorithms: Ensure that the metrics of success for your algorithms align with the community’s values, not just corporate or efficiency metrics. If an algorithm optimizes for efficiency but ignores the social nuance of a ritual, it is inherently harmful.
- Transparent Exit Strategies: What happens to the data if the project ends? Establish a “digital stewardship” plan where data is either returned, archived under community control, or permanently deleted to prevent future misuse.
Examples and Case Studies
Case Study 1: The Misuse of Genetic Ancestry Databases. Many indigenous groups have faced the unauthorized use of genetic data. When researchers publish genomic data derived from sacred tribal populations without consent, it can lead to legal and spiritual crises. For example, some studies have inadvertently contradicted the oral histories and origin stories of communities, causing intense internal division and trauma. The failure to treat this data as a sacred, rather than merely biological, asset harmed the cohesion of these groups.
Case Study 2: Digital Mapping of Religious Sites. A tech startup once attempted to map “untapped” tourist attractions by scraping social media tags of religious ceremonies. By turning private, sacred rituals into public tourist data, the influx of uncontrolled crowds effectively destroyed the sanctity and cohesion of the local community. The data was accurate from a technical perspective, but the application was socially destructive.
Common Mistakes
- The “Open Data” Fallacy: The assumption that all data should be “open” or “accessible.” In sacred domains, transparency can be a form of vulnerability. Not all cultural knowledge is meant for external consumption or aggregation.
- Ignoring Epistemic Diversity: Assuming that the data scientist’s worldview is universal. Data scientists often view the world through a reductionist lens, failing to see the sacred complexity that a community has maintained for centuries.
- Short-term Project Horizons: Treating a community as a “dataset” to be mined for a specific paper or product, then leaving them to deal with the long-term consequences of that data existing in the wild.
- Power Asymmetry: Relying on legalistic consent forms that residents or community elders may not fully understand, thereby failing to secure “informed” consent in any meaningful way.
Advanced Tips
To truly uphold the “do no harm” principle, consider adopting Participatory Action Research (PAR). In this model, community members are not just “subjects” of data; they are co-researchers. They participate in defining the research questions, interpreting the data, and deciding which insights should be shared externally.
Furthermore, utilize Differential Privacy techniques that allow for statistical analysis without revealing identifying details, even in small, tight-knit communities. However, understand that technical anonymity is often insufficient. If the community itself recognizes the data, the risk to their cohesion remains. Technical safeguards must be paired with policy-based guardrails regarding who can access the finalized, aggregated results.
The most advanced data science is that which recognizes its own limits. Sometimes, the most ethical decision is not to collect the data at all.
Conclusion
The digitization of sacred domains presents a profound paradox. We have the technical capacity to document, preserve, and analyze the world in ways previously unimaginable, yet this power brings the risk of stripping communities of their agency and their most precious, non-quantifiable traditions.
Data science in sacred domains must evolve from a model of extraction to one of stewardship. By prioritizing community cohesion, practicing meaningful consent, and respecting the limits of what should be digitized, data scientists can ensure that their work supports rather than undermines the vital, fragile ecosystems of human culture. Ultimately, the success of a project should be measured not by the accuracy of the model, but by the health and autonomy of the community from which the data was drawn.




