Francis Bacon's claim that knowledge is power was a description of a relationship, not an observation about epistemology: to know something is to be positioned over it. When a person becomes a behavioral profile, the profile does not passively record them but begins to act on them. Insurance rates, credit decisions, content recommendations, policing patterns: these are responses to what the data says the person is, which is fundamentally different from who the person actually is. That gap between the record and the self is constitutive, not a technical limitation to be eventually resolved.
Paul Ricoeur, in Oneself as Another, distinguishes between two modes of identity: idem and ipse. Idem is the identity of pattern and habit that persists through time, the regularity that makes a person predictable, systematic, legible. It is what a database is designed to capture. Ipse is something altogether different: the identity constituted by the capacity to make and keep promises, to say "I will" across an open future and act upon it. It cannot be stored, because it consists solely of an individual’s potential. Data captures a fragment of the idem identity and cannot touch ipse at all. By treating the idem fragment as an adequate representation of the person, the data apparatus locks itself into a fundamentally backward-facing understanding: who this person has been, with no structural place for who they might become.
James C. Scott further adds metis as local, embodied knowledge: insight that cannot be reduced to formal rules or captured because it is constituted by context and relationship. High-modernist legibility projects, from colonial land surveys to smart city dashboards, have characteristically destroyed metis not out of malice but out of the structural requirement that knowledge be transmissible and standardized.
We frame data as a resource: who owns it, who profits from it, how it should be regulated. Shoshana Zuboff calls this surveillance capitalism, and she argues that human experience is claimed as raw material for behavioral prediction. Modern platforms no longer make money by selling products to users but by selling predictions about users to third parties. Nick Couldry and Ulises Mejias extend this diagnosis beyond the market in their concept of data colonialism: "Data colonialism combines the old logics of colonialism, the appropriation of bodies and territories, and emerging practices of data relations." The value generated flows right out of the community in which it was produced.
Helen Nissenbaum's contextual integrity framework shows a way to see information differently. Medical data shared with a doctor flows appropriately to treating physicians and community care, it does not flow appropriately to insurers. Transit data shared to optimize routes flows appropriately to urban planners, it does not flow appropriately to immigration enforcement. Surveillance earns its name through the rupture of contextual expectation, through the violation of what information was understood to mean in its original relation. Nissenbaum's framework implies the core wrong of data colonialism is this rupture at scale, data generated in one relational context extracted and deployed in another without consent, accountability, or return. The idem record is taken from the context in which it was constituted, and stripped of the metis that gave it meaning.
Denmark's Landspatientregisteret, the National Patient Registry, has collected data on every hospital contact in the country since 1977. It covers an entire population, links to other national registries, and generates one of the richest longitudinal health datasets in the world. It has been used to identify cancer clusters, study the long-term effects of medications, understand the social determinants of disease, and document outcomes that improve care across the system. The data stays within the health context; the value generated circulates within it as population health intelligence available to researchers, policymakers, and eventually to the patients who generated it.
Under the Landspatientregisteret, access to patient-level data requires approval from the Danish Data Protection Agency and from the Scientific Ethics Committees. In doing so, access requires researchers to embed their data request in a social and relational context, institutionalizing metis. Research applications are reviewed against public benefit criteria that are written into law rather than determined by the institution holding the data. The Landspatientregisteret have contextual integrity operationalized at the level of bureaucratic architecture. The norms of the health context are built in principle, then enforced through institutional structure. And the feedback loops run in directions that would be structurally impossible in a commercial data apparatus: Danish researchers have used the registry to demonstrate that certain antidepressants increase cardiac risk in elderly patients, findings that generated new prescribing guidelines. In doing so, Landspatientregisteret institutionalizes an approach that protects ipse and metis by design.
I believe that a quiet, unobservable life, away from data aggregation, is no longer possible. The question of whether to collect any data at scale has already been settled. At its face, the intuition that drives data minimization is sound: collect less, reduce exposure, limit the surface area for harm. But minimization, as a governing ethical principle, can only seek to conceal from the parties of which it’s designed to obfuscate. If demographic data cannot be collected, disparate impact cannot be detected or remedied. The gap between the system's self-representation and its actual effects, the idem of the institution as against its claims about its own ipse, can only be documented through collection.
To build accountability, institutions need metis-first systems which are built for serving the community. Accurate data aggregation that respects ipse and metis needs to fight both community distrust and the competitive pressure of companies that have optimized for extraction and profit.
The Home Mortgage Disclosure Act, enacted in the United States in 1975, required lenders to report geographic and demographic data on mortgage applications. When researchers and journalists began analyzing that data in the late 1980s and early 1990s, the results were unambiguous: the Atlanta Journal-Constitution's "Color of Money" series and the Federal Reserve Bank of Boston's 1992 study both documented systematic patterns of denial that mapped almost exactly onto racial geography. Neither investigation would have been possible without the data the lending industry had resisted collecting. When fintech lenders began arguing, decades later, that their algorithmic underwriting was racially neutral and that demographic collection requirements were therefore unnecessary, the structure of the argument was identical to a privacy claim. The discrimination was not eliminated by making the data disappear; it was made invisible as the industry silently benefited from its discretion.
Leveraging the idem/ipse/metis framework requires us to uplift communities instead of condensing them. The "who" someone is remains irreducibly different from the "what" they have been. Examples like Landspatientregisteret prove its possibility; choosing to organize itself around the community rather than against it, building a system that treats the person as more than their record. In this way, Ipse and metis prove that blind data aggregation removes the community that data is designed to represent.
Key Sources
- Francis Bacon, Meditationes Sacrae (1597) — "knowledge is power"
- Paul Ricoeur, Oneself as Another (1990) — the idem/ipse distinction between habitual pattern-identity and narrative self-constituted through promise and commitment
- Shoshana Zuboff, The Age of Surveillance Capitalism (2019) — human behavioral experience as raw material extracted for prediction products sold to third parties
- Nick Couldry and Ulises A. Mejias, "Data Colonialism: Rethinking Big Data's Relation to the Contemporary Subject," Television & New Media (2019) — the structural parallel between nineteenth-century colonial appropriation and contemporary data extraction
- James C. Scott, Seeing Like a State (1998) — metis as local, embodied practical knowledge destroyed by high-modernist legibility projects
- Helen Nissenbaum, Privacy in Context (2010) — contextual integrity as the framework for appropriate information flow based on the norms of the originating social context
- Landspatientregisteret (Danish National Patient Registry, est. 1977) — longitudinal national health registry as a model of democratically governed, contextually bounded data collection