Skip to main content Skip to main content

Data dictionary · v2.4 · current

Every entity, every field, published.

OMOP Common Data Model 5.4 with our eight-axis Master Equation extensions. Every field documents its provenance, deidentification method, and re-identification risk class. No surprises in the enclave.

Overview

The CH research dataset is a single coherent OMOP CDM 5.4 instance, extended with eight tables specific to the Master Equation. Every patient, every encounter, every observation lives in this schema. There is no parallel "internal" schema — researchers see the same model our own analysts use, with sensitive fields deidentified per the policy on this page.

Entities
28 tables
20 OMOP CDM 5.4 standard tables + 8 CH-specific extensions for axis tracking, consent, and chain-pinned events.
Fields
412 documented
Every field has type, allowed values, lineage, deid method, and an example. 89 fields are CH-specific extensions.
Vocabularies
14 standardized
SNOMED CT, RxNorm, LOINC, ICD-10-CM, CPT-4, HCPCS, NDC, ATC, MedDRA, Race & Ethnicity (OMB), CHRX (CH internal Rx codes), CHAX (CH axis events), CHCO (CH consent codes), CHPT (CH provider type).
Update cadence
Daily ETL
Data lands in the enclave at 03:00 UTC. Schema migrations announced 90 days in advance via the Researcher Bulletin (RSS available).

Standards we follow

We don't invent vocabularies when something credible already exists. Where we extend, we extend transparently — the CH-prefixed namespaces below are the only places we add anything custom.

OMOP CDM
v5.4 · OHDSI
Person-centric relational schema for observational health research. Same model used by 600+ academic medical centers.
FHIR R4
HL7 · v4.0.1
Source-of-truth at the clinical edge. We map FHIR resources → OMOP for the registry. FHIR endpoints available for clinic-side reads.
SNOMED CT
US Edition · 2025-09
Conditions, procedures, anatomical sites, qualifier values. Updated monthly from the NLM release.
RxNorm
2025-10
All medication exposures map to RxNorm RxCUIs. Brand and generic preserved as separate codes per OHDSI convention.
LOINC
v2.78
Laboratory results, vital signs, survey instruments. Every measurement has a LOINC code and a UCUM unit.
ICD-10-CM
FY2026
Billing codes preserved alongside SNOMED for claims-based research. Auto-mapped via the OMOP standard concept map.
USCDI v4
ONC · 2025
Every USCDI v4 element is represented. We support USCDI+ for behavioral-health-specific elements when applicable.
CHAX (CH internal)
v2.4
8-axis Master Equation event vocabulary. 312 codes spanning measurable axis movement (e.g. CHAX:PO-001 = anchor compliance).

Entities

The 28 tables in the registry. OMOP standard tables are kept as-is (so existing OHDSI tooling works); CH-prefixed tables are extensions documented on this page.

Table Purpose Source Fields Rows (live)
personDemographic anchor for every participant.One row per participantOMOP181.2k
observation_periodWhen a person was actively observed.Spans of follow-upOMOP53.4k
visit_occurrenceEncounter-level rollup (visit, telehealth, ER, etc.)One row per encounterOMOP178.7k
condition_occurrenceCoded conditions (SNOMED + ICD-10).Diagnoses, problem-list entriesOMOP1522.1k
drug_exposureMedications dispensed or administered.Rx + admin recordsOMOP2218.4k
procedure_occurrenceProcedures performed (CPT/HCPCS/SNOMED).Procedure recordsOMOP1411.2k
measurementQuantitative results — labs, vitals, instruments.Numeric or coded resultsOMOP23142.8k
observationQualitative or non-result clinical facts.Notes, social history, qualifiersOMOP2038.6k
deathDate and cause of death.One row per decedentOMOP814
device_exposureImplants, wearables, durable medical equipment.Device-level recordsOMOP122.8k
specimenSpecimens collected (blood, tissue, etc.)One per drawOMOP119.4k
costEncounter and item-level cost records.Charges, paid amountsOMOP1428.3k
payer_plan_periodInsurance coverage spans.One per coverage spanOMOP92.1k
care_siteClinic, hospital, or virtual visit site.Reference tableOMOP8147
providerClinicians and care-team members.Reference tableOMOP11312
locationGeographic location (county-level only post-deid).Reference tableOMOP6847
noteClinical notes (free text, redacted).Encounter notesOMOP108.1k
note_nlpNLP-derived structured facts from notes.Auto-extractedOMOP1442.6k
fact_relationshipCross-table relationships (e.g. drug↔condition).Relationship graphOMOP518.9k
concept / vocabulary / etcOMOP standardized vocabulary tables.ReferenceOMOP
ch_axis_snapshotPer-day 8-axis vector for every participant.Daily axis valuesCH14438k
ch_axis_eventDiscrete events that moved an axis.Event log (CHAX-coded)CH1286.4k
ch_consentActive consent record (research / sharing / data classes).Consent stateCH163.2k
ch_consent_eventConsent-state change log (chain-pinned).Audit trailCH1111.8k
ch_data_sharePer-study data-share grants and revocations.One per shareCH13412
ch_token_eventHCR/HCC issuance to participants (deidentified).Earnings ledger refsCH994.2k
ch_protocolCare-protocol catalog (versioned, signed).ReferenceCH10187
ch_chain_eventChain-pinned audit events (notes, consent, prescribing).Tamper-evident logCH7218k

person

The demographic anchor. Every participant has exactly one row. Direct identifiers (name, SSN, MRN, address, phone, email) are not in this table — they live in the operational system, never in the research enclave.

person — 18 fields

OMOP CDM 5.4 standard · spec ↗
FieldTypeSource / vocabularyDeidNotes
person_idint64Synthetic surrogateSHIFTStable across queries within one study; rotated between studies.
gender_concept_idintOMOP GenderPASSPass-through. 5 standard concepts.
year_of_birthintYYYYSHIFTDate-shifted ±90 days for the patient (year stable for ages < 89).
month_of_birthint1-12BINQuarter only after deid (1, 4, 7, 10).
day_of_birthintSUPPRSuppressed (set to 15 of birth month).
birth_datetimetimestampSUPPRSuppressed entirely. Use year_of_birth.
race_concept_idintOMB Race & EthnicityPASSPass-through. Self-reported.
ethnicity_concept_idintOMB Race & EthnicityPASSPass-through. Self-reported.
location_idintFK → locationBINCounty-level only. ZIP code suppressed.
provider_idintFK → providerSHIFTProvider IDs surrogated per study.
care_site_idintFK → care_siteSHIFTCare-site IDs surrogated per study.
person_source_valuestringSUPPROriginal MRN suppressed.
gender_source_valuestringPASSSelf-described, no PII.
race_source_valuestringPASSSelf-reported text.
ethnicity_source_valuestringPASSSelf-reported text.
ch_consent_stateenumCHCOPASSCH ext. active / paused / withdrawn / pending.
ch_axis_baseline_idintFK → ch_axis_snapshotPASSCH ext. Baseline 8-axis vector at enrollment.
ch_data_classesenum[]CHCOPASSCH ext. Which data classes patient consented to share.

observation

Qualitative facts that aren't measurements, conditions, or procedures — social history, family history, qualifiers, structured questionnaire responses. CHAX-coded axis events also surface here as observations of class CH-AXIS.

observation — selected fields (20 total)

OMOP CDM 5.4 standard + 2 CH ext.
FieldTypeSource / vocabularyDeidNotes
observation_idint64Synthetic surrogateSHIFTPer-study surrogate.
person_idint64FK → personSHIFTStable within study.
observation_concept_idintSNOMED, LOINC, CHAXPASSStandardized concept.
observation_datedateSHIFTPatient-level date shift ±90 days.
observation_datetimetimestampSHIFTSame shift as date; time-of-day binned to nearest hour.
value_as_stringtextREDACTNER-redacted free text. Names, locations, MRNs removed.
value_as_numbernumericPASSNumeric values pass through.
value_as_concept_idintVocabPASSCoded value.
qualifier_concept_idintSNOMED qualifierPASS
unit_concept_idintUCUMPASS
provider_idintFKSHIFT
visit_occurrence_idint64FKSHIFT
ch_axisenumPO/NM/ER/SC/RS/ES/TA/PVPASSCH ext. Which axis this observation moved.
ch_axis_deltanumericPASSCH ext. Signed delta on that axis.

measurement

Quantitative test results: labs, vital signs, calculated indices, instrument scores. Largest table by row count. Values pass through unchanged; dates are patient-shifted; provider/care-site IDs are surrogated.

measurement — abbreviated

23 fields total
FieldTypeSourceDeidNotes
measurement_concept_idintLOINCPASSStandard LOINC concept.
measurement_datedateSHIFT±90 days, patient-stable.
value_as_numbernumericPASSPass-through.
unit_concept_idintUCUMPASS
range_low / range_highnumericPASSLab-reported reference ranges.
measurement_source_valuestringREDACTSource-system text; NER-redacted.
ch_axis_attributionenumPO/NM/ER/SC/RS/ES/TA/PVPASSCH ext. Which axis (if any) this measurement updates.

drug_exposure

Medications dispensed, prescribed, or administered. Includes inpatient administration records, outpatient prescriptions, and pharmacy fill events. Standard RxNorm coding throughout.

drug_exposure — abbreviated

22 fields total
FieldTypeSourceDeidNotes
drug_concept_idintRxNorm RxCUIPASS
drug_exposure_start_datedateSHIFTPatient date shift.
drug_exposure_end_datedateSHIFTSame shift.
days_supplyintPASS
quantitynumericPASS
refillsintPASS
drug_type_concept_idintOMOP typePASSRx written / dispensed / administered / inferred.
stop_reasonstringREDACTNER-redacted free text.
ch_42cfr2_flagboolCHCOSUPPRCH ext. SUD-treatment Rx is suppressed unless Tier-2+ IREB and 42 CFR Part 2 redisclosure consent.

condition_occurrence

Diagnoses recorded by clinicians, problem-list entries, claims-coded conditions. SNOMED CT is the standard concept; ICD-10-CM source is preserved.

condition_occurrence — abbreviated

15 fields total
FieldTypeSourceDeidNotes
condition_concept_idintSNOMED CTPASS
condition_start_datedateSHIFTPatient date shift.
condition_end_datedateSHIFTSame shift.
condition_status_concept_idintOMOP statusPASSActive, resolved, ruled-out, etc.
condition_source_valuestringICD-10-CMPASSOriginal code preserved.
stop_reasonstringREDACTNER-redacted text.
ch_sensitive_classenumCHCOPASSCH ext. none / BH / SUD / HIV / genomic / repro — drives suppression rules.

ch_axis_snapshot

The signature CH extension. One row per participant per day, holding their 8-axis vector and overall CH score. This is what makes CH research data structurally different from claims and chart data — every patient has a longitudinal time-series of their lived health, not just billing events.

PO

Physical Origin

Anchor work, baseline movement, restorative habits.

NM

Nourishment

Diet quality, meal cadence, hydration.

ER

Effort & Recovery

Training load, sleep, recovery markers.

SC

Social Connection

Quality and frequency of meaningful contact.

RS

Rhythm & Sleep

Circadian alignment, sleep-stage architecture.

ES

Emotional State

Mood, affect, stress regulation.

TA

Thought & Attention

Cognitive engagement, focus, learning.

PV

Purpose & Vision

Sense of meaning, agency, longitudinal direction.

ch_axis_snapshot — 14 fields

CH extension · v2.4
FieldTypeRangeDeidNotes
snapshot_idint64SHIFTSurrogate.
person_idint64FKSHIFT
snapshot_datedateSHIFT±90 day patient shift.
axis_ponumeric0–100PASSPhysical Origin score.
axis_nmnumeric0–100PASSNourishment score.
axis_ernumeric0–100PASSEffort & Recovery score.
axis_scnumeric0–100PASSSocial Connection score.
axis_rsnumeric0–100PASSRhythm & Sleep score.
axis_esnumeric0–100PASSEmotional State score.
axis_tanumeric0–100PASSThought & Attention score.
axis_pvnumeric0–100PASSPurpose & Vision score.
ch_scorenumeric0–100PASSComposite Master-Equation score.
data_completenessnumeric0–1PASSFraction of expected signals present that day.
computation_versionstringsemverPASSME computation version (e.g. 2.4.0). Pin to this for reproducibility.

Deidentification methods

Five methods cover every field. The deid column on every entity table tells you which one applies. Combined methods comply with HIPAA Safe Harbor (45 CFR § 164.514(b)(2)) and meet the more stringent Expert Determination standard for the cohorts we publish.

PASS
Pass-through

Field is non-identifying by construction. Coded vocabularies, numeric values without obvious dates, axis scores. Passed unchanged.

SHIFT
Date / ID shift

Dates shifted by a patient-stable random offset of ±90 days. IDs rotated to per-study surrogates. Time intervals between events preserved.

BIN
Binning

Continuous values bucketed (e.g. ZIP → county; age > 89 → "90+"; hour-of-day → nearest 4-hour bin). Reduces identifiability while preserving research value.

REDACT
Redaction (free text)

NER-based PHI redaction across 18 HIPAA Safe Harbor identifiers + 12 CH-specific. Replaced with placeholder tokens ([NAME-1], [LOC-1]). Reviewed by output-review for k-anonymity.

SUPPR
Suppression

Field is removed entirely or set to a fixed null value. Used for direct identifiers (MRN, full DOB), 42 CFR Part 2-protected fields without explicit redisclosure consent, and other high-risk classes.

Re-identification risk

Deidentified ≠ anonymized. We treat re-identification risk as ongoing. Three layers protect against attempts:

1 · Output review (every result)

Every result you export from the enclave is reviewed before release. Cells with cohort < 11 are auto-suppressed. Counts within ±2 of small cohorts are jittered. Anything that violates k-anon for any combination of demographic axes gets held for human review.

2 · Linkage attack monitoring

We monitor for query patterns consistent with linkage attempts (e.g. repeatedly slicing by ZIP + DOB + sex). The IREB is alerted and the study can be paused for review.

3 · Annual re-identification audit

Each year, an external auditor (currently Privacy Analytics) attempts re-identification on a sample of our published outputs using public datasets. Last audit (2025-08): 0 of 47 records re-identifiable at 95% confidence. Report posted on the security page.

What you cannot do, even in the enclave

The enclave software environment enforces these prohibitions. Violation attempts are logged and may result in study termination.

  • Cross-link to external identifying datasets you bring in (file uploads are scanned).
  • Generate outputs that uniquely identify cohorts smaller than 11.
  • Use unprotected free-text fields to derive direct identifiers.
  • Combine date-shifted dates with external timeline data to reverse the shift.
  • Attempt to fingerprint participants via rare-condition combinations beyond what your protocol justifies.

Provenance & lineage

Every record in the registry has a provenance chain. You can trace any row back through ETL → operational system → originating clinic / device / patient self-report. The provenance fields are visible in the enclave and are required for any publication.

Provenance fields (on every fact table)

Each table includes a standard set of provenance columns: source_system (which clinic, device, or app produced the row), extraction_timestamp (when ETL ran), etl_version (semver of the ETL job), chain_event_id (FK into ch_chain_event for chain-pinned operations), and quality_flags (bitset for known data-quality issues).

Chain-pinned events

A subset of high-trust events — consent changes, prescription writes, signed clinical notes, AI override decisions, axis-token issuance — are pinned to the CH Chain. Their chain_event_id resolves to a public hash you can verify independently. We use this for the kind of facts where "we say it happened" is not enough.

Downloads & SDKs

All artifacts free, no application required.

Schema bundle
JSON Schema · DBML · OMOP DDL
Full schema in three formats. Includes deid policy, vocabulary refs, and example synthetic rows.

Download schema bundle (4.1 MB) →

Synthetic-data sandbox
10k synthetic rows
Match-schema synthetic dataset for code validation. Same shape as real data, no real participants.

Open sandbox →

CH-OMOP Python SDK
pip install ch-research
Helpers for axis cohort building, date-shift handling, and output-review pre-flight.

PyPI · GitHub →

Vocabulary changelog
RSS · Email · Slack
Subscribe to schema and vocabulary changes. 90-day notice on breaking changes; immediate notice on additive ones.

Subscribe →

Ready to design a study?

Open the synthetic sandbox before you apply.

Validate your cohort logic and analytic plan against the real schema with synthetic rows. No commitment, no review queue. When the code runs clean, your application is half-written.