← Back to Safety DB

Research Summary

What we have, how complete it is, and where the data comes from. Updated 2026-03-18.

Database Overview

Total entities
4,101 (1,880 compounds + 959 materials + 1,262 products)
Compound schemas
v4.0.0 — 3-tier classified (ORG 1,626 / INO 172 / MIX 82)
Material schemas
v4.0.0 — 5-tier classified (STR 189 / SFC 111 / CHM 310 / ENV 196 / ADV 153)
Product schemas
v4.0.0 — 8-tier classified (HOM 270 / BDY 155 / CHD 155 / FOD 113 / SPE 169 / OUT 75 / PET 63 / WER 67)
Fragrance catalog
2,325 ingredients across 29 chemical classes
API tests
23,149 passing, 0 failures
Synonyms
193,597 across 1,880 compounds (avg ~106 per compound)

Compound Coverage

FieldCoverageNotes
CAS number97.2%35 are class compounds (no CAS exists)
PubChem CID88.1%Ceiling: classes/mixtures have no CID
DTXSID (EPA)95.9%51 unmappable class compounds
Wikidata QID94.8%29 remaining have no Wikidata entry
UNII (FDA)87.6%PubChem batch enriched
ChEBI ID76.2%PubChem batch enriched
GHS data66.0%814 with hazard codes + statements + signal word
EDC classification100%1,002 positive classifications across 14 research passes
IARC group44.8%398 classified; 490 not evaluated by IARC
Prop 65100%984 complete + 249 evidence-based
LD50100%603 complete + 65 evidence-based + 565 insufficient
Genotoxicity100%555 complete + 77 evidence-based + 601 insufficient
Skin/Eye100%476 complete + 240 evidence-based + 517 insufficient
Bioactivity100%496 complete + 212 evidence-based + 525 insufficient
Alternatives100%554 with known substitutions + 13 category rules

Material Coverage (v4.0.0)

FieldCoverageNotes
description100%All 959 — derived from story (Era A/B) or existing (Era C)
overall_risk_level100%low 25, moderate 132, high 375, very_high 427
hazard_profile100%primary + manufacturing + use_phase + end_of_life
safety_contexts100%Per-context risk assessments
safety_summary100%Primary concerns + sensitive populations
behavior (4 sub-keys)100%Leaching, degradation, offgassing, thermal
identifiers100%Aliases, trade names, CAS numbers
regulatory (by_jurisdiction)100%US EPA, US OSHA, EU, other

Product Coverage (v4.0.0)

FieldCoverageNotes
description100%All 514
overall_risk_level97.2%low 89, moderate 18, high 165, very_high 5 (8 unknown)
formulation100%Key ingredients, preservative system, fragrance system
compound_composition96.5%275 with compound cross-refs
materials86.7%247 with common/concerning/preferred material refs
exposure100%Canonical 7-field shape (routes, contact types, users, etc.)
consumer_guidance90.2%Red flags, green flags, what to ask, alternatives
regulatory90.2%Applicable regulations, certifications, labeling

Data Sources

SourceData TypeCoverage
EPA CompTox (CTX)DTXSID, ToxVal, ToxCast bioactivity, CPDat, physical properties1,182 compounds
PubChem (PUG-View + REST)CID, InChIKey, SMILES, GHS, synonyms, physical properties, xrefs1,086 compounds
IARC MonographsCarcinogenicity groups 1/2A/2B/3398 compounds
California Prop 65Listed compounds with endpoints1,880 compounds assessed
ECHA REACHSVHC candidates, restriction entries634 EC numbers
Wikidata SPARQLQIDs, multilingual aliases1,169 compounds
ChEMBLDrug/trade names, bioactivity data378 compounds
ChEBIBiochemical nomenclature, ontology940 compounds
FDA (UNII)Unique Ingredient Identifiers1,080 compounds
NHANES (via CTX)Biomonitoring prevalence27 detected in population

All data is cross-referenced and sourced. No unsupported claims. Every field either has data, is stamped as "ceiling" (searched, confirmed not to exist), or is marked "not yet searched."

Methodology

The HQ Safety Database uses a tiered data quality model. Every field for every entity is classified as:

Compound safety assessments use 5D vectorization: regulatory classifications from IARC, EPA, EFSA, NTP, and Prop 65 are vectorized across magnitude, confidence, and consensus dimensions. Risk levels are synthesized per exposure context (human adult, child, infant, pregnant, dog, cat, aquatic life, etc.).