What we have, how complete it is, and where the data comes from. Updated 2026-03-18.
| Field | Coverage | Notes |
|---|---|---|
| CAS number | 97.2% | 35 are class compounds (no CAS exists) |
| PubChem CID | 88.1% | Ceiling: classes/mixtures have no CID |
| DTXSID (EPA) | 95.9% | 51 unmappable class compounds |
| Wikidata QID | 94.8% | 29 remaining have no Wikidata entry |
| UNII (FDA) | 87.6% | PubChem batch enriched |
| ChEBI ID | 76.2% | PubChem batch enriched |
| GHS data | 66.0% | 814 with hazard codes + statements + signal word |
| EDC classification | 100% | 1,002 positive classifications across 14 research passes |
| IARC group | 44.8% | 398 classified; 490 not evaluated by IARC |
| Prop 65 | 100% | 984 complete + 249 evidence-based |
| LD50 | 100% | 603 complete + 65 evidence-based + 565 insufficient |
| Genotoxicity | 100% | 555 complete + 77 evidence-based + 601 insufficient |
| Skin/Eye | 100% | 476 complete + 240 evidence-based + 517 insufficient |
| Bioactivity | 100% | 496 complete + 212 evidence-based + 525 insufficient |
| Alternatives | 100% | 554 with known substitutions + 13 category rules |
| Field | Coverage | Notes |
|---|---|---|
| description | 100% | All 959 — derived from story (Era A/B) or existing (Era C) |
| overall_risk_level | 100% | low 25, moderate 132, high 375, very_high 427 |
| hazard_profile | 100% | primary + manufacturing + use_phase + end_of_life |
| safety_contexts | 100% | Per-context risk assessments |
| safety_summary | 100% | Primary concerns + sensitive populations |
| behavior (4 sub-keys) | 100% | Leaching, degradation, offgassing, thermal |
| identifiers | 100% | Aliases, trade names, CAS numbers |
| regulatory (by_jurisdiction) | 100% | US EPA, US OSHA, EU, other |
| Field | Coverage | Notes |
|---|---|---|
| description | 100% | All 514 |
| overall_risk_level | 97.2% | low 89, moderate 18, high 165, very_high 5 (8 unknown) |
| formulation | 100% | Key ingredients, preservative system, fragrance system |
| compound_composition | 96.5% | 275 with compound cross-refs |
| materials | 86.7% | 247 with common/concerning/preferred material refs |
| exposure | 100% | Canonical 7-field shape (routes, contact types, users, etc.) |
| consumer_guidance | 90.2% | Red flags, green flags, what to ask, alternatives |
| regulatory | 90.2% | Applicable regulations, certifications, labeling |
| Source | Data Type | Coverage |
|---|---|---|
| EPA CompTox (CTX) | DTXSID, ToxVal, ToxCast bioactivity, CPDat, physical properties | 1,182 compounds |
| PubChem (PUG-View + REST) | CID, InChIKey, SMILES, GHS, synonyms, physical properties, xrefs | 1,086 compounds |
| IARC Monographs | Carcinogenicity groups 1/2A/2B/3 | 398 compounds |
| California Prop 65 | Listed compounds with endpoints | 1,880 compounds assessed |
| ECHA REACH | SVHC candidates, restriction entries | 634 EC numbers |
| Wikidata SPARQL | QIDs, multilingual aliases | 1,169 compounds |
| ChEMBL | Drug/trade names, bioactivity data | 378 compounds |
| ChEBI | Biochemical nomenclature, ontology | 940 compounds |
| FDA (UNII) | Unique Ingredient Identifiers | 1,080 compounds |
| NHANES (via CTX) | Biomonitoring prevalence | 27 detected in population |
All data is cross-referenced and sourced. No unsupported claims. Every field either has data, is stamped as "ceiling" (searched, confirmed not to exist), or is marked "not yet searched."
The HQ Safety Database uses a tiered data quality model. Every field for every entity is classified as:
Compound safety assessments use 5D vectorization: regulatory classifications from IARC, EPA, EFSA, NTP, and Prop 65 are vectorized across magnitude, confidence, and consensus dimensions. Risk levels are synthesized per exposure context (human adult, child, infant, pregnant, dog, cat, aquatic life, etc.).