โ† Back to Safety DB

Chemical Identification Systems โ€” Research Summary

What exists, who runs it, what people love/hate, and where HQ fits

๐Ÿ”ข Chemical Identifier Systems

CAS Registry Number Commercial

American Chemical Society (ACS) โ€ข Since 1965 โ€ข 290M+ substances

  • Universal standard โ€” everyone uses it
  • Unique, unambiguous identifier
  • Works across languages
  • Commercial โ€” $6+ per lookup
  • CAS RN is trademarked IP
  • Can't verify without paying
  • Errors propagated because no one checks
  • Doesn't cover mixtures/undefined compositions
  • ACS told Wikipedia not to verify numbers (!)
PubChem CID Free

NIH / National Library of Medicine โ€ข Since 2004 โ€ข 115M+ compounds

  • Completely free and open
  • Government-backed (NIH mandate)
  • Structure data freely available
  • Actively maintained
  • Deliberately moved away from CAS
  • Only covers compounds with structures
  • Multiple CIDs can map to same CAS
  • Less universal recognition than CAS
InChI / InChIKey Open Standard

IUPAC โ€ข Since 2005 โ€ข Computable from structure

  • Free, non-proprietary
  • Can be computed โ€” doesn't need authority
  • Structure information encoded in ID
  • InChIKey is web-searchable (27 chars)
  • Not human-readable
  • Can't handle all chemistry (clays, polymers)
  • Tautomers cause issues
  • InChIKey collisions possible (rare)
EC Number (ECHA) Free

European Chemicals Agency โ€ข EU regulation (REACH)

  • Free to access
  • Required for EU market
  • Good regulatory data
  • EU-only coverage
  • Not all substances have EC numbers
  • Less global recognition
"We (including Wikipedia) should now switch from using CAS numbers to using PubChem IDs wherever possible... PubChem has deliberately moved away from CAS because CAS numbers are IP." โ€” Peter Murray-Rust, Cambridge chemist, 2008
"~120,000 of the 350,000+ chemicals in commercial products were too poorly described to link to a CAS number or their identities were withheld as trade secrets." โ€” Environ. Sci. Technol. research

๐Ÿ›’ Consumer Safety Databases

EWG Skin Deep Biased

Environmental Working Group โ€ข 88K+ products

  • Free consumer access
  • Large product database
  • Raised awareness of ingredient safety
  • Easy-to-understand ratings (1-10)
  • 80% of toxicologists say EWG overstates risks
  • Ratings for "data: none" ingredients
  • Chemically identical compounds rated differently
  • Natural bias โ€” natural ingredients rated better
  • Doesn't update with new research
  • Pay-to-play "EWG Verified" program
  • Amazon affiliate links on "dangerous" products
Safety Data Sheets (SDS) Free

Manufacturer-required โ€ข GHS standard (16 sections)

  • Legally required for hazardous chemicals
  • Standardized format (since 2012)
  • Comprehensive information
  • Manufacturer-specific data
  • Hard to find โ€” buried in manufacturer sites
  • Written for industrial/workplace use
  • Not consumer-friendly language
  • 41% of SDSs in one study didn't mention combustibility
  • Many consumer products don't have SDS
  • Not designed for pets/children contexts
PubChem Free

NIH โ€ข Comprehensive compound data

  • Free, authoritative, government-backed
  • Chemical/physical properties
  • Hazard information
  • Literature citations
  • Designed for researchers, not consumers
  • Technical language
  • No product-level data
  • No pet-specific information
"A decade ago, George Mason University surveyed ~1000 members of the Society of Toxicology. 80% felt that EWG overstated the risks of chemicals." โ€” The Eco Well, citing toxicologist survey

๐Ÿ“Š Comparison Matrix

System Free? Authority Consumer-friendly Pet data Products
CAS No ($6/lookup) High No No No
PubChem Yes High Somewhat No No
InChI Yes High No No No
ECHA Yes High Somewhat No No
EWG Yes Low (biased) Yes No Yes
SDS/MSDS Yes High No No Some
HQ Safety DB Yes High (cited) Yes (goal) Yes Yes

๐Ÿ•ณ๏ธ The Gap HQ Fills

What exists:

What doesn't exist:

๐Ÿ”ง HQ Nomenclature Approach

Layer HQ ID Cross-references
Compound hq-c-0001 CAS, PubChem CID, InChIKey, ECHA EC
Material hq-m-0001 Resin codes, ASTM standards
Product hq-p-0001 None (our category, our ID)

Why our own IDs + crossrefs?

Proposed Schema Addition:

{
  "hq_id": "hq-c-0001",
  "name": "Glyphosate",
  
  "crossrefs": {
    "cas": "1071-83-6",
    "pubchem_cid": "3496",
    "inchi_key": "XDDAORKBJWWYJS-UHFFFAOYSA-N",
    "echa_ec": "213-997-4"
  },
  
  "identity": { ... },
  "safety": { ... }
}
        

โœ“ Resolved Decisions

Q: Should HQ IDs be in filenames? RESOLVED

โœ“ Option A: hq-c-0001.json โ€” Machine-friendly, unambiguous
Option B: glyphosate.json โ€” Human-friendly, HQ ID inside file only

Human-readable name lives inside the file, not in filename.

Q: Sequential vs structured IDs? RESOLVED

โœ“ Option A: hq-c-0001 โ€” Simple sequence, just order of entry
Option B: hq-c-herb-0001 โ€” Category embedded
Option C: hq-c-2025-0001 โ€” Year embedded

Simple sequential. Categories change, years complicate lookups. Registry tracks assignments.

Q: How to handle "same compound, different form"? RESOLVED

Glyphosate acid vs glyphosate isopropylamine salt โ€” same thing?
Option A: Same HQ ID, different CAS in crossrefs array
โœ“ Option B: Separate HQ IDs, linked via hierarchy

Parent compound (hq-c-0001 glyphosate) โ†’ children reference via hierarchy.parent.
Children have own CAS numbers but inherits_safety: true from parent.

๐Ÿ“Š Current Registry State

As of March 2026

888
Active Compounds
+ 295 aliases = 1,183 files
Next: hq-c-1226
262
Materials
hq-m-0001โ€“0262
Next: hq-m-0263
237
Products
237 active (with gaps)
Next: hq-p-0362

Key Coverage Milestones

Field Coverage Notes
hazard_profile, found_in, alternatives 100% All 888 active compounds
regulatory.classifications 95% ~43 at ceiling (no classifiable language)
GHS hazard data 80% Ceiling: mixtures/classes lack single SDS
identity (formula + SMILES) 78% Ceiling: classes/mixtures can't have structure
dose_response.ld50 68% PubChem exhausted; CTX API pending

First Family: Glyphosate (still canonical example)

hq-c-0001  Glyphosate (acid)           CAS 1071-83-6      [parent]
    โ”œโ”€โ”€ hq-c-0002  IPA salt            CAS 38641-94-0     [alias โ†’ hq-c-0001]
    โ”œโ”€โ”€ hq-c-0003  Potassium salt      CAS 70901-12-1     [alias โ†’ hq-c-0001]
    โ””โ”€โ”€ hq-c-0004  Ammonium salt       CAS 114370-14-8    [alias โ†’ hq-c-0001]