Documentation

Facts JSON Schema

PHI-safe analytical output schema for the explain command and Python SDK.

Purpose

The ExactEDI Facts JSON is a deterministic, PHI-safe summary of an EDI file designed for:

  1. LLM Contract - Safe input for AI explanation layers (ExactEDI Insights)
  2. Analytics Pipeline - Structured data for reporting and dashboards
  3. Quality Assurance - Validation status and anomaly detection

PHI Safety Constraints

The Facts JSON is designed to be HIPAA Safe Harbor compliant by explicitly excluding:

  • Patient names, DOB, addresses
  • Medical Record Numbers (MRN)
  • Social Security Numbers
  • Subscriber/Member IDs
  • Account numbers
  • Free-text fields (notes, descriptions)
  • Raw EDI content

Safe to include:

  • Control numbers (interchange, group, transaction)
  • Payer and provider identifiers (NPI, organization names)
  • Service dates (not patient DOB)
  • Procedure and diagnosis codes (ICD-10, CPT, HCPCS)
  • Monetary amounts (charges, payments)
  • Structural metadata (counts, envelope info)

Schema Fields

Root Object

FieldTypeDescription
schema_versionstringSchema version (e.g., "1.0.0")
engine_versionstringExactEDI Engine version that produced this output
file_metadataobjectFile identification and metadata
delimitersobjectX12 delimiter specification
envelope_countsobjectISA/GS/ST structure counts
transaction_countsobjectTransaction type breakdown
validation_summaryobjectError/warning summary
cas_summariesarrayCAS adjustment summaries (835 only)
claim_service_countsobjectHigh-level claim/service counts
structural_anomaliesobjectDetected anomalies

file_metadata

FieldTypeDescription
sha256_hashstringSHA-256 hash of file content for integrity
file_size_bytesintegerFile size in bytes
source_filenamestringOriginal filename (no path)
parse_timestampstringISO 8601 UTC timestamp of parse

delimiters

FieldTypeDescription
element_separatorstringElement separator (typically "*")
component_separatorstringComponent separator (typically ":")
segment_terminatorstringSegment terminator (typically "~")
repetition_separatorstringRepetition separator from ISA11

envelope_counts

FieldTypeDescription
interchange_countintegerNumber of ISA/IEA pairs
group_countintegerNumber of GS/GE pairs
transaction_countintegerNumber of ST/SE pairs
total_segmentsintegerTotal segment count in file

transaction_counts

FieldTypeDescription
claim_837pintegerProfessional claim count
claim_837iintegerInstitutional claim count
claim_837dintegerDental claim count
remittance_835integerRemittance advice count
otherintegerUnsupported/unknown transaction types

validation_summary

FieldTypeDescription
error_countintegerTotal error count
warning_countintegerTotal warning count
error_codesarray[string]Unique error codes encountered
warning_codesarray[string]Unique warning codes encountered

Common error codes:

  • ISA_NOT_FIRST - ISA is not the first segment
  • MISSING_SE - Transaction missing closing SE
  • MISSING_GE - Group missing closing GE
  • MISSING_IEA - Interchange missing closing IEA
  • ST_SE_MISMATCH - ST02 != SE02
  • GS_GE_MISMATCH - GS06 != GE02
  • ISA_IEA_MISMATCH - ISA13 != IEA02
  • SEGMENT_COUNT_MISMATCH - SE01 != actual count
  • TRANSACTION_COUNT_MISMATCH - GE01 != actual count
  • GROUP_COUNT_MISMATCH - IEA01 != actual count
  • NESTED_ST - ST inside ST without SE
  • NESTED_GS - GS inside GS without GE
  • NESTED_ISA - ISA inside ISA without IEA

cas_summaries (835 only)

Array of CAS group summaries. Only present for 835 remittance files.

FieldTypeDescription
group_codestringCAS group code (CO, CR, OA, PI, PR)
group_namestringHuman-readable group name
reason_countsobjectMap of reason_code -> occurrence count

CAS Group Codes:

  • CO - Contractual Obligations
  • CR - Correction and Reversal
  • OA - Other Adjustments
  • PI - Payor Initiated Reductions
  • PR - Patient Responsibility

claim_service_counts

Heuristic counts based on segment analysis. May not be exact.

FieldTypeDescription
claim_countintegerApproximate claim count (CLM/CLP segments)
service_line_countintegerApproximate service lines (SV1/SV2/SVC)
diagnosis_code_countintegerUnique diagnosis codes found
procedure_code_countintegerUnique procedure codes found

structural_anomalies

FieldTypeDescription
has_envelope_errorsbooleanMissing/mismatched envelope pairs
has_control_number_mismatchbooleanMismatched control numbers
has_segment_count_mismatchbooleanSE01/GE01/IEA01 count errors
has_unsupported_transactionsbooleanNon-837/835 transaction types
anomaly_descriptionsarray[string]Brief anomaly descriptions

How ExactEDI Engine Populates Facts

The ExactEDI Engine populates the Facts JSON deterministically through these steps:

  1. File Metadata

    • Hash is computed from raw file bytes (SHA-256)
    • File size read from filesystem
    • Timestamp is UTC at parse completion
  2. Delimiter Detection

    • Read from fixed ISA positions (bytes 3, 104, 105)
    • ISA11 provides repetition separator
  3. Envelope Tracking

    • EnvelopeTracker state machine processes ISA/IEA, GS/GE, ST/SE
    • Counts accumulated as segments processed
    • Diagnostics emitted for structural violations
  4. Transaction Type Detection

    • ST01 examined for transaction identifier
    • Implementation reference (ST03) may refine type (837P vs 837I)
  5. CAS Extraction (835 only)

    • CAS segments scanned for group code + reason codes
    • Counts aggregated by group and reason
  6. Claim/Service Counting (Heuristic)

    • CLM segments counted for 837
    • CLP segments counted for 835
    • SV1/SV2 (837) and SVC (835) counted for service lines
    • Codes extracted from composite elements when parseable
  7. Anomaly Detection

    • Based on validation diagnostics
    • Flags set based on diagnostic categories

Example Usage

#include <exactedi/output/facts_schema.hpp>

// Build facts from parsed data
exactedi::facts::ExactEDIFacts facts;
facts.schema_version = exactedi::facts::FACTS_SCHEMA_VERSION;
facts.engine_version = "1.0.0";

// Populate from EnvelopeTracker and ValidationResult
facts.envelope_counts.interchange_count = tracker.interchanges().size();
facts.envelope_counts.group_count = tracker.groups().size();
facts.envelope_counts.transaction_count = tracker.transactions().size();
facts.envelope_counts.total_segments = tracker.segment_count();

// Output JSON
std::string json = facts.to_json_string(true);  // pretty-printed

Versioning

The schema follows semantic versioning:

  • Major: Breaking changes to field names or structure
  • Minor: New optional fields added
  • Patch: Documentation or clarification updates

Consumers should check schema_version and handle unknown fields gracefully.