This document describes the SQLite database schema used by ExactEDI's import feature.
Overview
The import feature provides persistent storage for parsed X12 data in a local SQLite database. All stored data is PHI-safe and contains only structural metadata, statistical summaries, and validation results. Patient identifiers, provider names, and other protected health information are never persisted.
Purpose
The database enables:
- Historical analysis of EDI file characteristics over time
- Error trending across multiple file imports
- Transaction statistics for billing and remittance patterns
- Adjustment code analysis for 835 remittance files
- Deduplication detection via SHA-256 file hashing
Typical Use Cases
Quality Monitoring Track validation error rates over time to identify deteriorating data quality or partner configuration issues.
Statistical Analysis Aggregate transaction volumes, charge amounts, and payment patterns across multiple file imports for trending and forecasting.
Adjustment Code Reporting Analyze 835 remittance adjustment codes (CAS segments) to identify common denial reasons and payment adjustments.
File Provenance Maintain an audit trail of processed files with cryptographic hashes to detect duplicate imports or file modifications.
Schema Version
Current schema version: 1
Schema migrations are applied automatically when opening a database with run_migrations: true (default).
Table Definitions
schema_version
Tracks applied schema migrations.
| Column | Type | Constraints | Description |
|---|---|---|---|
version | INTEGER | PRIMARY KEY | Schema version number |
applied_at | TEXT | NOT NULL | ISO 8601 timestamp when migration was applied |
Example:
version | applied_at
--------|---------------------
1 | 2026-01-13T19:30:00Z
import_runs
Records metadata for each file import session.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique import run identifier |
source_file | TEXT | NOT NULL | Original file path (filename only, no directory) |
sha256_hash | TEXT | SHA-256 hash of file contents (hex-encoded, 64 characters) | |
file_size_bytes | INTEGER | NOT NULL | File size in bytes |
imported_at | TEXT | NOT NULL | ISO 8601 timestamp of import |
interchange_count | INTEGER | NOT NULL, DEFAULT 0 | Number of ISA/IEA envelopes |
group_count | INTEGER | NOT NULL, DEFAULT 0 | Number of GS/GE envelopes |
transaction_count | INTEGER | NOT NULL, DEFAULT 0 | Number of ST/SE transactions |
segment_count | INTEGER | NOT NULL, DEFAULT 0 | Total segment count |
error_count | INTEGER | NOT NULL, DEFAULT 0 | Number of validation errors |
warning_count | INTEGER | NOT NULL, DEFAULT 0 | Number of validation warnings |
is_valid | INTEGER | NOT NULL, DEFAULT 0 | 1 if file passed validation, 0 otherwise |
Indexes:
idx_import_runs_sourceonsource_fileidx_import_runs_hashonsha256_hash
Notes:
source_filestores only the filename, not the full path, to avoid exposing directory structuressha256_hashenables duplicate detection:SELECT * FROM import_runs WHERE sha256_hash = ?is_validis a boolean stored as INTEGER (0 or 1) per SQLite conventions
Example:
id | source_file | sha256_hash | file_size_bytes | is_valid
----|-------------------|----------------------------------|-----------------|----------
1 | claims_batch.x12 | a3f5e2... | 524288 | 1
2 | remit_20260113.x12| 7b9c1d... | 1048576 | 1
tx_summary
Stores per-transaction statistics. One row per ST/SE transaction set.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique transaction record ID |
run_id | INTEGER | NOT NULL, FOREIGN KEY | References import_runs(id) |
transaction_type | TEXT | NOT NULL | Transaction set identifier: "837P", "837I", "837D", "835", etc. |
control_number | TEXT | NOT NULL | ST02/SE02 control number |
start_segment | INTEGER | NOT NULL | Zero-indexed segment position of ST |
end_segment | INTEGER | NOT NULL | Zero-indexed segment position of SE |
segment_count | INTEGER | NOT NULL | Number of segments in transaction (SE01 value) |
is_complete | INTEGER | NOT NULL, DEFAULT 0 | 1 if SE segment present, 0 if missing |
counts_match | INTEGER | NOT NULL, DEFAULT 0 | 1 if SE01 matches actual segment count, 0 if mismatch |
claim_count | INTEGER | NOT NULL, DEFAULT 0 | Heuristic claim count (number of CLM/CLP segments) |
service_line_count | INTEGER | NOT NULL, DEFAULT 0 | Service line count (SV1/SV2 segments for 837, SVC for 835) |
total_charge | REAL | NOT NULL, DEFAULT 0.0 | Sum of billed amounts |
total_payment | REAL | NOT NULL, DEFAULT 0.0 | Sum of paid amounts (835 only) |
Indexes:
idx_tx_summary_runonrun_ididx_tx_summary_typeontransaction_type
Foreign Key Constraints:
run_idreferencesimport_runs(id)withON DELETE CASCADE
Notes:
transaction_typeis extracted from ST01 and refined using BHT segment data (e.g., "837P" vs "837I")is_completeandcounts_matchenable validation error analysis- Financial fields (
total_charge,total_payment) are stored as REAL (floating-point) - For 837 files,
total_paymentis typically 0.0 (claims, not remittances)
Example:
id | run_id | transaction_type | control_number | claim_count | total_charge
---|--------|------------------|----------------|-------------|-------------
1 | 1 | 837P | 0001 | 25 | 12500.00
2 | 1 | 837P | 0002 | 30 | 18750.50
3 | 2 | 835 | 0001 | 120 | 0.00
errors
Validation errors and warnings encountered during file processing.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique error record ID |
run_id | INTEGER | NOT NULL, FOREIGN KEY | References import_runs(id) |
severity | TEXT | NOT NULL | "error" or "warning" |
code | TEXT | NOT NULL | Error code identifier (e.g., "MISSING_SE", "SEGMENT_COUNT_MISMATCH") |
message | TEXT | NOT NULL | Human-readable error description |
segment_index | INTEGER | NOT NULL, DEFAULT 0 | Zero-indexed segment position where error occurred |
segment_tag | TEXT | Segment identifier (e.g., "ST", "SE", "CLM") |
Indexes:
idx_errors_runonrun_ididx_errors_codeoncode
Foreign Key Constraints:
run_idreferencesimport_runs(id)withON DELETE CASCADE
Notes:
severitydistinguishes between blocking errors and informational warningscodeenables programmatic error filtering and trendingsegment_indexcorresponds to the zero-indexed segment position in the file
Example:
id | run_id | severity | code | message | segment_index
---|--------|----------|--------------------------|------------------------------------------|---------------
1 | 1 | error | MISSING_SE | Transaction 0001 is missing SE | 47
2 | 1 | error | SEGMENT_COUNT_MISMATCH | SE01=45, actual=44 | 48
3 | 2 | warning | CONTROL_NUMBER_GAP | Gap in control numbers: 0002 -> 0004 | 150
adjustments_835
Aggregated adjustment data from 835 remittance CAS segments. Only populated for 835 transaction types.
| Column | Type | Constraints | Description |
|---|---|---|---|
id | INTEGER | PRIMARY KEY AUTOINCREMENT | Unique adjustment record ID |
run_id | INTEGER | NOT NULL, FOREIGN KEY | References import_runs(id) |
group_code | TEXT | NOT NULL | CAS group code: "CO", "CR", "OA", "PI", "PR" |
group_name | TEXT | NOT NULL | Human-readable group name |
reason_code | TEXT | NOT NULL | CARC reason code (e.g., "1", "45", "97") |
occurrence_count | INTEGER | NOT NULL, DEFAULT 0 | Number of times this adjustment appeared |
total_amount | REAL | NOT NULL, DEFAULT 0.0 | Sum of adjustment amounts |
Indexes:
idx_adjustments_runonrun_ididx_adjustments_groupon(group_code, reason_code)
Foreign Key Constraints:
run_idreferencesimport_runs(id)withON DELETE CASCADE
Notes:
- Adjustment codes are aggregated per import run (not per transaction or claim)
group_codevalues:- "CO" - Contractual Obligation
- "CR" - Correction and Reversals
- "OA" - Other Adjustments
- "PI" - Payer Initiated Reductions
- "PR" - Patient Responsibility
group_nameis populated from the CAS segment lookup tablereason_codeis the CARC (Claim Adjustment Reason Code)
Example:
id | run_id | group_code | group_name | reason_code | occurrence_count | total_amount
---|--------|------------|-----------------------------|----|------------------|-------------
1 | 2 | PR | Patient Responsibility | 1 | 45 | 3250.00
2 | 2 | CO | Contractual Obligation | 45 | 120 | 8900.50
3 | 2 | CR | Correction and Reversals | 97 | 3 | 150.00
Table Relationships
Entity-Relationship Diagram
import_runs (1) ----< (N) tx_summary
|
+-------------< (N) errors
|
+-------------< (N) adjustments_835
Referential Integrity
All child tables use ON DELETE CASCADE foreign key constraints:
- Deleting an
import_runsrecord automatically deletes all associatedtx_summary,errors, andadjustments_835records - SQLite foreign key enforcement is enabled via
PRAGMA foreign_keys = ON(set automatically by ExactEDI)
Query Implications
Joining import runs with transactions:
SELECT r.source_file, t.transaction_type, COUNT(*) as tx_count
FROM import_runs r
JOIN tx_summary t ON t.run_id = r.id
GROUP BY r.source_file, t.transaction_type;
Finding all errors for a specific file:
SELECT e.code, e.message, e.segment_index
FROM import_runs r
JOIN errors e ON e.run_id = r.id
WHERE r.source_file = 'claims_batch.x12';
Common Query Examples
Import History
List all imports with summary statistics:
SELECT
id,
source_file,
imported_at,
transaction_count,
segment_count,
error_count,
CASE WHEN is_valid = 1 THEN 'Valid' ELSE 'Invalid' END as status
FROM import_runs
ORDER BY imported_at DESC;
Find duplicate files by hash:
SELECT sha256_hash, COUNT(*) as import_count, GROUP_CONCAT(source_file) as files
FROM import_runs
WHERE sha256_hash IS NOT NULL
GROUP BY sha256_hash
HAVING COUNT(*) > 1;
Transaction Analysis
Count transactions by type:
SELECT transaction_type, COUNT(*) as count, SUM(claim_count) as total_claims
FROM tx_summary
GROUP BY transaction_type
ORDER BY count DESC;
Average charges per claim by transaction type:
SELECT
transaction_type,
COUNT(*) as transaction_count,
ROUND(AVG(total_charge), 2) as avg_charge,
ROUND(SUM(total_charge), 2) as total_charges
FROM tx_summary
WHERE claim_count > 0
GROUP BY transaction_type;
Identify incomplete transactions:
SELECT r.source_file, t.control_number, t.transaction_type
FROM tx_summary t
JOIN import_runs r ON r.id = t.run_id
WHERE t.is_complete = 0 OR t.counts_match = 0;
Error Analysis
Most common validation errors:
SELECT code, COUNT(*) as occurrence_count, severity
FROM errors
GROUP BY code, severity
ORDER BY occurrence_count DESC
LIMIT 10;
Error rates by file:
SELECT
r.source_file,
r.segment_count,
COUNT(e.id) as error_count,
ROUND(COUNT(e.id) * 1.0 / r.segment_count * 100, 2) as error_rate_pct
FROM import_runs r
LEFT JOIN errors e ON e.run_id = r.id AND e.severity = 'error'
GROUP BY r.id
ORDER BY error_rate_pct DESC;
Error trends over time:
SELECT
DATE(imported_at) as import_date,
COUNT(DISTINCT r.id) as file_count,
SUM(r.error_count) as total_errors,
ROUND(AVG(r.error_count * 1.0 / r.segment_count) * 100, 2) as avg_error_rate
FROM import_runs r
WHERE imported_at >= DATE('now', '-30 days')
GROUP BY DATE(imported_at)
ORDER BY import_date;
Adjustment Analysis (835 Remittances)
Top denial reasons by total amount:
SELECT
group_code,
group_name,
reason_code,
occurrence_count,
ROUND(total_amount, 2) as total_amount,
ROUND(total_amount / occurrence_count, 2) as avg_amount
FROM adjustments_835
ORDER BY total_amount DESC
LIMIT 20;
Patient responsibility breakdown:
SELECT
reason_code,
occurrence_count,
ROUND(total_amount, 2) as total_amount,
ROUND(total_amount * 100.0 / SUM(total_amount) OVER (), 2) as pct_of_total
FROM adjustments_835
WHERE group_code = 'PR'
ORDER BY total_amount DESC;
Adjustment summary by group code:
SELECT
group_code,
group_name,
SUM(occurrence_count) as total_occurrences,
ROUND(SUM(total_amount), 2) as total_amount
FROM adjustments_835
GROUP BY group_code, group_name
ORDER BY total_amount DESC;
Combined Analysis
File processing summary with error details:
SELECT
r.source_file,
r.imported_at,
r.transaction_count,
r.segment_count,
CASE WHEN r.is_valid = 1 THEN 'Valid' ELSE 'Invalid' END as status,
COUNT(DISTINCT e.code) as unique_error_types,
r.error_count as total_errors
FROM import_runs r
LEFT JOIN errors e ON e.run_id = r.id AND e.severity = 'error'
GROUP BY r.id
ORDER BY r.imported_at DESC;
837 vs 835 comparison:
SELECT
SUBSTR(transaction_type, 1, 3) as tx_family,
COUNT(*) as transaction_count,
SUM(claim_count) as total_claims,
SUM(service_line_count) as total_service_lines,
ROUND(SUM(total_charge), 2) as total_charges,
ROUND(SUM(total_payment), 2) as total_payments
FROM tx_summary
GROUP BY tx_family;
Performance Optimization
Index Usage
All common query patterns are covered by existing indexes:
- Lookups by
source_file: Usesidx_import_runs_source - Duplicate detection by hash: Uses
idx_import_runs_hash - Transaction filtering by type: Uses
idx_tx_summary_type - Error code trending: Uses
idx_errors_code - Adjustment grouping: Uses
idx_adjustments_group
Query Performance Tips
Use covering indexes when possible:
-- Good: Uses idx_tx_summary_type
SELECT transaction_type, COUNT(*) FROM tx_summary GROUP BY transaction_type;
-- Better: Add needed columns to reduce row lookups
CREATE INDEX idx_tx_type_charge ON tx_summary(transaction_type, total_charge);
Avoid full table scans on large datasets:
-- Inefficient for large tables
SELECT * FROM tx_summary WHERE total_charge > 1000;
-- Better: Filter by indexed run_id first
SELECT * FROM tx_summary
WHERE run_id IN (SELECT id FROM import_runs WHERE imported_at > '2026-01-01')
AND total_charge > 1000;
Use EXPLAIN QUERY PLAN to verify index usage:
EXPLAIN QUERY PLAN
SELECT * FROM errors WHERE code = 'MISSING_SE';
-- Should show: SEARCH errors USING INDEX idx_errors_code (code=?)
Write-Ahead Logging (WAL)
WAL mode is enabled by default (journal_mode=WAL), which provides:
- Better concurrency: Readers don't block writers
- Improved performance for write-heavy workloads
- Atomic commits across multiple tables
Trade-off: Creates additional -wal and -shm files alongside the database file.
Checkpoint control:
-- Manually checkpoint WAL to main database
PRAGMA wal_checkpoint(TRUNCATE);
Vacuum and Maintenance
After deleting many import runs:
-- Reclaim disk space and rebuild indexes
VACUUM;
-- Analyze table statistics for query planner
ANALYZE;
Schema Migration
ExactEDI automatically manages schema migrations when opening a database. The current schema version is tracked in the schema_version table.
Manual Schema Inspection
Check current schema version:
SELECT version, applied_at FROM schema_version ORDER BY version DESC LIMIT 1;
View table structure:
.schema import_runs
.schema tx_summary
.schema errors
.schema adjustments_835
List all indexes:
SELECT name, tbl_name, sql
FROM sqlite_master
WHERE type = 'index' AND tbl_name NOT LIKE 'sqlite_%';
Future Schema Versions
Schema migrations are applied sequentially. When schema version 2 is released, databases at version 1 will automatically upgrade when opened with run_migrations: true.
Migration scripts preserve all existing data and maintain backward compatibility for read operations.
Data Retention
ExactEDI does not automatically delete old import records. Implement retention policies as needed:
Delete imports older than 90 days:
DELETE FROM import_runs
WHERE imported_at < DATE('now', '-90 days');
-- CASCADE deletes associated tx_summary, errors, and adjustments_835 records
Archive old data:
-- Export to separate database
ATTACH DATABASE 'archive_2025.db' AS archive;
INSERT INTO archive.import_runs
SELECT * FROM import_runs
WHERE imported_at BETWEEN '2025-01-01' AND '2025-12-31';
-- Repeat for tx_summary, errors, adjustments_835
-- Delete from main database
DELETE FROM import_runs
WHERE imported_at BETWEEN '2025-01-01' AND '2025-12-31';
DETACH DATABASE archive;
Limitations
No PHI Storage
By design, the database schema excludes all PHI fields:
Not stored:
- Patient names, dates of birth, SSNs
- Provider names and addresses (only NPIs)
- Specific diagnosis codes (only counts)
- Specific procedure codes (only counts)
- Claim-level detail beyond summary statistics
Stored (PHI-safe):
- Transaction counts and types
- Financial totals and averages
- Validation error types and frequencies
- Adjustment code aggregations
Statistical Summaries Only
The schema stores aggregated transaction-level data, not segment-level detail. To analyze individual claims or service lines, parse the original X12 file using the parse or explain commands.
No Real-Time Queries
The database is populated during import operations. Changes to source X12 files are not reflected in the database unless re-imported.
Integration Examples
Python
import sqlite3
conn = sqlite3.connect('analytics.db')
cursor = conn.cursor()
# Find all 837P transactions with high charges
cursor.execute('''
SELECT r.source_file, t.control_number, t.total_charge
FROM tx_summary t
JOIN import_runs r ON r.id = t.run_id
WHERE t.transaction_type = '837P' AND t.total_charge > 10000
ORDER BY t.total_charge DESC
''')
for row in cursor.fetchall():
print(f"{row[0]}: Transaction {row[1]} - ${row[2]:,.2f}")
conn.close()
Shell Script
#!/bin/bash
# Daily import and error report
db_path="analytics.db"
today=$(date +%Y-%m-%d)
# Import today's files
for file in /data/incoming/*.x12; do
exactedi import "$file" --db "$db_path"
done
# Generate error report
sqlite3 "$db_path" <<EOF
SELECT
code,
COUNT(*) as count,
GROUP_CONCAT(DISTINCT source_file) as files
FROM errors e
JOIN import_runs r ON r.id = e.run_id
WHERE DATE(r.imported_at) = '$today'
AND severity = 'error'
GROUP BY code
ORDER BY count DESC;
EOF
Support
For schema-related questions or feature requests:
- Professional tier: support@exactedi.com
- Enterprise tier: Use dedicated support channel
- Include database schema version and query examples when reporting issues