Leveraging Typescript: How We Made a 60 Year Old File Format Usable
Aug 11, 2023
Some Background
Electronic Data Interchange (EDI) is a digital symphony (albeit an extremely out of tune one) of structured data communication. Established for businesses, it facilitates the electronic exchange of standard business documents — ranging from purchase orders to invoices to health insurance claims — between organizations. When introduced, EDI was a transformative tool, ushering businesses away from paper trails and into the streamlined corridors of digital transmission. Now, it is a widely used nightmare of a file format.
Translating this intricate and assenine file type to be intuitively usable can be a dev’s proverbial Gordian Knot. My job has primarily been to untie it.
A Glimpse into the Past: The Genesis of EDI
Electronic Data Interchange (EDI), a term known to invoke equal parts admiration and apprehension, originated in the 1960s as a method for exchanging business documents in a standardized electronic format. The idea was novel: instead of businesses sending paper documents to each other (like purchase orders and invoices), they could transmit these directly from one server to another. This not only led to faster transactions, reduced paperwork, and decreased errors, but also heralded a new era of digital integration.
The Architecture of EDI: A Product of Its Time
The structure of EDI, like many tech solutions, was shaped by the constraints of its era. When EDI first graced the scene in the 1960s, data storage wasn't the abundant and affordable resource we know today. It was a prized commodity, a costly affair. Computers of that age had limited storage capabilities, often measured in mere kilobytes or megabytes — a stark contrast to the terabytes and cloud storage we have at our disposal now.
Given the premium on storage, EDI had to be efficient and concise. This constraint led to its densely packed format, where every bit of data had to justify its place. As we moved into an age of digital abundance, with storage costs plummeting and capacities soaring, the foundational structure of EDI remained, carrying with it the vestiges of a time when every byte was precious.
As industries evolved, so did the complexity of their transactions. Health care, being one of the most intricate sectors, developed its own set of EDI standards. File specifications — called “guides” — like 837I and 837P (health care claims), 834 (benefit enrollment and maintenance), and 270 (health care claim status) became commonplace. These standards, although essential, also introduced layers of complexity to the EDI world.
Lost in Translation: Challenges of Ingesting Complex EDI Objects
The introduction of these schemas, while vital for standardized communication, came with their own set of problems:
Complexity: Each EDI schema has a unique structure with its nested loops, segments, and data elements. This structural intricacy makes it difficult to decipher, validate, and process these documents.
Schema Evolution: Like all standards, EDI schemas evolve. Over time, fields get deprecated, new ones are introduced, and the structure might change, requiring continual updates to the ingestion methods.
Lack of Typing: The original EDI formats were not designed with modern programming languages in mind, so they often lack the strict definitions that us type junkies love, making automatic data validation a challenge.
1 Dimension Short: EDI is a "flat" structure, meaning it doesn't inherently support nested hierarchies like modern data formats (think JSON or XML). However, EDI tries to emulate nesting using loops and segments. This emulation demands strict adherence to an EDI guide for accurate parsing. Without the guide, determining where a nested loop begins or ends is more or less impossible.
Given these limitations, the pursuit to tame the EDI beast, especially in a type-safe environment like TypeScript, is not just a coding exercise. It’s a quest for clarity, elegance, and reliability.
A Look Under the Hood
Before we delve into how we handled EDI's complexities, it's beneficial to get a firsthand look at an actual EDI file.
Wait, that’s actually not so bad! On initial inspection, the EDI sample may strike one as reasonably concise. Spanning just 34 lines, it resists the convolution often associated with deeply nested structures like JSON. There's even a hint of intuitiveness to its layout, allowing for some discernment of fields based solely on their values. Hold onto that thought.
EDI's Pseudo-Simplicity: Warning — Jump Scare
At a glance, the EDI structure above appears fairly linear, almost deceptively straightforward. The codes, while unfamiliar, aren't wrapped in convoluted syntax.
Now, let's transpose this structure to JSON. As you'll see, this translation takes our relatively neat EDI blueprint and converts it into a multi-level architectural marvel.
Prepare yourself: what was once a 34-line EDI sample will unfold into a 260-line JSON, revealing the intricate nesting and relationships hidden within the initial EDI structure.
Release the Beast
Horrifying.
The JSON format offers a detailed view of the data hierarchy, but its depth also illustrates the challenges devs face. The nesting, the pseudo-arrays, and the subtleties of relationships can make it a nightmare to navigate. It's a prime example of why understanding both the source and the transformed structure is crucial, and, more practically, why we type it to high heaven.
It’s very worth noting that this is an extremely simple claim. There are thousands upon thousands of optional fields that aren’t present in this JSON, including some unbelievably specific gems. Here are some favorites:
spinal_manipulation_service_information_CR2
ambulance_drop_off_postal_zone_or_zip_code_03
patient_snoring_intensity_modifier_08
(okay this one is a joke, but is so believable)
The Wrangling
Above, I showed the EDI and JSON for the same claim. If you’ve ever worked with EDI, you should’ve raised an eyebrow at that. That translation is no small feat, and one that I thankfully didn’t have to do in house.
We were lucky enough to come across Stedi (clever name, eh?) which has done a lot of heavy lifting for us. It’s an entire company devoted to EDI, namely dragging it into the 21st century, and it’s what we use to go from plain EDI to JSON. Getting EDI to JSON is only half the battle though — once the hidden complexity is revealed in JSON, you have to actually manage it.
I firmly believe that TypeScript is widely underutilized. A language written with tons of developer accessible typing features is commonly riddled with as any
s and exclamation marks. The easy way out is convenient at first — then projects scale. The types break down and people are left battling their own systems and scratching their heads as to why they bothered with TypeScript in the first place.
We take an alternate approach: skip the as any
s, and work relatively hard up front to ensure that we have types that actually benefit the developer. The path we take to doing so varies case by case, but in the EDI case it is as follows.
1. Schemas
Stedi allows for exporting of EDI Guides as JSON Schemas, which is our entry point to a robust type system for the JSON we pull from those files. They’re exported then immediately converted to Zod schemas. The 837P Zod schema is around 15,000 lines, some of which is shown below:
2. Basic Types
Zod schemas are converted to TypeScript types.
3. Generic Types
A lot of the methods that pass around JSON-ified EDI are very general. Methods to send EDI, to store records of transmissions, to send acknowledgment files, etc. don’t have a need to access any guide specific fields, and should accept generalized types. But many of those pass the JSON to other methods that do need to access format specific fields.
One case study in the utility of generic types for EDI is the 837 Schema. Encapsulated in the 837 file type, there are 837Is and 837Ps subtypes — both of which are claim files (”I” for Institutional, “P” for Professional). The vast majority of the time they are processed identically, but occasionally there is logic that conditions on the subtype.
Our code evolves quickly. We may need to integrate with a new partner quickly or to route certain claims to different places on short notice, and methods that originally were only used to access fields that are constant across both the I and P subtypes may need to access something subtype specific. I wanted all of that complexity wrapped up neatly into one type, so I made the 837 generic which optionally takes a subtype parameter. The definition is below.
The meat of this generic is in the definition of Schema837
, and is what allows for all three of the following usages:
There are really only two components of the generic, and both are critical. Having the passed in type, T
, extend Schema837SubType
is what allows for the first two usages (namely, the ability to specify an 837 subtype by passing in the subtype as a string). Having it default to the entire Schema837SubType
is what allows for the third usage, which is an “or” with the first two.
4. Type Guards
The 837 generic makes two extremely useful type guards elegant. For those who are less familiar with the inner workings of the TypeScript ecosystem, a type guard is a function that allows native TypeScript inferencing to be extended. This is probably best explained through an example, so here are the two 837 guards.
Both of these functions accept a schema of type Schema837
, and assert whether it is of type Schema837<”Institutional”>
or Schema837<”Professional”>
via the is
keyword.
Funny enough, you can see one example of fields that differ between Institutional and Professional 837s in **the type guards. Institutional claims hold their specific guide version in a field called version_release_or_industry_identifier_03
, whereas Professional claims put it in implementation_guide_version_name_03
, and those fields are undefined in the opposite schema type. In the dummy function demonstrateTypeInference
, the Institutional guide field is accessed once after checking the schema against the type guard, and once without checking it. The latter is caught by the linter!
The error message is rather unsightly (as many type error messages are), but does its job nonetheless. It states:
And on the eighth day He said “Let there be types.”