Use cases
Industries
Products
Resources
Company
In 2020, a team of volunteers gathered to take on a long-standing eDiscovery challenge: how to better enable cross-platform email duplicate identification.
The result of the group’s efforts is the EDRM Message Identification Hash (EDRM MIH), released by EDRM on January 11, 2023. Details about the project are at the EDRM DupeID page.
Reveal supports EDRM MIH and we recently added instructions for using it to our User Documentation site.
The EDRM Duplicate Identification Project was the brainchild of Beth Patterson, the director and founder of ESPconnect and an industry stalwart from Australia. To hear directly from Beth about her role in the DupeID project, check out the eDiscovery Leaders Live discussion with her earlier this year.
Beth first floated the idea for the project in 2020. The project kicked off in earnest in March 2021, with over a dozen participants. As the project progressed, the team expanded to over 20 members coming from all the constituencies you would expect: software companies, service providers, law firms, and corporate. They also came from across the globe: Australia, Finland, Japan, Israel, the UK, and the US.
The group’s initial mission was to develop a best practice specification for hashing electronic documents and data to identify exact duplicates . The hope was that this could improve efficiencies and generate significant cost savings. Over time we narrowed our focus, aiming for something more immediately achievable than our initial objective.
After more than two years of evaluation, experimentation, and testing, in January 2023 released version 1.0 of the EDRM Message Identification Hash Specification. That specification defines a process to identify duplicate email messages across platforms.
The EDRM MIH is an MD5 hash value. That value is generated from the Message-ID header field of an RFC-compliant email message.
There are some qualifiers for any tools that creates EDRM MIH values:
The EDRM MIH is intended to be used for cross-platform deduplication. Even so, there are situations where the EDRM MIH may not be appropriate to use. Examples are included in the project’s deliverables (described below). The deliverables call out 10 examples, discussed in greater detail in the deliverables:
The project’s first set of deliverables is the EDRM Email Duplicate Identification Toolkit, designed to facilitate cross platform identification of duplicate email messages. The Toolkit has six components, all available from the EDRM DupeID page. The components are:
Earlier this year, Reveal added support for EDRM MIH to Reveal 11. This means:
Now, Reveal has added an article about the specification to our support system, the Reveal 11 Knowledgebase. The article, EDRM Message ID Hash (EDRM MIH), provides step-by-step instructions for how to use the specification in Reveal 11.
As our platform processes email messages, it looks at each message’s Message-ID header line to see whether that line contains a valid Message-ID value. It is looking for text with this format:
“<” id-left “@” id-right ”>”
Let start with the following Message-ID header line from a hypothetical email message:
Message-ID: <CALckR-a8UDkRjO4xJyjd_s0GPxQWw@mail.gmail.com>
If we compare the content of the header line with the required format, we see that the content conforms to the format:
If the platform finds information in the required format, it passes the bracketed value to its EDRM HIM generator. In this example, the value it passes looks like this:
<CALckR-a8UDkRjO4xJyjd_s0GPxQWw@mail.gmail.com>
Next, the platform uses that information to generate an EDRM MIH hash value. With this example, the hash value obtained is:
1de319c276884bd0c9e2f1621ada26cc
Finally, the platform adds this hash value to the EDRM_MSGID field for the email message.
To note, The EDRM MIH may only be generated if an email has a valid Message-ID value that has not been altered in any way. Where more than one Message-ID value is contained within an email, the MIH must be generated using only the first Message-ID value declared in the parent email message headers.
For detailed steps, go to the article in the Reveal 11 Knowledgebase.
EDRM MIH values are available for a variety of uses, including as part of load files generated for document productions.
Today, we discussed how Reveal has incorporated the new EDRM Message Identification Hash into Reveal 11. If your organization is interested in learning more about how Reveal maintains its position as a leader, with its AI-powered end-to-end legal document review platform, contact us.