TERIS partner Nuix, and EDRM, the leading standards organization for the eDiscovery and information governance market, announced yesterday that they republished the EDRM Enron
PST Data Set after cleansing it of private, health and personal financial information. Nuix and EDRM have also published the methodology Nuix's staff used to identify and remove more than 10,000 high-risk items at nuix.com/enron.
The EDRM Enron data set is an industry-standard collection of email data that the legal profession has used for many years for electronic discovery training and testing. It was sourced from the Federal Energy Regulatory Commission's investigation into collapsed energy firm Enron. In early 2012, the EDRM Enron PST Data Set and the EDRM Enron Data Set v2 became an Amazon Web Services Public Data Set, making them a valuable public resource for researchers across a variety of disciplines.
"Recently, we have been working closely with Nuix to cleanse the data set of private information about the company's former employees and make the cleansed data set readily available to the community," said George Socha and Tom Gelbmann, co-founders of EDRM. "These efforts help to protect the privacy of hundreds of individuals and we encourage anyone who finds private data that we did not remove to notify us."
Using a series of investigative workflows on the EDRM Enron PST Data Set, Nuix consultants Matthew Westwood-Hill and Ady Cassidy identified more than 10,000 items including:
-
60 items containing credit card numbers, including departmental contact lists that each contained hundreds of individual credit cards
-
572 containing Social Security or other national identity numbers—thousands of individuals' identity numbers in total
-
292 containing individuals' dates of birth
-
532 containing information of a highly personal nature such as medical or legal matters.
Many items contained multiple instances and types of information. This included departmental contact list spreadsheets with dates of birth, credit card numbers, Social Security numbers, home addresses and other private details of dozens of staff members. 
The investigative team also clearly demonstrated that these items did not stay within the Enron firewall. For example, some staff emailed "convenience copies" of documents containing private data to their personal addresses.
"Nuix and our partners have conducted sweeps for private and credit card data for dozens of corporate customers and we are yet to encounter a data set that did not include some inappropriately stored personal, financial or health information," said Eddie Sheehy, CEO of Nuix. "The increasing burden of privacy and data breach regulations, combined with the strict requirements of credit card companies, make this an unacceptable business risk.
"Using the methodology we are publishing alongside the cleansed EDRM Enron data, organizations can identify private and financial data, find out if it has been emailed outside the firewall and take immediate steps to remediate the risks involved."
Nuix is currently applying the same methodology to the EDRM Enron Data Set v2, which it will also republish at nuix.com/enron.
Nuix will host a Twitter chat to discuss the release of the cleansed EDRM Enron PST Data Set on Thursday, May 23rd 2pm – 3pm ET. Nuix experts will describe the process of identifying unsecured financial, health and personally identifiable information in corporate data. Follow the hashtag #NuixChat and send in your questions beforehand to @nuix.
Download the latest TERIS white paper:
Special Blog Post By Peter vR Sternkopf, TERIS CTO
In the very early days of eDiscovery, the market was defined by several independent software providers, utilized for the most part directly by law firms. As the technology and market have changed, several of these smaller providers have either vanished or been consolidated into larger companies. The challenges of remaining sustainable include the variation in clientele (service providers, corporations, and law firms) as well as the rapidly changing industry demands. Despite thes
e challenges, and out of over 200 software applications, there are several notable software providers currently dominating the eDiscovery market.
Platform Solutions
Exterro Fusion
Exterro’s integrated, all-in-one e-discovery platform, Exterro Fusion, enables legal and IT teams to manage the most daunting complexities and challenges often encountered in the convergence of eDiscovery and information governance. Historically the main competition for PSS Atlas (IBM), Exterro similarly provides management tools for discovery workflow as well as legal holds. With the addition of further capabilities in modular format, Exterro offers a more comprehensive solution than point-based software.
Autonomy IDOL Server
This branch of HP, focused specifically on the eDiscovery market, is considered one of the primary providers within the industry for enterprise clientele.
Viewpoint (by Lateral Data a Xerox company)
Viewpoint software is an all-in-one e-Discovery platform that covers the core components of the electronic discovery lifecycle, including Collection, Early Case Assessment, Processing, Analysis, Review and Production.
ESI Processing
Nuix
Nuix's eDiscovery software specializes in Early Case Assessment (ECA) with focus on extremely fast metadata/text extraction and indexing, analysis toolset, first-pass review, production and export. Their powerful technology is geared toward normalizing an organization’s unstructured data stores, making them instantly searchable and more manageable.
eCapture (by IPRO)
eCapture can handle virtually any electronic file type and email store, and offers powerful searching, de-duping, filtering and compound document handling; designed to handle large eDiscovery projects with speed and scalability utilizing a distributed, multi-threaded and centralized management system.
Venio FPR
Venio FPR provides a single source for data culling, processing, ECA analytics, review, and production.
AccessData ECA
AD ECA is a stand-alone processing, culling and filtering solution. It focuses on the processing and first-pass review stages of the e-discovery life cycle and comes in both software and appliance formats.
Cloud Discovery
X1 Social Discovery
X1 Social Discovery is designed to effectively address social media content from the leading social media networking sites such as Facebook, Twitter and LinkedIn. In addition, it can crawl, capture and instantly search content from any website. Unlike archiving and image capture solutions, X1 Social Discovery provides for a matter-centric workflow from search and collection through production in searchable native format, while preserving critical metadata not possible through image capture, printouts, or raw data archival of RSS feeds.
NextPoint
The Cloud Preservation service automatically crawls your web properties at chosen intervals, building an archive of html source code and resources, high quality snapshots, and a robust full-text search index. The service makes it a breeze to go back in time with all of your websites, blogs, Twitter accounts and Facebook fan pages to search content, preview the site, and export data.
Audio Discovery
Nexidia
Nexidia makes audio recordings easily reviewable and replaces the costly process of manual transcription and human listening with an efficient alternative. Nexidia's patented technology quickly indexes large volumes of recorded audio using phoneme patterns - providing higher accuracy than dictionary-dependent, speech-to-text based audio search. Advanced query technology and metadata integration allows reviewers to drill directly into the audio content.
Content Analysis
Equivio
Specifically geared toward providing technology-assisted review, this company shows promise in the predictive coding market and may expand this base in the future.
Content Analyst Analytical Technology (CAAT)
The CAAT platform is a dynamic suite of technologies known as Text Analytics. It provides organization tools for classification and email analysis; concept search; and other text analytics capabilities that automate most of the human activity traditionally associated with using unstructured data.
ESI Archiving/Management
Enterprise Vault (by Symantec)
An enterprise communication and ESI archiving system that connects to Exchange, Domino/Lotus, SharePoint, file shares, PST/NSF archives, Database/ERP systems, IM systems and Blackberry SMS/PIN. Enterprise Vault enables users to store, manage, and discover unstructured information across the organization.
StoredIQ Information Intelligence Platform
The technology can discover, index, tag, classify, and act on unstructured and semi-structured data from a variety of different sources including file servers, email, archives, document management systems, and tape. Early case assessment can be performed prior to collection.
Data Indexing
Lucene (Apache)
While suitable for any application that requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching; and especially well-suited for eDiscovery software implementations.
dtSearch
The dtSearch Engine lets developers quickly add dtSearch’s proven “industrial-strength” searching, and file format and other data support, to applications.
Review Hosting
Relativity (by kCura)
As a feature-rich review platform, Relativity provides image and native file review, diverse coding options, flexible workflow capabilities, integrated productions, foreign language support, text analytics, and visual data analysis.
Xera (by iCONECT)
XERA is a review platform designed with advanced Web technologies focused on the review, analysis, and production of electronically stored information (ESI). With the release of XERA, iCONECT delivers a new standard in legal review applications for law firms and corporations that expect to interact with technology in an intuitive and efficient manner.
Clearwell (a Symantec Company)
Purchased by Symantec last year, the Clearwell E-Discovery Platform was purpose-built for eDiscovery and was one of the first companies in this market to offer very niche-specific software.
Honorable Mentions
There are also several up-and-comers who are turning their focus more intently toward eDiscovery, and may soon give the top contenders a run for their money. These include:
Microsoft: Already the major software leader for just about every other use, Microsoft continues to add more eDiscovery functionality into its products, such as SharePoint and Exchange. While it’s unclear whether a firm entrance into the discovery market is in mind, the company seems poised and ready to enter that market at will.
Google: With the advent of Google Vault, a program which offers archival and discovery functionality for Google Apps users, it’s very possible that more specific discovery software may be released in the future. Google is also extremely budget-friendly, with Google Vault running at a cost of just $50/year/per user.
IBM: Providing a range of product functionality, IBM (and its acquired PSS Atlas) offers just about everything from evidence collection and assessment to legal hold issuances. There is a primary focus on defensibility, reducing legal vulnerability and discovery costs simultaneously.
In addition, there is of course a multitude of vendors and service providers who use their own proprietary software. As the growing trend of eDiscovery continues, the list of industry leaders will only become even more impressive.
For more information and comparison tools, visit The eDJ Tech Matrix