Use cases
Industries
Products
Resources
Company
Data Scientist. You've heard the term. It seems that everyone has one - every large corporation, major law firm, legal services provider, every eDiscovery software company. At Reveal, we have a whole team. But who are these people and what do data scientists do?
Data science is a comparatively new discipline. The term "data scientist" was only coined in 2008. In simple terms, as noted on the University of Wisconsin Data Science Program website, "a data scientist’s job is to analyze data for actionable insights."
Data science follows a life cycle. As is to be expected with any new discipline, many, varied, and sometimes conflicting descriptions of that life cycle abound. UC Berkeley School of Information, for example, offers a model with five stages:
Others describe the process somewhat differently, such as this 10-step framework from Master’s in Data Science—Your Guide to Data Science Graduate Programs in 2021, where the authors note that "though no two data scientists will come up with precisely the same steps for their work, most data science projects follow a similar trajectory and will have at least some steps in common with other data science efforts."
There are many types of data science job titles, such as those laid out in a list from Projectpro:
Data scientists have varying and overlapping areas of expertise. Included in this list published on Medium, here are some of the skill sets you might see on a data scientist's LinkedIn profile:
The field of data science encompasses a wide range of industries and career paths. Data scientists are in the corporate world, health care, entertainment, government, and, of course, legal. According to Springboard, the three industries employing the most people data scientist roles are finance, which includes banks, investment firms, insurance firms, and the real estate sector; professional services; and information technology.
A sampling of other industries employing data scientists includes:
A data science career involves an ever-widening range of projects and provide services to all manner of stakeholders. They scrub data, investigate data, visualize it, organize it by clusters, and apply machine learning to it. They might use data and modeling to define crime hotspots and predict law enforcement needs in a city.
Data scientists have been active at the Federal level. They used address data to help respond to the devastation in Puerto Rico caused by Hurricane Irma and Hurricane Maria. They worked on building a software program to help the National Nuclear Security Administration (NNSA) proactively respond to emerging infrastructure needs by recommending building component repairs and replacements at the most opportune time. They put together a prototype through the Census Bureau’s Opportunity Project to better assess where volunteers should direct litter-clearing efforts.
Data scientists help build dashboards that allow teams to work together more effectively by, for example, visually tracking, displaying, and analyzing key performance indicators.
Whatever industry they are working in and whatever software engineering project they are working on, data scientists likely will deliver results that do some combination of the following as well as many others not listed here:
To land a position as a data scientist, it helps to have a relevant Bachelor's degree of Master's degree, such as one in computer science, mathematics, IT, statistics, or another related field. Work experience always is useful, as are other capabilities such as strong problem-solving skills, the ability to work individually and with a team, an understanding of data collection and analysis, and strong verbal and visual communication skills. Programming skills in widely-used programming languages like Python and SQL and experience with Hadoop are also useful in this field.
Reveal has a strong data science team, as far as we know the most robust in the industry. Our data science team really has two parts, the data science team itself and the AI engineering team.
Reveal's data science team is led by Dr. Irina Matveeva, Chief Data Scientist & Head of Machine Learning. One of a small number of women to lead a data science team, Dr. Matveeva is responsible for Reveal’s data science organization and applying machine learning and natural language processing approaches throughout the Reveal platform. She is an Adjunct Professor at the Illinois Institute of Technology (IIT) and has nearly a decade of both practical and academic experience in natural language processing. Dr. Matveeva received her Ph.D. from the University of Chicago. She co-chaired the TextGraphs workshops in 2012, 2011, 2008, and 2007, and is a reviewer for multiple prestigious journals and publications.
Reveal's AI engineering team is led by Dr. David Lewis, Executive Vice President, AI Research, Development, & Ethics. Dr. Lewis is responsible for artificial intelligence research, development, and ethics issues throughout Reveal's software and services. Prior to joining Reveal, he held positions at Brainspace, AT&T Labs, Bell Labs, and the University of Chicago, along with co-founding a machine learning startup and consulting on numerous legal cases. He received his Ph.D. from the University of Massachusetts at Amherst. Dr. Lewis was elected a Fellow of the American Association for the Advancement of Science in 2006, and in 2017 he and W. A. Gale won the ACM SIGIR Test of Time Award for the invention of uncertainty sampling.
Most of Reveal's data science team members (I am including both the data science and AI engineering teams) have been with the team for years. Generally they started as interns and have expanded and deepened their expertise just as we have grown our data science capabilities. They are responsible for, among other things, the AI capabilities in Reveal Review, Reveal AI, and Brainspace.
In addition to the skills they have learned on the job, our data science team members bring a wealth of academic experience with credentials including PhD, Master of Science, Bachelor of Science, Bachelor of Engineering, and Bachelor of Technology degrees from University of Chicago; University of Massachusetts Amherst; Illinois Institute of Technology; Michigan State University; University of Mumbai; Sinhgad Academy of Engineering; and Jaypee University of Information Technology.
Having such as robust data science team has enabled Reveal to build a platform powered by cutting-edge artificial intelligence and machine learning. You can see the results in Brainspace's visual analytics, in how active learning is woven into Review, and the platform's ability to work deftly with images, foreign language content, and audio files. And you can hear it in what our customers have to say about us: "Intuitive yet robust artificial intelligence", "Best in class legal technology", "Truly outstanding range of AI-driven products".
With the power and technical skills of its data science team, Reveal continues to make the platform even stronger, working every day to develop compelling solutions to the next round of challenges facing legal.
I recently chatted with Irina and Dave on eDiscovery Leaders Live, a weekly program hosted by ACEDS and sponsored by Reveal where I chat with leaders in eDiscovery and related fields. During the session, Irina, Dave and I focused on artificial intelligence in eDiscovery. We started with efforts to use AI to deliver “simple” solutions to complex problems and talked about the importance of holding the AI discussion at the right level.
Irina gave us a little background on DLA Piper’s Aiscenion, which her team worked on. Dave and then Irina offered their thoughts on the “every case is special” objections and how to respond to it. We also looked at whether a PhD or MS is needed to make effective use of AI in discovery and how much of the AI plumbing attorneys really need to understand.
Irina and Dave shared info about their teams and what those folks do, a discussion that morphed into learning about AI more generally. Finally, at my request Dave and then Irina peered into their crystal balls to offer thoughts about where AI might take discovery in the future. The video of the session is available on ACEDS social media platforms and the video and transcript of our discussion are available here.