I prepared this case study for MarkLogic, a technology company specializing in building solutions for efficient data management.

BIG DATA AS A SCIENCE

UK Chemistry Society Unlocks 170 Years’ Worth of Data with the Power & Flexibility of MarkLogic

When the Royal Society of Chemistry found themselves struggling to manage millions of buried data files, they partnered with the MarkLogic Corporation to build a new solution. Using MarkLogic’s Enterprise NoSQL database, the RSC has made over a century’s worth of information accessible to entrepreneurs, educators, and researchers around the world.

Company Overview

Founded over a century and a half ago in the United Kingdom, the Royal Society of Chemistry (RSC) is the world’s oldest, largest organization dedicated to furthering awareness of the chemical sciences. A conglomeration of four renowned bodies—The Chemical Society, The Society of Analytical Chemistry, The Royal Institute of Chemistry, and The Faraday Society—the RSC currently has over 47,500 global members.

To strengthen knowledge of the profession and science of chemistry, the RSC holds conferences, meetings, and public events, and also publishes industry-renowned books, journals, and databases.

The Challenge

It’s a tall order to manage a single year’s worth of data—so how about 170 of them? Since the 1840s, the RSC has gathered millions of images, science data files, and articles from more than 200,000 authors. All of that information was stored in countless formats and multiple locations, and it was growing by the day.

To make matters more complex, RSC is in the process of buying the rights to The Merck Index. Widely considered as the worldwide authority on chemistry information, this renowned reference book has been used by industry professionals for over 120 years. As part of this acquisition, RSC would be tasked with publishing and maintaining the Index in an online format.

In 2010, largely due to the huge growth of social media and digital formats, the RSC launched an initiative to make its data more accessible, fluid, and mobile.

David Leeming, project manager for RSC, sums up the Society’s goal: “We needed an integrated repository that would make all of our content accessible online to anyone—from teachers to businesses to researchers. The key was finding the right technology.”

The Solution

After evaluating several major providers, the RSC chose MarkLogic as the best platform for their needs. The database offers many key benefits, among them the ability to store content as XML documents.

Given the Society’s wide variance of information—books, emails, manuals, tweets, metadata, and more—the data is rarely straightforward, which means a traditional relational database can’t accommodate it. MarkLogic’s document-based model is a much better fit. The RSC can simply load its information as-is, without having to conform to a rigid format.

As Leeming points out, “A book chapter is very different from a journal article. A relational database can’t combine the two. MarkLogic is flexible enough to handle all types of unstructured content in a single delivery mechanism, from spreadsheets and images to videos and social media comments.”

MarkLogic also enables logical associations between different types of content. Each image, video, and article is automatically tagged, allowing users to find, understand, and process the information they need. As shown in Figure 1 below, searching RSC publications is now a quick, intuitive process.

With the greater data accessibility afforded by the new MarkLogic database, the RSC’s publishing division has become much more productive. “We can now publish three times as many journals and four times as many articles as we did in 2006,” says Leeming. “And we now have the ability to build new educational programs to spread chemistry knowledge among more people.”

Since implementing the integrated MarkLogic database, the RSC has already seen a 30 percent increase in article views, a 70 percent traffic boost on its educational websites, and a spike in research activity in India, China, and Brazil.

The new platform will also be a significant benefit in the The Merck Index acquisition. “We’re eagerly looking forward to developing The Merck Index for the digital future,” says Dr James Milne, RSC Publishing Executive Director. The MarkLogic database will help to ensure the publication’s smooth, successful transition to an online format.

Although the integrated data repository has been the biggest game-changer, the schema-less MarkLogic technology has unlocked many other opportunities as well. Leveraging MarkLogic’s NoSQL database, the RSC has launched many new research journals, mobile applications, social media forums, and applications for children.

Dr. Robert Parker, Chief Executive of the RSC, sums up the major role MarkLogic has played in this successful transition. “Using MarkLogic’s big data platform has allowed us to open up the world of chemistry to a much wider and more mobile audience, while increasing the volume and quality of the research that we publish.”

» View PDF