Elsevier Makes Science Research Accessible in Milliseconds with Neo4j

The world’s largest publisher of academic peer-reviewed content has built a content indexing infrastructure with Neo4j’s graph to unlock new insights, link authors with readers, and advance science – across 780 billion research requests each year.

  • Reduced infrastructure requirements: from 200 nodes to a single cluster of less than 10
  • Response time for queries: under 300 milliseconds +
  • Number of queries in real time: 200,000 per minute

Elsevier is one of the driving forces behind advancing and disseminating scientific knowledge. It is the world’s largest publisher of academic literature, responsible for more than 2,700 digitized journals including The Lancet and Grey’s Anatomy. Elsevier’s challenge is maintaining a trusted, accurate, comprehensive content database while also ensuring its information is easily accessible to the millions of academic researchers and institutions who rely on these insights. 

Elsevier’s search engine receives 780 billion search requests each year from site visitors scouring the 90 million documents Elsevier stores. 95% of these searches are structured queries – highly specific, intensive searches with many parameters, such as finding all papers by an author, or all citations referencing a paper.

One of Elsevier’s core products is Scopus, an indexing tool that draws from the 90-million strong database of abstracts and citations. Scopus can quickly find relevant and authoritative research, identify experts, and provide access to reliable data, all of which are fundamental to the progression of science.

Scopus’ search is powered by Neo4j’s graph technology, but that wasn’t always the case. Elsevier’s old search engine technology was incapable of delivering the results required of it at today’s scope and scale, and as a hub for much of the world’s scientific knowledge, Elsevier sought to re-platform. To do that, it turned to graph. 

Lightning Fast Results are the New Standard

For medical researchers, part of the scientific process involves exploring relevant, existing literature. An author must first seek and discover knowledge to determine what is already known about a given topic. This process is continuous and is critical to the publication, sharing, and review of scientific research.

For years, Elsevier relied exclusively on a traditional text-based search engine to enable authors and researchers to scour its platform. But the search engine simply could not support the growing demand for structured data queries at pace. More people were searching for more complex, tailored results, but the system was not capable of making the necessary connections between data points to fulfill these intricate searches. The real challenge lay in establishing the connections between the data points, and the complexity of those connections; the solution in the insights hidden in those relationships. 

Unlocking those insights with a traditional relational database is complex because such databases must infer those relationships from a mountain of rows and columns by nature of their design. And it becomes almost impossible the more connected that data is. Conversely, graph database technology affords equal weight to the connections between data points – nodes – treating both as equals. Graph databases allow users to realize the potential of  relationships and patterns across billions of data connections quickly, enabling users to unlock new ways to solve challenges like the one faced by Elsevier.

Getting the connected data found in Scopus content to a searchable level was also arduous; first, the data was moved to a cloud platform, then to an in-memory data structure store. This would then compute all the citations and citation counts, and bring that data back into the search engine so it could be made searchable. This was an immensely resource intensive, inefficient way of doing things. Surely there was a better solution?

“We knew we wanted to put this knowledge and research into a structured data store,” says Elsevier’s VP of Product Management Erik Schwartz. “We knew about the relational power of graph. We also thought we could use graph to improve the search experience. That was the hypothesis.”

“We were looking at about three to four different graph providers,” he continues. “But we partnered with Neo4j because we knew we would have a high number of users, making an even higher number of searches. For those important runtime heuristics, we needed to hit certain performance thresholds, like sub-300 ms response times for queries. Only Neo4j could help us hit them.”

Ultimately, the switch to a search engine powered by graph was driven by a desire to meet these thresholds and metrics with an efficient, scalable solution that could handle the enormous number of queries Elsevier receives and deliver rich, linked results. 

“Working with Neo4j was an incredibly collaborative experience,” continues Erik. “They just rolled up their sleeves, jumped in, and did whatever it took to make this work.” 

Real-time Science Requires Real-time Data

However, it’s not just about response times. With its graph-powered search engine, Elsevier has unlocked other efficiencies, some of which the team did not even conceive of when starting out. “There was this intuition that we could use the graph for other use cases,” Erik says. 

His hunch was quickly proven correct. “What we learned over time was that we were able to use the graph for other use cases. And that’s where the ‘aha’ moment happened”.

One such use case relates to the article submission process in Elsevier’s Editorial Manager, a workflow tool that assists researchers in submitting their manuscripts, peer reviews, editorial reviews, and aligning their content with a suitable journal. 

Publishing a manuscript also involves finding peers to review the article. When an author submits a paper, the system looks at relevant content and recommends a set of authors to whom the submitter can reach out for peer review. 

Now, thanks to graph, the tool also allows users to go in and search by keyword to enable the user to determine if there’s a conflict of interest among potential reviewers. The system traverses the co-author and co-employment knowledge graphs to see any relationships or overlaps within the last three to five years and flags potential conflicts to the user. 

“We just couldn’t do this before,” explains Erik.

Reduction in Infrastructure Requirements with Neo4j

“We’ve run some tests that show we can handle up to 200,000 queries per minute in our production environment,” Erik explains. “We feel we can really put confidence into this system as we scale up.”

To the team’s surprise, however, new tech supporting faster response times also saved on Elsevier’s cost and hardware footprint. With Neo4j’s solution in place, a single cluster now replaces the 200 nodes required previously, equating to a reduction of infrastructure requirements down to less than 10 nodes.

The new graph database also boosts search engine visibility for content on Elsevier’s ScienceDirect platform, a reading platform for Elsevier’s 2,700 journals that offers full-text search and is where researchers can access academic content.  Author profiles are easier to find on Google; Elsevier’s graph solution thus further improves the discovery of peers and collaborators, enables easier hiring and recruitment, and ultimately makes it easier for academics to get the grants they need. It also creates opportunities for researchers to work together to solve cross-disciplinary challenges in science. 

Looking ahead, Erik sees more use cases for their improved search engine. “Recently, our team has become involved in a number of generative AI solutions which allow us to leverage our search investment,” he says.

“We want customers to be able to ask natural language queries so the model needs to understand the academic language context. That’s where we’re going next, adding a generative AI layer on top of our content.”

“But this is very much the tip of the iceberg. The possibilities are practically endless and Elsevier is extremely excited to find new ways to work with the solution we’ve developed with Neo4j”.

Get in touch

Curious about what insights you could unlock for your business with graph-powered solutions? Let’s talk – reach out, and we’ll get in touch.