Practice Nebula Graph on Boss Zhipin, a Chinese Recruitment Platform

Business Background

Chinese recruitment platform, Boss Zhipin, uses a large scale graph storage and mining computing in its security and risk control. Boss Zhipin introduced a self-built HA Neo4j 1 cluster to handle its needs. However, when it comes to real time analysis, Neo4j works not well because it doesn’t support a daily data increase of 1 billion relationships.

We first adopted Dgraph to meet our needs. After too many tricky usages and meetings with Dgraph for half a year, we finally make up our mind to migrate to Nebula Graph, a database that fits our scenarios more. This post won’t cover the Benchmark because there are plenty of them on the forum. We will share some technical qualifications and selections, plus the comparisons between the two, which I think you are more interested in.

Data Migration From JanusGraph to Nebula Graph – Practice at 360 Finance

Speaking of graph data processing, we have had experience in using various graph databases. In the beginning, we used the stand-alone edition of AgensGraph. Later, due to its performance limitations, we switched to JanusGraph, a distributed graph database. I introduced details on how to migrate data in my article “Migrate tens of billions of graph data into JanusGraph (only in Chinese)”. As the data size and the number of business calls grew, a new problem appeared: Each query consumed too much time. In some business scenarios, a single query took up to 10 seconds, and with increase of the data size, a more complicated single query needed two or three seconds. These problems had seriously affected the performance of the entire business process and the development of related businesses.

The architecture design of JanusGraph determines that a single query is time-consuming. The core reason is that its storage depends on the external storage, and JanusGraph cannot control the external storage well. In our production environment, an HBase cluster is used, which makes it impossible for all queries to be pushed down to the storage layer for processing. Instead, data can only be queried from HBase to the JanusGraph Server memory and then filtered accordingly.