Graphs

Graph Storage and Processing


We develop systems and algorithms for large scale graph processing and explore novel applications for graph-structured data.

Smooth Kronecker Generator

As there are a dearth of truly large, publicly available data sets, synthetic graph generators are critical for evaluating the scalability of graph processing systems. We introduced the Smooth Kronecker generator, which produces Kronecker graphs whose fundamental graph characteristics, such as degree distribution, better reflect real world data.

Papers
  1. [Smooth Kronecker: Solving the Combing Problem in Kronecker Graphs ]
Code
  1. [Smooth Kronecker Repository]
Scalable Graph Partitioning

Graph partitioning plays a pivotal role in various distributed graph processing applications, including graph analytics, graph neural network training, and distributed graph databases. A “good” graph partitioner reduces workload execution time, worker imbalance, and network overhead. Graphs that require distributed settings are often too large to fit in the main memory of a single machine. This challenge renders traditional in-memory graph partitioners infeasible, leading to the emergence of streaming solutions. Streaming partitioners produce lower-quality partitions, because they work from partial information and must make premature decisions before they have a complete view of a vertex’s neighborhood.

We introduce CUTTANA, a streaming graph partitioner that partitions massive graphs (Web/Twitter scale) with superior quality compared to existing streaming solutions. CUTTANA uses a novel buffering technique that prevents the premature assignment of vertices to partitions and a scalable coarsening and refinement technique that enables a complete graph view, improving the intermediate assignment made by a streaming partitioner. We implemented a parallel version for CUTTANA that offers nearly the same partitioning latency as existing streaming partitioners.

Papers
  1. [CUTTANA: Scalable Graph Partitioning for Faster Distributed Graph Databases and Analytics]
Code
  1. [CUTTANA Repository]
Hybrid Transactional and Analytical Processing (HTAP) Graph Databases

The rise in the amount of graph data that are being generated has prompted an increased interest in storing and processing these datasets. For over a decade, researchers have been developing myriad graph processing systems in an effort address new application demands and improve efficiency. Most graph-structured data persists in a database, but most whole-graph analysis takes place in special purposem, in-memory graph processing systems. This leads to a pipeline where analysis requires extracting data from a database, ingesting it into a graph processing system, and then running an analytical task. If the graph is larger than the memory of a single compute node, the pipeline typically adds a partitioning step, so that the graph can be distributed to a cluster, whose aggregate memory is large enough to hold the graph. This pipeline repeats whenever there is new data to analyze.

We now see databases catering to graph-structured data. Unfortunately, they do not offer performance near that of graph processing systems for whole-graph analytics. We present FlexoGraph, a best of both worlds approach; Flexograph is a single-node graph-processing layer built atop a robust, production quality key-value store that provides performance comparable to that of graph processing systems when the graph fits in memory and better than a graph database when the graph exceeds memory.

Code
  1. [FlexoGraph Repository]
On-Going Work
  1. Developing high-performance storage structures to facilitate both point queries and large-scale analytics.
  2. Building frameworks that allow such systems to adapt to variations in hardware platform, workload, and input data.
People
arrow_back Back

Systopia lab is supported by a number of government and industrial sources, including Cisco Systems, the Communications Security Establishment Canada, Intel Research, the National Sciences and Engineering Research Council of Canada (NSERC), Network Appliance, Office of the Privacy Commissioner of Canada, and the National Science Foundation (NSF).