Scaling Thousands of Concurrent Data Grid Rows with Cell-Based Virtualization in React

As web applications grow in complexity, the amount of data they need to manage also increases. This data is often presented in data grids, which allow users to view, edit, and manipulate rows of data. However, rendering thousands of rows in a data grid can cripple performance and lead to an unusable interface. In this article, we'll explore a technique called "cell-based virtualization" to smoothly handle tens of thousands of concurrent data grid rows in a React-based web application.

The Problem: Rendering Large Data Sets

A naive data grid implementation might load all the data and render all the rows and cells at once. This works fine for small data sets, but as the number of rows grows, the interface slows to a crawl.
Some major performance issues include:

The browser needing to layout and render tens of thousands of DOM elements
Heavy memory usage to store all the row and cell data
Expensive data processing for features like sorting, filtering, and aggregation

For example, a grid with 50,000 rows and 5 columns would need to render 250,000 cell elements on top of the row markup, JavaScript memory overhead per record, event handlers, and more. This taxes the browser and leads to severe lag when scrolling and interacting.

Virtualization Basics

Virtualization techniques only render a small subset of rows that are currently visible, usually with some overflow on either side. As the user scrolls, rows are seamlessly rendered or removed as needed.
This helps performance by:

Reducing the number of DOM elements the browser must manage and redraw
Lowering memory usage by avoiding unused data
Decreasing the data processing load for grid features

However, traditional virtualization operates at the row level – entire rows are rendered or removed as a block. This still requires layout and rendering of hundreds or thousands of cells at a time.

Introducing Cell Virtualization

Cell virtualization takes things to the next level by only rendering the currently visible cells, rather than full rows. As the user scrolls, individual cells are rendered precisely where they need to be in the viewport.
For example, given a grid with 10,000 rows and 5 columns, traditional row virtualization might render 100 full rows with 500 visible cells. In contrast, cell virtualization could render just the 150 cells currently visible rather than full rows.
This further optimizes:

The number of DOM elements avoiding unused markup
Memory usage by skipping unused data
Scroll smoothness by reducing paint areas

A cell-virtualized data grid also intelligently reuses existing cell elements as much as possible. As you scroll, most new cells can be efficiently swapped into place rather than created from scratch. This caching avoids expensive DOM placement and reflows.
Here is a simplified version of cell reuse logic in React:


jsx
Copy code
// Cache of cell react elements 
let cellCache = {} 
function renderCell(row, col, data) {
  // See if we already have this cell
  let cellElement = cellCache[[row, col]]
  if (cellElement) { 
    // Reuse existing element
    cellElement.data = data;  
    return cellElement;
  } else {
    // Render new cell 
    cellElement =  
    // Save to cache
    cellCache[[row, col]] = cellElement;
    return cellElement;
  }
}

By reusing elements, wasted recreation of identical markup is avoided. This optimization starts to become significant at scale across tens of thousands of records.

Scaling to 100,000 Rows

As a real-world test case, we implemented a React data grid using cell-based virtualization with the following parameters:

100,000 rows
5 columns
1,000 pixel height
Dynamic data (filtering, sorting)

Even with this much raw data, cell virtualization provided a smooth 60 FPS scrolling experience. Memory usage remained reasonable for such a large dataset, and DOM elements were optimized by reusing cells.
Some numbers from Chrome DevTools:

1,000 visible rows – Only rows in viewport were rendered
5,000 visible cells – Individual cells rendered as needed
60 FPS – Consistently smooth scrolling
50 MB memory – Expected size without optimization
1.5 MB memory – Actual usage with cell virtualization

Compare this to a naive rendering approach which would have likely crashed the browser!

Implementation Details

Here is a high-level outline of how cell-based virtualization can be implemented:
Determine visible row range

Listen to scroll events
Calculate first and last visible row index based on row heights

Calculate visible cell range

Determine horizontal position
Iterate through visible row range
Identify first and last visible cell in each column

Here is sample logic in React:


jsx
Copy code
function updateVisibleCells() {
  let topRow = getTopVisibleRowIndex();
  let bottomRow = getBottomVisibleRowIndex();
  for (let row = topRow; row <= bottomRow; row++) {
    let topCell = getTopVisibleCellIndex(row);
    let bottomCell = getBottomVisibleCellIndex(row);
    for (let cell = topCell; cell <= bottomCell; cell++) {
      // Render cell
    }
  }
}

Populate container

Reuse existing cell elements
Create missing cells
Position elements correctly

Smooth scrolling

Debounce scroll handler
Request animation frame
Throttle data calls

By following virtualization best practices, the grid stays responsive even with 100K concurrent records!

Next Steps

Cell-based data grid virtualization opens the door to managing large, real-time datasets in web UIs. Some ideas for taking things further:
Incremental loading – Fetch additional data as user scrolls down
Remote data – Integrate with large cloud data sources
Column virtualization – Only render visible columns
Dynamic columns – Reordering, resizing, etc
Immutable data – For snapshots, time travel debugging, etc
If you need to visualize or interact with huge numbers of records, give cell virtualization a try! Proper virtualization technique can make the difference between an unusable laggy interface and a buttery-smooth user experience.

‍

Want to receive update about our upcoming podcast?

Latest Articles

View All Articles

How to design scalable ETL Workflows using Databricks Workflows and Delta Live Tables

This article explores the evolving landscape of ETL (Extract, Transform, Load) processes in data-driven organizations, focusing on the challenges faced by traditional ETL approaches in handling the ever-growing volumes of data. It introduces Databricks Workflows and Delta Live Tables (DLT) as powerful tools that offer simplicity, scalability, and reliability in ETL processes

Tech

7

min read

Implementing distributed tracing with OpenTelemetry and Jaeger for microservices architectures

Discover how to implement distributed tracing in microservices using OpenTelemetry and Jaeger. This comprehensive guide covers setup, sample microservices, and best practices to enhance visibility and performance in your distributed systems.

Tech

8

min read

How to optimize PostgreSQL Performance with pgBadger and Grafana

In this blog, we learn how to boost PostgreSQL performance with pgBadger and Grafana. Set up real-time monitoring, configure logs, and create custom dashboards to quickly identify and fix query issues.

Tech

7

min read