From Bottleneck to Breakthrough: Engineering 8x Faster Data Processing
Large file processing can be a daunting task, often leading to bottlenecks in computation, memory, or I/O. This post outlines a structured approach to solving these challenges, moving from naive methods to an elegant, high-performance solution that leverages concurrent programming principles.
galaxy
file-proc-main.jpeg
From Bottleneck to Breakthrough: Engineering 8x Faster Data ProcessingFrom Bottleneck to Breakthrough: Engineering 8x Faster Data Processing

Problem Definition ⚠️

The challenge is to efficiently read, process, and write a large text file without running out of memory or being slowed by I/O bottlenecks. A naive approach - loading too much data at once and writing from multiple threads - risks exhausting available RAM and causing performance-killing lock contention. The goal is to design a solution that fully utilizes CPU cores while keeping memory usage predictable and stable.

Prerequisites and Assumptions 📋

  • The input is a massive text file (e.g., 8 GB) where each line can be processed independently.
  • The machine has enough RAM to hold working buffers (e.g., 12 GB in our test scenario).
  • A multi-core CPU is available (e.g., 6 cores), allowing parallel processing.
  • We'll use a high-level, concurrent programming approach - concepts shown here are inspired by languages like Rust, but the principles apply universally.

Solutions to the Bottleneck 🚀

1. Sequential Processing: The Naive Approach

The most basic method is to process the file line by line in a single thread. This approach is safe and memory-efficient but incredibly slow. The single thread spends a significant amount of time waiting for disk I/O operations, leaving most of the CPU idle. It's like having a single chef in a large kitchen, handling every step of a recipe from start to finish on their own.

2. Parallel Processing with Shared Locks

To utilize multiple CPU cores, we can process the file in batches, but this introduces a new bottleneck. As multiple threads (our chefs) try to write their results to a shared output file (the main pot), they have to take turns. The shared file is protected by a Mutex (mutual exclusion) lock.

  • Initial Failure: Trying to load the entire 8GB file into memory can cause an OOM crash. The strings often require 2–3x more memory than the raw file size due to encoding and data structure overhead.
  • Refined Parallelism: To avoid memory issues, we can process the file in batches. A manageable portion of the file is read into memory, and then multiple threads process this data. However, the bottleneck shifts to the writing phase. If every thread tries to acquire the lock to write its processed data line by line, they spend more time waiting for the lock to become available than they do on actual work. This is known as lock contention.

3. Dynamic Batching with Buffered Writes

This approach builds on the previous one by optimizing the writing phase. Each processing thread buffers its output in a local memory buffer. Once a thread finishes a "chunk" of data, it acquires the file lock just once to write its entire buffered output to the disk.

  • This significantly reduces lock contention because the lock is held for shorter, less frequent periods.
  • It's like having multiple chefs work on different parts of a recipe, but they only wait in line to use the main pot once to add their complete ingredients.
  • However, if multiple threads finish at the same time, their local buffers can accumulate in memory while they wait, potentially leading to another OOM crash.

chunk.png Batch-Chunks parallelism with buffered writes

4. The MPSC Channel Architecture: Decoupling I/O 🧠

The most elegant and efficient solution is to completely decouple the processing from the I/O. This is achieved using a Multi-Producer, Single-Consumer (MPSC) channel: A channel that lets many different parts of your program send messages (the producers) to one dedicated part that receives and processes them (the consumer).

  • Dedicated Writer Thread: A single, dedicated thread is spawned whose only job is to handle writing to the output file. Because it's the sole entity that writes to the file, no locks are needed, completely eliminating lock contention.
  • Processing Threads as Producers: The processing threads, acting as producers, no longer write to the file. Instead, they process their chunks of data and then send their completed output buffers through the MPSC channel to the writer thread.
  • Controlled Flow with Backpressure: A key feature of this approach is using a bounded channel. This means the channel has a limited capacity (e.g., 100 chunks). If the writer thread falls behind and the channel becomes full, the processing threads will automatically pause until space becomes available. This backpressure mechanism prevents memory from accumulating uncontrollably and ensures the entire pipeline operates with predictable memory usage.

mpsc.png MPSC Channel for Decoupled I/O

This architecture is like a highly efficient kitchen: chefs (processing threads) place their finished dishes on a conveyor belt (the MPSC channel), and a dedicated manager (the writer thread) handles all the storage, creating a smooth and continuous flow.

code-block.png

Performance Comparison Summary 📊

The table below summarizes the performance metrics for each approach, highlighting the trade-offs between speed, memory usage, and CPU utilization.

summary-table.png

Conclusion 🎉

The journey from a slow, crash-prone script to a highly optimized pipeline demonstrates the power of a well-designed concurrent architecture. By strategically addressing each bottleneck - first memory, then CPU utilization, and finally I/O contention - we can achieve extraordinary performance gains. The Batch + MPSC approach is the clear winner, reducing processing time from over 32 minutes to just under 4 minutes. This robust pattern is a foundational tool for any developer tackling large-scale data processing challenges.

Other Articles you may like:

Understanding the Importance.webp
2023-07-06
Understanding the Importance of Managing Interest Rate Risk on Banking BookInterest rate risk on the banking book (IRRBB) is a growing concern for banks worldwide, and the Reserve Bank of India has recently released guidelines for its management. Read more
BASEL III LR.jpg
2024-09-01
Navigating BASEL III - LCR NSFRThe Essentials of Liquidity Coverage Ratio (LCR)Read more
Know Your Best Performing Branches SFTP Thumbnail.webp
2024-01-01
Know Your Best Performing Branches by Surya’s Funds Transfer Pricing (FTP) Funds Transfer Pricing (FTP) is a sophisticated mechanism employed by banks and financial institutions to allocate and measure the profitability of funds transferred between different business units and product lines within the organization. At its core, FTP enables banks to assign economic value to the funds they use and generate, facilitating a granular analysis of profitability across various dimensions, including branches, product lines, customer relationships, and even individual accounts.Read more
galaxy
Reach out to know more
What People say?
“Indo Zambia Bank is proud to mention that we are the first Bank in Zambia, to have implemented ALM to manage our balance sheets with the help of Surya’s BALM tool. The entire product cycle from sale to customization, development and Implementation was done within 6 months to take care of our immediate needs. Along with their BALM product, we went ahead to use other reporting products like FTP, Prudential, RCSA, Register incident, BASEL II etc due to their stupendous tech and efficiency of their tools. The team from Surya has been accommodative and reactive to our changes and went along with us to deploy solutions in a time-bound manner.”
Harikrishna Bommareddy
CFO
At NBS Bank we decided to engage the services of Surya Software Systems for their Bank Balance Sheet/Assets and Liabilities management system and we are happy to share that it was a great decision. We utilize their solution to assist us on optimizing balance sheet strategies with the enhancement of information as their system produces versatile and timely reports suitable for our departmental needs. Having this system enables us to focus on strategic and regulatory balance sheet management knowing that all the assets and liabilities management reports are automated and accessible through their application.
Our experience in working with Surya has been very positive and we would highly recommend them as they are able to accommodate all client needs without compromising their service standards.
Neema Kitta Mojoo
Manager – Asset & Liability Management
In 2014, Doha Bank decided to move to a structured ALM solution and decided to implement Surya BALM. In addition, it was decided to procure a FTP system to meet the profitability measurement requirements. These systems were implemented successfully within in a short span of time in Qatar, Kuwait & UAE. A consolidator that aggregates ALM positions at the head office has also been implemented.
Surya has helped to significantly reduce the end of day processing time to under 45 minutes. Besides the central bank reporting, BALM has helped the bank produce Basel III liquidity reports. We are happy to have partnered with Surya, support from them has been reassuring.
Gaurav Dhingra
Head of Financial Risk
I have been working with Surya Software for 15 years. There were several projects for various companies as different as Street lighting control systems or Watch Retail. The capacity of Surya to understand properly the issues related to specific businesses, to answer quickly to complex proposals, and to deliver on time appropriate developments, have given satisfactory and confidence to the end-users vis-à-vis Surya.
Henri MABILLE
CIO
Surya-soft’s BALM software provides Axis Bank with a Bank-wide asset liability management system capable of handling granular ALM data for both its domestic as well as overseas operations on a daily basis as well as consolidate liquidity positions using BALM consolidator. It offers the Bank an enhanced platform to meet its liquidity and interest rate risk monitoring and analytics requirements in addition to meeting regulatory and internal reporting needs
Pravat Dash
SVP & Head (Market Risk)
“Indo Zambia Bank is proud to mention that we are the first Bank in Zambia, to have implemented ALM to manage our balance sheets with the help of Surya’s BALM tool. The entire product cycle from sale to customization, development and Implementation was done within 6 months to take care of our immediate needs. Along with their BALM product, we went ahead to use other reporting products like FTP, Prudential, RCSA, Register incident, BASEL II etc due to their stupendous tech and efficiency of their tools. The team from Surya has been accommodative and reactive to our changes and went along with us to deploy solutions in a time-bound manner.”
Harikrishna Bommareddy
CFO
At NBS Bank we decided to engage the services of Surya Software Systems for their Bank Balance Sheet/Assets and Liabilities management system and we are happy to share that it was a great decision. We utilize their solution to assist us on optimizing balance sheet strategies with the enhancement of information as their system produces versatile and timely reports suitable for our departmental needs. Having this system enables us to focus on strategic and regulatory balance sheet management knowing that all the assets and liabilities management reports are automated and accessible through their application.
Our experience in working with Surya has been very positive and we would highly recommend them as they are able to accommodate all client needs without compromising their service standards.
Neema Kitta Mojoo
Manager – Asset & Liability Management
In 2014, Doha Bank decided to move to a structured ALM solution and decided to implement Surya BALM. In addition, it was decided to procure a FTP system to meet the profitability measurement requirements. These systems were implemented successfully within in a short span of time in Qatar, Kuwait & UAE. A consolidator that aggregates ALM positions at the head office has also been implemented.
Surya has helped to significantly reduce the end of day processing time to under 45 minutes. Besides the central bank reporting, BALM has helped the bank produce Basel III liquidity reports. We are happy to have partnered with Surya, support from them has been reassuring.
Gaurav Dhingra
Head of Financial Risk
I have been working with Surya Software for 15 years. There were several projects for various companies as different as Street lighting control systems or Watch Retail. The capacity of Surya to understand properly the issues related to specific businesses, to answer quickly to complex proposals, and to deliver on time appropriate developments, have given satisfactory and confidence to the end-users vis-à-vis Surya.
Henri MABILLE
CIO
Surya-soft’s BALM software provides Axis Bank with a Bank-wide asset liability management system capable of handling granular ALM data for both its domestic as well as overseas operations on a daily basis as well as consolidate liquidity positions using BALM consolidator. It offers the Bank an enhanced platform to meet its liquidity and interest rate risk monitoring and analytics requirements in addition to meeting regulatory and internal reporting needs
Pravat Dash
SVP & Head (Market Risk)
“Indo Zambia Bank is proud to mention that we are the first Bank in Zambia, to have implemented ALM to manage our balance sheets with the help of Surya’s BALM tool. The entire product cycle from sale to customization, development and Implementation was done within 6 months to take care of our immediate needs. Along with their BALM product, we went ahead to use other reporting products like FTP, Prudential, RCSA, Register incident, BASEL II etc due to their stupendous tech and efficiency of their tools. The team from Surya has been accommodative and reactive to our changes and went along with us to deploy solutions in a time-bound manner.”
Harikrishna Bommareddy
CFO
At NBS Bank we decided to engage the services of Surya Software Systems for their Bank Balance Sheet/Assets and Liabilities management system and we are happy to share that it was a great decision. We utilize their solution to assist us on optimizing balance sheet strategies with the enhancement of information as their system produces versatile and timely reports suitable for our departmental needs. Having this system enables us to focus on strategic and regulatory balance sheet management knowing that all the assets and liabilities management reports are automated and accessible through their application.
Our experience in working with Surya has been very positive and we would highly recommend them as they are able to accommodate all client needs without compromising their service standards.
Neema Kitta Mojoo
Manager – Asset & Liability Management
In 2014, Doha Bank decided to move to a structured ALM solution and decided to implement Surya BALM. In addition, it was decided to procure a FTP system to meet the profitability measurement requirements. These systems were implemented successfully within in a short span of time in Qatar, Kuwait & UAE. A consolidator that aggregates ALM positions at the head office has also been implemented.
Surya has helped to significantly reduce the end of day processing time to under 45 minutes. Besides the central bank reporting, BALM has helped the bank produce Basel III liquidity reports. We are happy to have partnered with Surya, support from them has been reassuring.
Gaurav Dhingra
Head of Financial Risk
I have been working with Surya Software for 15 years. There were several projects for various companies as different as Street lighting control systems or Watch Retail. The capacity of Surya to understand properly the issues related to specific businesses, to answer quickly to complex proposals, and to deliver on time appropriate developments, have given satisfactory and confidence to the end-users vis-à-vis Surya.
Henri MABILLE
CIO
CLIENTS SERVED
© Copyright 2025 Surya Software Systems Pvt. Ltd. All Rights Reserved