Node.jsBatch ProcessingMySQLPerformanceBackend

Batch Processing That Cut Memory Use by 16x

Loading 30K CSV rows in one pass consumed over 2 GB per upload — two concurrent uploads crashed the server. Processing in chunks of 3,000 with per-batch transactions dropped memory to 128 MB and brought out-of-memory errors to zero.

Sep 10, 20257 min read

The upload endpoint received CSV files with 15K to 30K rows. The first implementation loaded the entire file into memory, parsed every row, and ran the operations in one pass.

At 30K records, that approach consumed over 2GB of memory per upload. With concurrent uploads from two or three brands, the server ran out of memory and crashed. Operations teams lost data. Files had to be re-uploaded.

Problem

Loading 30K records into a JavaScript array at once means keeping all 30K objects in memory simultaneously. Each record has multiple fields. Add validation state, deduplication maps, and intermediate processing objects, and you are at 2GB or more per upload.

The real issue was concurrency. A single 30K upload was borderline tolerable. Two concurrent uploads hit 4GB and the server fell over. Three concurrent uploads happened regularly during month-end processing.

There was also a correctness problem. The original code ran all inserts in one pass without transaction boundaries. A crash halfway through left partial data in the database. Reconciling that manually took hours.

Memory spike during uncontrolled full-file upload

Solution

Chunked processing with a batch size of 3000 records.

Instead of loading the entire file, the pipeline reads in groups of 3000, processes each group within a transaction, and moves to the next. The maximum memory footprint at any point is roughly 3000 records plus the operation queues for that batch.

For lookups within a batch I used JavaScript Map instead of array operations. Building a Map from existing record IDs before the loop means each lookup is O(1). At 3000 records per batch, this keeps per-batch processing time predictable.

The transaction boundary wraps each logical upload as one unit. If anything fails after the upload starts, the entire operation rolls back. No partial data. Operations teams can safely re-upload without a cleanup step.

Chunked batch processing with per-batch transaction boundaries

Result

Memory per upload dropped from over 2GB to about 128MB. That is a 16x reduction.

Concurrent uploads from multiple brands are now stable. Three simultaneous uploads that used to crash the server now complete without incident.

Out-of-memory errors went to zero after the batching change.

The transaction boundary also eliminated the partial-data problem. Either an upload completes fully or it does not affect the database at all.

Found this useful?

Share it with someone who'd appreciate it.

https://wardvisual.com/blogs/batch-processing-memory-16x-reduction

Eduardo Manlangit Jr.

@wardvisual · 🇵🇭 Dasmarinas City, Cavite PH

Full-stack engineer. Business systems, database optimization, and operations software.