JavaScriptPerformanceData StructuresNode.jsOptimization

Replacing Array.find() with Map for O(1) Lookups in Data Pipelines

Array.find() inside a loop that runs 20K times means up to 400 million comparisons — O(n²). The insert queries were fast; the JavaScript between them was the bottleneck. Switching to JavaScript Map for O(1) lookups made the CPU cost negligible.

Apr 5, 20265 min read

Inside a loop that runs 20K times, the difference between O(1) and O(n) lookups is not theoretical. It is seconds of actual runtime that users wait for.

The deduplication and classification steps of the upload pipeline had this problem. I replaced array operations with JavaScript Map objects and the processing bottleneck went away.

Problem

The original deduplication code used Array.prototype.find() to check if an item already existed in the processed list before adding it.

array.find() is O(n). It scans from the first element until it finds a match or reaches the end. For the first record, it checks 0 existing items. For the 10000th record, it checks up to 9999 items. The average across 20K records is checking 10000 items per lookup.

20K lookups at O(n) where n grows with each iteration is O(n2). At 20K records that is up to 400 million comparisons just for deduplication.

The same pattern appeared in the classification step, where the code checked whether an incoming record already existed in the database by scanning a fetched array.

I noticed this when profiling the upload endpoint. The insert queries were fast. The JavaScript processing between queries was slow. CPU-side bottleneck, not I/O-side.

Profiler output showing slow array.find loop at 20K records

Solution

Replace the arrays with Map objects before the loop. Use the natural key as the Map key.

For deduplication: build a Map keyed by merchandise_id before the upload loop starts. When a new record comes in, Map.get() tells you in O(1) whether it is already in the processed set. If it is, merge the quantities. If not, set the new entry.

For classification: after batch-fetching existing database records, build a Map keyed by the composite key of parent_id and merchandise_id. For each incoming record, Map.get() returns the existing row if it exists. O(1) per lookup regardless of how many existing records there are.

The Map setup is O(n). The lookups are O(1). Total complexity for n records is O(n) instead of O(n2).

Result

At 20K records, the deduplication step went from several seconds of CPU time to negligible.

The classification step dropped similarly.

End-to-end upload time improvements came from a combination of batched SQL and O(1) lookups. The SQL batching reduced database round trips. The Map lookups removed the JavaScript processing bottleneck.

If you have a loop that calls array.find() or array.includes() inside it and the array grows with each iteration, replace it with a Map. It is one of the highest-leverage changes you can make in data processing code.

Found this useful?

Share it with someone who'd appreciate it.

https://wardvisual.com/blogs/javascript-map-o1-lookup-performance

Eduardo Manlangit Jr.

@wardvisual · 🇵🇭 Dasmarinas City, Cavite PH

Full-stack engineer. Business systems, database optimization, and operations software.