How does malloc work?

malloc is a API call implemented by the operating system when we want to receive a block of memory to put data from the heap. The heap is the general pool of memory that programs can pull memory from and is different from the stack. It’s worth reading about the heap and stack if you haven’t come across these concepts.

Typically malloc could be implemented using a linked list of free memory blocks.

So when a program requests a block of say 16 bytes, then the operating system has to go and walk the list and find a block of memory of that size or chop it off a larger block.

One of the things that can happen that hurts performance is “memory fragmentation” - everything gets fragmented into small awkward sized chunks which makes it difficult to find the block size required.

The operating system might in this situation need to defragment the memory by collating smaller pieces into larger chunks etc. - it can all get very slow.

What are things that a program could do which could make it hard for the operating system to manage the heap?

Well one example is imagine you are appending a bunch of small strings into a much larger string - say a million.

The operating system is then being asked to de-allocate and re-allocate a large string which keeps on getting larger. This can things run like molasses as operating system struggles to find continuous blocks as size progressively increases.

This is why in languages like Lua and Javascript which have immutable strings the trick of appending many small strings into an array and then joining them at the end can be a big performance enhancement.

It’s also why C++ libraries for strings and vectors typically prefer to grow their size using a “geometric” function like 200% rather than incrementally increasing their size.