Heap32Next is a Windows kernel function used by OpenSSL's rand_win to generate random data via traversing a heap list.
You'd expect such a function to have O(1) performance, wouldn't you? Nope - not on Win7, at least. The performance of Heap32Next on any heap entry in heap list is dependent upon the summed number of heap entries in all heap lists.
Here is time taken plotted against the number of heap objects:
Here's one relevant stack trace:
ntdll.dll!_RtlpWalkHeap@12() + 0x3f bytes
ntdll.dll!_RtlpQueryExtendedInformationHeap@16() + 0x4f5 bytes
ntdll.dll!_RtlpQueryExtendedInformationAllHeaps@12() + 0xe5 bytes
ntdll.dll!_RtlpQueryExtendedHeapInformation@12() + 0xe7 bytes
ntdll.dll!_RtlQueryHeapInformation@20() + 0x1bc76 bytes
ntdll.dll!_RtlQueryProcessHeapInformation@4() + 0x288 bytes
ntdll.dll!_RtlQueryProcessDebugInformation@12() + 0x11492 bytes
kernel32.dll!_Heap32Next@4() + 0x4d bytes
randwin.exe!RAND_poll() Line 231 + 0x9 bytes C++
randwin.exe is my test harness, and sampling indicates that the vast majority of time is spent in RtlpWalkHeap.
Edit: this bug has been filed with Microsoft and OpenSSL.