Even so, a million pairs of numbers shouldn't use more than ~32MB, so you may be creating unnecessary variables and/or need to scope things so the GC knows when it can throw them away. (I just checked and the image you posted only has 0.61 million pixels in total, including the blacks, so something isn't right.)
Actually, the dots in your image are ~7px in size, and the gaps are bigger - you can resize down to ~100px wide without merging them, which means you can fit coords into a byte or two (so 1-2MB of memory for a million), and at the same time simplify the threshold/contour/filter process.
For the angle/scaling issue, you can use the axes - the ends should be at the same x/y - if they're not it needs rotating. Without axes... well assuming the squares are supposed to be square, you can probably still get OpenCV to detect rotation/skewing and do the appropriate calculations/transformation.
Of course, any time you find yourself nesting loops in any scenario you should take a step back and check what you're doing, especially if it's more than two levels and using the same data - there'll almost certainly be a different approach worth considering. Pretty sure detecting corner/outermost points doesn't require any nesting.
It can't be used to analyse C/C++, which isn't surprising since it'll be a separate process and different memory structure.
Might be able to use tools from NirSoft or SysInternals to do that, if necessary, but probably not to the same degree of detail/interactivity.