Can you post an example of the starting point and the goal, and what you've got so far?
My (possibly naive) approach would be to extract coordinates of all dot positions from the template and work out the ratios from the centerpoint, e.g. the top two dots are 1 unit high and 0.836 to either side, the bottom outer dots are 1.62 and 1.11 units away from centre, and so on.
I'm assuming the axes are consistently present, so you can do edge detection and/or a separate template to detect where the centre point is - otherwise I'd probably start by looking in the cleanest corner (bottom left) and trying each dot found until enough of them match to identify where the centre would be.
Once you've got an anchor point (the centre) and have identified enough dots that match the template ratios, you can calculate how many pixels are in a unit and resize appropriately.
Dunno if that's the best approach, and it's also assuming OpenCV can actually extract the coords of dots, but seems reasonable enough?
Even so, a million pairs of numbers shouldn't use more than ~32MB, so you may be creating unnecessary variables and/or need to scope things so the GC knows when it can throw them away. (I just checked and the image you posted only has 0.61 million pixels in total, including the blacks, so something isn't right.)
Actually, the dots in your image are ~7px in size, and the gaps are bigger - you can resize down to ~100px wide without merging them, which means you can fit coords into a byte or two (so 1-2MB of memory for a million), and at the same time simplify the threshold/contour/filter process.
For the angle/scaling issue, you can use the axes - the ends should be at the same x/y - if they're not it needs rotating. Without axes... well assuming the squares are supposed to be square, you can probably still get OpenCV to detect rotation/skewing and do the appropriate calculations/transformation.
Of course, any time you find yourself nesting loops in any scenario you should take a step back and check what you're doing, especially if it's more than two levels and using the same data - there'll almost certainly be a different approach worth considering. Pretty sure detecting corner/outermost points doesn't require any nesting.