LDraw.org Discussion Forums

Full Version: Memory issues in C++
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I was reading about the LDView bugs and spotted this statement.

Travis Cobbs Wrote:Also, no matter how much memory you have, LDView is a 32-bit app, so it will crash if it ever needs more than 2GB (or perhaps 3GB, but I don't think so).

I've been wondering why some code I'd written kept crashing and throwing std::bad_alloc and it's very likely to be this.

So... do any of the C++ gurus around here know how to bypass the problem in general? I've got some numerical code that needs to store and manipulate massive matrices and to run on 32bit systems as well as 64bit.

Tim

PS. Helping the forum live up to its C++ query predictions Wink
First of all, if you haven't already, do a Google search for "out of core solver". That's the name of the class of general solutions to your problem.

Having said that, you may be able to give yourself a little breathing room if you're only a little bit too big. In Windows, there is a setting you can set in your app that bumps the maximum from 2GB up to 3GB, as long as the OS itself is configured to allow for that. In Microsoft VC++, the command line option is /LARGEADDRESSAWARE, and is in the System section of the Linker Project options in the IDE. My understanding is that an app compiled with this option run on a 32-bit system gets 3GB max, and run on a 64-bit system gets 4GB max, while without this option, 2GB is the max.

Note that there is another problem with 32-bit apps, which is that they also have a 32-bit address space. This means that memory fragmentation inside the app can also have a big impact, because in order to allocate a single array that is 250MB in size, your app needs 250GB of contiguous space inside its address space. If you perform a lot of allocations and deallocations of big things, you'll quickly get to a position where even though the app isn't using anywhere close to 2GB (or 3GB if the option is set), a large memory allocation will still fail. You can mitigate this problem to some degree by allocating your large arrays up front and never deallocating them, but instead reusing them.
If you really need more that 2GB (most software doesn't, but programmers seem to get more and more lazy these days Smile ) ...

The obvious solution would be to develop for 64bit os's only.

An alternative would be using some kind of swap file, although this will needs decent management classes and could be extremely slow if done inefficient (I'm having EMS back flashes here Smile ).

Something else I have been thinking of but never really tried is using multiple program instances. Because if the system runs a 64bit os every 32bit process has it's OWN 2GB limit. So using multiple instances would give you 'n' times 2GB. This approach would need a sophisticated way of communicating between processes and dividing the data.

my 2cts
Travis Cobbs Wrote:First of all, if you haven't already, do a Google search for "out of core solver". That's the name of the class of general solutions to your problem.

Excellent thanks. Always very helpful to have a proper name for the problem/solution.

Travis Cobbs Wrote:Having said that, you may be able to give yourself a little breathing room if you're only a little bit too big. In Windows, there is a setting you can set in your app that bumps the maximum from 2GB up to 3GB, as long as the OS itself is configured to allow for that. In Microsoft VC++, the command line option is /LARGEADDRESSAWARE, and is in the System section of the Linker Project options in the IDE. My understanding is that an app compiled with this option run on a 32-bit system gets 3GB max, and run on a 64-bit system gets 4GB max, while without this option, 2GB is the max.

Note that there is another problem with 32-bit apps, which is that they also have a 32-bit address space. This means that memory fragmentation inside the app can also have a big impact, because in order to allocate a single array that is 250MB in size, your app needs 250GB of contiguous space inside its address space. If you perform a lot of allocations and deallocations of big things, you'll quickly get to a position where even though the app isn't using anywhere close to 2GB (or 3GB if the option is set), a large memory allocation will still fail. You can mitigate this problem to some degree by allocating your large arrays up front and never deallocating them, but instead reusing them.

Hmmm. It's actually a linux program right now using GCC although I'm hoping it can be made portable so I'll keep that in mind (why I didn't mention specifics in my first post). I do a lot of allocating and deallocating of small arrays before I need the large array so allocating the large arrays up front may be a good way around it.

Thanks heaps for your help. I won't get a chance to try it for a week or so but I'll report on solutions when I do.

Tim
Roland Melkert Wrote:If you really need more that 2GB (most software doesn't, but programmers seem to get more and more lazy these days Smile ) ...

The obvious solution would be to develop for 64bit os's only.

An alternative would be using some kind of swap file, although this will needs decent management classes and could be extremely slow if done inefficient (I'm having EMS back flashes here Smile ).

Something else I have been thinking of but never really tried is using multiple program instances. Because if the system runs a 64bit os every 32bit process has it's OWN 2GB limit. So using multiple instances would give you 'n' times 2GB. This approach would need a sophisticated way of communicating between processes and dividing the data.

my 2cts

Hi Roland,

Yes I really do need that much. I'm writing physics code (true computation!) so I need 3x3 tensors of 3D representations of complex values. Which means (3N^3)^2 elements where N is largish.

With the current code I can save myself 1 (somewhat easily) or 2 (not as easily) very large arrays but that just gains me a small increase in N.

The idea of parallelising/forking is interesting. If I can find a parrallellisable eigendecomposition routine it might be a good solution in general.

And yeah... swap files... sloooooow access, annoying to deal with... but may have to be the solution Sad

Thanks,

Tim
If you're trying to solve 10,000 equations with 10,000 unknowns, then one single instance of the 10,000x10,000 matrix uses 800MB (or 400MB if you use single-precision floating point). And, while out of core solvers can deal with the problem by not keeping everything in memory, they are a fair bit slower than in core solvers which simply load up the matrices and have at it, since they keep scratch files on the hard disk to keep track of all the data they don't have in memory.

So, any time a problem requires "massive matrices" (Tim's words) to solve, they're far simpler (and faster) to solve if the matrices themselve can be wholely loaded into RAM.
Yes exactly. In fact they're complex double to make matters x2 worse. Presently I crash out when I hit 2500x2500 although I'm sure I can get this higher by culling some waste and using some matrix relationships.

I do have another way around the problem which would be more amenable to swap files (exploiting FFTs and using Arnoldi diagonalisation) but I suspect it will be slower in practise until I get really, really large so I'm canvassing all interim ideas to see what I can squeeze out.

Tim
As long as the problem can be tackled by divide-and-conquer, i.e.,
in the case that not all values are needed at all time, but instead the problem can be partioned,
I would first think of splitting the solver into a separate executable,
which then gets spawned multiple times, each instance limited by 2GB.
Your .exe could even spawn itself, so you still have just 1 executable program on harddisk,
running in multiple instances, communicating via shared memory.
But I fear it is not so simple to split your problem into smaller 2GB portions, otherwise
you would have done that already. Best regards.
Steffen Wrote:But I fear it is not so simple to split your problem into smaller 2GB portions, otherwise
you would have done that already. Best regards.

No, not easy at all. Eigendecomposition is very hard to parallellise.

Tim
just being curious, can you tell us a little more about the application you're using this for?
are you solving humanity's energy problem?
or computing reversly the path all atoms traveled from today back to the big bang?
nuclear fusion?
interstellar travel?
Pages: 1 2