LDraw.org Discussion Forums

I was reading about the LDView bugs and spotted this statement.

Travis Cobbs Wrote:Also, no matter how much memory you have, LDView is a 32-bit app, so it will crash if it ever needs more than 2GB (or perhaps 3GB, but I don't think so).

I've been wondering why some code I'd written kept crashing and throwing std::bad_alloc and it's very likely to be this.

So... do any of the C++ gurus around here know how to bypass the problem in general? I've got some numerical code that needs to store and manipulate massive matrices and to run on 32bit systems as well as 64bit.

Tim

PS. Helping the forum live up to its C++ query predictions Wink

First of all, if you haven't already, do a Google search for "out of core solver". That's the name of the class of general solutions to your problem.

Having said that, you may be able to give yourself a little breathing room if you're only a little bit too big. In Windows, there is a setting you can set in your app that bumps the maximum from 2GB up to 3GB, as long as the OS itself is configured to allow for that. In Microsoft VC++, the command line option is /LARGEADDRESSAWARE, and is in the System section of the Linker Project options in the IDE. My understanding is that an app compiled with this option run on a 32-bit system gets 3GB max, and run on a 64-bit system gets 4GB max, while without this option, 2GB is the max.

Note that there is another problem with 32-bit apps, which is that they also have a 32-bit address space. This means that memory fragmentation inside the app can also have a big impact, because in order to allocate a single array that is 250MB in size, your app needs 250GB of contiguous space inside its address space. If you perform a lot of allocations and deallocations of big things, you'll quickly get to a position where even though the app isn't using anywhere close to 2GB (or 3GB if the option is set), a large memory allocation will still fail. You can mitigate this problem to some degree by allocating your large arrays up front and never deallocating them, but instead reusing them.

If you really need more that 2GB (most software doesn't, but programmers seem to get more and more lazy these days Smile

) ...

The obvious solution would be to develop for 64bit os's only.

An alternative would be using some kind of swap file, although this will needs decent management classes and could be extremely slow if done inefficient (I'm having EMS back flashes here Smile

).

Something else I have been thinking of but never really tried is using multiple program instances. Because if the system runs a 64bit os every 32bit process has it's OWN 2GB limit. So using multiple instances would give you 'n' times 2GB. This approach would need a sophisticated way of communicating between processes and dividing the data.

my 2cts

Travis Cobbs Wrote:First of all, if you haven't already, do a Google search for "out of core solver". That's the name of the class of general solutions to your problem.

Excellent thanks. Always very helpful to have a proper name for the problem/solution.

Travis Cobbs Wrote:Having said that, you may be able to give yourself a little breathing room if you're only a little bit too big. In Windows, there is a setting you can set in your app that bumps the maximum from 2GB up to 3GB, as long as the OS itself is configured to allow for that. In Microsoft VC++, the command line option is /LARGEADDRESSAWARE, and is in the System section of the Linker Project options in the IDE. My understanding is that an app compiled with this option run on a 32-bit system gets 3GB max, and run on a 64-bit system gets 4GB max, while without this option, 2GB is the max.

Note that there is another problem with 32-bit apps, which is that they also have a 32-bit address space. This means that memory fragmentation inside the app can also have a big impact, because in order to allocate a single array that is 250MB in size, your app needs 250GB of contiguous space inside its address space. If you perform a lot of allocations and deallocations of big things, you'll quickly get to a position where even though the app isn't using anywhere close to 2GB (or 3GB if the option is set), a large memory allocation will still fail. You can mitigate this problem to some degree by allocating your large arrays up front and never deallocating them, but instead reusing them.

Hmmm. It's actually a linux program right now using GCC although I'm hoping it can be made portable so I'll keep that in mind (why I didn't mention specifics in my first post). I do a lot of allocating and deallocating of small arrays before I need the large array so allocating the large arrays up front may be a good way around it.

Thanks heaps for your help. I won't get a chance to try it for a week or so but I'll report on solutions when I do.

Tim

Roland Melkert Wrote:If you really need more that 2GB (most software doesn't, but programmers seem to get more and more lazy these days ) ...

The obvious solution would be to develop for 64bit os's only.

An alternative would be using some kind of swap file, although this will needs decent management classes and could be extremely slow if done inefficient (I'm having EMS back flashes here ).

Something else I have been thinking of but never really tried is using multiple program instances. Because if the system runs a 64bit os every 32bit process has it's OWN 2GB limit. So using multiple instances would give you 'n' times 2GB. This approach would need a sophisticated way of communicating between processes and dividing the data.

my 2cts

Hi Roland,

Yes I really do need that much. I'm writing physics code (true computation!) so I need 3x3 tensors of 3D representations of complex values. Which means (3N^3)^2 elements where N is largish.

With the current code I can save myself 1 (somewhat easily) or 2 (not as easily) very large arrays but that just gains me a small increase in N.

The idea of parallelising/forking is interesting. If I can find a parrallellisable eigendecomposition routine it might be a good solution in general.

And yeah... swap files... sloooooow access, annoying to deal with... but may have to be the solution Sad

Thanks,

Tim

If you're trying to solve 10,000 equations with 10,000 unknowns, then one single instance of the 10,000x10,000 matrix uses 800MB (or 400MB if you use single-precision floating point). And, while out of core solvers can deal with the problem by not keeping everything in memory, they are a fair bit slower than in core solvers which simply load up the matrices and have at it, since they keep scratch files on the hard disk to keep track of all the data they don't have in memory.

So, any time a problem requires "massive matrices" (Tim's words) to solve, they're far simpler (and faster) to solve if the matrices themselve can be wholely loaded into RAM.

Yes exactly. In fact they're complex double to make matters x2 worse. Presently I crash out when I hit 2500x2500 although I'm sure I can get this higher by culling some waste and using some matrix relationships.

I do have another way around the problem which would be more amenable to swap files (exploiting FFTs and using Arnoldi diagonalisation) but I suspect it will be slower in practise until I get really, really large so I'm canvassing all interim ideas to see what I can squeeze out.

Tim

As long as the problem can be tackled by divide-and-conquer, i.e.,
in the case that not all values are needed at all time, but instead the problem can be partioned,
I would first think of splitting the solver into a separate executable,
which then gets spawned multiple times, each instance limited by 2GB.
Your .exe could even spawn itself, so you still have just 1 executable program on harddisk,
running in multiple instances, communicating via shared memory.
But I fear it is not so simple to split your problem into smaller 2GB portions, otherwise
you would have done that already. Best regards.

Steffen Wrote:But I fear it is not so simple to split your problem into smaller 2GB portions, otherwise
you would have done that already. Best regards.

No, not easy at all. Eigendecomposition is very hard to parallellise.

Tim

just being curious, can you tell us a little more about the application you're using this for?
are you solving humanity's energy problem?
or computing reversly the path all atoms traveled from today back to the big bang?
nuclear fusion?
interstellar travel?

Nothing so grandiose I'm afraid. Just trying to efficiently (yes, this is actually efficient compared to other methods!) and accurately model van der Waals physics at the nano-scale.

For those with access to a university library you can read the basic theory:
http://prb.aps.org/abstract/PRB/v84/i24/e241108

Tim

If you're doing this for work, you might want to take a look at the Intel Math Kernel Library. It's not exactly cheap, at $400 per developer, but redist is free. Some of the PhD people where I work use it, and say it's really good. (We use it for structural analysis and design of buildings.) It's available for Linux and Windows. It supports both in-core and out-of-core solving, and you can download a trial version for free to see if it has the features you need. But if you can't get your work to pay for it, it's probably a no-go due to its price.

If you can't split the matrices them self, maybe dividing them per (set of) components helps.

For example split the base 3x3 over 3 executable's all holding one row of that base matrix inside their own major matrix. Then ether use a 4th exe or make one 'master' to fetch the components from all 3 sources per element when needed.

Also is there a valid reason for supporting 32bit? Because even the multiple exe's approach needs a 64bit os in the end to be useful. So it seems a lot of overhead just for having a 32bit exe. And using a native 64bit executable probably results in better performance with the large float types you are using.

On the other side if you set up the communication between exe's to use tcp/ip (localhost) you potentially open the door to a primitive cluster implementation Smile

This would also solve running out of physical memory (even on a 64 bit os).

Hi guys,

Just tolet you know I'm actually away at Brickvention right now so any lack of response is lack of access. Thank you all so much for your help. I'll respond properly when I get a chance.

Tim

Tim Gould

Travis Cobbs

Roland Melkert

Tim Gould

Tim Gould

Travis Cobbs

Tim Gould

Steffen

Tim Gould

Steffen

Tim Gould

Travis Cobbs

Roland Melkert

Tim Gould