LDraw.org Discussion Forums

Full Version: T-Junctions re-visited
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I can see this being very useful, and I agree that it should be relatively easy to update a small number of files (primitives, mostly). However, care does have to be taken, and there may in fact be one more useful setting, which would be semi-flat. (I'm not sure of a good term for this setting.) A box primitive that doesn't have edge lines on all edges would be semi-flat, as would similar rect primitives. The box/rect itself is always flat, but it may adjoin curved surfaces, in which case the surface normals for those curved surfaces should be pulled to match the box's/rect's surface normals.

Take a part like 33243, for example, and look at the first rect3.dat in the file. This rect3.dat demonstrates what I mean by semi-flat. The correct surface normals for the rect itself are fixed (perpendicular to the rectangle). However, the curved surface that meets it then needs to have its normals adjusted to match those where the two meet.
Hi Travis,

I think your "semi-flat" category is a sub-set of my "smooth" category...the key here is that it looks flat until you look at the neighbors, right?

My smooth category basically means "participates in smoothing after flattening". To this end, it's not really necessary for a program to pre-smooth the part; the part is going into the smoothing process later with neighbors considered, and the smoothing algorithms I've seen tossed around don't benefit from pre-smoothing some vertices.

So while semi-flat may be semantically different (e.g. "participates in smoothing after flattening, but flat on its own") I'm not sure it's a huge win to define it. There's certainly no harm in defining it if we want to be more specific.

But I definitely agree that rect3 is _not_ a candidate for "FLAT" tagging even though it is, ironically, flat. Because it is missing an edge line we must be pessimistic and assume that another part will join, of which case 33243.dat is a perfect example.

By comparison, rect.dat could be tagged flat. Because the quad is bounded on four sides by solid lines, we can safely assume that it was meant to be flat shaded.

Yes, it is a sub-set of your "smooth" category. I proposed it as being useful due to the fact that it can improve smoothing results (as opposed to improving smoothing performance, which is what FLAT and SMOOTH_ATOMIC do). In my example, without semi-flat (or something similar), the normals at the top of the rect3 would be pulled half-way up to meet the normals on the cylinder. This will cause that bottom face to disappear visually into the cyllinder, when in fact that face is flat in the actual part.

I'm not sure that adding a fourth category to your proposal is the best way to handle this case, but it seemed like it would be straightforward to do so.
Ben Supnik Wrote:Yep - this is exactly what I saw when I had triangulation on. But I think in theory a weighting by arc-angle could fix this. Two of the three triangles at a given vertex will share 90 degrees of "span" and the other will have 90 to itself - by this weighting, normals will go back to vertical. I don't know what the effect would be on other shapes.

When implementing LDView's smoothing, I actually considered using the un-normalized cross product as the input to the average calculation. Since the magnitude of the cross product is proportional to the area of the triangle, I think this would result in correct surface normals in the case of a cylinder. However, in order for it to work, quads would in fact have to be split into triangles, and my smoothing code acts on the quads.
Note: using un-normalized cross products as input gives higher weight to bigger triangles at the smoothed point. Conceptually, I think this is actually better behavior than simply calculating the average of the normalized surface normals.

However, I relaize that my statement in the above post is wrong. It will still result in improper smoothing, because the single triangle will only have half the weight of the other two. The problem is that the "weight" needs to ideally be based on the surface area of the triangle involved, plus all adjacent co-planar triangles.
Oh - I understand - it's an orthogonal purpose: to improve the quality of normal generation. (My "flat" statement was just meant as a cheap way to save CPU time by no-oping the smooth on a known flat piece.)

Two thoughts:
1. It may make sense to keep meta extensions that improve processing speed separate from meta extensions that improve normal quality; any time we add logic to make the normals look nicer, it may conflict with all other such extensions, so knowing every extension that can change the normal will be necessary for sanity.

2. If you simplify my scheme by not special-casing all-flat geometry, I've created basically "smooth-self" and "smooth-with-parent". It turns out that you can compute that cheaply and correctly on the fly for at least a few key primitives; when building the connected mesh, a mesh is smooth-self if it has zero edges that are 'open' (meaning connected to neither a line nor another triangle). stud.dat and all groups and rect.dat correctly parse as "sealed" - rect2.dat and rect3.dat parse as unsealed.

Parsing for complex parts like animals can give false "unsealed" results, but this doesn't matter- the key is to identify that heavily used primitives like studs are entirely self-sealed and don't need to be smoothed with their parents.

This means that not only can we create a faster t-junction fixer without meta commands, but the results of the seal test could be used to generate a candidate list of parts to get the META command should we choose to implement it.

3. we've had at least two other proposals for improving normal quality besides noting flat-but-shared normals: allowing the normal to come from a higher-level primitive (specified via a meta command) and allowing explicit normals for some polygons. This makes me think that normal quality, while important, might be orthogonal to the overall algorithmic problems (e.g. given no meta commands, can we come up with decent normals quickly).

What are you two doing to me, I've just rewrote the vertex processing code for LDCad this weekend Wink

In regards to meta-ing things up:

I like the opt-out approach Ben is suggesting, but I also would like to do some more coding/testing to see how far the real-time (brute force) approach can be pushed, before adding any new meta's.

One thing we might be overlooking is the fact parts also need to be BFC certified for reliable smoothing. This seems to be the biggest problem in my current setup.
LDView auto-flips the smoothed normals to deal with non-BFC geometry. It seems to work fine.
How do you know which normal in the' to be averaged' set must be used as a guidance ? You could end up flipping all of them the wrong way.
Maybe the main problem with 3626bpn1.dat is that the conditional lines in the face are missing?
Bit of a side track but with all this real time processing, when does something actually qualifies as being to 'to slow' ?

For example what's an acceptable time for the 48x48 (4186.dat, 553.000 vertices). My current version takes 45 seconds to do unique point tests and angle based smoothing.

This is just about acceptable (to me) but I still need to include type 2 lines during the smoothing tests and optionally do the whole t-junction thing before hand (although I'm probably putting a limit on part size for that, e.g. <25000 vertices) .

In short when will things be fast enough for general use, taking in account these waiting times only occur once during the programs run (even once per installation if you consider disk caching the results).
Roland Melkert Wrote:What are you two doing to me, I've just rewrote the vertex processing code for LDCad this weekend Wink

Yep - I suspect that I will have written smoothing for Bricksmith completely twice over by the time this is all done. But ... such research is probably necessary to fully understand the problem; you and anyone else on the leading edge are essentially doing that research.

Quote:I like the opt-out approach Ben is suggesting, but I also would like to do some more coding/testing to see how far the real-time (brute force) approach can be pushed, before adding any new meta's.

Agreed -- a perf optimization META is pretty useless if there isn't a perf problem. I think at the rate we are rapid-prototyping and the rate that spec change are approved, we'll have good test apps in plenty of time. :-)

Quote:One thing we might be overlooking is the fact parts also need to be BFC certified for reliable smoothing. This seems to be the biggest problem in my current setup.

I can confirm what Travis says: neither BFC certification in the model change nor BFC awareness in the renderer are necessary to smooth. This was a good thing for me...BrickSmith's rendering path ignores BFC completely. :-) (And generally it's nice that smoothing isn't dependent on BFC cert being 100%.)

I prefer to think of the problem in terms of 1-sided and 2-sided geometry. A BFC-aware program processing BFC cert geo is essentially creating a series of one-sided triangles - the program can use the BFC invert stack to reverse triangle winding so that the end data set is entirely CCW. A BFC-unaware program draws everything two-sided and must cope with back-facing normals.* I mention this because a 2-sided lighting program must somehow be able to cope with wrong-facing normals already, so having some normals face backward without smoothing is not a problem -- in fact, every normal faces the wrong way half the time as you rotate the model.

Two triangles face in opposite directions if they have a common edge whose vertices go in the same order in both triangles. So...um...

  / | \
/ 1|2 \

If you have triangle ABC and triangle SPAM CONTENT then the common edge BC is in opposite order between the two tris (BC in the first and CB in the second). Thus you know they are wound in the same direction for a given plane, and the crease is measured via the dot product of the CCW normal of each. (In other words, this is the 'sane' case.)

But if you have triangles ABC and triangle BCD, nwo the edge BC is in the same order in both triangles...one is CCW and one is CW (for a given viewpoint). In this case you need to do one of two things:

- One-sided case: treat them as not connected - they're not visually connected or
- Two-sided case: treat them as flipped.

So in BrickSmith's case, when I find this common edge I mark 1 and 2 as "flipped neighbors", not "regular neighbors." When I calculate the normal of B1 (that is, vertex B, for triangle 1), I flip the normal of 2 to get a normal facing in the direction that 1's face normal was facing, but smoothed. Then when I go back and calculate B2 (that is, vertex B for triangle 2), I flip the normal of 1 to get a normal facing in the direciton that 2's normal was facing, but smoothed.

Since the smoothing function is the same for B1 and B2, I get the same normal, but in opposite directions, and the lighting looks continuous since the lighting is two-sided.

My indexing-consolidation pass runs _after_ all normals are calculated (so that I only 'merge' 100% smoothed vertices). In this case, B1 and B2 both go in my VBO separately because their normals are different.

This case happens most often in things like mirrored sw minifig heads - every 'seam' at the mirror line is a flipped neighbor case. If I were to track BFC cert and keep an eye on the matrix inverts, I could have simply taken the mirror-half of the minifig head and flipped the order of all vertices.

* Allen originally solved this with a clever trick: the lighting environment is mirrored around the XY plane, so back-facing triangles are lit from behind and appear correct. With the new shader-based renderer, I simply flip the noraml in-shader if it is facing the wrong way in screen space.
That's really a tough call. However, LDView's existing curve smoothing takes up a significant fraction of the load time for any given model. I just did tests with a 10179 Millennium Falcon model, and when I turned curve smoothing off, the model loaded in around 4.5 seconds. When I turned curve smoothing on, it took around 15 seconds. In both cases I was starting from a freshly launched LDView, so the application itself didn't have any caches. However, I had already loaded the model prior to the tests, so the disk cache almost certainly had all the parts in it.

My current curve smoothing code is based purely on detection of conditional lines between surfaces that are intended to be smooth, which is not something you are doing, and not something any other app is likely to do (since detection of edge lines to indicate creases, along with an angle threshold, is almost certainly a better solution).

My curve smoothing takes even longer if primitive substitution is disabled in LDView, because LDView then has to do curve detection and smoothing on curved primitives. With primitive substitution enabled, curve smoothing is completely skipped for primitives, since they already have accurate surface normals.

If an app chooses to cache calculated things (like surface normals) to disk in an intelligent fashion, then that puts a much looser constraint on the required performance of the surface normal calculation algorithm. Having said that, intelligent caching isn't exactly a trivial feature to implement.

So, I'm not really sure what a good answer to your question is. However, 45 seconds seems like an awfully long time to load a single part (although that's mitigated by the fact that the vast majority of parts should take dramatically less time).
I do the unique tests first, optional (pending) T-Junction fixes are second and the final step takes care of generating normals (and thus extra vertices). This setup is the result of my rewrites this weekend, it gives me the most playing room for generating extra lookup tables etc in the early stages.

The way I do smoothing at the moment would require all kinds of extra tests (like the bc cb ones you mentioned) for it to work correctly on non BFC parts. So at the moment I'm assuming everything is BFC. When the results on BFC-ed parts are 'perfect' I'll take a look at those pesky nocertify ones to see what needs to be done extra.

I know on the part level if something is bfc (whole mesh is indeed ccw in such a case) or not, so in the end I could basically add two algorithms for handling smoothing, one for bfc (faster) and one for non bfc (probably much slower).

Even with this, non bfc parts don't actually look much worse at the moment, they only suffer from the dark/light spot here and there due to the mirroring you mentioned.
Thanks for the info,

Yes the 45 secs is at the very top, most stuff loads in a blink of an eye, I do have one more thing to try (extra lookup tables) that potentially dramatically lowers the time for large parts (but might slightly increase it for smaller ones). But the added stuff will bring it up again.

On the other hand I like tinkering with these kinds of things more then the actual drawing of models etc Smile
Depending on your code structure, I think adding non-BFC support should be relatively cheap.

If you do not use an index structure to detect common edges, it will be a 2x cost - you'll have to try every edge twice, once in each direction.

If you do use an index structure to detect common edges*, it will be less than 2x, because the index lookup (the expensive part of the test) won't need repeating.

So I think that once you get into heavy optimization the non-BFC 'cost' will melt away a bit. :-) But your 'get it working first' approach is good I think....

* The idea is that with an index, you can easily find "all other vertices Vs that have the some location as V0" in something like logarithmic time, rather than linearly trying every vertex. For two tris to have a common edge, they will have a matched vertex via the index, regardless of winding - it is just a question of: once you find one matched vertex, do you go CW or CCW to find the other?

I use per index processing, so a nice hash table should make a huge difference. Haven't added that jet though, I don't want to spend heaps of time debugging a glitch to discover a bug in one of the support list classes Smile

In the mean time, feast your eyes on this:

[Image: smoothTest2a.png]

LDCad smoothing mark 2.0 Smile

This now also uses edges (type 2 only) to limit smoothing and the general processing also preserves the quads now. I disabled the T-Junction stuff for now.

This pretty much looks identical to LDView when using the same ldr, although LDView loads it faster (for now Wink )

Next weekend I'm hoping to crank up it's performance to acceptable levels.
For reference I use a std::map for my vertex lookups. My key is a custom class that uses integers to store (x * 100, y * 100, z * 100), and implements operator<. See the TREVertexKey class here:


You could do something similar with your adjusted vertices. The std::map class uses a balanced binary tree, so has a lookup performance of O( log n ). Insertion is also O( log n ), but the balancing/heap allocation means that actual time to insert is longer than to do lookups. To be honest, I wouldn't know where to start to write a good hash function for a 3D vector to store them in a hash table.
Hi Travis,

Unless I'm misunderstanding something, I think your rounding algorithm has a mistake. It will round 0.018 to 1 rather than 2 as desired. I would have thought TRE_VERTEX_KEY_ROUNDUP should just be 0.5f.

Also, for negative numbers it appears that (long) may round up rather than down so that

#define iround(x) (x<0)?((int)(x-0.5))Sad(int)(x+0.5))

is better

Quote:In the mean time, feast your eyes on this:
All thumbs up!
Hi Travis,

I'm using a related technique: I have a single array of vertices, but they are sorted by a "lexicographal" comparison function (that is, sort by X, if X is equal sort by Y, if Y is equal, sort by Z, etc.).

This gives you similar performance times as the balanced binary tree - NlogN to get the index built (due to sort time) and log-time lookups via a binary search.

The big thing you lose with the array is cheap insertion of "more stuff." But in the case of my code, I know the total vertex count before I even start working (because I have the unsmoothed directive and can count up the triangles and quads and lines) so this is acceptable.

The other thing I discovered is that, given a vertex in the array, it is a bit faster to find the range of equal vertices by linear search (E.g. just walk in both directions in the array) than to do a binary search of the whole sorted array. I chalk this up to locality - more cache hits if the search stays close by. I've been meaning to do a 'step' search - that is, start in the array, walk in units of N vertices, then walk back by N/2, etc. If we start N at a small number like 16 or so, we can find groups of equal value vertices in only a few steps. But...performance profiles indicate that it's a small optimization at this point; the algorithm is currently dominated by the 'big sort' at the beginning.

I'm currently sorting the _entire_ part together -- lines and tris and quads. When I had the lines separate it was faster (it's quicker to sort two small things than one big thing) but I want to have one 'pile' of vertices for snapping things together, so I think one big pile it is.

I think the big win is going to be recursion optimizations, e.g. smooth the stud, recognize in the smoothing process that the stud is "sealed" from other parts and doesn't need to be resmoothed when it is included, then smooth the parent part without the studs. I base this on the fact that (1) my code can detect that the stud is sealed and (2) small parts optimize almost instantly and (3) the big baseplates minus studs have similar vertex counts to small parts. It's a big refactor to actually get the code working this way though; I'm going to finish T junction repair first.

That cockpit looks much better with the creases 'locked'. :-) :-)

Here's a BrickSmith science experiment:

[Image: shiny_parts.png]

(Parts scaled to get them all to fit on screen.)
That looks so nice. Thanks!
Thanks for the heads-up. I will investigate. When posting my link, I glanced at the code, and the rounding behavior didn't really look right to me either, but I didn't take the time to try to figure out what my original intent had been. (I think perhaps it should have been used as (v.x + ROUNDUP) * PRECISION, but in that case, just using v.x * PRECISION + 0.5 would have been easier. And your comment about negative numbers being broken is obvious in retrospect.) The more I think about it, though, the more I wonder if rounding even makes sense. I'm not using the values stored in the keys for anything other than lookups, so I'm not sure that rounding even adds any value.
Hi Travis,

If you have a multi-map, then the only property you need on keys is that if (a==b) then (a' == b') for key-gen - that is, all equal keys have to encode the same way.

If you are using a non-multi-map then your map is auto-consolidating equal points and you must ensure that if a != b then a' != b'.

In particular, the 'iterative' rounding aglo suggested somewhere else in this now massively-out-of-control thread requires that rounding be done by a series of distance comparisons...any kind of "just round the numbers" algo fails because two nearby points that are just on opposite sides of the 'fall line' will round to different 'snapped' grids.

I had this kind of 'grid' rounding in my first version of smoothing and it can fail pretty badly when key vertices round away from each other and not to each other. I'm poking at iterative rounding this week.

Thinking about this some more, I think that what my current code is designed to do is reduce the number of occurences of floating point innaccuracies causing problems (with a bug of not checking the sign first). If the number is 1.03, that might end up in floating point as 1.02999999, or 1.03000001. Ideally, I'd like both of these to produce the same key value. My current code multiplies by 100 to produce 102.999999 or 103.000001, then adds .005, which produces 103.004999 or 103.005001, both of which get converted to 103. (I think this works out for negative numbers also, but the result would be -102, instead of -103 as normally expected. Given their use as keys, where the actual values don't really matter, this probably doesn't hurt anything.)

The expectation here is that LDraw files start out with fixed point coordinates, and very often, the transformed coordinate will also be very close to fixed point values (due to transformation matrices often being simple translations, or 90-degree increment rotations). So I wanted to prevent floating point innacuracies from separating two points from each other that were expected to be together.

It could well be that all of this was completely wrong-headed, and doesn't in fact do the job that it's intended to do when dealing with real-world parts.
Thanks. My message below (which I was writing when you posted) is my explanation of how I think the apparently broken rounding of my keys is actually intentionally designed to hopefully avoid many instances of two keys getting pulled apart that should have been together. I'm not sure how well it works, though.
Travis Cobbs Wrote:The expectation here is that LDraw files start out with fixed point coordinates, and very often, the transformed coordinate will also be very close to fixed point values (due to transformation matrices often being simple translations, or 90-degree increment rotations). So I wanted to prevent floating point innacuracies from separating two points from each other that were expected to be together.

Ah - I think that the expectation is correct for untransformed locations but likely to be incorrect for transformed ones.

For the untransformed point what you've basically said is: users shouldn't use more than 0.001 LDU, so I'll set up a grid of that spacing, with the gridlines at the half-way points between the canonical values, then bucket. And that makes sense.

But once the transform _stack_ is applied, how do you know where the 'grid lines' in the theoretical 1/1000th LDU grid are? Ignoring scaling, I think that just rotations will hose you. The result of the rotations may be totally misaligned with the grid, and thus two points that were near an even mili-LDU (mLDU?? :-) and would round together now span a 'split point (the points half-way between the canonical values where rounding has to go one way or another).

In other words, I think for untransformed parts your code works but for transformed ones some small sub-set of points that should have been locked together will be pulled apart.

Very nice, I like the metal look. And environment mapping? show off Smile
Travis Cobbs Wrote:You could do something similar with your adjusted vertices. The std::map class uses a balanced binary tree, so has a lookup performance of O( log n ). Insertion is also O( log n ), but the balancing/heap allocation means that actual time to insert is longer than to do lookups. To be honest, I wouldn't know where to start to write a good hash function for a 3D vector to store them in a hash table.

This is the reason I do the unique tests first, because that way I only have to work with the single integer index values for the smoothing and T-Junction fix.

I'm also using pointer lists so insertions only have to move 4 bytes per vertex, instead of 12 or more. It also helps memory fragmentation during all the prep work. When all is done I generate a 'flat' GLfloat array for OpenGL and free all the lookup lists.

The smoothing it self could probably benefit from some additional hash-ed look ups. I was planning to write custom classes for those based on my base list classes. Part of my hobby with LDCad is writing all these support classes my self, so I'm hardly using std::vector etc.
Yeah - I grabbed a cube map of some sky + clouds off the web; since the lighting is already in shader, doing a cube map lookup is sort a cheap gag. :-) The math for the reflection calcs (they're slightly different from regular normal mapping can be found here:


In the long term I'm thinking: diffuse and specular lighting can be built entirely off of cube maps, by doing a pre-processing stage on an environment map. If you look at this:

[Image: 10_irradiance_02a.jpg]

The right two images can be used to replace lighting equations, and they are entirely derived from the left image. The advantages of this technique are:

- The lighting looks 'the same' between metal parts and non-metal parts, because the diffuse/specular lighting came from the reflection of the room.

- The lighting model is complex, soft, and diffuse. I think most of our experience viewing lego bricks is indoors, so a fair amount of the light has bounced off of the ceiling, white walls, come from a window (an 'area light for our purposes, etc.). So a cube map may get us to an accurate 'room' environment faster than creating a gajillion point GL lights.

While talking about smoothing you must also consider the quality of the smoothing.
You can take some kind of really GOOD smothing at the cost of loading time, or just ignore smoothing having short timings.
In the middle you can have some kind of deals with these parameters...

In SR3DBuilder the 4186.dat part loads in 3.5 seconds on a Centrino 2 2GHz processor and the quality is good enough compromise.

I can give some useful hints about performance on that part:
- using indexed primitives reduces from about 330.000 vertex to about 110.000
- caching already optimized parts in memory allows loading the same part in less then 0.1 sec
- to averate the normals in a vertex I consider the angle between the current vertex normal already computed and the new one I'm going to test. If the angle between them is <45° then the new normal value will be averaged, otherwise a new vertex in the same position is added with the new normal value. This is not the best quality in some cases (part 32197), but is good enough in most of them
The method you describe is about the same as the first version I tried, it indeed makes a day and night difference between flat shading.

My current version also includes edges to prevent unwanted rounding (e.g. 33deg slope). The raw version of this took over 40 secs for the 48x48 base plate, but after some optimizations I managed to get that down to about 4 secs. For the 'normal' parts loading is hardly noticeable (especially since I'm loading multiple parts at once in the backgroud, so you can continue working using other parts etc)

I'm curious: What do you mean by caching in memory? Are you reprepping parts for different colors or something, or do you mean disk caching in between program runs.
Caching in memory means that I read the part definition once from HD, and make calculation of normals, smothing etc once.
Then I store the worked part in memory and everytime I need it I read the worked part from memory copy instead of making all the work again.
Another memory saving approach could be to render the same part in all the instances it occours in the model giving it different color and transformation matrix, but in this case you need to change the vertex buffer to render from at each part render causing performance reduction. It maybe using one only vertex buffer containing all required parts definition and playing with starting index to render from gives better results, but you need to recompute normals for every part so not so much sure it is fast enough...
Pages: 1 2