Miscellaneous Coding Stuff
home

Order Invariant Linear Transforms:

Have you ever needed to find a linear transform of a set of data points when the sample data is not ordered? You could sort both the raw data and the target data, then do the standard least-squares linear fit, but there's an easy way which does not rely on the order of the data. In order to map “Raw” to “Target”, you just need to compute the average and the standard deviation of both sets of data, then use the following equations.

Gain = +- StDev(Target) / StDev(Raw)

Offset = Average(Target) – Gain*Average(Raw)

Note that the gain may have to be negated; you lose the sign of gain as standard deviations are always positive, but you lose that information when you sort both data sets too. Also note that if you are using Excel to compute the standard deviation, you should use STDEVP(), not STDEV().

A Simple Least Squared Error Formulation:

WIP

A* is a really nifty tool for path-finding. It is used quite a bit in games, but there are some uses for A* which I haven't seen implemented

• multi-agent scheduling (done it)

• state space is a std::bitset

• state space has inventory count information

• state space has for each operator a

• current job ID (integer)

• # ticks remaining till job is done (also an integer)

• the scheduler can assign a job to an idle individual

• or the scheduler can advance the tick count by 1 (“>wait[<] time passes...”)

• minimax-type path-finding for multiple “teams” (idea only, untested)

• instead of a single “g” cost, there is one for every team

• if it is team #0's turn, g_total = g – g – g...etc.

• Compression (untested)

• the trick is getting a good heuristic

• and a good representation of the work to date

• state could be a series of [compression type,#bytes]

Making A* Scalable:

You can also make A* very scalable (for an example of a huge problem domain: for my multi-agent-scheduling application checks 10.55 million nodes (that's the number of “next best node”s pulled off queue, there was 3.5 million nodes in the open/closed hash_map, and 0.46 million nodes left in the HOT queue), in 39 seconds on my Athlon 64 3000+).

• use a fast hash-map for the node list (both open and closed in a single map)

• makes changing a node from open to closed very quick

• I'm using a Google dense_hash_map type approach (power-of-2 size, easy to grow, requires a good hashing function)

• use a heap-on-top queue for finding the next lowest cost open node.

• Nodes are stored in “buckets”

• The buckets are stored in a std::set, so they automatically remain sorted (via the “split_point” value)

• I automatically break up a given “bucket” any time the number of nodes in it exceeds a limit.

• only the top-most bucket (with the most promising open nodes) is kept in a heap

• whenever the top bucket is depleted (or split), the new best bucket is converted to a heap

• popping out the most likely next node is extremely quick (as it's kept in a heap of a limited size)

• finding a node to update it's cost is also much faster, as you can jump straight to the bucket in log(N) time (as it is stored in a set), then you only have to search a limited number of nodes to find the one to update. When you update the node, you have to delete it and re-insert it into the whole mess, which takes care of updating its bucket, and whether that bucket is heap-ified or not.

The Generic Tree:

Especially in computer graphics rendering circles, there are a bunch of hierarchical bounding structures in use (KD, Octree, BIH, etc.), with many new ones continuing to appear. Many of these have a very similar fundamental base structure. Instead of re-writing a bunch of code for each tree type, I wrote some C++ template code for the base tree type.

• It keeps a std::vector of BoundingType nodes

• the BoundingType is one of the template parameters

• it can be an Axis-Aligned Bounding Box, or a Bounding Sphere, or whatever)

• and also a std::vector of “DataType”,

• which can be anything from 8 values and a bounding box used in a Signed Distance Field Octree,

• to a batch of triangles for your new raytracer.

• Splitting any give node into child nodes is the job of a functor, thus making the creation of different tree types trivial

• the BoundingType and the DataType and the new generic_tree all derive from a common collision_interface.

• This means you can nest trees of different types, and calling ray_intersect() on the parent will automatically work it's way down the tree(s).

• calling ray_intersect() on the tree

• starts at the root node, and checks its bounding volume for collision.

• If it collides the root node is added to a heap, storing the node ID and the hit point

• this hit point if by definition the most optimistic hit point that could happen for any of the primitives or other bounding volumes beneath this node

• the collision test in non-recursive...it just keeps popping the next closes bounding volume off the heap

• if the newly popped bounding volume is farther away then an already confirmed hit, we're done

• if this node is a parent, check each child node for intersection, adding them to the heap if the were hit

• if the node was a final leaf (i.e. Not a parent), then check for intersection on the DataType stored for that node