CulDeVu
http://djtaylor.me/
Sun, 09 Jul 2017 00:55:46 +0000Sun, 09 Jul 2017 00:55:46 +0000Jekyll v3.4.3Tools for Photosensitive Epilepsy I: Flashing<p>In 2009 engineers from the University of Wisconsin released a piece of software called <a href="http://trace.wisc.edu/peat/">PEAT</a>. This software has been used in medicine since as a tool to diagnose patients with photosensitive epilepsy. This is a disease where strobing lights cause seizures, and it currently has no cure. Things that trigger it include lightning, flashing sirens, flickering lights, and other similar strobe effects. Other triggers that seem unrelated, but are also very cyclic in nature, include rapidly changing images, and even certain stationary stripes and checkerboard patterns. This particular form of epilepsy affects roughly 1 in every 10,000 people, and around 3.5 million people worldwide.</p>
<p>Despite this number, however, sufferers have to take precautions on their own. Nearly every movie has lightning in it, or camera flashes, and videos online are often a lot worse. With the exception of UK television broadcasts (which are regulated), there are no real preventative measures to combat epileptic events.</p>
<p>The next little mini-series of posts talks about my development of a sort of real-time epilepsy protector: a program that watches the current output of a computer monitor and detects when epileptic events may be possible and then blocks the offending region of screen from further flashing. During development, all suggestions are very welcome, as I’m probably going to go about everything in the exact wrong way.</p>
<p>Without further ado, let’s jump right in.</p>
<h1 id="w3c-definition">W3C definition</h1>
<p>The W3C spec for Web Accessibility is the spec that PEAT is implemented from. It defines two types of ways that an epileptic event can occur: general flashes and red flashes. For the rest of this post I’ll only be dealing with general flashes, as I want to get the general case out of the way first.</p>
<p>So the <a href="https://www.w3.org/WAI/GL/WCAG20/#general-thresholddef">actual spec</a> is kinda confusing. It mixes units, defines thresholds for passing instead of failing, and tries to mix approximations and measurements. I’m going to take creative liberties to fix these issues and add new terms for further clarity and without loss of meaning of the actual spec.</p>
<p>Taking it all apart and extracting the actual meaning, an epileptic event can be triggered when:</p>
<def>
There are more than 3 “general flashes” in a 1 second interval
</def>
<p>A <strong>general flash</strong> is defined as:</p>
<def>
25% of the pixels in a 0.024 steradian area on the screen, relative to the viewer, all experience the same “frame flash” at the same time
</def>
<p>A <strong>frame flash</strong> is defined per pixel as what happens when:</p>
<def>
There is a decrease followed by an increase, or an increase followed by a decrease in the histogram of luminance for a single pixel. Each increase and decrease satisfy the following:
<ul>
<li>The minimum luminance in the histogram over the time-span of the increase or decrease is less than 0.8</li>
<li>The amount of increase or decrease of luminance in the histogram must be at least 10% of the maximum luminance in the “range” of the increase or decrease</li>
</ul>
</def>
<p>So an epileptic event can be triggered when many pixels agree that a flash happened at the same time. The W3C spec says “concurrently,” so I’m taking that to mean “the same frame,” even though differing by 1 frame probably isn’t discernable to the eye.</p>
<p>So you could create a function that takes only a histogram of luminance for a single pixel and be able to get out the frame flashes for that pixel. In fact, all we need to know about the frame flashes of each pixel is when they occur, so that it can be used to determine how many happen concurrently. This is the only thing that is stored about them in the program.</p>
<p>There are other details in the W3C spec that I’m ignoring:</p>
<ul>
<li>Very “fine” transitions, like 1px x 1px checkerboards alternating colors every frame are fine, because the individual transitions are not discernable to the human eye. This is a complicated one to deal with, for many reasons. However, if you ignore it, you fall prey to detecting flashes in videos of light fog moving, where a lot of the transparency comes from stochastic opacity, or from grain filters. I might address this later.</li>
<li>It gives some approximations of the areas on the screen you should be looking at for a viewing distance of 22-26 inches with a 15-17 inch screen with resolution 1024x768. This is far too specific to try to generalize to any screen, so I’m going to be doing all the calculations on my own.</li>
</ul>
<p>I’m also imposing my own restrictions on the definitions above. I have no idea if they’re correct, but they make the algorithms more approachable (or rather, it makes the problem less ambiguous)</p>
<ul>
<li>The time-span of the increases and decreases of each frame flash cannot overlap between frame flashes in a single pixel</li>
<li>A general flash is counted if 25% <strong>or more</strong> of a 0.024 steriadian viewing area experiences a common frame flash. The spec didn’t specify, but this is the only thing that makes sense</li>
<li>I’m only going to be testing fixed areas of 0.024 steradians, instead of trying to find every area that satisfies the conditions in the spec.</li>
</ul>
<p>So, just as a review of everything, the general pipeline of figuring out if an epileptic event is possible in the past second:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>(Sample phase) Create histogram of luminance over the past second of every sampled pixel on the screen
(Frame Flash phase) For every pixel:
Create a list of all frame flashes by looking at the peaks of their histogram
(Gather phase) For each frame:
Try to find a 0.024 steradian area where 25% or more pixels experienced a frame flash on that frame
If such an area is found, increment the general flash counter by 1
If the general flash counter is greater than 3, an epileptic event is possible
</code></pre>
</div>
<p>Since epileptic events are possible when the general flash counter is greater than 3, protective measures are going to be triggered when the counter is equal to 3. Also, I don’t necessarily want to just know if an epileptic event is possible, what I want to do is block it out.</p>
<p>In order to break everything up into multiple parts, I’ll tackle each stage of the algorithm in order of what needs clarification the most. I’ll cover the Frame Flash Detection phase here, and the Sample and Gather phases in another post.</p>
<h1 id="frame-flash-detection">Frame Flash Detection</h1>
<p>So what we need to do is, given a histogram of luminance for a single pixel over the course of 1 second, we need to find every frame that contains a frame flash. However, the big problem is that, most of the time, flashes happen over the course of several frames. So, we need to find a way to not only find the flashes but also associate an entire flash with a single number.</p>
<p>Recall the definition of a frame flash, re-written here:</p>
<def>
There is a decrease followed by an increase, or an increase followed by a decrease in the histogram of luminance for a single pixel. Each increase and decrease satisfy the following:
<ul>
<li>The minimum luminance in the histogram over the time-span of the increase or decrease is less than 0.8</li>
<li>The amount of increase or decrease of luminance in the histogram must be at least 10% of the maximum luminance in the “range” of the increase or decrease</li>
</ul>
</def>
<p>This definition took me 2 or 3 different tries to implement correctly, and I may still be wrong. I blame how cryptic the W3C spec is about this.</p>
<p>So this is going to be implemented exactly how it’s stated. We are going to keep a record of all of the increases and decreases that each satisfy the above constraints, and then match pairs of adjacent increases and decreases. For an example, consider the histogram of luminance:</p>
<p><img src="http://djtaylor.me/assets/2016-03-31-nopelepsy-01/graph.png" style="width:49%" />
<img src="http://djtaylor.me/assets/2016-03-31-nopelepsy-01/grouping.png" style="width:49%" /></p>
<p>In order to find increases and decreases, we’re going to look at every local extrema, compare it to the first sample, and see if the difference fulfills the above constraints. If it does, we record the result and now use that sample as the sample to compare future samples against. The code goes something like this:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>int extrema[NUM_SAMPLES];
transition pairs[NUM_SAMPLES];
int numPairs = 0;
int lastExtrema = lumHistogram[(sampleCounter + 1) % NUM_SAMPLES];
for (int i = 1; i < NUM_SAMPLES - 1; ++i)
{
int prevI = ...;
int currI = ...;
int postI = ...;
float x_prev = lumHistogram[prevI];
float x_curr = lumHistogram[currI];
float x_post = lumHistogram[postI];
float dx1 = x_curr - x_prev;
float dx2 = x_curr - x_post;
// test for local extrema
if ((dx1 > 0) == (dx2 > 0) && (dx1 != 0) && (dx2 != 0))
{
float dx = x_curr - lastExtrema;
if (0.1 <= dx && dx <= 0.1)
continue;
if (lastExtrema > 0.8 && x_curr > 0.8)
continue;
lastExtrema = x_curr;
if (dx1 < 0)
pairs[numPairs] = DECREASE;
else
pairs[numPairs] = INCREASE;
extrema[numPairs] = i;
++numPairs;
}
}
</code></pre>
</div>
<p>The <code class="highlighter-rouge">extrema[numPairs] = i;</code> part is the part that stores the location of the “frame” of each transition. When it comes time to look for adjacent increases and decreases, you can just pick one of the two “frames” and call that the exact frame of the flash, taking care of that particular problem.</p>
<p>Now that we have the frame flashes found, we need to store them in a way that makes it easy on the next stage to use the info, the Gather stage. I’m going to cover that next time, but the general way I’m doing it is by making a giant array of distributions, with each distribution corresponding to a single Gather stage chunk. The specific part of the array is passed to the function so that each sampled pixel can contribute to it. This makes the gather stage really simple.</p>
<p>The nice thing about this method of finding frame flashes is that it doesn’t matter if you traverse backwards or forwards through the histogram; as long as you’re consistent then it’ll work well.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Next time I’ll go over how the Gather and Sample stages work. Until then, I’ll be continuously working on the project. There are still some to-do’s, even with the just the Frame Flash phase. Some of these include:</p>
<ul>
<li>Multithreading each sampled pixel, or maybe porting to the GPU. I don’t want to port to CUDA because that technology isn’t very universal</li>
<li>Analyzing the pixels adjacent to the current pixel, so as to maybe combat things like grain filters and clouds triggering the detector.</li>
<li>More stability improvements. The program doesn’t pick up on a lot of really obvious cases, especially if they happen in small areas, like epilepsy gifs on Google. I feel like if I could sample at a finer granularity, that problem would go away. This ties back in to multithreading, because reducing the granularity would lead to achieving enough throughput to be able to process that extra granularity.</li>
<li>For the whole program, minimizing to tray and some other polishing touches. Hopefully these will get resolved in some way or another sometime soon. The current project will be hosted as soon as I’m confident that Nopelepsy v0.1.0 is okay to show the world.</li>
</ul>
<p>Thanks for reading, and stay tuned for next time!</p>
Thu, 31 Mar 2016 00:00:00 +0000
http://djtaylor.me/blog/2016/nopelepsy-01/
http://djtaylor.me/blog/2016/nopelepsy-01/epilepsymathMicrofacet Importance Sampling: For Dummies<p>Despite how important of a topic good importance sampling is in the area of global illumination, it’s usually left out of all common-English conversations and tutorials about path tracing (with the exception of the Lambertian case, which is a good introduction to importance sampling). If you want to get into microfacet importance sampling, multiple importance sampling, or even, God forbid, Metropolis light transport, you’re on your own. And, probably for a good reason. At some point in a path tracing project, you WILL have to butt heads with the math. And believe me, the head that the math rears is ugly. Beautiful in design, but large and ugly all the same.</p>
<p>I’m going to try to correct that, at least for microfacet importance sampling. I’ll present just enough math to get the point across what <em>needs</em> to be understood to implement everything correctly, with minimal pain.</p>
<h1 id="choosing-the-microfacet-terms">Choosing the Microfacet Terms</h1>
<p>The Cook-Torrance Microfacet BRDF for specular reflections comes in the form</p>
<script type="math/tex; mode=display">f_r(\textbf{i}, \textbf{o}, \textbf{n}) = \frac{
F(\textbf{i}, \textbf{h}) G(\textbf{i}, \textbf{o}, \textbf{h}) D(\textbf{h})
}{ 4 \vert \textbf{i} \cdot \textbf{n} \vert \vert \textbf{o} \cdot \textbf{n} \vert }</script>
<p>, where <script type="math/tex">F</script> is the Fresnel term, \( D \) is the microfacet distribution term, and \( G \) is the microfacet shadowing term.</p>
<p>This is great and all, as it allows you to basically stop and swap different terms in as much as you want. Realtime graphics use that property to be able to simplify a BRDF into something pretty but efficient, by mixing and matching models with terms that cancel out.</p>
<p>However, there is danger in this though. Choosing terms purely because they are less complex, and therefore more efficient, can lead to completely wrong results. As a case in point, let me show the difference between the implicit Smith G and the Beckmann with Smith G. For reference:</p>
<script type="math/tex; mode=display">% <![CDATA[
G_{Smith}(\textbf{i}, \textbf{o}, \textbf{h}) = G_1(\textbf{i}, \textbf{h}) G_1(\textbf{o}, \textbf{h}) \\~\\
G_{1, Implicit}(\textbf{v}, \textbf{h}) = \vert \textbf{v} \cdot \textbf{n} \vert \\~\\
G_{1, Beckmann}(\textbf{v}, \textbf{h}) \approx
\begin{cases}
\frac{3.535 a + 2.181 a^2}{1 + 2.276 a + 2.577 a^2} & \text{if } a < 1.6 \\
1 & \text{otherwise}
\end{cases} \\
\text{with } a = ( \alpha_b \tan \theta_v)^{-1} %]]></script>
<p>For future reference, the Beckmann <script type="math/tex">G_1</script> listed is an approximation to the actual Beckmann shadowing term. The <script type="math/tex">\alpha_b</script> is a tunable parameter that controls the roughness of the material. I usually just set that term to the Cook-Torrance roughness value.</p>
<p>These are the results, after importance sampling for 500 samples:</p>
<p><img src="http://djtaylor.me/assets/2016-02-02-microfacet-dummies/implicitG.png" style="width:49%" />
<img src="http://djtaylor.me/assets/2016-02-02-microfacet-dummies/beckmannG.png" style="width:49%" /></p>
<table style="width:100%">
<tr>
<td style="width:50%; padding:5px"> Left: Implicit G<sub>1</sub>, roughness 0.05, 500 samples of importance sampling </td>
<td style="width:50%; padding:5px"> Right: Beckmann G<sub>1</sub>, roughness 0.05, 500 samples of importance sampling</td>
</tr>
</table>
<p><br /></p>
<p>As you can see, the Implicit G suffers greatly for its simplicity. While it calculates color noticeably faster, it does so by ignoring the cases that need extra work to represent, like really low roughness values. It approximates the mid-range fairly well, but in doing so butchers the tails. So take this example to heart: for a path tracer, don’t be afraid to sacrifice a little speed for correct color, especially because people use pathtracers because they’re willing to wait for good images.</p>
<p>So. What terms should be chosen? Well, my personal favorite <script type="math/tex">D</script> and <script type="math/tex">G</script> are the Beckmann ones, so this post is going to be based off of those. A more complete description of the <em>whys</em> of all of this can be found <a href="https://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf">here</a>, but as titled, this is the “For Dummies” version, and re-stating everything found there would be a massive waste of time.</p>
<p>The Fresnel term is taken from Cook-Torrance:</p>
<script type="math/tex; mode=display">\eta = \frac{ 1 + \sqrt{F_0} }{ 1 - \sqrt{F_0} } \\~\\
c = \textbf{v} \cdot \textbf{h} \\~\\
g = \sqrt{ \eta^2 + c^2 - 1 } \\~\\
F_{Cook-Torrance}(\mathbf{v}, \mathbf{h}) = \frac{1}{2} \left( \frac{g - c}{g + c} \right)^2 \left( 1 + \left( \frac{ (g + c)c - 1 }{ (g - c)c+ 1 } \right)^2 \right)</script>
<p>The Distribution term is from Beckmann:</p>
<script type="math/tex; mode=display">D(m) = \frac{ \chi^+ (\textbf{m} \cdot \textbf{n}) }
{ \pi \alpha^2_b \cos^4 \theta_m }
e^{ \frac{ -\tan^2 \theta_m }{ \alpha^2_b } }</script>
<p>The Geometry term is a little trickier. It involves the <script type="math/tex">\text{erf}(x)</script> function, which can be very expensive to calculate. There’s an approximation of it, though, that’s very good, shown again here:</p>
<script type="math/tex; mode=display">% <![CDATA[
G_{Smith}(\textbf{i}, \textbf{o}, \textbf{h}) = G_1(\textbf{i}, \textbf{h}) G_1(\textbf{o}, \textbf{h}) \\~\\
G_1(\textbf{v}, \textbf{m}) \approx
\chi^+ ( \frac{ \textbf{v} \cdot \textbf{m} }{ \textbf{v} \cdot \textbf{n} } )
\begin{cases}
\frac{3.535 a + 2.181 a^2}{1 + 2.276 a + 2.577 a^2} & \text{if } a < 1.6 \\
1 & \text{otherwise}
\end{cases} \\
\text{with } a = ( \alpha_b \tan \theta_v)^{-1} %]]></script>
<p>The only things left to clarify is <script type="math/tex">\alpha_b</script> term, which is just the roughness of the material, and the <script type="math/tex">\chi^+</script> function, which enforces positivity:</p>
<script type="math/tex; mode=display">% <![CDATA[
\chi^+(a) =
\begin{cases}
1, & a > 0 \\
0, & otherwise
\end{cases} %]]></script>
<h1 id="importance-sampling">Importance Sampling</h1>
<p>So to importance sample the microfacet model shown above involves a lot of integrals. So many that I’ll probably end up getting LaTex-pinky from typing all the brackets. So I’m just going to give the short version.</p>
<p>Because it’d be damn near impossible to try to importance sample the entire microfacet model, with all of its terms together, the strategy is to importance sample the distribution term to get a microfacet normal, and then transform everything from all vectors relative to the normal to relative to the origin.</p>
<p>Given 2 uniform random variables, <script type="math/tex">\zeta_1 \text{ and } \zeta_2</script>, generate a microfacet normal with the angles (relative to the surface normal):</p>
<script type="math/tex; mode=display">\theta_m = \arctan \sqrt{- \alpha^2_b \log(1 - \zeta_1) } \\
\phi_m = 2 \pi \zeta_2</script>
<p>Note how the <script type="math/tex">\phi_m</script> term is can spin freely around the normal. This is the power that sampling only the <script type="math/tex">D</script> function has. It follows, then, that by reversing the age-old vector-reflectance equation, you get the outbound (or inbound, or whatever) light ray:</p>
<script type="math/tex; mode=display">\textbf{o} = 2 \vert \textbf{i} \cdot \textbf{m} \vert \textbf{m} - \textbf{i}</script>
<p>Now all that’s left to do, with the exception of a few caveats that I will get to in a minute, is to define the probability distribution function for this setup:</p>
<script type="math/tex; mode=display">p_o(\textbf{o}) = \frac{ D(\textbf{m}) \vert \textbf{m} \cdot \textbf{n} \vert }
{4 \vert \textbf{o} \cdot \textbf{m} \vert }</script>
<p>There! That’s all that’s needed! If all you want to do is render really shiny metals, at least. However, there’s some unfinished business we have to attend to concerning what to do with the un-reflected light, the amount that the Fresnel term takes away. First, though, let’s look at some images.</p>
<h1 id="intermediate-results">Intermediate Results</h1>
<p><img src="http://djtaylor.me/assets/2016-02-02-microfacet-dummies/nearMirror.png" alt="Purely specular floor" />
<br />
<em>[Above: Sponza atrium with a floor with Cook-Torrance roughness of 0.001, 500 samples per pixel]</em></p>
<p>What’s impressive about the above image is that that isn’t a mirror BRDF. It’s the microfacet BRDF described above, importance sampled. This amount of image clarity for a floor with a roughness that small, and at only 500 samples, is something that would have only been achieved with a special-case in the code for materials like that. However, with importance sampling, you get it all for free!</p>
<h1 id="extending-it-with-lambertian-diffuse">Extending it With Lambertian Diffuse</h1>
<p>The Cook-Torrance BRDF is really nice and all, but it’s only a specular BRDF after all. We need to fuse it with a Lambertian BRDF to finish everything up nicely. The form the BRDF takes after the fuse is:</p>
<script type="math/tex; mode=display">f_r(\textbf{i}, \textbf{o}, \textbf{n}) =
f_d * (1 - F(\textbf{i}, \textbf{h})) +
\frac{
F(\textbf{i}, \textbf{h}) G(\textbf{i}, \textbf{o}, \textbf{h}) D(\textbf{h})
}{ 4 \vert \textbf{i} \cdot \textbf{n} \vert \vert \textbf{o} \cdot \textbf{n} \vert }</script>
<p>This makes a lot of sense too. A Fresnel Term describes how the ratio between reflected and transmitted signals, and a diffuse reflection is just a really <a href="http://djtaylor.me/assets/2016-02-02-microfacet-dummies/lambert.png">short-range refraction</a>.</p>
<p>Now there’s an issue, and it’s a big one. The density function generated above describes the distribution of outbound rays based off of the <script type="math/tex">D</script> microfacet distribution. To convert between a microfacet distribution and an outbound ray distribution, it undergoes a change of base via a Jacobian. Now that Jacobian is analytically derived from the microfacet BRDF definition itself, so to integrate the diffuse term into it (a term which has no microfacets but still changes the shape of the BRDF drastically) it’d have to be re-calculated.</p>
<p>And this is a re-occurring problem. There are papers describing every way to couple this with that. Some of them are about coupling matte diffuse with microfacet specular, some describe how to couple transmissive plus specular, some specular with specular. And at the end of the day, these are all specialized cases of different material properties coexisting on a surface. Until it’s taken care of, the problem of mixing and matching BRDFs will never go away. I don’t know how other path tracers deal with this issue, but I won’t stand for it.</p>
<p>For my path tracer, I chose to go the way that <a href="https://www.solidangle.com/arnold/">Arnold</a> does, by using the idea of infinitesimal layers of materials. Each layer of the material can, by using the Fresnel function we already have, reflect an amount of light, and then transmit the rest down to the next layer down. This will happen for every layer down until it hits the final layer: either a diffuse or transmission layer, which will use the rest of the remaining energy. Thus, the problem of coupling BRDFs was solved by thinking about the problem a little differently.</p>
<p>If you think about it, the above algorithm can be simplified using Russian Roulette, and so the final algorithm becomes something like this:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>Generate random number r1
for each layer in material:
determine the new half-vector (or microfacet normal)
f = Fresnel function using that half-vector
if (r1 > f):
r1 -= f
continue
calculate outbound ray
generate BRDF for current layer and inbound and outbound rays
generate pdf for outbound ray
sample and accumulate as normal
</code></pre>
</div>
<p>The only thing left to do is get rid of the Fresnel term from the microfacet BRDF and the <script type="math/tex">(1 - F)</script> from in front of the Lambertian term: it won’t be of any more use to us right there, as that calculation is handled implicitly now. That function now serves as a Russian Roulette mechanism to switch between layers, and is no longer needed to scale the energy of the microfacet BRDF, as the <script type="math/tex">D</script> and <script type="math/tex">G</script> term are perfectly capable of doing exactly that on their own.</p>
<p>One more pic to show it in action:</p>
<p><img src="http://djtaylor.me/assets/2016-02-02-microfacet-dummies/diffAndSpec.png" alt="Coupled diffuse and specular" />
<br />
<em>[Above: Floor with layered specular and diffuse. 500 samples per pixel]</em></p>
<p>That concludes my write-up of Microfacet Importance Sampling: For Dummies edition. With luck, hopefully implementing it will be as enjoyable for you as it was for me. Happy rendering, and may your dividends be forever non-zero.</p>
Tue, 02 Feb 2016 00:00:00 +0000
http://djtaylor.me/blog/2016/microfacet-dummies/
http://djtaylor.me/blog/2016/microfacet-dummies/global-illuminationmathMathematics of Origami: Flat Folding<p>So. Back at this.</p>
<p>First things first. You can safely disregard most everything I said last post. All of that was very observation-based, and is actually incorrect in almost every way. Which might seem strange, cause most of it made sense. For clarity and completeness, though, here’s a list of why everything last post was wrong:</p>
<ol>
<li>The idea that a fold is a collection of fold segments in order starts falling apart after you do certain folds, like petal folds. The idea that you can keep track of where they are and which way they’re going would be way to hard, and you could conceivably connect everything under one fold, since all origami crease diagrams are Hamiltonian.</li>
<li>The idea that folds only reflect is another screwy one. It falls apart when you have more than 4 fold segments touching a vertex</li>
<li>The fold tree was a nice idea, but constructions using it can only create a subset of all origami, so it’s not what I’m looking for</li>
</ol>
<p>Notable things that I’m keeping un-debunked:</p>
<ol>
<li>Fold segments are straight lines that represent part of a fold. They come in 2 flavors: mountain and valley. The ends of the segments either end at an edge of the paper or at the end of another fold segment. These places of union are called vertexes and the graph of all fold segments on the paper, including the edges of the paper, is called a fold diagram</li>
<li>All flat folded origami is non-directed and complete</li>
<li>If you sum up the amount of mountain and valley fold segments coming out of any vertex, they will always differ by two. I don’t really know why it’s like that, but it definitely is.</li>
</ol>
<p>Aight. I think that’s enough talking about the past. Math time.</p>
<h1 id="flat-folding">Flat Folding</h1>
<p>First of all: Flat folding. In a physical sense, this is a type of origami that can be folded flat, pressed between the pages of a book. So, for example, a paper crane without the wings.</p>
<p>So if you make a flat folded origami, and you take a pin and poke a hole completely through it, through every layer, and open it up, you’ll see that there are multiple holes in the paper. In essence, many points on the origami paper map to the same point in 3D space. These points also map perfectly to reflections along the axes of the creases themselves. Here’s a picture to illustrate:</p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/illu1.jpg" alt="" />
<br />
<em>[in this image, the red points all map to the same point in 3D space, and the green points all map to the same point as well]</em></p>
<p>Though that was probably pretty obvious. However, I discovered a couple facts that are not so obvious I think. The first is that, for flat folded origami, when you reflect a point around creases like in the above image, one by one, if the path you take ever ends up in the face that you started with, you’ll have the exact same point you started with. In other words, no matter what face you’re viewing from, the the original point you start out with will always be equal to itself, relative to its starting face. Second is that this property is not present in all fold diagrams. See the images below for further explanation.</p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/illu2.jpg" alt="" />
<br />
<em>[as you can see in this image, the path that a point takes on this fold diagram, even though you can take two very different paths back, you will still always end up at the same point again]</em></p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/illu3.jpg" alt="" />
<br />
<em>[in this example, however, the path that the point took back to the original face did not correspond to the same starting point. Likewise, you cannot possibly fold a piece of paper along these lines]</em></p>
<p>Well if only certain fold diagrams have this property, how do we tell which ones do? How do we prove correctness? It’s hard to tell just by looking at all the folds without physically folding it yourself, so it’s useful to look at a really simple case: paths going clockwise around a point where fold segments intersect. Because reflections around multiple edges rooted at a same point preserves radial distance [see image], you only have to keep track of the angles of the fold segments and the angle of v. In this case, the transformation around each fold segment, in order, would result in the same point, or the same angle plus 2 * pi.</p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/illu4.jpg" alt="" /></p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/illu5.jpg" alt="" /></p>
<p>From that:</p>
<script type="math/tex; mode=display">T(e_n, T(e_{n-1}, T(..., T(e_2, T(e_1, pt))...))) = pt</script>
<p>Converting edges to angle representation:</p>
<p><script type="math/tex">= T(a_n, T(a_{n-1}, T(..., T(a_2, T(a_1, a_1 - \epsilon))...))) = a_1 - \epsilon + 2 \pi</script>
<script type="math/tex">= 2 a_n - (2 a_{n-1} - 2(a_{n-2} - ... (2 a_2 - 2(a_1 - (a_1 - \epsilon))) ...)) = a_1 - \epsilon + 2 \pi</script></p>
<p>For even n:</p>
<p><script type="math/tex">2 (a_n - a_{n-1} + a_{n-2} - ... + a_2 - a_1) + a_1 - \epsilon = a_1 - \epsilon + 2 \pi</script>
<script type="math/tex">= a_n - a_{n - 1} + a_{n-2} - ... + a_2 - a_1 = \pi</script></p>
<p>For odd n:</p>
<p><script type="math/tex">2 (a_n - a_{n-1} + a_{n-2} - ... - a_2 + a_1) - a_1 + \epsilon = a_1 - \epsilon + 2 \pi</script>
<script type="math/tex">= a_n - a_{n - 1} + a_{n-2} - ... - a_2 = - \epsilon + \pi</script></p>
<p>The case of n begin odd can be thrown out because, as you see, it’s only true for specific angles (specific offsets epsilon from <script type="math/tex">a_1</script>), but the definition above states that it should be true for EVERY angle around the vertex.</p>
<p>(as a side note, this is a variant of something called Kawasaki’s Theorem. I proved all of these things myself before figuring out that these things were already known. Oh well.)</p>
<p>This is a very nice equation. It allows us to solve for any missing angle if one is needed, and also acts as a really simple validator. You can check each vertex one by one with this equation to check the simple case of correctness-by-rotating-around. However, that is a very simple case. Paths can take any form, but they all need to exhibit the property listed above, not just paths spinning around vertexes.</p>
<p>However, each vertex is connected to each other by the very fold segments that we’re rotating around by. When you transfer a pt from one vertex to another by just connecting the two via a fold segment, pt doesn’t change (pt stays in the same face, it’s just the origin is changing). Since all fold diagrams are complete anyway, you can connect any vertex to any other by following the path from one to the next, rotating around each vertex as you go. Thus, it can be proven that a fold diagram can be flat folded by simply checking each vertex and seeing if it conforms to the above theorem.</p>
<p>As a caveat, this doesn’t ensure that there are no self-intersections. I don’t think there is a way to both achieve interactive framerates and also ensure no self-intersection, but I’ll get back to you on that.</p>
<p>Yup. So that’s that.</p>
<h1 id="implementation">Implementation</h1>
<p>So the implementation of this is fairly easy.</p>
<p>It’s tempting to try to implement a flat folded origami interpreter by trying to represent each fold segment. However, if you travel along this path, it leads to lots of trees connecting faces (that you have to build from the edges) and a lot of searching over spaces and it’s just not fun. Believe me, that was what my first implementation was like.</p>
<p>The way I’m now doing everything is by having a soup of vertexes that all have a vector of angles, sorted counter-clockwise, of all the edges connecting them and other vertexes. Finding both sides of a fold segment given one end requires a raycast by the fold segment’s angle relative to the vertex. For faster use, this can be accelerated using nice structures, or, since the origami is only changing rarely in our case, we can just spend the time to bake them into each vertex and then go about our merry way.</p>
<div class="highlighter-rouge"><pre class="highlight"><code>struct vertex
{
dvec2 pt;
vector<double> angles; // sorted over [0, 2*PI)
vector<int> bakedConnections;
int angleToTraverseTree = 0;
vertex(dvec2 p)
{
pt = p;
}
void sortAngles()
{
std::sort(angles.begin(), angles.end());
}
void removeDuplicates()
{
// get rid of duplicates. im not worried about performance here
for (int a = 1; a < angles.size(); ++a)
{
double diff = abs(angles[a] - angles[a - 1]);
if (diff < epsilon)
{
angles.erase(angles.begin() + a);
--a;
}
}
// catch the 0-2pi case
double diff = abs(angles[angles.size() - 1] - angles[0]);
if (diff - 2 * PI < epsilon)
{
angles.erase(angles.begin() + angles.size() - 1);
}
}
};
vector<vertex> verts;
</code></pre>
</div>
<p>Besides the vertex manual labor of adding, and sometimes iteratively adding, angles to vertexes, there’s only 1 more thing that’s critical to the implementation: taking this structure and actually folding the paper. In order to do this you first plot out the direction any individual point will take across vertexes (because this only allows rotations and trading off of origins, and that preserves the origami property). If you do this and follow each point in the piece of paper to the same vertex and the same face, you’ll have the completed origami.</p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/fold_tree.jpg" alt="Fold tree" /></p>
<div class="highlighter-rouge"><pre class="highlight"><code>void creaseTraversalTree()
{
vector<int> processed;
vector<int> leftToProcess;
for (vertex& v : verts)
v.angleToTraverseTree = 0;
for (int i = 1; i < verts.size(); ++i)
leftToProcess.push_back(i);
processed.push_back(0);
while (leftToProcess.size() != 0)
{
vector<int>::iterator leftIter;
for (leftIter = leftToProcess.begin(); leftIter != leftToProcess.end(); ++leftIter)
{
int vId = *leftIter;
bool found = false;
for (int angleId = 0; angleId < verts[vId].angles.size(); ++angleId)
{
int closest = getClosestVertexAlongDirection(verts[vId].pt, verts[vId].angles[angleId]);
if (closest == -1)
continue;
vector<int>::iterator iter = std::find(leftToProcess.begin(), leftToProcess.end(), closest);
if (iter == leftToProcess.end())
{
verts[vId].angleToTraverseTree = angleId;
leftToProcess.erase(leftIter);
found = true;
break;
}
}
if (found)
break;
}
}
}
</code></pre>
</div>
<div class="highlighter-rouge"><pre class="highlight"><code>dvec2 fold(dvec2 ptOriginal)
{
dvec2 pt = ptOriginal;
int vId = getAnyVertexInFace(pt);
double angle, rad;
int angleAt = getBiggestAngleLessThan(vId, pt);
while (1)
{
{
dvec2 ptRelV = pt - verts[vId].pt;
angle = atan2(ptRelV.y, ptRelV.x);
if (angle < 0)
angle += 2 * PI;
rad = length(ptRelV);
}
vertex& curV = verts[vId];
int shiftSign = (curV.angleToTraverseTree > angleAt) ? 1 : -1;
for (int i = angleAt; i != curV.angleToTraverseTree; i += shiftSign)
{
int correctedI = i % curV.angles.size();
angle = 2 * curV.angles[correctedI] - angle;
}
pt = rad * dvec2(cos(angle), sin(angle)) + curV.pt;
if (vId == 0)
break;
assert(curV.bakedConnections.size() != 0);
int newVert = curV.bakedConnections[curV.angleToTraverseTree];
// find the new angleAt to preserve the rotation
double angleToFind = rotClamp(curV.angles[curV.angleToTraverseTree], PI);
int i;
bool found = false;
for (i = 0; i < verts[newVert].angles.size(); ++i)
{
if (abs(angleToFind - verts[newVert].angles[i]) < epsilon)
{
found = true;
break;
}
}
if (!found)
printf("something funky going on with vertex %d and angle %d\n", vId, angleAt);
angleAt = (i - 1 + verts[newVert].angles.size()) % verts[newVert].angles.size();
vId = newVert;
}
return pt;
}
</code></pre>
</div>
<h1 id="conclusion">Conclusion</h1>
<p>So, Here’s some screenshots of everything in action:</p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/screen1.png" alt="" />
<br />
<em>[creases in a piece of paper with the corresponding folded points, generated as described above]</em></p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/screen2.png" alt="" />
<br />
<em>[the internal data structure of the above paper]</em></p>
<p><img src="http://djtaylor.me/assets/2015-12-15-flat-folding/screen3.png" alt="" />
<br />
<em>[a finished folded crane, without the folded wings]</em></p>
<p>Also, since I’m not that good at explaining things clearly, <a href="https://github.com/CulDeVu/OrigamiConverter">here’s the link</a> to the github for the project. This, if you’re interested in looking at it, probably explains everything a bit more clearly, if you can get past the messy 1-file codebase.</p>
<p>Well, that should be everything! Next step: autocompleting origami, and maybe non-planar folds!</p>
<p>:D</p>
Tue, 15 Dec 2015 00:00:00 +0000
http://djtaylor.me/blog/2015/flat-folding/
http://djtaylor.me/blog/2015/flat-folding/origamimathOrigami and the Associated Maths<p>So this is going to be another one of those write-it-out-so-that-I-can-better-understand-it sort of things. Just a heads up. This IS a journal, after all, and functions as advertisement second.</p>
<p>So. Origami and mathematics. We know that the two are related because, if you fold a piece of paper the same way twice, you’ll get two of the same shape, so there’s rules governing it. And some smart people have tried figuring out those rules. There’s a bunch of them, but they all fall short for what I’m wanting to do. So here it goes:</p>
<p>Together with research in 1992, and again in 2002, a total of 7 axioms for flat folding paper were created, detailing exactly which basic operations could be performed with paper. These were laid out in much the same way that Euclid’s axioms of geometry were laid out. This is basically the way that most research that I’ve been able to turn up on the internet is: interesting constructions that, using the axioms, prove something about something else.</p>
<p>What I’m wanting to do, however, is a lot different. I’m wanting to be able to take a collection of creases and their associated information, and turn it into a model of the finished product. A kind of crease instruction to finished origami model converter if you will. To do this I’ve tried a couple methods, like physics based, relaxation and constraint based, and they’ve failed for one reason or another. I’ve settled on doing something sort of finite-element based: given a point on the original piece of paper and all the folds, where will that point be located in 3D space?</p>
<p><img src="http://djtaylor.me//assets/2015-11-09-origami-01/illu1.jpg" alt="" />
<br />
<em>[turn up for trackpad art]</em></p>
<p>Unfortunately, there doesn’t seem to be any information on the topic, so I guess I’ll have to do it. Which is good, because I really wanted an interesting problem to work on. The rest of the entry will be about the stuff I’ve concluded so far, and the questions I have.</p>
<h1 id="types-of-folds">Types of folds</h1>
<p>Folds are what what happen when you crease the paper in a straight line, and either fold the other side down or turn it some arbitrary angle around the fold. I’ll be using the term “fold” and “crease” interchangeably.</p>
<p>Folds can created that do one of two things:
start at one edge of a piece of paper and end at another
loop back in on itself
(a fold can also start and stop on the same point on the edge of a piece of paper, but I’ll ignore that case and say it’s case #1)</p>
<p><img src="http://djtaylor.me//assets/2015-11-09-origami-01/illu2.jpg" alt="" /></p>
<p>As you can see in the above examples, folds can be segmented into multiple parts going different directions, but are still connected end to end. I’ll call these fold segments. They come in 2 flavors:</p>
<ol>
<li>mountain</li>
<li>valley</li>
</ol>
<p>These types mean essentially nothing by themselves. They represent a direction to fold relative to your viewing angle. What’s important is that they are opposites, and reversing every single fold segment will result in the exact same origami (this is equivalent to turning the piece of paper over). In this journal, all mountain folds are shown as dotted lines in red, and all valley folds are shown as dashed lines in blue, unless i’m trying to point something out.</p>
<h1 id="things-you-can-do-with-folds">Things you can do with folds</h1>
<p>Origami is all about layering paper folds, one after another, in order, to create something. The “in order” part is important. You can’t create the head and tail of a paper crane (the last steps) before you make all the other folds. However, folds are not always dependent on all of the previous ones. Sometimes folds are made purely to get a reference point for some other fold later on down the line. Some folds cancel out the symmetry of previous folds (more on that later). Further, some steps can be done in parallel, like the 2 wings of a paper plane, because they are independent of each other.</p>
<p>When you make a fold, sometimes the mountain fold that you thought you made is actually a bunch of mountain-valley segments. To clear this up, I’ll call the starting fold segment the fold seed.</p>
<p><img src="http://djtaylor.me//assets/2015-11-09-origami-01/illu3.jpg" alt="" />
<br />
<em>[the green fold segment is the fold seed. The purple fold segments were calculated from other information about symmetry and dependency]</em></p>
<p>There are 4 types of fold seeds that I’ll support for right now. They are:</p>
<ol>
<li>mountain</li>
<li>valley</li>
<li>inside-reverse (more on this one later)</li>
<li>outside-reverse (the opposite of inside-reverse)</li>
</ol>
<p><img src="http://djtaylor.me//assets/2015-11-09-origami-01/illu4.jpg" alt="" />
<br />
<em>[from left to right: mountain, valley, inside-reverse, outside-reverse. The green is the seed, and this shows what happens when each fold interacts with a single dependent]</em></p>
<p>I chose these specifically because these are the types of fold seeds that are needed (with the exception to #4) to create a paper crane. I figured if I can’t create a origami interpreter that can’t interpret the quintessential piece of origami, I’ve failed. The first 2 are really obvious and the latter two I’m saving for a later time. There are certain issues about them that make them more complicated.</p>
<p>You’ve likely noticed in all of the pictures you’ve seen so far there’s a lot of reflection about lines going on. I’ll call this reflecting about lines “dependency,” because of the its ordering properties. When a fold segment is dependent on another fold segment, it will reflect about the other line and change from mountain to valley or vice versa. Here’s a pic:</p>
<p><img src="http://djtaylor.me//assets/2015-11-09-origami-01/illu5.jpg" alt="" /></p>
<p>An important thing to note about this dependency is that the line that is being reflected about <em>has</em> to be done first. The only exception to the rule is fold segments at 90 degree angles on the paper, which can be done interchangeably (which I think says something very interesting about these types of folds).</p>
<p>All of the above info has led me to use a tree to talk about origami folds. the parents of the tree represent dependency, while the nodes represent fold seeds.</p>
<p><img src="http://djtaylor.me//assets/2015-11-09-origami-01/illu6.jpg" alt="" /></p>
<p>For mountain and valley fold seeds, this representation encompasses everything that needs to be said about an origami: from parallelism to dependency to ordering to direction.</p>
<h1 id="other-fun-facts">Other fun facts</h1>
<p>Here are just some neat things that happen to be true, that are essential to have a complete implementation of a origami interpreter, but aren’t essential to have a complete understanding of what’s happening. In no particular order:</p>
<ul>
<li>Everything talked about above has to do with planar folds, folds that create something that ends up laying down flat (180 degree folds). Non-planar folds act the exact same way but have an additional constraint: they can have no children in the dependency tree.</li>
<li>For planar folds, every vertex where fold segments intersect, there are always the number of valley folds segments and mountain ones always differ by 2. For vertexes where fold segments and paper edges intersect, all bets are off. For non-planar fold segments, you can treat them like paper edges for the purpose of counting mountain-valley folds. This is an easy way to tell if a fold is valid or not.</li>
<li>For mountain and valley fold seeds only, if a fold seed is dependent on a parent, which is dependent on a grandparent, then the fold seed is also dependent on the grandparent. This is not true for inside-reverse or outside-reverse folds.</li>
<li>Folds that do not touch their parent fold in the dependency tree can be turned into 2 separate folds without any issues. For non-planar folds, it get’s a bit more complicated. This is the only time self-intersection of paper is a problem, and I think I can safely ignore this case as it’s fairly easy to work around. I think.</li>
<li>Nodes on the same layer of the dependency tree that are not dependent on each other can be done in parallel, or in any order.</li>
<li>Each node on the dependency tree must be unique, otherwise it’d be kind of hard to do anything with it or conclude anything from it.</li>
</ul>
<h1 id="yup">Yup.</h1>
<p>Well that’s about it. That’s all I’m really semi-confident about at the moment. There are a bunch of things I still have to look into and figure out before all is said and done though.</p>
<p>Tune in next time to find out what any of this has to do with finite element transformations, what the deal is with inside- and outside-reverse fold seeds and why they throw a kink in everything, and how any of this will help creating a fold to origami interpreter.</p>
Mon, 09 Nov 2015 00:00:00 +0000
http://djtaylor.me/blog/2015/origami-01/
http://djtaylor.me/blog/2015/origami-01/mathorigamiFluid dynamics update<p>I’m not really going to talk about much math-y stuff this time. I’ve covered too much ground since my last entry to be able to remember all the pitfalls I ran into. So this time I’m only really going to be talking about the general algorithm, how I deal with the pitfalls of the marker particle method, and the ongoing translation to CUDA. And probably share some of my Mario Maker levels. I’m addicted to stars :(</p>
<p>So! Let’s do this!</p>
<iframe width="640" height="390" src="https://www.youtube.com/embed/0xBAqjxHUkA" frameborder="0" allowfullscreen=""></iframe>
<p>Sorry for the video being kinda shaky. OBS is having a hard time capturing details and YouTube thinks everything ever is a camcorder, so it looks a little strange. It gets across the idea, though. Note that there is still no surface tension, which would account for just about everything else strange looking with the simulation.</p>
<h1 id="the-general-algorithm">The General Algorithm</h1>
<p>So the algorithm for simulating a Eulerian fluid goes as follows:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>init particles
init velocity field, type field, etc
for each frame:
apply external forces to velocity field
clear type field
update particle positions and apply to type field
extrapolate the velocity field
find particles to re-position
re-position those particles
build pressure matrices
solve the pressure equation
apply the pressure to the velocity field
enforce boundary conditions
advect the velocity field
</code></pre>
</div>
<p>The velocity field is the standard MAC staggered grid, cause why not, and because it’s useful for pressure computation and velocity extrapolation. The extrapolation and pressure step are straight out of academic papers so it’s standard stuff.</p>
<p>The marker particles are the part I don’t really know about. I just advect the marker particles through the velocity field just like anything else, and wherever the marker particles end up defines where the fluid exists or doesn’t. This is pretty much the simplest method of defining the interior vs exterior of a fluid, so it has a lot of pitfalls, but I’ll get to those in a minute. The issue is, though, that most everyone doesn’t talk about this method (because of the many many issues it has) and so they use something called level sets. I’ve tried implementing level sets several times, and marker particles are just so much simpler in every way.</p>
<h1 id="marker-particles">Marker Particles</h1>
<p>So the biggest pitfall about marker particles is that, due to numerical inaccuracy and a bunch of other issues, they tend to bunch up a lot in certain places. Namely, places where the divergence of the velocity field is still positive even after the incompressibility pressure solver. You’d think that, since the pressure is positive in some places, it’d be negative in the same area, canceling each other out, but it’s not. The fact that gravity is always pulling down on the fluid makes it not always true, so what ends up happening is a net loss of volume from full to nearly 0 in a couple minutes.</p>
<p>So what I tried to do, instead of using level sets like a sane person, I decided to force the fluid to conserve volume. The way I did this is pretty straightforward, and is the “find/re-position particles” part of the above algorithm. Basically,</p>
<ol>
<li>iterate over every particle over a grid and see if there are already more than a certain number of particles (max density, if you will) in that grid cell. If there are, I mark it to be re-positioned</li>
<li>iterate over every grid cell where and see if there are less than some other number of particles (min density), and if there are, start pulling from the re-position pool and fill in those spots</li>
</ol>
<p>This is rather finicky in practice. For example, if I set minDensity to maxDensity, you see a lot of artifacts from there not being enough particles in the re-position pool to accommodate (because of the overall positive divergence I mentioned). A happy median, I found, was setting maxDensity to anywhere between 2 or 3 times the minDensity. Sure, this DOES lead to volume loss by a factor of whatever multiple you choose, but it’s much better than having visually disgusting and simulations.</p>
<p><img src="http://djtaylor.me/assets/2015-09-29-fluid-dyanamic-update/marker.jpg" alt="Marker particles" /></p>
<p>To be fair, the simulation without re-position looks a lot more fluid and water-like. However, conserving volume it too important to be avoided. Oh well.</p>
<h1 id="cuda">CUDA</h1>
<p>I’ve been translating small pieces of the simulation to use the GPGPU via CUDA. I’ve gotten the “update particle” portion completely ported to the GPU, which is just a simple advection and setting type fields. The really neat one is the pressure solver.</p>
<p>In order to find the pressure in each cell, long story short, you need to construct a matrix equation something like this:</p>
<script type="math/tex; mode=display">\Bigg[
\text{pain and suffering}
\Bigg]
\Bigg[
\text{pressure}
\Bigg] =
\Bigg[
\text{divergence}
\Bigg]</script>
<p>, and then solve for pressure. On the CPU I used Gauss-Seidel to solve it, but I have no idea how to do it on the GPU. Luckily, there’s a library called cusp that implemented everything for me!</p>
<div class="highlighter-rouge"><pre class="highlight"><code>struct cuspTriple
{
int row, col;
float amount;
};
cusp::array1d<float, cusp::host_memory> pressure(mapW * mapH);
void project()
{
cusp::array1d<float, cusp::host_memory> b(mapW * mapH);
{
float scale = rho / dt;
for (int y = 0; y < mapH; ++y)
{
for (int x = 0; x < mapW; ++x)
{
int index = y * mapW + x;
b[index] = scale * (u->at(x + 1, y) - u->at(x, y) +
v->at(x, y + 1) - v->at(x, y));
}
}
}
vector<cuspTriple> data;
{
for (int y = 0; y < mapH; ++y)
{
for (int x = 0; x < mapW; ++x)
{
float scale = 1;
int n = 0;
if (x > 0)
{
if (type[y * mapW + x - 1] != SOLID)
{
if (type[y * mapW + x - 1] == WATER)
{
cuspTriple t;
t.row = y * mapW + x;
t.col = y * mapW + x - 1;
t.amount = 1;
data.push_back(t);
}
++n;
}
}
if (y > 0) {
if (type[(y - 1) * mapW + x] != SOLID)
{
if (type[(y - 1) * mapW + x] == WATER)
{
cuspTriple t;
t.row = y * mapW + x;
t.col = (y - 1) * mapW + x;
t.amount = 1;
data.push_back(t);
}
++n;
}
}
if (x < mapW - 1) {
if (type[y * mapW + x + 1] != SOLID)
{
if (type[y * mapW + x + 1] == WATER)
{
cuspTriple t;
t.row = y * mapW + x;
t.col = y * mapW + x + 1;
t.amount = 1;
data.push_back(t);
}
++n;
}
}
if (y < mapH - 1) {
if (type[(y + 1) * mapW + x] != SOLID)
{
if (type[(y + 1) * mapW + x] == WATER)
{
cuspTriple t;
t.row = y * mapW + x;
t.col = (y + 1) * mapW + x;
t.amount = 1;
data.push_back(t);
}
++n;
}
}
cuspTriple t;
t.row = y * mapW + x;
t.col = y * mapW + x;
t.amount = -n;
data.push_back(t);
}
}
}
cusp::coo_matrix<int, float, cusp::host_memory> A(mapW * mapH, mapW * mapH, data.size());
{
for (int i = 0; i < data.size(); ++i)
{
A.row_indices[i] = data[i].row;
A.column_indices[i] = data[i].col;
A.values[i] = data[i].amount;
}
}
cusp::default_monitor<float> monitor(b, 600, 0.01, 0);
cusp::precond::diagonal<float, cusp::host_memory> M(A);
cusp::krylov::cg(A, pressure, b, monitor, M);
}
void applyPressure()
{
float scale = dt / (rho);
for (int y = 0; y < mapH; y++)
{
for (int x = 0; x < mapW; x++)
{
if (type[y * mapW + x] != WATER)
continue;
float p = pressure[y * mapW + x];
u->at(x, y) -= scale * p;
u->at(x + 1, y) += scale * p;
v->at(x, y) -= scale * p;
v->at(x, y + 1) += scale * p;
}
}
}
</code></pre>
</div>
<p>This is the version of the GPU solver I have right now. It has a ton of theoretical std::vector overhead (idk how well the CUDA compiler does to try to optimize things out), so the next thing I’m going to be testing is whether or not over-approximating the number of non-zero elements in the sparse matrix will still be efficient.</p>
<p>Also, the GPU version is around 3-4 times faster than the CPU version! Now, on the other hand, it’s only 3 or 4 times faster than the CPU version. That’s with copying data back and forth from the GPU, so I give it a little bit of a break, but not much. I’m extremely confident that I could easily squeeze much more efficiency out of it, so that will have to be something I’ll deal with at a later date, after I port a bit more to the GPU.</p>
<p>In short, kill me now.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Could someone please give me stars in Super Mario Maker? I’m addicted and I need all you have. One of my levels’ ID is 04B7-0000-0069-DB69 and from there you should be able to access the 3 levels I’ve uploaded. They have < 20% clear rate on each of them for some strange reason, even though I think they’re pretty easy (and fun).</p>
<p>Anyway, thanks for reading!</p>
Tue, 29 Sep 2015 00:00:00 +0000
http://djtaylor.me/blog/2015/fluid-dyanamic-update/
http://djtaylor.me/blog/2015/fluid-dyanamic-update/fluid-dynamicsmathFactored BRDF Representation<p>It’s been quite a while.</p>
<p>So I’ve been working on life/jobs/family/etc the past couple months. A couple weeks ago, however, I decided to implement a type of arbitrary BRDF sampling based on factoring large matrices, found here: http://gfx.cs.princeton.edu/proj/brdf/brdf.pdf</p>
<p>The idea is to convert the BRDF into a series of matrix representations in such a way that</p>
<p>\[ f_r(\omega_i, \omega_o)(w_i \cdot \textbf{n}) \approx
\sum_{l=1}^{L}{ F_l(\omega_o) u_l(\theta_h) v_l(\phi_h) } \]</p>
<p>, where \( F_l \), \( u_l \), and \( vl \) are massive matrices that depend on outgoing direction, half angle azimuth, and half angle zenith about surface normal, respectively. Now that the functions for half angle azimuth and zenith are separated out, it’s trivial to construct a probability density function based on each. From then on, it’s the whole uniform random number -> CDF -> random ray thing like we’re used to. The PDF is defined by</p>
<p>\[
pdf(\theta_h, \phi_h \vert \omega_o) =
\left(
\sum_{l=1}^{L}{\frac{ F_l(\omega_o) u_l(\theta_h) v_l(\phi_h)}{\sum_{j=1}^{L}{F_j(\omega_o)}}}
\right)
\frac{1}{4 (\omega_i \cdot \omega_h) }
\]</p>
<p>, where that last term is the Jacobian between half angle representation and normal paths.</p>
<h1 id="pretty-pictures-and-opinions">Pretty Pictures and Opinions</h1>
<p><img src="http://djtaylor.me/assets/2015-07-20-factored-brdf/3bounce.jpg" alt="3 bounce comparison" />
<em>[Left: factored BRDF representation, Right: Path traced reference, 200 samples each]</em></p>
<p>On to my personal thoughts. (Just as a warning, I’ve checked and re-checked my math a hundred times, so I don’t think anything is wrong. If it is, though, be sure to sufficiently embarrass me on the Internet)</p>
<p>So. The big issue is that, while the representation is extremely good at importance sampling shiny materials, it does so at the expense of accurately representing more Lambertian properties. The effect can be seen when only one bounce is traversed:</p>
<p><img src="http://djtaylor.me/assets/2015-07-20-factored-brdf/1bounce.jpg" alt="1 bounce comparison" />
<em>[Left: 256 u terms, 128 v terms, Middle: path traced reference, Right: 512 u terms, 256 v terms, all images rendered with 200 samples]</em>
(btw, the blue ball is the only thing in the picture that new material is applied to here. In case you couldn’t tell.)</p>
<p>This issue arises because of its discrete nature. In order to get a good enough representation of the BRDF to sample it in this manner, you have to throw more and more samples at the BRDF. In this type of representation, a pure mirror would work very well, and diffuse materials would work less well, because this representation prefers more jagged BRDFs that so that it can sample the correct lobe. Otherwise, it has to make approximations at to what the CDF is, in the same way that a Euler integrator has issues because of regular sampling that doesn’t capture the essence of the function. If that makes sense. A graph of what this looks like might be something like</p>
<p><img src="http://djtaylor.me//assets/2015-07-20-factored-brdf/illu.jpg" alt="" /></p>
<p>I could show some close-ups like in normal graphics papers, but there’s no real point as the differences are so massive.</p>
<p>Welp. Hoped you liked it. The factored representation is still fantastic if you want to store measured BRDF data, but it doesn’t suit my needs for an actual path tracer. Oh well. I’m bound to find some other fancy graphics paper to try to implement soon</p>
Mon, 20 Jul 2015 00:00:00 +0000
http://djtaylor.me/blog/2015/factored-brdf/
http://djtaylor.me/blog/2015/factored-brdf/global-illuminationmathPlanetary Biomes<p>Not many images or videos this time around, unfortunately. It’s mostly about updates about what I’m up to and what I’m working on.</p>
<p>So, first off, the non-euclidean stretchy game thingy. Turns out that some people in my art class had taken an interest in it when I was working on it in class, and they offered to help offload some of the content generation, like making art, animations, level design, etc. This is actually exactly what I need in order to continue working on it, so you’ll be seeing some gosh-darn pretty images in the future, after finals are over! Look forward to it!</p>
<p>In the meantime though, as I always need something to procrastinate on homework with, I’ve been playing around with a space flying simulation, one where you can get really close to the ground!</p>
<p><img src="http://djtaylor.me/assets/2015-04-25-planetary-biomes/head1.jpg" style="width: 49%;" />
<img src="http://djtaylor.me/assets/2015-04-25-planetary-biomes/head2.jpg" style="width: 49%;" /></p>
<p>It looks a little rough right now, I know. Atmospheric scattering and physically correct lighting have taken a back seat to getting the game mechanics up and working. As you can see, though, there are biomes on the planet that try to correspond to how an actual planet’s biomes work. With the exception of icecaps and rivers and stuff that have to have a little more processing thrown at them, the biomes are roughly there. I need to work on smoothing them out and getting them to be believable colors, but I’ll work on that when I start doing close-up stuff.</p>
<p>So, down to the nitty gritty technical details of everything.</p>
<h2 id="biomes-n-stuff">Biomes ‘n Stuff</h2>
<p>The game world is represented in a sort of fixed-point-esque system using doubles in hopes to make it expand to more than just a single solar system. However, everything drawn on the GPU is done using floats, particularly because I don’t want to have to require GLSL version 400. So for the planet and sun that you see right now, everything is done using raytraced impostors.</p>
<p>So how do the biomes work? Well, back in Freshman year of high school I took biology, and I remember learning about a chart that determines all of the different biomes we see on earth by plotting them on a graph of average rainfall vs average temperature. Here’s a picture of it:</p>
<p><img src="http://djtaylor.me/assets/2015-04-25-planetary-biomes/biome_graph.jpg" style="width: 60%;" /></p>
<p>So, that’s pretty easy. For the average temperature, I took inspiration from the earth:</p>
<p><img src="http://djtaylor.me/assets/2015-04-25-planetary-biomes/temp.jpg" style="width: 60%;" /></p>
<p>I approximate it by a dot product plus offset by some value noise.</p>
<div class="highlighter-rouge"><pre class="highlight"><code>// takes normalized position rel sphere
// returns a value on [0, 1] representing the range [-15, 30] celcius
float getTemperature(vec3 p)
{
vec3 randTempSeed = p + vec3(12, 11, 10);
vec3 upVec = vec3(0, 1, 0);
float temp = dot(upVec, p);
if (temp < 0)
temp = -temp;
temp = 1.0 - temp + valueNoise3D(randTempSeed.x, randTempSeed.y, randTempSeed.z, 6) * 0.2;
temp = clamp(temp, float(0), float(1));
return temp;
}
</code></pre>
</div>
<p>The average rainfall on the earth doesn’t look like it has any discernible pattern, so I just thew some value noise on it and called it a day. All together, here’s the snippet that determines biomes:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>// takes normalized position rel sphere
// returns a value on [0, 1] representing the range [-15, 30]
float getTemperature(vec3 p)
{
vec3 randTempSeed = p + vec3(12, 11, 10);
vec3 upVec = vec3(0, 1, 0);
float temp = dot(upVec, p);
if (temp < 0)
temp = -temp;
temp = 1.0 - temp + valueNoise3D(randTempSeed.x, randTempSeed.y, randTempSeed.z, 6) * 0.2;
temp = clamp(temp, float(0), float(1));
return temp;
}
// takes normalized position rel sphere
// returns a value on [0, 1] representing the range [0, max amount of precipitation]
float getPrecipitation(vec3 p)
{
p = p + vec3(8, 12, 19);
float precip = (valueNoise3D(p.x, p.y, p.z, 8) + 1.0) / 2.0;
return precip;
}
int getBiome(vec3 p)
{
float temp = getTemperature(p);
float precip = getPrecipitation(p);
if (temp < 0.25)
return BIOME_TUNDRA;
if (precip / temp < 0.15)
return BIOME_SUBTROPIC;
if (temp < 0.75 && precip / temp < 0.5)
return BIOME_GRASSLAND;
if (temp < 0.3)
return BIOME_TIAGA;
if (temp < 0.75 && precip < 0.5)
return BIOME_DECIDUOUS;
if (temp < 0.75)
return BIOME_TEMERATE_RAINFOREST;
if (precip < 0.25)
return BIOME_SAVANNA;
if (precip < 0.5)
return BIOME_TROPICAL_FOREST;
return BIOME_TROPICAL_RAINFOREST;
return 0;
}
</code></pre>
</div>
<p>And that’s that!</p>
<h2 id="translating-it-to-c">Translating it to C++</h2>
<p>So, now with a basic rudimentary system in place for generating biomes for GLSL impostors, I need to translate it over to C++ so that I can build a mesh for when you get up close.</p>
<p>The issue is that I’ll probably end up changing things, like finding ways to make the biomes more natural (like having more rainfall around ocean edges, etc), and I’ll have to translate every single change I make to C++ as well.</p>
<p>The solution I came up with is to use the power of GLM plus the power of simple parsing, to get the C++ compiler to compile GLSL.</p>
<p>Here’s how it works:</p>
<p>My shaders get run through a very simple parser that looks for #include statements, and then starts revursively copying and pasting those files into the body of the shader (I call these files .glh). This serves a double purpose. Firstly, it makes a large GLSL codebase much cleaner and more plug-and-play, which helps with prototyping. The separate header files can also, if GLM is willing, be parsed by the C++ compiler if there’s just functions and their definitions.</p>
<p>This way, I can import all of my biome creation code into C++ by saying something like:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>using glm::max;
using glm::min;
using glm::clamp;
using glm::normalize;
typedef uint32_t uint;
#include "planetImpostor.glh"
</code></pre>
</div>
<p>and BOOM! I’ve got the biome determination code transfered.</p>
<p>Here are some more boring screens:</p>
<p><img src="http://djtaylor.me/assets/2015-04-25-planetary-biomes/foot1.jpg" alt="asdfas" /></p>
<p><img src="http://djtaylor.me/assets/2015-04-25-planetary-biomes/foot2.jpg" alt="asdfas" /></p>
<p>Welp, I hope that interested you guys and everything! I’ll get back to you guys in a week and catch you up with everything if finals don’t get too rough.</p>
<p>Thanks for reading!</p>
Sat, 25 Apr 2015 00:00:00 +0000
http://djtaylor.me/blog/2015/planetary-biomes/
http://djtaylor.me/blog/2015/planetary-biomes/procedural-generationmathgraphicsHBR: Human Based Rendering<p>Heyy all!</p>
<p>So the graphics professor at school here was talking to me and he was like “you know what would be a cool idea?” Turns out, it WAS a cool idea.</p>
<p>So! I present to you:</p>
<h1 id="hbr-human-based-rendering-the-photon-mapping-of-the-next-generation">HBR: Human Based Rendering (the photon mapping of the next generation)</h1>
<p>The general idea is to either get the user that’s looking at a screen to plot the photons themselves, or somehow crowd-source rendering an environment by manually tracing out light paths.</p>
<p>It’s a novel technique, you see. Instead of you offloading the work to the GPU, the GPU is offloading the work to you. The former has been explored many times, but the other side of the symmetry has not. While HBR is not a silver bullet, we believe that, if used in conjunction with more conventional GPU usage, HBR will become a powerful tool in a graphics programmer’s toolbox.</p>
<p><img src="http://djtaylor.me/assets/2015-01-31-HBR/illu1.jpg" alt="" />
<br />
<img src="http://djtaylor.me/assets/2015-01-31-HBR/illu1.jpg" alt="" />
<br />
<em>[all screenshots above were rendered with real humans!]</em></p>
<p>This has many advantages over normal rendering techniques. A few:</p>
<ul>
<li>Aesthetically Accurate over Physically Accurate: the end result of a render will look like how the user thinks the environment should look. This weight on personal stylistic choice builds the illusion that they’re crafting the environment themselves. This is why we believe that HBR, the act of manually placing and tracing photons, will increase immersion in games.</li>
<li>GPU Empathy: as previously stated, instead of you offloading the work to the GPU, the GPU is offloading work to you. When you have to manually render each frame yourself, you start being a little more okay with the thought of awful looking graphics.</li>
<li>Cheap for Static Geometry: this just comes with photon mapping. a pre-computation is all you need to cull offscreen photons, and BOOM! you’re realtime.</li>
</ul>
<p>:D</p>
<p>Jesting aside, it would actually be really cool to have a crowdsourced rendered cornell box, where people come to place photons and render the scene themselves, and see how far off of the ground truth it is.</p>
Sat, 31 Jan 2015 00:00:00 +0000
http://djtaylor.me/blog/2015/HBR/
http://djtaylor.me/blog/2015/HBR/global-illuminationmath