Light Ray Tracing

In todays post I am going to run a ray tracing algorithm, that in 126 lines of code is able to generate this picture:


The code is found here (here) and was created by user rossant.

I will assume the audience understands the basic motivation behind the ray tracing algorithm, which is well explained on this wikipedia page.


I will establish some of the important variables used in this code. These variables are defined many times throughout, but are always used to represent the same concepts.

ToL-a normal vector which points toward the light source in the frame.

ToO- a normal vector which points towards the Observer in the frame.

RayO, or O- The origin of the ray starting at the observer.

RayD or D- The direction of the ray which starts at the observer.

N- the normal vector of the object.

M-the point in space that we are considering.

Our Shading Model

Before we kick off, I want to really quickly go into the “science” behind the code. The two types of shading used in the model are summarized below. These shading relate to light which diffuses, and light which refracts in a specular way.

Lambert shading (diffuse).

This is for materials that scatter light, like a rug. It might seem like this type of data needs information from the observer, because the amount of surface area being hit depends on the observer, and how close the observer is to the main angle of incident obviously depends on the observer. Doing the math, one sees that these two effect perfectly cancel each other out, making it so that we only need information about how directly the light source hits the object.

Therefore it reduces to a simple dot product which does not depend on the observer.

AddToColor += DiffuseScalar * max( dot(NormalToSurface, NormalTowardsLight), 0) * RGBOfObject

 Blinn-Phong Shading (specular).

This is for shiny materials that act like mirrors. Just from experience most would guess that this does depend on the viewer. Given a simple light source, a mirror works best if the normal vector to the mirror is normalize(ToO+ ToL), which would put the origin and the light at a perfect angle of incidence. In this shading model, we compare how close the surface is to putting us at a perfect incidence, and raise this to a power for quicker drop off.

col_ray += (ScalarConstant * max(dot(NormalToSurface, normalize(NormalTowardsLight + NormalTowardsObserver)), 0) ^ specular_k) * colorOfLight


Setting the Scene

Though the code is in a different order, to me it makes sense to start by explaining the functions which creates the objects in the scene.   There is a sphere class and a plane class. These are both defined by the functions below.

Screen Shot 2017-05-18 at 9.26.40 AMScreen Shot 2017-05-18 at 9.26.45 AM

A sphere is defined by its radius and center, and a plane is defined by a point on it and the normal vector. The diffuse, specular and reflection are constants which will lated be used to scale the weight of the shading methods described above.

Screen Shot 2017-05-18 at 9.27.06 AM

Now we simply define a scene.

Screen Shot 2017-05-18 at 9.26.52 AM

Screen Shot 2017-05-18 at 9.27.00 AM


Some Helpful Functions

Now that we set everything up, lets define some nice functions for working with them.

Our first goal is to make the function intersect which simply tells us, for a given object in the set, how far away our first intersect is from it. This is obviously really important for vision, as the first object this “ray” intersects in the first object that we will see.

There are only two classes of objects in the image, spheres and planes, so we do this for both of them, and then string them together.

First we do this for a sphere, the inputs we need are simply the radius R and center C in order to proceed.

Screen Shot 2017-05-18 at 8.55.36 AM

Same principle for a plane defined by a point P in it, and a vector N tangent to it.

Screen Shot 2017-05-18 at 8.56.03 AM

Now we simply make a wrapper which is defined on the objects in the scene, which are only spheres and planes.

Screen Shot 2017-05-18 at 8.56.14 AM

After this we define some more helper functions,

Screen Shot 2017-05-18 at 8.56.38 AM

Screen Shot 2017-05-18 at 8.56.55 AM



The trace_ray function

If you can believe it we are ready to write an algorithm, trace_ray which takes in a ray shot for the observer, and returns what it hits, the point, M, where it hits, the normal  vector, N, at the point that is hits its first object, and the color of the ray.

Screen Shot 2017-05-18 at 8.57.27 AM

First things first, we want to find the object that it hits. We simply find the object that minimizes the distance, t, from the observer.

Screen Shot 2017-05-18 at 8.57.37 AM

Now we get the info we need to compute the output. Our distance, t, to the nearest object is useful for defining M.

Screen Shot 2017-05-18 at 8.58.33 AM

Now that we have everything nicely defined, we can begin calculating the color. First we check to see if any light hits the object at all by checking iff something blocks it from the light. After that we simply apply our color algorithms explained in the beginning, including a constant vector for brightening.

Screen Shot 2017-05-18 at 8.58.41 AM


Setting Up the Ray Tracing Algorithm

We are finally ready to go through the heart of this algorithm.

w and h are meant to be the number of pixels in the width and the height of our screen. They are set to be 300 and 400 respectively. Q will be explained later. img will become the image posted above.

Screen Shot 2017-05-18 at 9.40.49 AM

Now we define the screen we are looking through by detailing its two corners. we define the screen so that the ratio of the height of the square to the width of the square is almost equal to number of pixels in height versus the number in the width.

I thought this was weird and got rid of the .25 terms, the result was indistinguishable to me.

Screen Shot 2017-05-18 at 9.27.18 AM

The Ray Tracing Algorithm

Behold, the ray tracing algorithm, it is explained below.

Screen Shot 2017-05-18 at 9.27.26 AM


Our entire algorithm is a nested for loop, iterating through the x value of a pixel and the y value. we add a cheeky percent calculating step for the user.

Screen Shot 2017-05-18 at 10.07.22 AM

Now we do some setup. We use Q, which is the pixel value, to create D, which a vector that givens us the direction the camera is pointing in.

Screen Shot 2017-05-18 at 10.04.51 AM

Now we can begin looping though the rays for any given pixel.

For the first layer of depth, we simply shoot a ray in the direction of the pixel, and update its color obtained from our tracing algorithm. We only update the value if the current object is reflective. We decide that the newest ray will start where our last one ended, and be in the direction where it will be reflected across N as expected. This can be seen with simple linear algebra.

Screen Shot 2017-05-18 at 10.04.58 AM



Mitigation of Power in Saint Benedict’s Monasteries

One of the most egregious problems for an intentional community is that of resolving conflicts. Difficult questions of fairness, authority and autonomy must be addressed for a group to be able to solve conflicts effectively, and failure to address these problems can be catastrophic. The Rule of Saint Benedict, one of the most popular and successful models for monastic living, deals with issues of conflict resolution by essentially promoting a benevolent dictator, the abbot, who had the final word in virtually all disputes between the monks. In any governing system with highly centralized power one can imagine considerable tension between the authority and the subdued, in this case the abbot and the monks respectively. St. Benedict views both the abbot and the monks as fallible and many of his rules are devoted to de-escalating the power struggle between these two imperfect actors by insuring the monks’ blind obedience, restricting the power of the abbot, and explicitly stating the most potentially controversial  rules.

Right after the introduction in The Rule of Saint Benedict, Benedict declares that “[The Abbot] must not deviate from the Lord’s teachings” and that “[he is] accountable for his monks on Judgment Day” (Ch. 2). This is a big responsibility for a mere mortal, and at no point does Benedict attempt to hide the potential inadequacy of an abbot to perfectly fill the role. The Rule is riddled with advice to the abbot on potential abuses of power such as favoring particular monks (Ch. 2) or not supplying basic necessities (Ch. 33), and the whole sixty-fourth chapter is devoted to the protocol of a “wicked” abbot and how to overthrow him. In these chapters, Benedict makes it clear that his designated power figure, the abbot, has clear limitations, and so devotes The Rule to designing a system to mitigate the relationship between abbot, a human power figure standing in for the divine, and the monks.

As in so many authoritarian systems, one of the best tools to avoid trouble between the controllers and the controlled is to promote obedience as a duty and a virtue.  Benedict’s Rule deeply stress the importance of obedience in order to mitigate the risk of interpersonal conflicts arising, claiming that the brothers must cheerfully do literally any tasks assigned by an authority figure, even those of which that are seemingly impossible (Ch. 68) and they must even be deferential to each other (Ch. 71). An entire chapter is devoted to describing a “ladder to heaven”, which details that maintaining humility and obedience is the only way to ascend the heavenly ladder (Ch. 7). The lack of autonomy that the monks must accept is shocking: the abbot has the final say in virtually every decision brought before him. The document does state that young people should be involved in the decision making process, but only to learn the way the mighty abbot makes his decisions (Ch. 3). It is even stated that the monks should not emphatically defend any particular position of their own (Ch. 3). One can see how the system as described by Benedict would be virtually free of conflict, as in it conflict seems to be almost illegal.

Looking at this proposed culture of extreme deference, a natural question arises: How can Benedict be sure that the abbot does not take advantage of his followers? Benedict deals with this problem by restricting the power of the abbot. Of course the most important restriction on the abbot’s power is that he can get overthrown (Ch. 64), but there is more: Though the abbot’s word is final, it seems to be restricted to a rather petty domain. Reading through the list of the abbot’s responsibilities it becomes clear that much of his power seems to be over lowly maintenance tasks such as deciding on the dress (Ch. 55) and keeping inventory (Ch. 32). The abbot must concern himself with such trivialities because most of the important decisions have already been made explicitly by Benedict in the document, restricting the abbot’s domain of power. The entire day’s schedule has been prearranged in the document, from a morning of prayer (Ch. 8) to a night-time of silence (Ch. 42). Even meals have all been planned to the last detail, taking in account the age of the monks and the season (Ch. 41).

Ironically, power restrictions can also be of benefit to the abbot by alleviating him of blame. Resentment is a natural response to an authority figure in an environment as demanding as a monastery. Benedict brilliantly mitigates this potential downfall by explicitly stating the protocol for the tasks most likely to generate resentment. For example, if a monk is late for dinner and must “humiliate” himself in front of his peers as punishment (Ch. 44), he cannot resent the abbot because the abbot is only enforcing the rules for lateness which Benedict has pre-decided. Benedict also explicitly states that the monks must perform manual labor for a grueling five hours each day, so the abbot may not be blamed for the difficult schedule (Ch. 48). These rules, while alleviating the abbot of power, also protect him from resentment from monks under his watch.

The struggle between the abbot and the monks more generally reflects the tension between rights of the state and rights of the people in any society, and Benedict’s ability to carefully balance this tension is what helped his monastery model pass the test of time. While the monks have the power to overthrow the abbot, the abbot has control over almost everything not already decided by Benedict. In The Rule of Saint Benedict, Benedict cultivates a culture of obedience to a leader without endangering the autonomy of the followers by limiting the leader’s powers to a small domain. By explicitly stating the most controversial rules, Benedict protects the leader from potential resentment. These measured practices allowed Benedictine Monasteries to tap into the efficiency of a central authority without marginalizing any of the actors involved.


Benedict, Saint. The Rule of Saint Benedict. Italy, c. 530 CE. Republished by Fordham University: Medieval Sourcebook, 1996. Accessed 25 Nov. 2016. <;

Motivation for Ito Integral


Lets say we are writing a graphics software for a differential equation for a system that progresses like:


But if we wanted to write a graphics software that modeled this, we need to make this model computer readable, so we change it to:


This is no cheap trick, look at the solutions to the discrete time Navier-Stokes, they are gorgeous!

Now, maybe we want to insert some randomness into our graphics, we are going to want a discrete differential equation that looks like this:


So you plug this into your graphics software, and you love the about the output, you want to prove theorems about this process, so you have the instinct to make it continuous again, but alas, how do we deal with this error term?

Its actually no big deal! We assume in the continuous case the noise came from some noise function e(t), and that noise_n=e(n)-e(n-1). So we randomly grab a noise function and if we assume our function integrable we get that Y=f(t,Y)+e(t). and this should do the trick. At this point, since it has already been randomly selected, we can just treat it as garden variety and solve this differential equation as would any other.

But this does not work in all cases, for example, how dramatic the error we are experiencing will be proportional to some function of the system. So we might get


This is where we will need Ito integrals.

The Ito Integral Definition

Now I am going to define the Ito integral, and we will see how it solves our problem. If we are given a continuous process W, on event space X, and continuous stochastic process g(x,t) be an adapted process as well define \int g(x,t) dW=lim \sum_1^{\infty} g(x,t_{i-1})(W_{i}-W_{i-1}). Where are limit is taken under partitions of t as the mesh goes to 0.

I hate to break this to you, but this is not well defined. You might be thinking, of course it is, why this is just a Riemann integral, but remember. Riemann integrals are defined assuming that you are integrating against a function of bounded variation.

In fact variation is so high that the forward difference will give you a different answer than the backwards difference. If we had differenced the other way we would have obtained the Stratonovich Integral, which behaves entirely differently There are a couple of reasons we choose to use the forward difference. We use the forward difference because it makes the sum, and therefore the limit a martingale in the discrete case, and therefore in the general case. Another argument is that if we are integrating out holding of a stock versus its price the forward differencing is a lot more appropriate.

So back to the variation issue, one instinct might be then to restrict this definition to functions of bounded variation, but many important random processes including the Weiner Process would not fit the bill, so this is not our way forward.

So we need this integral to be better behaved if we want to proceed. We decide to make W a semi-martingale, because it corresponds nicely to our intuition of noise, and with that assumption we can build and extend the functions on which the integral is well defined on, much like in the Lebesgue definition of an integral. Google “Ito Integral existence proof” for more information, but the proof is very extensive.

With this definition we define an Ito process X_t to be such that X_0+\int_0^{\infty} \sigma dB+\int u_s ds, where B, \sigma are stochastic processes, that are integrable w.r.t Ito and Lebesgue respectively .  Keep in mind that these integrals have two different definitions. We see that differential equations like these  can remedy a certain set of the issues raised above.

The Business Cycle

We are going to use a dynamical systems argument to derive the business cycle, assuming only the shape of the investment and saving curve, as well as a few simple economic arguments. Fancier arguments have been made, but I will try to keep the assumptions to a minimum as I am a strong believer in not being precisely inaccurate.

Investment and Capital

Capital relative to the economy, or k is the amount of useful stuff in an economy, like buildings, hammers, computers. Investment, or I is money that is spent on capital.  So one might think k'=I but actually, some of the capital depreciates, so we get k'=I-I_0. We talk about k relative to expectations, so that k might depreciate, we assume our differential equation still holds.

For our purposes, the investment is a function of the income and the capital expense. The more income, the more money for investment so we have a positive correlation. As far as capital is concerned we get a negative correlation- there is no reason to buy more capital if the economy is already stocked.  A picture is displayed below for investment with capital k_1<k_2<k_3 (along with the saving function which we will use later).


Phase space

The income being in equilibrium is the same as saying that the aggregate expenditure is equal to the aggregate income. Aggregate expenditure is consumption and investment, while aggregate income is consumption and savings. By algebra aggregate income equaling aggregate expenditure is the same as  savings equaling investment. So income will be in equilibrium or y'=0 if S=I.

Therefore we can use the picture above to find how y'=0 looks in phase space, for each k simply find where I(y,k)=S.


We see that k' is displayed as upward sloping in the phase space. This can be justified by seeing where I=I_0 in the first graph.

Capital stock and income both fluctuate, so we know that the model each quadrant will be unique. So we only need to discuss one quadrant, lets choose the top. If the capital stock is extremely low, but the income is huge, the we expect the income to shrink and the capital stock to grow. So the arrows are correct.

We see now how this economy would be subject to business cycles especially under the whim of random fluctuations.


Varian goes on to prove that with dynamical systems theory that there is a unique limit point, that the point is unstable, and therefore has an orbit. I am choosing to omit this argument, results from dynamical systems are very subtle, and in my opinion our cute model cannot hold up to this level of nuance.

But, the arguments Varian uses are interesting in their own right and can be found here.

The Bathtub Vortex

We can understand the water vortex in two parts. The first being: Why does water spin around the drain, and the second being: Why does spinning around a drain cause a tornado shape.

For the first half, we can think of opening the drain as a force which pulls all of the water particles towards the center. Since in general, angular momentum is preserved, if a volume of fluid approaches the drain with angular momentum, it must speed up to preserve its angular momentum. A similar affect can be seen when dropping a penny into a penny vortex.

After many collisions, the water chooses a direction to spin, this is similar to the reasoning behind all of the planets going around the sun in the same direction. If there is not enough angular momentum to begin with, the collisions will cancel out and the water will not make a vortex, this can be shown by symmetry.

Now for the tornado shape. We assume that the fluid is fairly irrotational and incompressible, except perhaps very close to the center. Then a circle path integral around the vortex will not depend on radius of the vortex, as long as we are not too close to it, by Stokes Theorem. Since the path is proportional to the average velocity times the radius, and this value is constant, we get that the average velocity is inversely proportional to the radius. 

Lastly we look to Bernoulli’s Equation for incompressible flows, which  says that at the surface of the water, at atmospheric pressure, that the energy of the water must be constant. We know that the kinetic energy grows as a radius shrinks from the last paragraph, so the potential energy, must shrink to keep up with it. Therefore the height of the water must shrink as we approach the drain.

Neural-Inspired Hardware and Non-Electric Factories.

I’ll start this post with a video of an awesome wind-powered sawmill. You don’t need to watch the whole thing to get the picture.


I have always loved the ideas of factories like this, factories that can be powered by wind, rain, or any circular motion. The only problem with factories like these, is that you can only do really boring stuff, like sawing wood, or churning butter. If you want to do something more complicated than that, you would need to pay a smart person to design it for you, and get speciality parts shipped in.

In comes Neural Networks, kind-of. This post is about a hardware version of a Neural Network, that can achieve any kind of motion that you wish. Weather you want a patty to be flipped every thirty seconds, or plants to be watered every third day.

Also, I’d like to add that this is totally a loose idea, so if anyone has any ideas for improvements, applications, or issues, please talk to me.

First I’ll tell you what the thing does, and then I’ll tell you how it works.

What It Does

Basically you can get a big non-electric arm like this.



to do the same thing over and over again like the arms in this factory,


and be powered by a windmill, rainfall or any other non electric power.

This is for all of you hippies dreaming of rain powered farms, home-grown inventors that want to compete on an economy of scale, or chefs who want to automate recipes. And it is dirt cheap.

Alright, enough of sounding like an infomercial, lets get our hands dirty.

How It Works

Lets assume our arm is the one from the video above. That means that having this arm do what we want, is equivalent to pushing the hydraulic buttons in and out at the right time. Lets say that we have recorded the values the the valves need to be at as a time series, in this case from time to R^4.

Now we have to find a way to turn circular motion into linear motion that matches our time series. This is where my design begins to be heavily inspired by Neural Networks.

First one has to understand linkages. Check out this demo: linkage demo, and play with changing the lengths.

As you can see, the linkage takes boring circular motion and turns it into complicated motion. By changing the values of the leg lengths we get a totally different curves.

If you experimented with different leg values, you might have noticed that some did not work. I solved this by making my own linkage, which, as long as all the bars are sufficiently large will always work. It looks like this:



The white dot in the picture are stationary. The white rectangle is the crank,  the red lines are just metal, and the filled in circles are pivots. The point of this thing is that the leg lengths have a huge range or workable motion, unlike the linkage in the demo.

You might be wondering? Well, what motion does it make? The answer is: Whatever you want! This design is so complicated, and with so many degrees of freedom that by changing the various leg lengths you can get as complicated a motion as you want. With the right combination of leg lengths, this thing will do the exact motion you want.

For those machine learning people out there this is basically a hardware version of a Neural Network

Anyways, if you hook the end of this up to a pulley or something, you easily have something that controls you arm.



We could hook four differently parameterized verisons of these to our arm and get whatever motion we want!

So now the only question left is: How do I get the values of the legs so that it makes the movement I want. This is where the machine learning comes in.

I think the most simple way to solve this is by using gradient decent. gradient-descent

We start by randomly initializing the leg lengths in a computer software. All we do, is check how far we are from having the motion we recorded in our time series, and nudge the values just a bit in the direction that fixes it the most. This is done by heading in the direction where the error is steepest. This is extremely effective, and is the engine behind a Neural Network. This algorithm can be made very fast, because the data needed is closed form equation and therefore it should not require a simulation model.

After we find the values, we simply build the linkage, which is dirt cheap, and ship the part.


Thanks for reading this. If you have any ideas, improvements, or suggestions, definitely reach out.


Fourier Transform and Basis-ish Vectors

Prereq: Hilbert spaces and familiarity with Fourier Transforms.

Proofs are great, they are the heart of mathematics, and answer the most important mathematical question :”why is this true?”. The bummer about proofs is they often cannot answer the equally interesting question of “why do I expect this to be true?”. One such set of proofs are of Parseval’s Theorem and the Fourier Inversion Theorem.

I highly recommend looking at the proofs of these for function spaces on abelian groups in Folland’s Harmonic analysis, the proofs do not require you read the whole book up to that point, and are interesting in their own right. But in todays post I will be looking more at the “why do we expect these to be true side of things”, by convincing you that these ought to hold on the Fourier transform from unit circle to the integers, the integers to the unit circle, and from R to R with an original argument.

The case we will work with first is the Fourier Transform from the unit circle to the natural numbers,  in this case the proof explains both “why?” and “why do we expect?”.

On the circle, we see that the Fourier Transform is simply the projection of L2 onto the orthonormal basis of f_n(x)= e^{ix*n}. Consider these two well known properties of orthonormal basis vectors b_i in Hilbert spaces :

<f,g>=\sum <g,b_i><b_i,f>

f=\sum <f,b_i>b_i

We can see that in this case, these theorems are equivalent to Parseval’s and Fourier inversion respectively, when we take <f,g> to be the L2 Hilbert space on the natural numbers.

Now we will move onto the Fourier Transfrom from the natural numbers to the unit interval. We will now only be answer “why do we expect” and not “why”.

Take some number, r, we know that it corresponds to the set of orthonormal vectors f_n(x)=e^{i*x*r*n}. Now, if these were complete, we could simply use the explanation above to justify Parseval’s and Inversion, but they are not. Instead we cheat, and see that though the above theorems might not be true for an incomplete set, but they will become truer as we lower r, and therefore get more and more orthonormal function. We can see in the limit that the sums becomes the Riemann’s definition of an integral, and we get Parseval’s and Inversion in the limit.

So the basic idea is that we can get more and more basis functions, and make both <f,g>=\sum <g,b_i><b_i,f> and f=\sum <f,b_i>b_i true and truer, until we get what we are used to.

Keep in mind that the above is very far from a proof, and is really just good for intuition. We make a very similar argument the next example.

Now we treat the Fourier Transform from R to R.  We can consider the set of orthonormal functions corresponding to the double r, c, f_n(x)= e^{i*x*n*r}Char(|x|<c)/c. Which are basically e^{i*x*n*r} but restricted to interval (-c,c) and normalized. We notice that as we shrink r, to get more functions, and grow c, to make each function more useful, we get that the above theorems will be truer and truer. And again, in the limit the theorems above become the Parseval’s and the Inversion Theorem.

These are very far from proofs, and some people would dismiss this as not real mathematics. And it really isn’t. But to me, seeing the Fourier Transform as a projection onto a growing orthonormal set of vectors illuminates a lot of it’s most interesting qualities.