<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>eDesign.nl &#187; Algorithms</title>
	<atom:link href="http://www.edesign.nl/category/programming/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.edesign.nl</link>
	<description>Thoughts and concepts on software development</description>
	<lastBuildDate>Fri, 08 Apr 2011 13:35:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Math behind a world sunlight map</title>
		<link>http://www.edesign.nl/2009/05/14/math-behind-a-world-sunlight-map/</link>
		<comments>http://www.edesign.nl/2009/05/14/math-behind-a-world-sunlight-map/#comments</comments>
		<pubDate>Thu, 14 May 2009 00:59:29 +0000</pubDate>
		<dc:creator>Jurgen</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Image processing]]></category>

		<guid isPermaLink="false">http://www.edesign.nl/?p=434</guid>
		<description><![CDATA[My neighbour has a map of the world on the wall. You can see it from the street in front of his house. It has a backlight but that only illuminates half of the map. The transition from day to night is shaped like a sine wave most of the time. It actually is a [...]]]></description>
			<content:encoded><![CDATA[<p><a rel="attachment wp-att-436" href="http://www.edesign.nl/2009/05/14/math-behind-a-world-sunlight-map/full-17572/"><img class="alignleft size-thumbnail wp-image-436" title="World sunlight map fraction" src="http://www.edesign.nl/wp-content/uploads/2009/05/full-17572-150x150.jpg" alt="World sunlight map fraction" width="150" height="150" /></a>My neighbour has a map of the world on the wall. You can see it from the street in front of his house. It has a backlight but that only illuminates half of the map. The transition from day to night is shaped like a sine wave most of the time. It actually is a <a href="http://www.geochronusa.com/new_geo/" target="_blank">physical world sunlight map</a>. Of course, you can <a href="http://www.die.net/earth/" target="_blank">simulate this </a>with a computer too. There even is an <a href="http://www.daylightmap.com/index.php" target="_blank">instance using Google maps</a>.</p>
<p>As many roads lead to Rome multiple ways are possible to this simulation. One could model the sun, earth, maybe more and start <a href="http://en.wikipedia.org/wiki/Ray_tracing_(graphics)" target="_blank">ray tracing</a>. This approach would include solar eclipses but is quite heavy by means of the load on the processor. Because of the number of calculations involved in ray tracing is quite high. The way I choose to describe fully in this article is one close to it. Using <a href="http://en.wikipedia.org/wiki/Euclidean_vector" target="_blank">vectors</a> pointing from a sphere (<a href="http://en.wikipedia.org/wiki/Earth" target="_blank">earth</a>) to a point (<a href="http://en.wikipedia.org/wiki/Sun" target="_blank">sun</a>) I map a <a href="http://en.wikipedia.org/wiki/Mercator_projection" target="_blank">Mercator projected map</a> of the world on the sphere. The challenges included are the yearly orbit of earth around the sun and it&#8217;s 23.5° tilted 24 hour spin.<span id="more-434"></span></p>
<p>About a decade ago I made a similar program in highschool. I didn&#8217;t know vectors and <a href="http://en.wikipedia.org/wiki/Trigonometry" target="_blank">trigonometry</a> then as I do now. Back then I used to have the map of the world where for each longitude (the ones parallel to the equator) I calculated sunset and sunrise times for that longitude. This way I knew when to start painting night over the daylight map I had. This result was pretty accurate and an <a href="http://users.electromagnetic.net/bu/astro/sunrise-set.php" target="_blank">algorithm to determine those solar times</a> is not hard to implement. This time I used another more advanced approach to the problem.</p>
<h2>Space</h2>
<p>First let us define a mathematical 3D space, with three axis: <img src="http://www.edesign.nl/wp-content/cache/tex_9dd4e461268c8034f5c8564e155c67a6.png" align="absmiddle" class="tex" alt="x" /> (pointing from left to right), <img src="http://www.edesign.nl/wp-content/cache/tex_415290769594460e2e485922904f345d.png" align="absmiddle" class="tex" alt="y" /> (pointing up) and <img src="http://www.edesign.nl/wp-content/cache/tex_fbade9e36a3f36d3d676c1b808451dd7.png" align="absmiddle" class="tex" alt="z" /> (pointing into your monitor). Pointing directions are given to help your imagination. Axis point from small (negative infinite) to big (positive infinite), they intersect at <img src="http://www.edesign.nl/wp-content/cache/tex_80c93660da9d0792e674ba1085c32ace.png" align="absmiddle" class="tex" alt="O_{x,y,z} = (0, 0, 0)" />.</p>
<h2>Earth</h2>
<p>Define a sphere on point <img src="http://www.edesign.nl/wp-content/cache/tex_f186217753c37b9b9f958d906208506e.png" align="absmiddle" class="tex" alt="O" />, with a radius of 1. This will be the earth. The funny thing is that for every point<img src="http://www.edesign.nl/wp-content/cache/tex_5808829302c573af69fc6aa7f83b41e6.png" align="absmiddle" class="tex" alt="\vec{p}" />(x,y,z coordinate) on the earth, because the earth is centered around <img src="http://www.edesign.nl/wp-content/cache/tex_f186217753c37b9b9f958d906208506e.png" align="absmiddle" class="tex" alt="O" />, automatically is the same a the vector pointing orthoganally away from the earth starting on that point<img src="http://www.edesign.nl/wp-content/cache/tex_5808829302c573af69fc6aa7f83b41e6.png" align="absmiddle" class="tex" alt="\vec{p}" />. This comes in handy when we do calculations later on.</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_33380c0b9e4cdcdf3e81f1180aea86ba.png" title="\vec{p} = (x, y, z)" class="etex" alt="\vec{p} = (x, y, z)" /></center></p>
<h2>Sun</h2>
<p>We don&#8217;t need the sun for this method. We only need the direction to it from the earth. So define a vector <em>S</em> which points in the direction of the sun from <img src="http://www.edesign.nl/wp-content/cache/tex_f186217753c37b9b9f958d906208506e.png" align="absmiddle" class="tex" alt="O" />. To do this we must use sine and cosine. On time <img src="http://www.edesign.nl/wp-content/cache/tex_1f48e973d6a9075dbaaf41a9e85f034e.png" align="absmiddle" class="tex" alt="t = 0" /> the direction of the sun will be <img src="http://www.edesign.nl/wp-content/cache/tex_4967dfa815448036b242326491623bad.png" align="absmiddle" class="tex" alt="(1, 0, 0)" />, which is a vector pointing to the right. On <img src="http://www.edesign.nl/wp-content/cache/tex_7fcd6686b2518bb9eefd38d7531edfba.png" align="absmiddle" class="tex" alt="t = 12" /> (hour) that should be pointing right <img src="http://www.edesign.nl/wp-content/cache/tex_d12de2b341d428af7c1eeab87422c070.png" align="absmiddle" class="tex" alt="(-1, 0, 0)" />. The same goes for 6 o&#8217;clock am and pm, being respectively <img src="http://www.edesign.nl/wp-content/cache/tex_8da033c96917f68a2d4465bf831bf683.png" align="absmiddle" class="tex" alt="(0, 1, 0)" /> and <img src="http://www.edesign.nl/wp-content/cache/tex_069359415cada53c746dae706831bb51.png" align="absmiddle" class="tex" alt="(0, -1, 0)" />. This gives us insight that the direction <img src="http://www.edesign.nl/wp-content/cache/tex_954322810f486ac24f0d70218627d2a0.png" align="absmiddle" class="tex" alt="\vec{d}" /> to the sun can be given as (<img src="http://www.edesign.nl/wp-content/cache/tex_e358efa489f58062f10dd7316b65649e.png" align="absmiddle" class="tex" alt="t" /> should be scaled to a value from 0 to 1, so we need to divide <img src="http://www.edesign.nl/wp-content/cache/tex_e358efa489f58062f10dd7316b65649e.png" align="absmiddle" class="tex" alt="t" /> by 24):</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_3150abfca1663b9bdd3db560ded4268b.png" title="\vec{d} =(\cos{\frac{2\pi t}{24}}, 0, \sin{\frac{2\pi t}{24}})" class="etex" alt="\vec{d} =(\cos{\frac{2\pi t}{24}}, 0, \sin{\frac{2\pi t}{24}})" /></center></p>
<p>With this approach we actually make the sun rotate around the earth.</p>
<h2>Tilted earth</h2>
<div id="attachment_459" class="wp-caption alignleft" style="width: 310px"><a rel="attachment wp-att-459" href="http://www.edesign.nl/2009/05/14/math-behind-a-world-sunlight-map/north_season/"><img class="size-medium wp-image-459" title="Earth tilted axis" src="http://www.edesign.nl/wp-content/uploads/2009/05/north_season-300x165.jpg" alt="Earth tilted axis" width="300" height="165" /></a><p class="wp-caption-text">Tilted orbit of the spinning earth around the sun. The method discussed here turns it around as it models the sun orbitting the earth.</p></div>
<p>The above formula still is not complete. The axis around which the earth is spinning is not orthogonal to the plane defined by the orbit of the earth around the sun. It is tilted 23.5°. This is how we get the four seasons on earth of course. To model this a nice trick can be applied to <em>d</em>. We know that on or closely around June 21th the sun is right above the 23.5° latitude (<a href="http://en.wikipedia.org/wiki/Tropic_of_Cancer" target="_blank">Tropic of Cancer</a>), the <a href="http://en.wikipedia.org/wiki/Solstice">summer solstice</a>. From here it takes a sine with a period of a year to go to -23.5° and back again. June 21th is the 172th day of the year. The standard cosine function has the form:</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_4059e79313887878e46828ae448a06bc.png" title="y = a \cdot \cos{(bx + c)} + d" class="etex" alt="y = a \cdot \cos{(bx + c)} + d" /></center></p>
<p>We can simply fill this out. The <img src="http://www.edesign.nl/wp-content/cache/tex_0cc175b9c0f1b6a831c399e269772661.png" align="absmiddle" class="tex" alt="a" /> must be 23.5, <img src="http://www.edesign.nl/wp-content/cache/tex_92eb5ffee6ae2fec3ad71c777531578f.png" align="absmiddle" class="tex" alt="b" /> is <img src="http://www.edesign.nl/wp-content/cache/tex_744d20bf0129464fb19fee5dd435a613.png" align="absmiddle" class="tex" alt="\frac{2\pi}{365}" />, <img src="http://www.edesign.nl/wp-content/cache/tex_4a8a08f09d37b73795649038408b5f33.png" align="absmiddle" class="tex" alt="c" /> is -172 and <img src="http://www.edesign.nl/wp-content/cache/tex_8277e0910d750195b448797616e091ad.png" align="absmiddle" class="tex" alt="d" /> equals 0. If you define <img src="http://www.edesign.nl/wp-content/cache/tex_415290769594460e2e485922904f345d.png" align="absmiddle" class="tex" alt="y" /> the observable tilt (because the real tilt will always be 23.5) and <img src="http://www.edesign.nl/wp-content/cache/tex_9dd4e461268c8034f5c8564e155c67a6.png" align="absmiddle" class="tex" alt="x" /> is the day of the year:</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_9fd4ce29cabb03153f7a603bfe765419.png" title="observableTilt = 23.5 \cos{(\frac{2\pi}{365} ({dayOfYear} - 172))}" class="etex" alt="observableTilt = 23.5 \cos{(\frac{2\pi}{365} ({dayOfYear} - 172))}" /></center></p>
<p>Now this observable tilt is the angle there is between what is the real direction to the sun and the already calculated direction <em>d</em>. To calculate the vector (<img src="http://www.edesign.nl/wp-content/cache/tex_415290769594460e2e485922904f345d.png" align="absmiddle" class="tex" alt="y" /> component, pointing up or down) which should be added to <img src="http://www.edesign.nl/wp-content/cache/tex_954322810f486ac24f0d70218627d2a0.png" align="absmiddle" class="tex" alt="\vec{d}" /> to get the real direction you can use tangens.</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_bfe8c80f2cfd4c7d565b2c8084c718d0.png" title="\vec{tiltCorrection} = \vec{d} \cdot \tan{ 2\pi \frac{observableTilt}{360}}" class="etex" alt="\vec{tiltCorrection} = \vec{d} \cdot \tan{ 2\pi \frac{observableTilt}{360}}" /></center></p>
<p>Then the real direction vector <img src="http://www.edesign.nl/wp-content/cache/tex_0b41e62b40d2668f0de148e45d42f78e.png" align="absmiddle" class="tex" alt="\vec{rd}" /> is the addition of <img src="http://www.edesign.nl/wp-content/cache/tex_7c54a32d6ac073f230f089ae1703bfc5.png" align="absmiddle" class="tex" alt="\vec{tiltCorrection}" /><em> </em>to <img src="http://www.edesign.nl/wp-content/cache/tex_954322810f486ac24f0d70218627d2a0.png" align="absmiddle" class="tex" alt="\vec{d}" />:</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_ac1ecf22584f2bbba1b86a580ce3b2c7.png" title="\vec{rd} = \vec{tiltCorrection} + \vec{d}" class="etex" alt="\vec{rd} = \vec{tiltCorrection} + \vec{d}" /></center></p>
<h2>Calcultating illumination and mapping</h2>
<p>Using the <a href="http://mathworld.wolfram.com/DotProduct.html" target="_blank">dot product</a> (or <a href="http://www.netcomuk.co.uk/~jenolive/vect6.html" target="_blank">scalar product</a> or <a href="http://en.wikipedia.org/wiki/Dot_product" target="_blank">inner product</a>) you can calculate the angle between two vectors. Or, if the vectors are both normalized (length = 1) it is the projection of the one vector onto the other. So if we calculate the dot product for every normal vector from the surface of the earth (<img src="http://www.edesign.nl/wp-content/cache/tex_5808829302c573af69fc6aa7f83b41e6.png" align="absmiddle" class="tex" alt="\vec{p}" />) with <img src="http://www.edesign.nl/wp-content/cache/tex_0b41e62b40d2668f0de148e45d42f78e.png" align="absmiddle" class="tex" alt="\vec{rd}" /> we should get some meaningful results.</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_934bbf22f5024754ad96d504e678bdab.png" title="illumination = \vec{rd} \cdot \vec{p}" class="etex" alt="illumination = \vec{rd} \cdot \vec{p}" /></center></p>
<p>If the <img src="http://www.edesign.nl/wp-content/cache/tex_f47cdb7d31a3b19cebc72d46123a33b7.png" align="absmiddle" class="tex" alt="illumination &lt; 0" /> the sun is in another direction so <img src="http://www.edesign.nl/wp-content/cache/tex_5808829302c573af69fc6aa7f83b41e6.png" align="absmiddle" class="tex" alt="\vec{p}" /> is pointing to the night side of the earth. If <img src="http://www.edesign.nl/wp-content/cache/tex_f68036c925fc7246cbd239e8cd6c41e0.png" align="absmiddle" class="tex" alt="illumination \geq 0" /> that piece of earth recieves solar light.</p>
<p>We have a 2D map (Mercator projection) we need to map on the 3D sphere. We need to do this to determine which we need to calculate dot products for. To avoid confusion, lets call <em>x</em> and <img src="http://www.edesign.nl/wp-content/cache/tex_415290769594460e2e485922904f345d.png" align="absmiddle" class="tex" alt="y" /> from the 2D map <img src="http://www.edesign.nl/wp-content/cache/tex_7b774effe4a349c6dd82ad4f4f21d34c.png" align="absmiddle" class="tex" alt="u" /> and <img src="http://www.edesign.nl/wp-content/cache/tex_9e3669d19b675bd57058fd4664205d2a.png" align="absmiddle" class="tex" alt="v" />. We can iterate for each <img src="http://www.edesign.nl/wp-content/cache/tex_7b774effe4a349c6dd82ad4f4f21d34c.png" align="absmiddle" class="tex" alt="u" /> and each <img src="http://www.edesign.nl/wp-content/cache/tex_9e3669d19b675bd57058fd4664205d2a.png" align="absmiddle" class="tex" alt="v" /> for each <img src="http://www.edesign.nl/wp-content/cache/tex_7b774effe4a349c6dd82ad4f4f21d34c.png" align="absmiddle" class="tex" alt="u" /> and map it to a <img src="http://www.edesign.nl/wp-content/cache/tex_5808829302c573af69fc6aa7f83b41e6.png" align="absmiddle" class="tex" alt="\vec{p}" /> in 3D. To do this we need to determine phi (<img src="http://www.edesign.nl/wp-content/cache/tex_1ed346930917426bc46d41e22cc525ec.png" align="absmiddle" class="tex" alt="\phi" />, logitude: <img src="http://www.edesign.nl/wp-content/cache/tex_9e3669d19b675bd57058fd4664205d2a.png" align="absmiddle" class="tex" alt="v" />)and theta (<img src="http://www.edesign.nl/wp-content/cache/tex_2554a2bb846cffd697389e5dc8912759.png" align="absmiddle" class="tex" alt="\theta" />, latitude: <img src="http://www.edesign.nl/wp-content/cache/tex_7b774effe4a349c6dd82ad4f4f21d34c.png" align="absmiddle" class="tex" alt="u" />). In the iteration (<img src="http://www.edesign.nl/wp-content/cache/tex_e862d31aae219d30f927d5835bb1aefa.png" align="absmiddle" class="tex" alt="maxV" /> is the height of the map and <img src="http://www.edesign.nl/wp-content/cache/tex_ce778d1af59137ea75e31062e9b5071a.png" align="absmiddle" class="tex" alt="maxU" /> is the width of the map):</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_ea04300b34f21520a92879871d570de7.png" title="\phi = 2\pi (\frac{v}{maxV})" class="etex" alt="\phi = 2\pi (\frac{v}{maxV})" /></center><br />
<center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_8a9c80d2a42b86a8030ef85d5fc9134e.png" title="\theta = 2\pi (\frac{u}{maxU})" class="etex" alt="\theta = 2\pi (\frac{u}{maxU})" /></center></p>
<p>Now <img src="http://www.edesign.nl/wp-content/cache/tex_7b774effe4a349c6dd82ad4f4f21d34c.png" align="absmiddle" class="tex" alt="u" /> and <img src="http://www.edesign.nl/wp-content/cache/tex_9e3669d19b675bd57058fd4664205d2a.png" align="absmiddle" class="tex" alt="v" /> map to <img src="http://www.edesign.nl/wp-content/cache/tex_5808829302c573af69fc6aa7f83b41e6.png" align="absmiddle" class="tex" alt="\vec{p}" /> <a href="http://en.wikipedia.org/wiki/Sphere#Equations_in_R3" target="_blank">as follows</a>:</p>
<p><center><img src="http://www.edesign.nl/wp-content/plugins/easy-latex/cache/tex_da487b437d943ae2192deb8d49f55840.png" title="\left\{\begin{matrix} \vec{p}_x = \sin{\phi} \cos{\theta} \\ \vec{p}_y = \cos{\theta} \\ \vec{p}_z = \sin{\phi} sin{\theta} \end{matrix}\right." class="etex" alt="\left\{\begin{matrix} \vec{p}_x = \sin{\phi} \cos{\theta} \\ \vec{p}_y = \cos{\theta} \\ \vec{p}_z = \sin{\phi} sin{\theta} \end{matrix}\right." /></center></p>
<p>Now we can for each coordinate <img src="http://www.edesign.nl/wp-content/cache/tex_f406f917636fcaa71ae578800a9700f5.png" align="absmiddle" class="tex" alt="(u, v)" /> on the map if it should be displayed as day or night.</p>
<h2>Fun with shading</h2>
<p>Because you know the angle the sunlight makes with the earth&#8217;s surface you can make add some shading by making it increasingly dark on the edges of the day. It the dot product of <img src="http://www.edesign.nl/wp-content/cache/tex_5808829302c573af69fc6aa7f83b41e6.png" align="absmiddle" class="tex" alt="\vec{p}" /> and <img src="http://www.edesign.nl/wp-content/cache/tex_0b41e62b40d2668f0de148e45d42f78e.png" align="absmiddle" class="tex" alt="\vec{rd}" /> is smaller than 0.1 (<img src="http://www.edesign.nl/wp-content/cache/tex_a45c2c2471bdd7da560847792785984c.png" align="absmiddle" class="tex" alt="illumination &lt; 0.1" />) the point is in dusk or dawn and you could mix day and night as a gradient to make the difference between them look more fluently. Another joke is the reflection of the sun: if the product is greater than, let&#8217;s say, 0.95 (<img src="http://www.edesign.nl/wp-content/cache/tex_0d4df00bd8936b0588687977c7e6cde1.png" align="absmiddle" class="tex" alt="illumination &gt; 0.95" />) the sun is approximately orthogonally above that coordinate an you could make it more white to have it look like the sun is reflecting in the map.</p>
<h2>Demo and discussion</h2>
<p>A <a href="http://www.edesign.nl/examples/sunlightmap/" target="_blank">demontration of this method</a> is available at the edesign example site. There are a number of assumptions done and restrictions set while creating this demo.</p>
<div class="wp-caption aligncenter" style="width: 510px"><a href="http://www.edesign.nl/examples/sunlightmap/"><img title="Live World Sunlight map" src="/examples/sunlightmap/map.php" alt="" width="500" height="250" /></a><p class="wp-caption-text">Live sunlight map generated using the algorithm discussed in this article. Click the image to go to the example page where you can override current time.</p></div>
<ul>
<li>I assumed the solar rays are parallel. The distance from the sun to the earth is huge, but it is actually wrong to assume rays are parallel. They are only by approximation so it is good enough for this simulation.</li>
<li>Because of this assumption, the night &#8217;starts&#8217; exactly at the half of the world in the shade while actually this is different. The sun is far bigger than the earth and therefor some rays should be able to reach the shadow half of the earth (near the limit depicted in the first image). By approximation I eliminated this too.</li>
<li>Timing is not very accurate. As well as for the 24h clock as for the solstices rough estimates are used.</li>
<li>A perfect sphere is taken as world. No fattening is modelled (polar radius should be smaller than equatorial radius).</li>
<li>The &#8216;fun with shading&#8217; should implement something to make land mass not reflect sunlight.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.edesign.nl/2009/05/14/math-behind-a-world-sunlight-map/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Textual difference detector</title>
		<link>http://www.edesign.nl/2009/05/07/textual-difference-detector/</link>
		<comments>http://www.edesign.nl/2009/05/07/textual-difference-detector/#comments</comments>
		<pubDate>Thu, 07 May 2009 15:55:13 +0000</pubDate>
		<dc:creator>Jurgen</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://www.edesign.nl/?p=397</guid>
		<description><![CDATA[Today I uploaded my textual difference detector to the eDesign examples. This is an example application demonstrating the theory of applying the Levenshtein algorithm to detect differences between two versions of the same text. Also, the &#8216;Find the differences&#8216; post is updated with a link to this example.
This example takes two texts as input and [...]]]></description>
			<content:encoded><![CDATA[<p><a rel="attachment wp-att-398" href="http://www.edesign.nl/2009/05/07/textual-difference-detector/comparedifflarge/"><img class="alignleft size-thumbnail wp-image-398" title="comparedifflarge" src="http://www.edesign.nl/wp-content/uploads/2009/05/comparedifflarge-150x150.jpg" alt="comparedifflarge" width="150" height="150" /></a>Today I uploaded my <a href="http://www.edesign.nl/examples/levenshtein/" target="_blank">textual difference detector</a> to the eDesign examples. This is an example application demonstrating the theory of applying the <a href="http://www.edesign.nl/2009/04/12/find-the-differences/" target="_self">Levenshtein algorithm</a> to detect differences between two versions of the same text. Also, the &#8216;<a href="http://www.edesign.nl/2009/04/12/find-the-differences/" target="_self">Find the differences</a>&#8216; post is updated with a link to this example.</p>
<p>This example takes two texts as input and outputs one merged text marked with what was deleted and what was added. <a href="http://www.edesign.nl/examples/levenshtein/" target="_blank">Take a look</a> and feel free to download the <a href="http://www.edesign.nl/examples/levenshtein/levenshtein.zip">source code</a>. This also inlcludes the <a href="http://www.edesign.nl/examples/levenshtein/levenshtein.zip">Levenshtein algorithm source code</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.edesign.nl/2009/05/07/textual-difference-detector/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Challenge Hash</title>
		<link>http://www.edesign.nl/2009/05/05/challenge-hash/</link>
		<comments>http://www.edesign.nl/2009/05/05/challenge-hash/#comments</comments>
		<pubDate>Tue, 05 May 2009 07:47:51 +0000</pubDate>
		<dc:creator>Jurgen</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.edesign.nl/?p=202</guid>
		<description><![CDATA[The Internet is a crowd and everybody in it can potentially hear what you say. Methods have been developed to prevent this and ensure identity, integrity and authenticity. Often these three can be seen as properties of encryption. Encryption implies the possibility of decryption. Passwords are precious things you don&#8217;t want others to decrypt and [...]]]></description>
			<content:encoded><![CDATA[<p><a rel="attachment wp-att-59" href="http://www.edesign.nl/2009/05/05/challenge-hash/hide_a_key/"><img class="alignleft size-medium wp-image-59" title="Hide a key" src="http://www.edesign.nl/wp-content/uploads/2009/03/hide_a_key-300x199.jpg" alt="Hide a key" width="167" height="110" /></a>The Internet is a crowd and everybody in it can potentially hear what you say. Methods have been developed to prevent this and ensure <a href="http://www.edesign.nl/2009/04/24/security-basics/" target="_self">identity, integrity and authenticity</a>. Often these three can be seen as properties of encryption. Encryption implies the possibility of decryption. Passwords are precious things you don&#8217;t want others to decrypt and read. With a technique called challenge hashing you don&#8217;t need to have any worries about it. Challenge hashing is a technique used to verify a password on site B which was sent from site A without sending the password in plain text. This article covers how.<span id="more-202"></span></p>
<h2>Hash</h2>
<p>A <a href="http://en.wikipedia.org/wiki/Cryptographic_hash_function" target="_blank">cryptographic hash function</a> is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value. The data to be encoded is often called the &#8220;message&#8221;, and the hash value is sometimes called the message digest or simply digest. (<a href="http://en.wikipedia.org/wiki/Cryptographic_hash_function" target="_blank">Wikipedia</a>, retreived may 2009). In other words a hash (digest) is the result of a hashing function from a certain input (password, file, etc.).</p>
<h2>Challenge</h2>
<p>The <a href="http://en.wikipedia.org/wiki/Challenge-response_authentication" target="_blank">challenge</a> is a question presented to a party who needs to provide the correct answer. A common form of this algorithm is where the challenge is asking for the password and the valid response is the correct password. Also <a href="http://recaptcha.net/" target="_blank">CAPCHAs</a> are a well known implementation.</p>
<h2>One step further</h2>
<p>When you combine these two an intuitive way of keeping a password secret while being sent along a publicly accessible area and still being valid for authentication checks emerges.</p>
<p>A system has stored user information (username, password, email, etc.) in a database and has the password stored as an MD5 hash. MD5 is the name of the function as there are more hashing functions. When a user requests a login prompt, the server generates a random string (the challenge) and sends it along with the login prompt. Also it stores the string in the session of that request.</p>
<p>The user enters his username and password and hits &#8216;login&#8217;. Just before submitting, a client side script is triggered which calculates the MD5 hash of the password, concatenates the challenge to the digest and hashes that result. This is submitted as the &#8216;password&#8217; in code.</p>
<p>Now the server has to verify the password. As there is no way to reverse the MD5 digest, the coded password is matched agains the database in a special way. The database needs to concatenate the previously generated challenge to the stored digests and calculate the MD5 hash of that. When the result is the same as the submitted coded password a login is successful.</p>
<h2>Discussion</h2>
<p>A downside to this technique is the database server processing capacity is required as password digests need to be hashed every login attempt. Worst case (most processing time) is when such an attempt fails or the last hit is a success as every password in the database needs to be checked. Therefore this system is not really scalable to systems aiming for masses of users.</p>
<p>Client scripting must be available. This is not really a critical downside as e.g. JavaScript is common, but you can not assume everybody supports it.</p>
<h2>Demonstration</h2>
<p>A <a href="http://www.edesign.nl/examples/challengehash/" target="_blank">demonstration of challenge hashing</a> is available in JavaScript and PHP for you to investigate.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.edesign.nl/2009/05/05/challenge-hash/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sudoku Logic &#8211; part I</title>
		<link>http://www.edesign.nl/2009/05/01/sudoku-logic-part-i/</link>
		<comments>http://www.edesign.nl/2009/05/01/sudoku-logic-part-i/#comments</comments>
		<pubDate>Fri, 01 May 2009 12:51:25 +0000</pubDate>
		<dc:creator>Jurgen</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://www.edesign.nl/?p=262</guid>
		<description><![CDATA[If you haven&#8217;t heard of Sudoku puzzles (数独,, 								sūdoku) you&#8217;ve either been sleeping under a rock or been space traveling for quite a while. These 9×9 square puzzles originating from around 1900 became an international hit in 2005. Sudokus appear in newspapers, online and special sudoku puzzle books all-over-the-world. And as if that is not [...]]]></description>
			<content:encoded><![CDATA[<p><a rel="attachment wp-att-299" href="http://www.edesign.nl/2009/05/01/sudoku-logic-part-i/vinyl-leolan-puzzle-large/"><img class="alignleft size-thumbnail wp-image-299" title="vinyl-leolan-puzzle-large" src="http://www.edesign.nl/wp-content/uploads/2009/05/vinyl-leolan-puzzle-large-150x150.jpg" alt="vinyl-leolan-puzzle-large" width="150" height="150" /></a>If you haven&#8217;t heard of <a href="http://en.wikipedia.org/wiki/Sudoku" target="_blank">Sudoku puzzles</a> <span style="font-weight: normal;">(<span class="t_nihongo_kanji" lang="ja" xml:lang="ja">数独</span><span class="t_nihongo_comma" style="display: none;">,</span>, 								<em><span class="t_nihongo_romaji">sūdoku</span></em>) </span>you&#8217;ve either been sleeping under a rock or been space traveling for quite a while. These 9×9 square puzzles originating from around 1900 became an international hit in 2005. Sudokus appear in <a href="http://www.nytimes.com/ref/crosswords/sudoku/easy.html" target="_blank">newspapers</a>, <a href="http://www.websudoku.com/" target="_blank">online</a> and <a href="http://www.sudoku.nl/default.aspx?id=7fcb013e-72cf-41d2-bc44-6756bf842126" target="_blank">special sudoku puzzle books</a> <a href="http://www.sudokuweb.nl/" target="_blank">all</a>-<a href="http://www.e-sudoku.fr/" target="_blank">over</a>-<a href="http://www.sudokukryss.se/" target="_blank">the</a>-<a href="http://www.sudoku.name/" target="_blank">world</a>. And as if that is not yet enough <a href="http://en.wikipedia.org/wiki/Sudoku#Recent_popularity" target="_blank">Sudoku TV shows</a> and all kinds of <a href="http://en.wikipedia.org/wiki/Sudoku#Variants" target="_blank">variants</a> of the puzzle are made. One can solve a sudoku using logic only. Because of this <a href="http://eric4ever.googlepages.com/sudoku.html" target="_blank">computational algorithms</a> to solve every possible Sudoku must exist. This is part one in the series on such algorithms.<span id="more-262"></span></p>
<p>We must speak of multiple algorithms (strategies) because one algorithm would consist of multiple <a href="http://www.scanraid.com/Strategy_Families" target="_blank">solve strategies</a>. The most simple strategy not included in this article is a brute force or trail and error &#8216;attack&#8217; on given input. This is not logic and could take quite a while as there are <a href="http://forums.whirlpool.net.au/forum-replies-archive.cfm/351170.html#r5116245" target="_blank">70759827985602812313600000000 possibilities</a> to a default 9×9 sudoku, not considering <a href="http://nl.wikipedia.org/wiki/Sudoku#varianten" target="_blank">variations (in Dutch)</a> of e.g. 16×16 hex or jigsaw sudokus. The strategies discussed by <a href="http://www.scanraid.com/" target="_blank">Andrew Stuart</a> are based on the default  9×9 sudoku and some are limited to those puzzles and some block only strategies overlap with row column strategies. What I want to try with this article is programming a more generic solution which implements some strategies making the result cover all strategies. I think this is possible if you imagine a sudoku board not as a set of cells, divided in rows, columns and blocks, but as a set of equal sections which might have one or more cells in common. This way it will also be easy to implement variants simply by extending the model for a sudoku, resulting in e.g. the X-sudoku or hypersudoku and strategies need to be implemented only once.</p>
<h2>Sudoku model</h2>
<p>First of all, a model is needed which represents every possible sudoku. That model should include the character set (numbers 1 to 9), the cells (81) and the sections (27). Defaults for a 9×9 sudoku are filled out between brackets. If not mentioned otherwise, every example will concern such a default 9×9 sudoku.</p>
<p>In the model, each cell is identified by a number (0-80). This array keeps an array of possibilities for every cell. On initilization, every possibility is set to true, as everything still is possible but while solving the sudoku more and more will toggle to false, finally leaving only one to true.</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$possibilities</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
<span style="color: #000000; font-weight: bold;">false</span><span style="color: #339933;">,</span>                <span style="color: #666666; font-style: italic;">//false if unsolved, if solved integer of solution (redundant*)</span>
<span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">true</span> <span style="color: #666666; font-style: italic;">//1 - 9 are possible (true)</span>
<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span> <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> <span style="color: #cc66cc;">81</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
   <span style="color: #000088;">$cells</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$possibilities</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>* The first key in the &#8216;possibilities&#8217; array is redundant: it could be determined by the other keys (if only one is true, it is solved and has that value). But this way, the array keys match the value they represent and it might increase speed by avoiding determining if the cell is solved or not over and over again.</p>
<p>The sections are probably the most important piece of the model as they define how cells relate to each other. There are 27 sections, each containing <em>nP</em><em>ossibilities</em> (9) cells. There are the rows, columns and blocks. Example resp. top row, left column and top left block with the cell identifiers:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$section</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$section</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">2</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">3</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">4</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">5</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">6</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">7</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">8</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//top row</span>
<span style="color: #666666; font-style: italic;">//... 8 more ...</span>
<span style="color: #000088;">$section</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">9</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">18</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">27</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">36</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">45</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">54</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">63</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">72</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//left column</span>
<span style="color: #666666; font-style: italic;">//... 8 more ...</span>
<span style="color: #000088;">$section</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">2</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">9</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">10</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">11</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">18</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">19</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">20</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//top left block</span>
<span style="color: #666666; font-style: italic;">//... 8 more ...</span></pre></div></div>

<p>Also, some default actions (methods) can be added to this model (class). A way to set a cell to a value, one to set a possibility to false and one to count instances of possible locations for a digit in a section. They can call each other to implement recursion:</p>
<ul>
<li>setCellValue() calls removeFromCell() for all other cells for the same value</li>
<li>removeFromCell() calls countPossibleLocations() for that value on every section in which that cell occurs, and if the count equals one</li>
<li>countPossibleLocations() calls setCellValue() for that value</li>
</ul>
<h2>Solving loop</h2>
<p>When you defined the model and filled out some initial digits the algorithm should include a loop searching for the solution. I have included a &#8216;changed&#8217; property to the sudoku model. This boolean is initially set to true. The first line in the while(sudoku.changed) loop is setting this boolean to false. Next, when iterating over every section trying to find solutions on every change (setting or deleting a number from a cell), this boolean is set to true causing another iteration to occur (a kind of recursion).</p>
<p>Now we have a setup where we can add solve strategies to the loop. In this article, the first strategy will be explained.</p>
<h2>Simple elimination strategy</h2>
<p>This first strategy to implement in the section iteration is elimination. This is pretty straight forward. As a digit can only occur once in a section (row, column or block), other cells still having this digit set to true can toggle it. This iterative process continues while this resolves possibilities in a single cell to be unique for a section, all other possibles have been set to false.</p>
<p>In Sudoku Logic &#8211; part II more strategies will be discussed. Also some sudoku extensions, to simulate sudoku variants and a live demonstration of an implementation of this algorithm in PHP will be included.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.edesign.nl/2009/05/01/sudoku-logic-part-i/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Find the differences</title>
		<link>http://www.edesign.nl/2009/04/12/find-the-differences/</link>
		<comments>http://www.edesign.nl/2009/04/12/find-the-differences/#comments</comments>
		<pubDate>Sun, 12 Apr 2009 10:34:00 +0000</pubDate>
		<dc:creator>Jurgen</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://www.edesign.nl/?p=103</guid>
		<description><![CDATA[Comparing files is something developers do every once in a while. For example, comparing configuration files to see what is different in the other environment or compare programming files to see what has changed in the source code. Implementations of text comparison algorithms are therefore widespread and used in several fields. For instance, in blogs [...]]]></description>
			<content:encoded><![CDATA[<p><a rel="attachment wp-att-167" href="http://www.edesign.nl/2009/04/12/find-the-differences/spot-differences-city-picture/"><img class="alignleft size-thumbnail wp-image-167" src="http://www.edesign.nl/wp-content/uploads/2009/04/spot-differences-city-picture-150x150.jpg" alt="Spot differences city picture" width="150" height="150" /></a>Comparing files is something developers do every once in a while. For example, comparing configuration files to see what is different in the other environment or compare programming files to see what has changed in the source code. Implementations of text comparison algorithms are therefore widespread and used in several fields. For instance, in blogs and content managements systems, one might need to know what was altered in an update of a text (in <a href="http://wordpress.org/" target="_blank">cms like systems</a>) or a programmer in a team would like to see what changed in the source code (<a href="http://subversion.tigris.org/" target="_blank">svn</a>). Also a lot of (combined) search, spell checking, speech recognition and plagiarism detection software compare texts (strings) in a certain way. This article covers the <a href="http://en.wikipedia.org/wiki/Levenshtein_distance" target="_blank">Levenshtein distance algorithm</a> and how to use it to indicate alterations to texts.<span id="more-103"></span></p>
<p>The are several ways to compare texts and find <a href="http://en.wikipedia.org/wiki/Diff" target="_blank">differences</a> and <a href="http://en.wikipedia.org/wiki/Category:String_similarity_measures" target="_blank">similarity scores</a>. For this article the similarity scores are not relevant because these scores are just numbers. We are interested in what is added, deleted or substituted in the transformation from text <em>A</em> to text <em>B</em>. In other words, we would like to mark the minimal number of primitive operations needed to transform <em>A</em> to <em>B</em>. To do this we&#8217;ll need the basics from the classic computer science problem: the <a href="http://en.wikipedia.org/wiki/Longest_common_subsequence_problem" target="_blank">longest common subsequence problem</a>. The technique described hereunder which is derived from this problem is the Levenshtein distance algorithm. The algorithm was developed by <a href="http://www.keldysh.ru/departments/dpt_10/lev.html" target="_blank">Vladimir Levenshtein</a> to replace the <a href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming distance</a>. The result of Levenshtein&#8217;s algorithm is exactly the minimal number of operations, but you can use the unpolished  result of this algorithm to determine what parts of text were added, deleted or substituted. Originally this is done per character, but with a little tweak this can be changed to a per word, line or paragraph level function. When you&#8217;ve read this article you will know how this works (and the demo, which is referred to at the end). This demo takes two texts as input and outputs what was added, deleted and replaced.</p>
<p>For this article and the demo I used <a href="http://www.google.nl/search?q=levenshtein" target="_blank">search results</a> for some inspiration. A nice explanation already there is this <a href="http://www.merriampark.com/ld.htm" target="_blank">description of the Levenshtein algorithm</a>, as well as the <a href="http://en.wikipedia.org/wiki/Levenshtein_distance" target="_blank">Wiki page on it</a>. For this article, let&#8217;s use two sample lines: &#8220;The brown dog jumped away from the sprinkler&#8221; and &#8220;The dog ran towards the green sprinkler&#8221;. Now we want to know with words were added, deleted or replaced in the transition from the first sentence to the second. To do this, let&#8217;s take a closer look on how the iterative process of the Levenshtein algorithm is executed.</p>
<ol>
<li>The first step is to contruct a matrix of <em>n</em>+1 by <em>m</em>+1, where <em>n</em> is the number of words in the first line and <em>m</em> the number of words in the second line.</li>
<li>Secondly, fill the first row and column with (from top left to bottom or right) zero to respectively <em>m</em> and <em>n</em>.</li>
<li>Now for each <em>n</em>, evaluate each <em>m</em>. If  the evaluated word of <em>n</em> matches <em>m</em>, the cost is 0, otherwise it&#8217;s 2.</li>
<li>Fill out cell (<em>n</em>, <em>m</em>) having (<em>n</em>, <em>m</em>) is the minumum of:
<ul>
<li>The value of the cell above + 1</li>
<li>The left neighbour cell value + 1</li>
<li>The above left cell value + cost</li>
</ul>
</li>
</ol>
<p>For the Levenshtein distance it stops right here, when the iteration is completed. The distance is the value in the lower right cell. Here we are not interested in the Levenshtein distance itself, but in the matrix we&#8217;ve just constructed. The lowest cost route from bottom right to top left reveals information on what words have been added, deleted and/or substituted. To show how, we need to costruct the matrix with the algorithm above.</p>
<table border="0">
<tbody>
<tr>
<th></th>
<th></th>
<th>The</th>
<th>brown</th>
<th>dog</th>
<th>jumped</th>
<th>away</th>
<th>from</th>
<th>the</th>
<th>sprinkler</th>
</tr>
<tr>
<th></th>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<th>The</th>
<td>1</td>
<td>*</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>dog</th>
<td>2</td>
<td>**</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>ran</th>
<td>3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>towards</th>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>the</th>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>green</th>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>sprinkler</th>
<td>7</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>For the first cell (*), the words &#8220;The&#8221; versus &#8220;The&#8221; are equal, so the cost is 0. Now the minumum of the cell above + 1 (2), the cell to the left + 1 (2) and the above left + cost (0), is the latter one.</p>
<p>The cell underneath it (**) has cost 1 (&#8220;The&#8221; versus &#8220;dog&#8221;) and gets a value equal to the minimum of the cell above + 1 (1), the cell to the left + 1 (3) and the above left + cost (2), which is 1.</p>
<p>Continue to fill this out and the table will look like this:</p>
<table border="0">
<tbody>
<tr>
<th></th>
<th></th>
<th>The</th>
<th>brown</th>
<th>dog</th>
<th>jumped</th>
<th>away</th>
<th>from</th>
<th>the</th>
<th>sprinkler</th>
</tr>
<tr>
<th></th>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<th>The</th>
<td>1</td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
</tr>
<tr>
<th>dog</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<th>ran</th>
<td>3</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<th>towards</th>
<td>4</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<th>the</th>
<td>5</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>5</td>
</tr>
<tr>
<th>green</th>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
<tr>
<th>sprinkler</th>
<td>7</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>5</td>
</tr>
</tbody>
</table>
<p>Now we need to find the lowest cost path from the bottom right to the zero in the top left. To do this simply jump to the cell with the lowest value adjacent to the current cell (to left, above or diagonal). Jumping diagonal is only allowed if the words are the same (column and row). If two or more have the same (lower) value, the priority of choosing a route is to try diagonal first, then either left or above. So, from the bottom right 5 we start the route to the diagonally adjacent 5 (because &#8217;sprinkler&#8217; equals &#8217;sprinkler&#8217;). From the 5  the next step would be the lower 4 above it, then the diagonally adjacent 4, etc&#8230; The route table will look like this:</p>
<table border="0">
<tbody>
<tr>
<th></th>
<th></th>
<th>The</th>
<th>brown</th>
<th>dog</th>
<th>jumped</th>
<th>away</th>
<th>from</th>
<th>the</th>
<th>sprinkler</th>
</tr>
<tr>
<th></th>
<td><span style="color: #ff8401;">0</span></td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<th>The</th>
<td>1</td>
<td><span style="color: #ff8401;">0</span></td>
<td><span style="color: #ff8401;">1</span></td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
</tr>
<tr>
<th>dog</th>
<td>2</td>
<td>1</td>
<td>1</td>
<td><span style="color: #ff8401;">1</span></td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<th>ran</th>
<td>3</td>
<td>2</td>
<td>2</td>
<td><span style="color: #ff8401;">2</span></td>
<td><span style="color: #ff8401;">2</span></td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<th>towards</th>
<td>4</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td><span style="color: #ff8401;">3</span></td>
<td><span style="color: #ff8401;">3</span></td>
<td><span style="color: #ff8401;">4</span></td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<th>the</th>
<td>5</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td><span style="color: #ff8401;">4<br />
</span></td>
<td>5</td>
</tr>
<tr>
<th>green</th>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td><span style="color: #ff8401;">5</span></td>
<td>5</td>
</tr>
<tr>
<th>sprinkler</th>
<td>7</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td>6</td>
<td><span style="color: #ff8401;">5</span></td>
</tr>
</tbody>
</table>
<p>After the route is calculated, every step in it tells something about the operations needed (from top left to bottom right).</p>
<ul>
<li>Every diagonal step (not increasing the score) tells us nothing happened. E.g. the first step from 0 to 0 tells us &#8220;The&#8221; from the first line stays &#8220;The&#8221; in the second line.</li>
<li>Every horizontal step means a word is deleted. E.g. the step from 0 to 1 tells us &#8220;brown&#8221; was deleted at this point.</li>
<li>Every vertical step means a word is added. E.g. the step from 4 to 5 tells us &#8220;green&#8221; was added at this point.</li>
<li>Every diagonal step having the score increased means a word is substituted (added and deleted) at this point. E.g. the step from 2 to 3 tells &#8220;away&#8221; is substituted by &#8220;ran&#8221;. This is an illegal opperation in the detection of addition and deletion of words.</li>
</ul>
<p>This way a text indicating the operations can be constructed:</p>
<p>The <span style="color: #ff0000;">brown</span> dog <span style="color: #ff0000;">jumped</span> <span style="color: #008000;">ran</span><span style="color: #ff0000;">away</span> <span style="color: #008000;">towards</span><span style="color: #ff0000;">from</span> the <span style="color: #008000;">green</span> sprinkler.</p>
<p>Red indicates deletion, green for insertion and a red and green pair indicates substitution.</p>
<p>Of course some optimalizations can be performed. The above for instance does give a good indication of what happened to the text. Imagine a larger text than just these lines and the relevance of changes are marked this way will become more obvious. But because only the primitive operations are detected at the word level, word groups are not taken into account. In this example for instance, the algorithm would be better if it marked &#8220;jumped away from&#8221; as replaced by &#8220;ran towards&#8221; instead of each seperate word as it does now:</p>
<p>The <span style="color: #ff0000;">brown</span> dog <span style="color: #008000;">ran towards</span><span style="color: #ff0000;">jumped away from</span> the <span style="color: #008000;">green</span> sprinkler.</p>
<p>This operation is not that hard to implement, simply replace subsequent differing operations by substitute operations.</p>
<p>An <a href="http://www.edesign.nl/examples/levenshtein/" target="_blank">implementation of this algorithm</a>, with the optimalization patch suggested here, is <a href="http://www.edesign.nl/examples/levenshtein/" target="_blank">now available as an example</a>. Source code (PHP) is available as well.</p>
<p>And about the featured picture on top: there are <a href="http://www.smart-kit.com/s749/birds-eye-view-can-you-spot-all-12-differences/" target="_blank">12 differences to spot</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.edesign.nl/2009/04/12/find-the-differences/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
