As many roads lead to Rome multiple ways are possible to this simulation. One could model the sun, earth, maybe more and start ray tracing. This approach would include solar eclipses but is quite heavy by means of the load on the processor. Because of the number of calculations involved in ray tracing is quite high. The way I choose to describe fully in this article is one close to it. Using vectors pointing from a sphere (earth) to a point (sun) I map a Mercator projected map of the world on the sphere. The challenges included are the yearly orbit of earth around the sun and it’s 23.5° tilted 24 hour spin.

About a decade ago I made a similar program in highschool. I didn’t know vectors and trigonometry then as I do now. Back then I used to have the map of the world where for each longitude (the ones parallel to the equator) I calculated sunset and sunrise times for that longitude. This way I knew when to start painting night over the daylight map I had. This result was pretty accurate and an algorithm to determine those solar times is not hard to implement. This time I used another more advanced approach to the problem.

First let us define a mathematical 3D space, with three axis: (pointing from left to right), (pointing up) and (pointing into your monitor). Pointing directions are given to help your imagination. Axis point from small (negative infinite) to big (positive infinite), they intersect at .

Define a sphere on point , with a radius of 1. This will be the earth. The funny thing is that for every point(x,y,z coordinate) on the earth, because the earth is centered around , automatically is the same a the vector pointing orthoganally away from the earth starting on that point. This comes in handy when we do calculations later on.

We don’t need the sun for this method. We only need the direction to it from the earth. So define a vector *S* which points in the direction of the sun from . To do this we must use sine and cosine. On time the direction of the sun will be , which is a vector pointing to the right. On (hour) that should be pointing right . The same goes for 6 o’clock am and pm, being respectively and . This gives us insight that the direction to the sun can be given as ( should be scaled to a value from 0 to 1, so we need to divide by 24):

With this approach we actually make the sun rotate around the earth.

The above formula still is not complete. The axis around which the earth is spinning is not orthogonal to the plane defined by the orbit of the earth around the sun. It is tilted 23.5°. This is how we get the four seasons on earth of course. To model this a nice trick can be applied to *d*. We know that on or closely around June 21th the sun is right above the 23.5° latitude (Tropic of Cancer), the summer solstice. From here it takes a sine with a period of a year to go to -23.5° and back again. June 21th is the 172th day of the year. The standard cosine function has the form:

We can simply fill this out. The must be 23.5, is , is -172 and equals 0. If you define the observable tilt (because the real tilt will always be 23.5) and is the day of the year:

Now this observable tilt is the angle there is between what is the real direction to the sun and the already calculated direction *d*. To calculate the vector ( component, pointing up or down) which should be added to to get the real direction you can use tangens.

Then the real direction vector is the addition of * *to :

Using the dot product (or scalar product or inner product) you can calculate the angle between two vectors. Or, if the vectors are both normalized (length = 1) it is the projection of the one vector onto the other. So if we calculate the dot product for every normal vector from the surface of the earth () with we should get some meaningful results.

If the the sun is in another direction so is pointing to the night side of the earth. If that piece of earth recieves solar light.

We have a 2D map (Mercator projection) we need to map on the 3D sphere. We need to do this to determine which we need to calculate dot products for. To avoid confusion, lets call *x* and from the 2D map and . We can iterate for each and each for each and map it to a in 3D. To do this we need to determine phi (, logitude: )and theta (, latitude: ). In the iteration ( is the height of the map and is the width of the map):

Now and map to as follows:

Now we can for each coordinate on the map if it should be displayed as day or night.

Because you know the angle the sunlight makes with the earth’s surface you can make add some shading by making it increasingly dark on the edges of the day. It the dot product of and is smaller than 0.1 () the point is in dusk or dawn and you could mix day and night as a gradient to make the difference between them look more fluently. Another joke is the reflection of the sun: if the product is greater than, let’s say, 0.95 () the sun is approximately orthogonally above that coordinate an you could make it more white to have it look like the sun is reflecting in the map.

A demontration of this method is available at the edesign example site. There are a number of assumptions done and restrictions set while creating this demo.

- I assumed the solar rays are parallel. The distance from the sun to the earth is huge, but it is actually wrong to assume rays are parallel. They are only by approximation so it is good enough for this simulation.
- Because of this assumption, the night ’starts’ exactly at the half of the world in the shade while actually this is different. The sun is far bigger than the earth and therefor some rays should be able to reach the shadow half of the earth (near the limit depicted in the first image). By approximation I eliminated this too.
- Timing is not very accurate. As well as for the 24h clock as for the solstices rough estimates are used.
- A perfect sphere is taken as world. No fattening is modelled (polar radius should be smaller than equatorial radius).
- The ‘fun with shading’ should implement something to make land mass not reflect sunlight.

This example takes two texts as input and outputs one merged text marked with what was deleted and what was added. Take a look and feel free to download the source code. This also inlcludes the Levenshtein algorithm source code.

]]>A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value. The data to be encoded is often called the “message”, and the hash value is sometimes called the message digest or simply digest. (Wikipedia, retreived may 2009). In other words a hash (digest) is the result of a hashing function from a certain input (password, file, etc.).

The challenge is a question presented to a party who needs to provide the correct answer. A common form of this algorithm is where the challenge is asking for the password and the valid response is the correct password. Also CAPCHAs are a well known implementation.

When you combine these two an intuitive way of keeping a password secret while being sent along a publicly accessible area and still being valid for authentication checks emerges.

A system has stored user information (username, password, email, etc.) in a database and has the password stored as an MD5 hash. MD5 is the name of the function as there are more hashing functions. When a user requests a login prompt, the server generates a random string (the challenge) and sends it along with the login prompt. Also it stores the string in the session of that request.

The user enters his username and password and hits ‘login’. Just before submitting, a client side script is triggered which calculates the MD5 hash of the password, concatenates the challenge to the digest and hashes that result. This is submitted as the ‘password’ in code.

Now the server has to verify the password. As there is no way to reverse the MD5 digest, the coded password is matched agains the database in a special way. The database needs to concatenate the previously generated challenge to the stored digests and calculate the MD5 hash of that. When the result is the same as the submitted coded password a login is successful.

A downside to this technique is the database server processing capacity is required as password digests need to be hashed every login attempt. Worst case (most processing time) is when such an attempt fails or the last hit is a success as every password in the database needs to be checked. Therefore this system is not really scalable to systems aiming for masses of users.

Client scripting must be available. This is not really a critical downside as e.g. JavaScript is common, but you can not assume everybody supports it.

A demonstration of challenge hashing is available in JavaScript and PHP for you to investigate.

]]>We must speak of multiple algorithms (strategies) because one algorithm would consist of multiple solve strategies. The most simple strategy not included in this article is a brute force or trail and error ‘attack’ on given input. This is not logic and could take quite a while as there are 70759827985602812313600000000 possibilities to a default 9×9 sudoku, not considering variations (in Dutch) of e.g. 16×16 hex or jigsaw sudokus. The strategies discussed by Andrew Stuart are based on the default 9×9 sudoku and some are limited to those puzzles and some block only strategies overlap with row column strategies. What I want to try with this article is programming a more generic solution which implements some strategies making the result cover all strategies. I think this is possible if you imagine a sudoku board not as a set of cells, divided in rows, columns and blocks, but as a set of equal sections which might have one or more cells in common. This way it will also be easy to implement variants simply by extending the model for a sudoku, resulting in e.g. the X-sudoku or hypersudoku and strategies need to be implemented only once.

First of all, a model is needed which represents every possible sudoku. That model should include the character set (numbers 1 to 9), the cells (81) and the sections (27). Defaults for a 9×9 sudoku are filled out between brackets. If not mentioned otherwise, every example will concern such a default 9×9 sudoku.

In the model, each cell is identified by a number (0-80). This array keeps an array of possibilities for every cell. On initilization, every possibility is set to true, as everything still is possible but while solving the sudoku more and more will toggle to false, finally leaving only one to true.

$possibilities = array( false, //false if unsolved, if solved integer of solution (redundant*) true, true, true, true, true, true, true, true, true //1 - 9 are possible (true) ); for($i = 0; $i < 81; $i++) { $cells[$i] = $possibilities; }

* The first key in the ‘possibilities’ array is redundant: it could be determined by the other keys (if only one is true, it is solved and has that value). But this way, the array keys match the value they represent and it might increase speed by avoiding determining if the cell is solved or not over and over again.

The sections are probably the most important piece of the model as they define how cells relate to each other. There are 27 sections, each containing *nP**ossibilities* (9) cells. There are the rows, columns and blocks. Example resp. top row, left column and top left block with the cell identifiers:

$section = array(); $section[] = array(0, 1, 2, 3, 4, 5, 6, 7, 8); //top row //... 8 more ... $section[] = array(0, 9, 18, 27, 36, 45, 54, 63, 72); //left column //... 8 more ... $section[] = array(0, 1, 2, 9, 10, 11, 18, 19, 20); //top left block //... 8 more ...

Also, some default actions (methods) can be added to this model (class). A way to set a cell to a value, one to set a possibility to false and one to count instances of possible locations for a digit in a section. They can call each other to implement recursion:

- setCellValue() calls removeFromCell() for all other cells for the same value
- removeFromCell() calls countPossibleLocations() for that value on every section in which that cell occurs, and if the count equals one
- countPossibleLocations() calls setCellValue() for that value

When you defined the model and filled out some initial digits the algorithm should include a loop searching for the solution. I have included a ‘changed’ property to the sudoku model. This boolean is initially set to true. The first line in the while(sudoku.changed) loop is setting this boolean to false. Next, when iterating over every section trying to find solutions on every change (setting or deleting a number from a cell), this boolean is set to true causing another iteration to occur (a kind of recursion).

Now we have a setup where we can add solve strategies to the loop. In this article, the first strategy will be explained.

This first strategy to implement in the section iteration is elimination. This is pretty straight forward. As a digit can only occur once in a section (row, column or block), other cells still having this digit set to true can toggle it. This iterative process continues while this resolves possibilities in a single cell to be unique for a section, all other possibles have been set to false.

In Sudoku Logic – part II more strategies will be discussed. Also some sudoku extensions, to simulate sudoku variants and a live demonstration of an implementation of this algorithm in PHP will be included.

]]>The are several ways to compare texts and find differences and similarity scores. For this article the similarity scores are not relevant because these scores are just numbers. We are interested in what is added, deleted or substituted in the transformation from text *A* to text *B*. In other words, we would like to mark the minimal number of primitive operations needed to transform *A* to *B*. To do this we’ll need the basics from the classic computer science problem: the longest common subsequence problem. The technique described hereunder which is derived from this problem is the Levenshtein distance algorithm. The algorithm was developed by Vladimir Levenshtein to replace the Hamming distance. The result of Levenshtein’s algorithm is exactly the minimal number of operations, but you can use the unpolished result of this algorithm to determine what parts of text were added, deleted or substituted. Originally this is done per character, but with a little tweak this can be changed to a per word, line or paragraph level function. When you’ve read this article you will know how this works (and the demo, which is referred to at the end). This demo takes two texts as input and outputs what was added, deleted and replaced.

For this article and the demo I used search results for some inspiration. A nice explanation already there is this description of the Levenshtein algorithm, as well as the Wiki page on it. For this article, let’s use two sample lines: “The brown dog jumped away from the sprinkler” and “The dog ran towards the green sprinkler”. Now we want to know with words were added, deleted or replaced in the transition from the first sentence to the second. To do this, let’s take a closer look on how the iterative process of the Levenshtein algorithm is executed.

- The first step is to contruct a matrix of
*n*+1 by*m*+1, where*n*is the number of words in the first line and*m*the number of words in the second line. - Secondly, fill the first row and column with (from top left to bottom or right) zero to respectively
*m*and*n*. - Now for each
*n*, evaluate each*m*. If the evaluated word of*n*matches*m*, the cost is 0, otherwise it’s 2. - Fill out cell (
*n*,*m*) having (*n*,*m*) is the minumum of:- The value of the cell above + 1
- The left neighbour cell value + 1
- The above left cell value + cost

For the Levenshtein distance it stops right here, when the iteration is completed. The distance is the value in the lower right cell. Here we are not interested in the Levenshtein distance itself, but in the matrix we’ve just constructed. The lowest cost route from bottom right to top left reveals information on what words have been added, deleted and/or substituted. To show how, we need to costruct the matrix with the algorithm above.

The | brown | dog | jumped | away | from | the | sprinkler | ||
---|---|---|---|---|---|---|---|---|---|

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |

The | 1 | * | |||||||

dog | 2 | ** | |||||||

ran | 3 | ||||||||

towards | 4 | ||||||||

the | 5 | ||||||||

green | 6 | ||||||||

sprinkler | 7 |

For the first cell (*), the words “The” versus “The” are equal, so the cost is 0. Now the minumum of the cell above + 1 (2), the cell to the left + 1 (2) and the above left + cost (0), is the latter one.

The cell underneath it (**) has cost 1 (“The” versus “dog”) and gets a value equal to the minimum of the cell above + 1 (1), the cell to the left + 1 (3) and the above left + cost (2), which is 1.

Continue to fill this out and the table will look like this:

The | brown | dog | jumped | away | from | the | sprinkler | ||
---|---|---|---|---|---|---|---|---|---|

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |

The | 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

dog | 2 | 1 | 1 | 1 | 2 | 3 | 4 | 5 | 6 |

ran | 3 | 2 | 2 | 2 | 2 | 3 | 4 | 5 | 6 |

towards | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 5 | 6 |

the | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 5 |

green | 6 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |

sprinkler | 7 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 5 |

Now we need to find the lowest cost path from the bottom right to the zero in the top left. To do this simply jump to the cell with the lowest value adjacent to the current cell (to left, above or diagonal). Jumping diagonal is only allowed if the words are the same (column and row). If two or more have the same (lower) value, the priority of choosing a route is to try diagonal first, then either left or above. So, from the bottom right 5 we start the route to the diagonally adjacent 5 (because ’sprinkler’ equals ’sprinkler’). From the 5 the next step would be the lower 4 above it, then the diagonally adjacent 4, etc… The route table will look like this:

The | brown | dog | jumped | away | from | the | sprinkler | ||
---|---|---|---|---|---|---|---|---|---|

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |

The | 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

dog | 2 | 1 | 1 | 1 | 2 | 3 | 4 | 5 | 6 |

ran | 3 | 2 | 2 | 2 | 2 | 3 | 4 | 5 | 6 |

towards | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 5 | 6 |

the | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
5 |

green | 6 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |

sprinkler | 7 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 5 |

After the route is calculated, every step in it tells something about the operations needed (from top left to bottom right).

- Every diagonal step (not increasing the score) tells us nothing happened. E.g. the first step from 0 to 0 tells us “The” from the first line stays “The” in the second line.
- Every horizontal step means a word is deleted. E.g. the step from 0 to 1 tells us “brown” was deleted at this point.
- Every vertical step means a word is added. E.g. the step from 4 to 5 tells us “green” was added at this point.
- Every diagonal step having the score increased means a word is substituted (added and deleted) at this point. E.g. the step from 2 to 3 tells “away” is substituted by “ran”. This is an illegal opperation in the detection of addition and deletion of words.

This way a text indicating the operations can be constructed:

The brown dog jumped ranaway towardsfrom the green sprinkler.

Red indicates deletion, green for insertion and a red and green pair indicates substitution.

Of course some optimalizations can be performed. The above for instance does give a good indication of what happened to the text. Imagine a larger text than just these lines and the relevance of changes are marked this way will become more obvious. But because only the primitive operations are detected at the word level, word groups are not taken into account. In this example for instance, the algorithm would be better if it marked “jumped away from” as replaced by “ran towards” instead of each seperate word as it does now:

The brown dog ran towardsjumped away from the green sprinkler.

This operation is not that hard to implement, simply replace subsequent differing operations by substitute operations.

An implementation of this algorithm, with the optimalization patch suggested here, is now available as an example. Source code (PHP) is available as well.

And about the featured picture on top: there are 12 differences to spot.

]]>