Spot the diff
Wikipedia or wikis in general have a really really cool feature, the history tab. It shows the differences between two versions of a page down to the word, you can easily see what has been added, what has been removed and what has been changed. Behind all this is a diff algorithm. The diff algorithm used by Domino Wiki was written by John Resig and released under a creative commons license, it works really well but it is written in Javascript. This is great for web applications and it does work in the Notes client as Javascript to some extent, but I want a version in LotusScript.
Step one is to understand the algorithm.
Firstly, there are some redundant functions. Function randomColor() and function diffString2() can both go, they are a distraction. Function escape() is useful on the web, but irrelevant in the notes client so that can go. Function diffString() and function diff() are the ones that we need to examine.
There are no comments in the code and lots of short cryptic variable names, so lets rewrite it with comments to explain what it is up to.
function diffString( o, n ) {
//o and n are strings, the old string and the new string
//first trim the ends of the strings s+$ means whitespace (s) followed by end of input ($)
o = o.replace(/s+$/, ''); n = n.replace(/s+$/, '');
//now call the diff function and put the result in a variant called out. More on this later.
//the parameters of the diff function are two arrays, the question mark is a kind of shorthand if statement
//if the string is "" then it passes in an empty array [] else it does an @explode on the strings and passes in that.
var out = diff(o == "" ? [] : o.split(/s+/), n == "" ? [] : n.split(/s+/) ); var str = "";
// it is now going to make an array of all the whitespaces in the original strings
//this allows it to put back the correct type of whitespace (space, tab, newline etc.) in the correct position
//personally I don't care very much. I would be happy with all whitespaces ending up as " " in the output
var oSpace = o.match(/s+/g);
if (oSpace == null) {
oSpace = ["n"];
} else {
oSpace.push("n");
}
var nSpace = n.match(/s+/g);
if (nSpace == null) {
nSpace = ["n"];
} else {
nSpace.push("n");
}
// out is an object with various properties, we will work out the exact structure later.
if (out.n.length == 0) {
//if the new string is empty then the whole of the old string is in <del> tags
for (var i = 0; i < out.o.length; i++) {
str += '<del>' + escape(out.o[i]) + oSpace[i] + "</del>";
}
} else {
if (out.n[0].text == null) {
//this is another test for the new string being empty so more <del> tags
for (n = 0; n < out.o.length && out.o[n].text == null; n++) {
str += '<del>' + escape(out.o[n]) + oSpace[n] + "</del>";
}
}
//now we loop through word by word in the new string
for ( var i = 0; i < out.n.length; i++ ) {
if (out.n[i].text == null) {
//the line above looks a bit wrong, but the .text property of out.n[i] doesn't do what you think it does.
//if out.n[i].text is null then that means that the word in the new string has been inserted
str += '<ins>' + escape(out.n[i]) + nSpace[i] + "</ins>";
} else {
var pre = ""; //now append any words from the old list that followed the current word from the new list
//these are stored in the row property of out.n[i]
for (n = out.n[i].row + 1; n < out.o.length && out.o[n].text == null; n++ ) {
pre += '<del>' + escape(out.o[n]) + oSpace[n] + "</del>";
}
now we add the words that appeared in both lists. out.n[i] won't be populated for inserted words.
str += " " + out.n[i].text + nSpace[i] + pre;
}
}
}
//and hand back one string with markup for differences to the caller.
return str; }
I hope the comments help a bit, but it doesn’t totally make sense yet until we understand the structure of the “out” object. I used the sample text from John Resig’s article:
diffString(“The red brown fox jumped over the rolling log.”,“The brown spotted fox leaped over the rolling log);
after running it gives output like this:
The <del>red</del> brown <ins>spotted</ins> fox <del>jumped</del> <ins>leaped</ins> over the rolling log.
I used a stringify function to convert the “out” object to JSON code, so here is what the object looks like as JSON:
{
“o”:[
{"text":"The","row":0},
"red",
{"text":"brown","row":1},
{"text":"fox","row":3},
"jumped",
{"text":"over","row":5},
{"text":"the","row":6},
{"text":"rolling","row":7},
"log."
],
“n”:[
{"text":"The","row":0},
{"text":"brown","row":2},
"spotted",
{"text":"fox","row":3},
"leaped",
{"text":"over","row":5},
{"text":"the","row":6},
{"text":"rolling","row":7},
"log"
]
}
so out has 2 properties, n and o. They are both arrays of things, the things can either be scalar text or arrays.
The arrays have a property called “text” and a property called “row”
That is enough for now, I will dig into the diff function next which builds this out object, then I will try to reimplement the whole thing in Lotuscript.
