The whole Enchilada

A sequence of thought

Document oriented programming

I've changed the sort(<) semantics to allow for (multi)set operations.

The new sort operator takes two operands L1 and L2, sorts them, and then takes their (multiset) union and difference. The sorting step is ommited if L1 and L2 are already sorted (because Enchilada caches the sort property for every list).

Here is an example of the new sort operator(<):

[0;4;2;2;2;3;5;1;5;7] [2;5;5;1;9] <

gives:

[0;1;1;2;2;2;2;3;4;5;5;5;5;7;9] [0;2;2;3;4;7]

If L1 and L2 share a lot of structure, the union and difference can be computed very efficiently. For instance, if L1 and L2 are equal, their difference will be the empty list.

The new sort operator opens a doorway to an efficient document management system. Consider the following 'document':

D1=[{H};{e};{l};{l};{o};{""};{w};{o};{r};{l};{d}]

Let's say we want to insert 'sunny' in between 'Hello' and 'world'.

[{H};{e};{l};{l};{o};{""};{w};{o};{r};{l};{d}]
[{s};{u};{n};{n};{y};{""}] 6 {L1 L2 i=L1 i & L2 i - L1 + + +}

gives:

D2=[{H};{e};{l};{l};{o};{""};{s};{u};{n};{n};{y};{""};{w};{o};{r};{l};{d}]

Now if we want to take the difference between D1 and D2 we could use the unix-style diff command. However, unix-diffs don't scale to terabyte lists because every list element has to be considered.

Another approach to this is to uniquely label each element in a document, with all labels in ascending order. Here's where dual lists come in handy:

D3=[0={H};1={e};2={l};3={l};4={o};5={""};6={w};7={o};8={r};9={l};10={d}]

we insert 'sunny':

D4=[0={s};1={u};2={n};3={n};4={y};5={""}]

.. in between 'Hello' and 'world'. We do this by prefixing all the labels of D4 with a label of D3, after the point of insertion. This way we maintain the necessary order:

D5=D3 D4 6 {L1 L2 i=L1 i & {p=p p ` 1 & : # :} L2 * | i - L1 + + +}

D5=
[0={H};1={e};2={l};3={l};4={o};5={""}
 5 0={s};5 1={u};5 2={n};5 3={n};5 4={y}
 5 5={""};6={w};7={o};8={r};9={l};10={d}]

Note that D5's labels are still unique and sorted. Also note the usage of the max(|) operator to prefix the labels of 'sunny'. Because everything is ordered and unique we use the sort(<) operator the take the difference between D3 (the original document) and D5 (the modified document):

D5 D3 < {union diff=diff}

 [0={H};1={e};2={l};3={l};4={o};5={""};5 0={s};5 1={u};5 2={n}
 5 3={n};5 4={y};5 5={""};6={w};7={o};8={r};9={l};10={d}] 
 [0={H};1={e};2={l};3={l};4={o};5={""}
  6={w};7={o};8={r};9={l};10={d}] < {union diff=diff}

gives the difference:

[5 0={s};5 1={u};5 2={n};5 3={n};5 4={y};5 5={""}]

Because D5 shares a lot of structure with D3, the difference between them can be computed much faster than in the case of mutable lists.

Now Imagine millions of (distributed) edits on a document. How do you determine all the edits and differences in the most efficient way? Enchilada's dual lists and operators are carefully crafted to give an answer to that difficult question.

You can find the lastest release (0.5.6) by following the documentation link.

Immutable papersPerformance & runtime

Write a comment

New comments have been disabled for this post.