Lorefnon | Blog | Integrating Google diff-match-patch with AutoMerge.Text

This post outlines the use of Google’s diff-match-patch library to generate patches that can be translated to operations against an AutoMerge.Text CRDT for situations when deriving the operations directly from user interactions is cumbersome.

Automerge is a so-called Conflict-Free Replicated Data Type (CRDT), which allows concurrent changes on different devices to be merged automatically without requiring any central server.

Automerge.Text provides experimental support for collaborative text editing. Under the hood, text is represented as a list of characters, which is edited by inserting or deleting individual characters. Compared to using a regular JavaScript array, Automerge.Text offers better performance.

However, using the AutoMerge.Text API directly is somewhat cumbersome in practice:

newDoc = Automerge.change(currentDoc, doc => {

doc.text = new Automerge.Text()

doc.text.insertAt(0, 'h', 'e', 'l', 'l', 'o')

doc.text.deleteAt(0)

doc.text.insertAt(0, 'H')

})

It is of course possible to track every keypress in a text field and translate them to insert/delete operations, but this requires handling of a lot of corner cases like selection tracking, handling clipboard paste etc.

Many javascript text editor components (eg. Quill, Draft etc.) provide deltas in the change events and we can translate them into AutoMerge operations.

An alternative to the above is using Google’s diff-match-patch library to compute patches that represent the change between the source document and changed document, and we can easily translate these patches to changes in AutoMerge Text data structure.

This comes in particularly handly when we don’t have control over the editor component, and deriving change deltas from user interactions is not very straightforward.

import DiffMatchPatch from "diff-match-patch";

import AutoMerge from "automerge";

// Typically we would get the changedText after some interactions in our UI components:

const sourceText = "The angel shall fall";

const changedText = "Lucifer shall rise";

// Initialize AutoMerge wrapper document

let doc = AutoMerge.init();

doc = AutoMerge.change(doc, doc => {

doc.text = new AutoMerge.Text();

doc.text.insertAt(0, ...sourceText.split(""));

});

const dmp = new DiffMatchPatch();

// Compute the diff:

const diff = dmp.diff_main(sourceText, changedText);

// diff is simply an array of binary tuples representing the change

// [[-1,"The ang"],[1,"Lucif"],[0,"e"],[-1,"l"],[1,"r"],[0," shall "],[-1,"fall"],[1,"rise"]]

// This cleans up the diff so that the diff is more human friendly.

dmp.diff_cleanupSemantic(diff);

// [[-1,"The angel"],[1,"Lucifer"],[0," shall "],[-1,"fall"],[1,"rise"]]

const patches = dmp.patch_make(sourceText, diff);

// A patch object wraps the diffs along with some change metadata:

// [{

// "diffs":[[-1,"The angel"],[1,"Lucifer"],[0," shall "],[-1,"fall"], [1,"rise"]],

// "start1":0,

// "start2":0,

// "length1":20,

// "length2":18

// }]

// We can use the patch to derive the changedText from the sourceText

console.log(dmp.patch_apply(patches, sourceText)[0]); // "Lucifer shall rise"

// Now we translate these patches to operations against AutoMerge.Text instance:

doc = AutoMerge.change(doc, doc => {

patches.forEach(patch => {

let idx = patch.start1;

patch.diffs.forEach(([operation, changeText]) => {

switch (operation) {

case 1: // Insertion

doc.text.insertAt(idx, ...changeText.split(""));

case 0: // No Change

idx += changeText.length;

break;

case -1: // Deletion

for (let i = 0; i < changeText.length; i++) {

doc.text.deleteAt(idx);

}

break;

}

});

console.log(doc.text.join("")); // "Lucifer shall rise"

Gaurab Paul

Polyglot software developer & consultant passionate about web development, distributed systems and open source technologies

Tags