Location>code7788 >text

Implementing Rich Text Editor #3 from Zero - Linear Data Structure Model Based on Delta

Popularity:974 ℃/2025-04-22 10:46:14

The design of the data model is the core foundation of the editor, which directly affects the selection model,DOMDesign of modules such as model and state management. For examplequillThe selection model inindex + lenexpression, andslateIn the middle, it isanchor + focusThese are all based on the design of data models. Therefore, our rich text editor that is implemented from zero needs to start with the design of the data model, and then we can gradually implement other modules.

  • Open source address:/WindRunnerMax/BlockKit
  • Online editing:/BlockKit/
  • Project Notes:/WindRunnerMax/BlockKit/blob/master/

Articles on implementing rich text editor projects from scratch:

  • Feeling like nothing, I'm ready to try to write a rich text editor from scratch
  • Implementing Rich Text Editor from Zero #2 - Editor Architecture Design Based on MVC Mode
  • Implementing Rich Text Editor #3 from Zero - Linear Data Structure Model Based on Delta

Delta

In previous architectural designs, we have mentioned that the flat data structure and independent subcontracting design we implement can be more conveniently processed whether it is operated in the editor or data analysis on the server side. In comparison, nested data structures can be better alignedDOMExpression, however, the operation of data becomes more complicated.

Therefore, in the design of data structure, we are based onquillofdeltaThe structure has been transformed. The most important part is to transform it intoimmutableThe implementation of the editor is actually a state that is maintained rather than itselfdeltastructure. And simplifies the expression of the entire data model, making it complexinsertandAttributeIf the type is reduced, the complexity of its operation logic will also be reduced.

deltaIt is a simple and powerful format used to describe the content of the document and its changes. This format is basedJSON, not only easy to read, but also easy to analyze by machine. By usingdeltaThe content described can accurately describe the content and format information of any rich text document, avoidingHTMLcommon ambiguity and complexity in.

deltaIt consists of a series of operations that describe changes made to the document. Common operations includeinsertdeleteretain. It should be noted that these operations do not depend on the specific index position, which always describes the changes in the current index position and can beretainTo move the pointer position.

deltaIt can represent both the entire document and the changes made to the document. Then here we willdeltaThe main class objects and related operation logic are described, especially in the editor's actual application scenarios, as well as the main transformations and related type declarations.

insert

insertThe method is to insert the data intodeltaThis is the operationdelta. When describing the entire document content, the content of the entire data should beinsert. The first parameter is the text content to be inserted, and the second parameter is an optional attribute object that describes the format information of the text.

const delta = new Delta();
("123").insert("567", { a: "1" });
// [{"insert":"123"},{"insert":"567","attributes":{"a":"1"}}]

OriginalinsertParameters are object typesEmbedStructure, this structure can be expressedImageVideoMentionetc. non-text structure data, and attributesAttributeMapThe parameters areRecord<string, unknown>Type, which is used to describe complex attribute values.

Here we have streamlined it,insertParameters are only supportedstringType, and specificschemaThen it is defined when the editor is initialized, and the format information is collected fromAttrsDescription in. andAttributeMapChange toRecord<string, string>type, and can avoid suchCloneDeepisEqualetc. for the implementation of complex data structures.

ActuallyEtherPadZhong means generalAttributethat is[string, string]Type, we also used a similar design here. In this infrastructure design, we recommend placing the attribute value flat onattributesIn attributes, instead of using a single attribute value askey, nested all attribute values ​​invaluemiddle.

export interface AttributeMap {
  [key: string]: string;
}

deltaThe entire name is usually used to describe changes, so in addition to describing the entire document content, it is of course also possible to describe changes in the document content. However, the application changes need to be usedcompose, Let's look at the description of this method later.

const delta1 = new Delta().insert("123");
const delta2 = new Delta().insert("456");
(delta2); // [{"insert":"456123"}]

delete

deleteThe method describes the length of the deleted content. Since the above definition, our current content is all text and embedded in the original data definition.EmbedThe length of1

const delta = new Delta().insert("123");
(new Delta().delete(1)); // [{"insert":"23"}]

Actually, this is a relatively interesting thing, throughdeleteWhen describing the change, it is impossible to know what exactly was deleted. Then in this case,invertWhen additional data is needed to constructinsertoperate. Similar scenes inOT-JSONThe content description is written directly to theop, so it can be directly based onopCome oninvertoperate.

const delta = new Delta().insert("123");
const del = new Delta().delete(1);
const invert1 = (delta); // [{"insert":"1"}]
const delta2 = (del); // [{"insert":"23"}]
(invert1); // [{"insert":"123"}]

retain

retainThe method describes the length of the content reserved, in other words, this operation can be used to move the pointer.

const delta = new Delta().insert("123");
(new Delta().retain(1).insert("a")); // [{"insert":"1a23"}]

at the same timeretainOperations can also be used to modify the property value of the content. In addition, if you want to delete a certain property, you only need tovalueSet as""Just do it.

const delta = new Delta().insert("123");
const d2 = new Delta().retain(1).retain(1, { "a": "1" });
const d3 = (d2); // [{"insert":"1"},{"insert":"2","attributes":{"a":"1"}},{"insert":"3"}]
(new Delta().retain(1).retain(1, { "a": "" })); // [{"insert":"123"}]

push

pushThe method is as mentioned aboveinsertdeleteretainThe basic method of dependency is to push content intodeltaMaintained in the array. The very important part of the implementation here isopMerge of the attribute value, when the attribute value is the same, it needs to be merged into a singleop

const delta = new Delta();
({ insert: "123" }).push({ insert: "456" }); // [{"insert": "123456"}]

Of course it's not justinsertThe operations will be merged, fordeleteretainThe same goes for the operation. The merge operation here is based onopofattributesattribute value, ifattributesDifferent attribute values ​​will be considered differentop, will not be merged automatically.

const delta = new Delta();
({ delete: 1 }).push({ delete: 1 }); // [{"delete": 2}]
const delta2 = new Delta();
({ retain: 1 }).push({ retain: 1 }); // [{"retain": 2}]
const delta3 = new Delta();
({ retain: 1 }).push({ retain: 1, attributes: { a: "1"} }); // [{"retain": 1}, {"retain": 1, "attributes": {"a": "1"}}]

slice

sliceThe method is used to interceptdeltaThis method is based onopoflengthThe attribute value is intercepted.

const delta = new Delta().insert("123").insert("456", {a: "1"});
(2, 4); // [{"insert":"3"},{"insert":"4","attributes":{"a":"1"}}]

eachLine

eachLineMethods are used to iterate the entire line by rowdelta, Our overall data structure is linear, but the editorDOMIt needs to be divided by row, so it is based on\nIt is a relatively common operation to divide the lines.

This method is very important for the initialization of the editor. After the initialization is completed, our changes need to be implemented based on the state, rather than through this method every time. Here we have also transformed it, originaleachLineMethod is not to carry\nnode.

const delta = new Delta().insert("123\n456\n789");
((line, attributes) => {
  (line, attributes);
});
// [{insert:"123"},{insert:"\n"}] {}
// [{insert:"456"},{insert:"\n"}] {}
// [{insert:"789"},{insert:"\n"}] {}

diff

diffMethods are used to compare twodeltaThe difference between this method is actually based on plain textmyers diffTo achieve it. BydeltaConvert to plain text, indiffAfterwards, we continue to select shorter operating parts to achievedeltaBetweendiff

In fact, in our implementation, we can completelydiffThe method is independent, the only references to the external one herefast-diffrely. existquillmiddlediffis necessary because it is a completely uncontrollable input method, and the input of text depends on theDOMTextdiffto implement, and our input is dependentbeforeinputSemi-controlled input of events, so they are not strongly dependentdiff

const delta1 = new Delta().insert("123");
const delta2 = new Delta().insert("126");
(delta2); // [{"retain":2},{"insert":"6"},{"delete":1}]

chop

chopMethod for cutting the endretainOperation, when there is the endretainAnd when there is no attribute operation, it itself has no meaning, so this method can be called to check and remove.

const delta = new Delta().insert("123").retain(1);
(); // [{"insert":"123"}]

compose

composeMethods can be used to transfer twodeltaMerge into onedeltaSpecifically,BofdeltaOperations apply toAofdeltaOn, the return is a new onedeltaObject. Of course, in our implementation, the originalDeltaClass rewrittencomposeMethod, doneimmutable

composeThere are many application scenarios in the editor, such as input events, content pasting, historical operations, etc., similar to the editorapplyMethod, equivalent to changing the application content.

const delta1 = new Delta().insert("123");
const delta2 = new Delta().retain(1).delete(1);
(delta2); // [{"insert":"13"}]

invert

invertThe method is todeltaThis method is very important in historical operations because it is itselfundoIt is necessary to reverse the current operation. In addition, in realizingOToflocal-csmiddle,invertIt is also a very important method.

It is worth noting that the above mentioneddeleteOperation andretainThe original content will not be recorded when the operation is executed, soinvertIt needs originaldeltaAs a data source, please note thatdeltaIt was the originaldelta, notinvertThe latterdelta

const delta = new Delta().insert("123");
const del = new Delta().delete(1);
const invert1 = (delta); // [{"insert":"1"}]
const delta2 = (del); // [{"insert":"23"}]
(invert1); // [{"insert":"123"}]

concat

concatMethods can connect twodeltaGo to the new onedeltamiddle. This operation iscomposedifferent,composeYesBThe operation ofAUp, andconcatThen it isBThe operation is added toAAfter the operation.

const delta1 = new Delta().insert("123");
const delta2 = new Delta().insert("456");
(delta2); // [{"insert":"456123"}]
(delta2); // [{"insert":"123456"}]

transform

transformThe method is to implement the operationOTThe basis of collaboration, even if collaborative editing is not implemented, this part of the implementation will be required in the history operation module in the editor. Suppose we have users nowA[uid:1]And usersB[uid:2], at this time weuidDefine priority,AThe operation priority is higher thanB, and the current document content is12

If it is in collaboration,b'=(b)It means, assumptionaandbAll from the samedraftBranched out, thenb'It's assumptionaIt has been applied, at this timebNeed to be inaTransformed fromb'Only directly apply, we can also understand it astransformSolvedaOperation is correctbThe impact of operation.

Then let's assumeAexist12The rear position has been insertedAcharacter,Bexist12The rear position has been insertedBcharacter. If you perform a coordinated operation, then the two are equivalent to inserting characters at the same position at the same time. If you apply them directly without performing operation transformation, the data of the two will conflict.AThe data is12BA,andBThe data is12AB, so you need to convert first and then apply.

// User A
const base = new Delta().insert("12");
const delta = new Delta().retain(2).insert("A");
let current = (delta); // 12A
// Accept Remote B
const remote = new Delta().retain(2).insert("B");
// ob1=OT(oa, ob)
const remote2 = (remote, true); // [{"retain":3},{"insert":"B"}] 
current = (remote2); // 12AB
// User B
const base = new Delta().insert("12");
const delta = new Delta().retain(2).insert("B");
let current = (delta); // 12B
// Accept Remote A
const remote = new Delta().retain(2).insert("A");
// oa2=OT(ob, oa)
const remote2 = (remote, false); // [{"retain":2},{"insert":"A"}] 
current = (remote2); // 12AB

transformPosition

transformPositionThe method is used to convert the specified position. The main scenario of this method is to transform the position of the selection/cursor in the editor, for example, the cursor is at this time.1Later, the structuredeltaexist1If the content has been added before, the cursor needs to follow the movement.

const delta = new Delta().retain(5).insert("a");
(4); // 4
(5); // 6

OpIterator

OpIteratorThe class defines an iterator for iteratingdeltaIn-houseopoperate. Iterators are used in large quantitiesdiffcomposetransformIn other methods, it is important to note that the iterator is callednextNo crossing timeop, even if passedlengthGreater than the current oneoplength.

const delta = new Delta()
  .insert("Hello", { bold: "true" })
  .insert(" World", { italic: "true" });
  .retain(3);
(2); // { insert: "He", attributes: { bold: "true" } }
(10); // { insert: "llo", attributes: { bold: "true" } }

EtherPad

EtherPadIt is also a very excellent collaborative editor, and the built-in data structure is also linear, and the overall description of the document is calledClientVars, data structure changes are calledChangeSet. The implementation of the collaborative algorithm isEasySync, and the documentation also provides a detailed description of how to perform server scheduling.

ClientVars/ChangesetAlso based onJSONThe data format is used to describe the content of the document and its changes. But it doesn't look likeDeltaSo clearly expressed,JSONThe structure is mainlyAttributePoolInside, the expression of text content is a plain text structure.

Document description

Document content is usedClientVarsThe data structure ofapoolText attribute pool,textText content,attribsAttribute description. In the following example, we describe the content of the title, bold, italic, plain text, so the contents of this are shown below.

({
  initialAttributedText: {
    text: "short description\n*Heading1\ntext\n*Heading2\nbold italic\nplain text\n\n",
    attribs: "*0|1+i*0*1*2*3+1*0|2+e*0*4*2*3+1*0|1+9*0*5+4*0+1*0*6+6*0|2+c|1+1",
  },
  apool: {
    numToAttrib: {
      "0": ["author", "a.XYe86foM7oYgmpuu"],
      "1": ["heading", "h1"],
      "2": ["insertorder", "first"],
      "3": ["lmkr", "1"],
      "4": ["heading", "h2"],
      "5": ["bold", "true"],
      "6": ["italic", "true"],
    },
    nextNum: 7,
  },
});

This content looks more complicated directly, and of course it is actually quite complicated.apoolIt is a property pool where all decorations for text content are stored here, that is, in itnumToAttribProperty stored[string, string]value,nextNumIt is the next index to be placed.textIt is the content of plain text, which is equivalent to the content of the plain text of the document at this time.attribsIt is based ontextplain text content and obtainapoolThe attribute value in the decorative text content is equivalent to the encoding.

thereforeattribsNeed to be taken out separately to analyze.*nIndicates taking the firstnApplying attributes to text usually requires cooperation|nand+nUse attributes,|nIndicates impactnOK, only for\nThe attribute needs to be used.+nIt means taking outnThe number of characters is equivalent toretainOperations can not only move the pointer, but also use it to hold property changes. It is particularly important to note that|mIt will not appear alone, it will always be with+nExpress them together, express themnExist in charactersmA newline, and the last applied character must be\n

also,EasySyncThe numbers inside are all36Category, so here+i/+eNone of them are special symbols, they need to be used0-9To represent the numbers0-9characters, and10-35That meansa-z,For example+ithat isi - a = 8 => 8 + 10 = 18

  • *0Indicates taking outauthorattributes,|1+iIt means it is applied toiLength is18, the characters areshort description\n, due to its inclusion\nThen define|1
  • *0*1*2*3Indicates before removal4attributes,+1Indicates applying it1characters, i.e.*Characters, inEasySyncThis character at the beginning of the line bears the line attribute, not placed\nmiddle.
  • *0Indicates taking outauthorattributes,|2+eIt means it has been appliedeLength is14, the characters areHeading1\ntext\n, which contains two\nThen define|2
  • *0*4*2*3Indicates the removal of relevant attributes.+1Indicates applying it1characters, i.e.*Characters represent line attribute content.
  • *0|1+9Indicates taking outauthorattributes,+9Indicates applying it9characters, i.e.Heading2\n, at the end is\nThen define|1
  • *0*5+4Indicates the removal of bold attributes, and apply4characters, i.e.bold
  • *0+1Indicates taking outauthorproperties, application1A single character is a space.
  • *0*6+6Indicates the removal of italics and other properties, apply6characters, i.e.italic
  • *0|2+cIndicates the removal of relevant attributes and apply12One character is\nplain text\nThere are two\nThen define|2
  • |1+1Indicates the end\nAttributes, inEasySyncThis character at the end of the line needs to be defined separately.

Change description

OTThe reference atomic implementation of operation transformation isinsertdeleteretainThree operations, thenChangeSetThe content description is naturally similar, but the data change description is not the samedeltaThe structure is so clear, but a set of data structure descriptions are specially designed.

Documents are initially created or importedClientVars, and then every time the document content is modified, it will be generatedChangeSet. For the above three operations, three symbols correspond to=expressretain+expressinsert-expressdelete, the combination of these three symbols can describe changes in the document content, in addition to additional definitions:

  • Z: The first letter isMagicNumber, denoted as a sign bit.
  • :N: The original content length of the document isN
  • >N: The final document length will be longer than the original document lengthN
  • <N: The final document length will be shorter than the original document lengthN
  • +N: Actual execution of the operation means that it has been insertedNcharacters.
  • -N: Actual operation means that the operation has been deletedNcharacters.
  • =N: Actual execution of the operation means that the operation is retainedNcharacters, move pointer or apply attributes.
  • |N: It means it has affectedNline, consistent with the above document description, needs to be+/-/=NThe length of the operation includesNindivual\n, and the end operation must be\n. The end of the document\nIf you need to express it, you must use it|1=1express.
  • *I: Indicates the application attribute,IforapoolThe index value in a+=or|There could be any number before*operate.
  • $: Indicates the end symbol, used to markOperationEnd of part.
  • char bank: for storageinsertOperation of specific character contents and use them in sequence when performing the insertion operation.

The same example as above already exists in the current documentexist text\n\nThe text content of the above content is then pasted into the document, thenUser ChangeSetThe changes are described as follows:

({
  changeset:
    "Z:c>1t|1=b*0|1+i*0*1*2*3+1*0|2+e*0*4*2*3+1*0|1+9*0+5*0*5+6*0|1+1*0+a$short description\n*Heading1\ntext\n*Heading2\nbold italic\nplain text",
  apool: {
    numToAttrib: {
      "0": ["author", "a.XYe86foM7oYgmpuu"],
      "1": ["heading", "h1"],
      "2": ["insertorder", "first"],
      "3": ["lmkr", "1"],
      "4": ["heading", "h2"],
      "5": ["italic", "true"],
    },
    nextNum: 6,
  },
});
  • ZexpressMagicNumber, that is, the symbol bit.
  • cIndicates that the original content of the document is12,Right nowexist text\n\nContent length.
  • >1tIt means that the final document will be longer than the original content1t36Priority conversion1tfor64, specificallychar bankindex.
  • |1=bIndicates that the length of the moving pointer isb, convert length to11, the text content isexist text\n, at the end\ndefinition|1
  • *0|1+iIndicates fromapoolTake out0Properties, applicationiConvert length to18, the text content isshort description\n, at the end\ndefinition|1
  • *0*1*2*3+1Indicates taking out4attributes applied to1, the text content is*, specifically, it is the start mark of the line attribute.
  • *0|2+eIndicates taking out0Properties, applicationeConvert length to14, the text content isHeading1\ntext\n, at the end\nAnd includes two\ndefinition|2
  • *0*4*2*3+1Indicates taking out4attributes applied to1, the text content is*, also is the start mark of the row attribute.
  • *0|1+9Indicates taking out0Attribute, the application length is9, the text content isHeading2\n, at the end\ndefinition|1
  • *0+5Indicates taking out0Attribute, the application length is5, the text content isbold
  • *0*5+6It indicates that the properties such as italics are removed, and the application length is6, the text content isitalic
  • *0|1+1Indicates taking out0Attribute, the application length is1, at the end\nThen define|1, the text content is\n
  • *0+aIndicates taking out0Attribute, the application length isaRight now10, the text content isplain text
  • $Indicates the end symbol, and subsequent content symbols arechar bank, the last\nUsually no representation is needed, even if it is expressed, it is required.|1=1Individually indicated.

Slate

slateThe data structure and the design of the selection are almost completely alignedDOMThe structure, and the data structure design is not independent, and is also based onJSONThe structure of , very similar to the low-code structural design. The operation transformation is directly inslateCore modulesTransformand the implementation of position-related operation transformation is scattered inPointPathin object.

[
  {
    type: "paragraph",
    children: [
      { text: "This is editable " },
      { text: "rich", bold: true },
      { text: " text." },
    ],
  },
  { type: "block-quote", children: [{ text: "A wise quote." }] },
];

Operation

Also based onOTImplement operation transformation algorithm, linear data structure only requiresinsertdeleteretainThree basic operations can be achieved, andslateThe middle is realized9Atomic operations are used to describe changes, including text processing, node processing, selection transformation operations, etc.

  • insert_node: Insert node.
  • insert_text: Insert text.
  • merge_node: Merge nodes.
  • move_node: Mobile node.
  • remove_node: Remove node.
  • remove_text: Remove text.
  • set_node: Set the node.
  • set_selection: Set selection.
  • split_node: Split nodes.

In fact, it's okay to only implement the application, the correspondinginverttransformIt will be more complicated. existslateIn-houseinverseRelated operations areImplemented, position-relatedtransformexistThere are related implementations in this.

In fact, these operations are usually not called directly in the editor.slateThese most basic operations are encapsulated and implementedTransformsModule. Many specific operations are implemented in this module, such asinsertNodesliftNodesmergeNodesmoveNodesremoveNodesWait, the operation here is far more than9Types.

  • insertFragment: Insert a fragment of a node at the specified location.
  • insertNodes: Insert node at the specified location.
  • removeNodes: Delete nodes at the location specified in the document.
  • mergeNodes: Merge with the previous node of the same level at a certain node.
  • splitNodes: Split nodes at a specified location in a node.
  • wrapNodes: Wrap a layer of nodes at a specified location in a node.
  • unwrapNodes: Unpack a layer of wrapping node at a specified location in a node.
  • setNodes: Set node properties at a specified location in a node.
  • unsetNodes: Cancel the node attribute at a specified location in a node.
  • liftNodes: Raise a layer of node at a specified location in a node.
  • moveNodes: Move the node at the specified location in the document.
  • collapse: Collapses the selection into caret.
  • select: Actively set the selection position.
  • deselect: Cancel the selection location.
  • move: Move the selection location.
  • setPoint: Sets the one-sided position of the selection.
  • setSelection: Set the new selection location.
  • delete: Delete the selection content.
  • insertText: Insert text in the selection position.
  • transform: On the editorimmutableExecuteop

OT-JSON

Similarly, inOT-JSON(json0)Implemented11In rich text scenesSubTypeIf you still need to expand, then naturally more operations are needed to describe the changes. Therefore, in fact,JSONNested data formats to describe content changes are much more complicated than linear operations.

existslateThe basics of encapsulating the editor by itselfopIf it is inOT-JSONPackaging based onTransformsIf so, for implementationOTThe collaboration will be more convenient.ShareDBAll collaborative frameworks need to be referencedOTTypesThe definition of . Of course, basedCRDTThe implementation of collaboration looks easier to handle.

  • {p:[path], na:x}: In the specified path[path]Add valuexValue.
  • {p:[path,idx], li:obj}: On the list[path]Index ofidxInsert object beforeobj
  • {p:[path,idx], ld:obj}: From the list[path]Index ofidxDelete the object inobj
  • {p:[path,idx], ld:before, li:after}: Use objectsafterReplacement list[path]IndicesidxObject ofbefore
  • {p:[path,idx1], lm:idx2}: Add the list[path]Indicesidx1The object to the indexidx2place.
  • {p:[path,key], oi:obj}: toward the path[path]Add keys to the object inkeyand objectsobj
  • {p:[path,key], od:obj}: From the path[path]Delete key in object inkeyand valueobj
  • {p:[path,key], od:before, oi:after}: Use objectsafterReplace path[path]Middle keykeyObject ofbefore
  • {p:[path], t:subtype, o:subtypeOp}: For path[path]The object application type intSuboperation ofo, subtype operation.
  • {p:[path,offset], si:s}: On the path[path]Offset of stringoffsetInsert strings ats, use subtypes internally.
  • {p:[path,offset], sd:s}: From the path[path]Offset of stringoffsetDelete stringss, use subtypes internally.

Summarize

The design of data structure is very important. For the editor, the design of data structure directly affects the selection model,DOMDesign of modules such as model and state management. Here we talked about a lot of data structure design.DeltaChangesetlinear structure,SlateThe nested structure of each data has its own design and considerations.

Then after selecting the data structure, you can implement various modules of the editor on this basis. Next, we will start from the data model, design the representation of the selection model, and then synchronize the browser selection and the editor selection model on this basis. The selection model is used as the goal of the operation to realize the basic operations of the editor, such as insertion, deletion, formatting and other operations.

One question every day

  • /WindRunnerMax/EveryDay

refer to

  • /slab/delta/blob/main/src/
  • /slab/delta/blob/main/src/
  • /ether/etherpad-lite/tree/develop/doc/public/easysync
  • /ether/etherpad-lite/blob/develop/src/static/js/
  • /ether/etherpad-lite/blob/develop/src/static/js/
  • /ianstormtaylor/slate/blob/main/packages/slate/src/interfaces/
  • /ianstormtaylor/slate/blob/main/packages/slate/src/interfaces/transforms/