We are.Kangaroo Cloud Stack UED Team, is committed to building an excellent one-stop data middleware product. We always maintain the spirit of craftsmanship and explore the front-end path to accumulate and spread the value of experience for the community.
This article was written by Wen Chang
preamble
In a Web IDE, displaying logs in the console is a critical feature. Monaco Editor, as a powerful code editor, provides a rich set of features and a flexible API for "decorating" content, which makes it ideal for building a log displayer. Here's an example.
In addition to real-time logs, there are scenarios where you need to view historical logs. As shown in the figure below:
Monarch
Monarch It is a syntax highlighting library that comes with Monaco Editor, through which we can use Json like syntax to realize the syntax highlighting function of a custom language. We won't go into too much detail about it here, but only the part that will be used in this article.
A language definition is basically a description of the various properties of a languageJSON
values, some of the generic properties are listed below:
- tokenizer
(Required fields, objects with state) This defines thetokenization
rules. A core component of the Monaco Editor for defining language syntax highlighting and parsing. Its main function is to break the input code text into tokens so that the editor can perform syntax highlighting, error checking and other editing functions based on these tokens. - ignoreCase
(Optional, default:false
) Is the language case insensitive?tokenizer
(Regular expressions in the (disambiguator) use this attribute to do case (un)sensitive matching, and thecase
Tests in scenarios. - brackets
(optional, array defined in parentheses)tokenizer
Use this to easily define curly bracket matches, for more details see@brackets cap (a poem)bracket Part. Each square bracketed definition is an array of 3 elements or objects describing theopen left curly brackets
、Close right curly brackets
cap (a poem)token
Class. The default definitions are as follows:
[ ['{','}',''],
['[',']',''],
['(',')',''],
['<','>',''] ]
tokenizer
The tokenizer property describes how lexical analysis is performed and how the input is converted into tokens. Each token is given a css class name for rendering in the editor, and the built-in css tokens include:
identifier entity constructor
operators tag namespace
keyword info-token type
string warn-token predefined
error-token invalid
comment debug-token
regexp
constant attribute
delimiter .[curly,square,parenthesis,angle,array,bracket]
number .[hex,octal,binary,float]
variable .[name,value]
meta .[content]
Of course, you can also customize the css token by injecting the custom css token in the following way.
("vs", {
base: "vs",
inherit: true,
rules: [
{
token: "token-name",
foreground: "#117700",
}
],
colors: {},
});
A tokenizer consists of an object that describes the state. the initial state of the tokenizer is determined by the first state defined by the tokenizer. What does this mean? Check out the example below.root
is the first state defined by the tokenizer, the initial state. Similarly, if theafterIf
respond in singingroot
The two states switch places, thenafterIf
is the initial state.
('myLanguage', {
tokenizer: {
root: [
// Rules for the initial state
[/\d+/, 'number'], // Recognize numbers
[/\w+/, 'keyword'], // Recognize keywords
// Move to the next state
[/^if$/, { token: 'keyword', next: 'afterIf' }], [], // identify the keyword.
]
afterIf: [
// Handle what comes after the if statement
[/\s+/, ''], // ignore whitespace
[/[\w]+/, 'identifier'], // identify the identifier
// Return to initial state
[/;$/, { token: '', next: 'root' }], ], // return to initial state.
]
}
});
How do I get the first state defined by the tokenizer?
class MonarchTokenizer {
...
public getInitialState(): {
const rootState = (null, this._lexer.start!);
return (rootState, null);
}
...
}
Getting an initial state is done through getInitialState. As you can see from the code, the initial state is identified through the property this._lexer.start. How is this property assigned a value?
function compile() {
...
for (const key in ) {
if ((key)) {
if (!) {
= key;
}
const rules = [key];
[key] = new Array();
addRules('tokenizer.' + key, [key], rules);
}
}
...
}
When compile parses the language definition object passed in by setMonarchTokensProvider, it takes the first key read out as the initial state. You may wonder, is it guaranteed that the first attribute written when defining the object will be the first one read when it is read?
In JavaScript, there are some specific rules for the order of object properties:
- Integer key: if the attribute name is an integer (e.g.
"1"
、"2"
etc.), these attributes will be listed in ascending order of value. - String keys: for non-integer string keys, the order of the attributes is in the order they were added to the object.
- Symbol keys: If the keys of the attributes are of type Symbol, these attributes are added to the object in the order in which they were added.
Therefore, when using thefor...in
When looping through the properties of an object, the order of the properties is as follows:
- First are all the integer keys, in ascending order.
- Then there are all the string keys, in the order they were added.
- Finally, there are all the Symbol keys, in the order they were added.
An example.
The above example shows that although "1" and "2" are written later, they are still sorted and output first, followed by the string keys according to the order in which they are added. Therefore, if possible, do not use integer keys to define state names.
When a tokenizer is in a certain state, only rules from that state can be matched. All rules are matched sequentially, and when the first rule is matched, its action is used to determine the type of token. No further attempts will be made with later rules, so it is important to order the rules in a way that is most efficient. For example, spaces and identifiers take precedence.
How do you define a state?
Each state is defined as an array of rules for matching inputs, and the rules can have the following form:
- [regex, action]
A shorthand for the form {regex: regex, action: action}. - [regex, action, next]
A shorthand of the form { regex: regex, action: action{ next: next} }.
('myLanguage', {
tokenizer: {
root: [
// [regex, action]
[/\d+/, 'number'],
/**
* [regex, action, next]
* [/\w+/, { token: 'keyword', next: '@pop' }] abbreviation
*/
[/\w+/, 'keyword', '@pop'],
]
}
});
The regex is a regular expression, and the action is categorized as follows:
- string
{ token: string } - [action, ..., actionN]
An array of multiple actions. This is only allowed if the regular expression consists of exactly N groups (i.e. parentheses). Example.
[/(\d)(\d)(\d)/, ['string', 'string', 'string']
- { token: tokenClass }
This tokenClass can be either a built-in css token or a custom token, and some special token classes are also specified:- "@rematch"
Backs up the input and re-calls the tokenizer. This only works when the state changes (or we go into infinite recursion), so this is usually used in conjunction with the next attribute. This can be used, for example, when you are in a particular tokenizer state and want to exit when you see certain end markers, but don't want to use them while you are in that state. Example.
- "@rematch"
('myLanguage', {
tokenizer: {
root: [
[/\d+/, 'number', 'word'],
],
word: [
[/\d/, '@rematch', '@pop'],
[/[^\d]+/, 'string']
]
}
});
What does the state flow diagram of this language look like?
It can be seen that when defining a state, it should be ensured that the state exists exits i.e. no rules for transferring to other states are defined), otherwise it may lead to a dead end loop that constantly uses the rules within the state to match.
- "@pop"
Ejects the tokenizer stack to return to the previous state. - "@push"
Push into the current state and continue in the current state.
('myLanguage', {
tokenizer: {
root: [
// Push the new state when a start token is matched
[/^\s*function\b/, { token: 'keyword', next: '@function' }],.
],
function: [
// Matching rules in function state
[/^\s*{/, { token: '', next: '@push' }], [ /^\s*{/, { token: 'keyword', next: '@push' }], function: [ // Match rules in function state.
[/[^}]+/, 'statement'], [/^\s*}/, { token: '', next: '@push' }], [ // Match rules in function state.
]
}
});
- $n
The nth set of matched inputs, or $0 for this matched input.
- $Sn
The nth part of the state, e.g., the state @, is represented by $S0 for the entire state name (i.e., ), $S1 returns tag, and $S2 returns foo.
Real-time logs
The use of Monaco Editor will not be mentioned in this article and is not the focus of this article. The main reason for using Monaco Editor to implement the log viewer is to have different highlighting themes for different types of logs.
In real-time logging, there are different types of logs, such as: info, error, warning, and so on.
/**
* Log Constructor
* @param {string} log Log content
* @param {string} type Log Type
*/
export function createLog(log: string, type = '') {
let now = moment().format('HH:mm:ss');
if (.NODE_ENV == 'test') {
now = 'test';
}
return `[${now}] <${type}> ${log}`;
}
Based on the logs it can be seen that each log is[xx:xx:xx]
outset, followed closely by<Log type>.
(Log type: info, error, warning.) (Log types: info, error, warning.)
Register a custom languagerealTimeLog
As a real-time log of alanguage
。
The rules here are also very simple, two parsing rules are set in the root, which are matching log date and log type. After matching the corresponding log type, the matching content will be labeled withtoken
and then through thenext
Go to the next state with the matching reference identifier ($1 means group 1 in the regular grouping)consoleLog
In StatusconsoleLog
in it, match the contents of the log, and typetoken
, until the termination condition (log date) is met.
import { languages } from "monaco-editor/esm/vs/editor/";
import { LanguageIdEnum } from "./constants";
({ id: });
(, {
keywords: ["error", "warning", "info", "success"],
date: /\[[0-9]{2}:[0-9]{2}:[0-9]{2}\]/,
tokenizer: {
root: [
[/@date/, "date-token"],
[
/<(\w+)>/,
{
cases: {
"$1@keywords": { token: "$1-token", next: "@log.$1" },
"@default": "string",
},
},
],
],
log: [
[/@date/, { token: "@rematch", next: "@pop" }],
[/.*/, { token: "$S2-token" }],
],
},
});
// ===== Log Style =====
export const realTimeLogTokenThemeRules = [
{
token: "date-token",
foreground: "#117700",
},
{
token: "error-token",
foreground: "#ff0000",
fontStyle: "bold",
},
{
token: "info-token",
foreground: "#999977",
},
{
token: "warning-token",
foreground: "#aa5500",
},
{
token: "success-token",
foreground: "#669600",
},
];
State flow diagram:
General Log
Normal logs are slightly different from real-time logs in that the type of log he is is not displayed, there is not aStart/End
identifierMonarch
Highlighting rule matches. So there is a need for a rule that is not displayed in the text and can be used as aStart/End
The identifier of the
It's also true that there is such a thing as a zero-width character that doesn't take up any width and can be matched - a "zero-width character".
Zero Width Characters (ZWCs) are characters that occupy zero width in text, usually for specific text processing or coding purposes. They are not visually visible, but may have an impact in program processing.
Use zero-width characters to create identifiers for different log types.
// Use zero-width characters for different types of log identifiers
// U+200B
const ZeroWidthSpace = '';
// U+200C
const ZeroWidthNonJoiner = ''; // U+200D
// U+200D
const ZeroWidthJoiner = ''; // U+200D
// Start/end identifiers for different types of logs, used for parsing Monarch files.
const jobTag = {
info: `${ZeroWidthSpace}${ZeroWidthNonJoiner}${ZeroWidthSpace}`, warning: `${ZeroWidthSpace}${ZeroWidthNonJoiner}${ZeroWidthSpace}`, }
warning: `${ZeroWidthNonJoiner}${ZeroWidthSpace}${ZeroWidthNonJoiner}`,
error: `${ZeroWidthJoiner}${ZeroWidthNonJoiner}${ZeroWidthJoiner}`,
success: `${ZeroWidthSpace}${ZeroWidthNonJoiner}${ZeroWidthJoiner}`,
};
The syntax highlighting rules for writing after that are the same as for real-time logging.
import { languages } from "monaco-editor/esm/vs/editor/";
import { LanguageIdEnum } from "./constants";
({ id: });
(, {
info: /\u200b\u200c\u200b/,
warning: /\u200c\u200b\u200c/,
error: /\u200d\u200c\u200d/,
success: /\u200b\u200c\u200d/,
tokenizer: {
root: [
[/@success/, { token: "success-token", next: "@" }],
[/@error/, { token: "error-token", next: "@" }],
[/@warning/, { token: "warning-token", next: "@" }],
[/@info/, { token: "info-token", next: "@" }],
],
log: [
[
/@info|@warning|@error|@success/,
{ token: "$S2-token", next: "@pop" },
],
[/.*/, { token: "$S2-token" }],
],
},
});
// ===== Log Style =====
export const normalLogTokenThemeRules = [
{
token: "error-token",
foreground: "#BB0606",
fontStyle: "bold",
},
{
token: "info-token",
foreground: "#333333",
fontStyle: "bold",
},
{
token: "warning-token",
foreground: "#EE9900",
},
{
token: "success-token",
foreground: "#669600",
},
];
State flow diagram:
(sth. or sb) else
Support in Monaco Editora
elemental
Monaco Editor does not natively support inserting HTML elements into content, it only natively supports highlighting links and supportingcmd + click
Open link. However, there may still be a need to implement similara
Elemental effects.
After looking up the Monaco Editor's API, linkProvider may be generally satisfactory, but it's still not enough.
Here is the introduction:
In Monaco Editor, linkProvider is an interface used to provide linking functionality. It allows developers to provide links to specific text or code snippets in the editor that, when hovered over or clicked on by the user, can perform specific actions, such as opening a document, jumping to a definition, etc.
Specific Usage:
const linkProvider = {
provideLinks: function(model, position) {
// Returns an array of links
return [
{
range: new (1, 1, 1, 5), // Scope of the link
url: '', // linked URL
tooltip: 'Click to access examples' // hovering alert
}
];
}
};
('javascript', linkProvider);
It is registered for registered languages and does not affect other languages. It is triggered when the text content changesprovideLinks
。
Think of a thought based on this API:
- When generating text, use the
#link#${(attrs)}#link#
package, attrs is an object containing the a element'sattribute
。 - Before the text is passed to the Monaco Editor, the content of the text is parsed, and the regulars are used to transform the
a Element tagging
Match it out using theattrs
(used form a nominal expression)link text
interchangeabilitymarkup text
and record the replacementlink text
Index position in the text content. Using the Monaco Editor'sgetPositionAt
Get the position of the linked text in the editor (start/end row/column information), generate theRange
。 - Use a container to collect the logs in the corresponding
Link
information. After passing the linkProvider the correspondinglink text
Recognized as link highlighting. - Binding click events to editor instances
onMouseDown
If the location of the clicked content is in the collected Link, the externally provided custom link click event is triggered.
Realization along these lines:
- Generate a element tag.
interface IAttrs {
attrs: Record<string, string>;
props: {
innerHTML: string;
};
}
/**
*
* @param attrs
* @returns
*/
export function createLinkMark(attrs: IAttrs) {
return `#link#${(attrs)}#link#`;
}
- Parsing text content
getLinkMark(value: string, key?: string) {
if (!value) return value;
const links: ILink[] = [];
const logRegexp = /#link#/g;
const splitPoints: any[] = []; const splitPoints: any[] = []; const splitPoints: any[] = []; let indexObj = (value)
let indexObj = (value);
/**
* 1. regular matching of the corresponding start/end tags #link# , in pairs
*/
while (indexObj) {
({
index: ,
1: indexObj[1], { { index.
});
indexObj = (value);
}
/**}; }; indexObj = (value); }; indexObj = (value)
* 2. Process the log content and collect link information based on the range of link tokens obtained in step 1.
*/ /** l is the start tag and r is the end tag */.
/** l is the start tag, r is the end tag */
let l = (); let r = ();
/** Number of characters to remove in string replacement */
let cutLength = 0;
/** link Information collection */
const collections:[number, number, string, ILink['attrs']][] = [];
while (l & & r) {
const infoStr = ( + r[0].length, );
const info = (infoStr);
/**
* Manually filling in a space is caused by the lack of content following it, causing the cursor to be on the link when clicking on the space following the link, too
* causing the current range to also be in the link's range, triggering the custom click event
*/
const splitStr = + ' ';
/** Replace '#link#{"attrs":{"href": "xxx"}, "props":{"innerHTML": "logDownload"}}#link#' with text in innerHTML */
processedString =
(0, - cutLength) +
splitStr +
( + r[0].length - cutLength);
([
/** Start position of the link */
- cutLength.
/** End position of the link */
+ - cutLength - 1, /** link address */
/** Link address */
,
/** Applied in workflow, click to open subtask tab */
,
]);
/** Record the difference between the replacement text and the original text during text replacement */
cutLength += - + r[0].length * 2;
l = (); r = (); }
}
/**}
* 3. Processing the collected link information
*/
const model = (processedString, 'xxx');
for (const [start, end, url, attrs] of collections) {
const startPosition = (start); const endPosition = (end)
const startPosition = (start); const endPosition = (end); const model = (processedString, 'xxx')
({
range: new Range(
range: new Range(
,
, , , , , , , , , , , , and
), url, , , , , , , , , , and
url,
attrs, , , ), url, , attrs, , attrs, , attrs
});
}
();
return processedString; }
}
- Use a container to store the parsed links
const value = `This is a string of text with links: ${createLinkMark({
props: {
innerHTML: 'link a'
},
attrs: {
href: ''
}
})}`
const links = getLinkMark(value)
- Register LinkProvider with stored links
('taskLog', {
provideLinks() {
return { links: links || [] };
},
});
- Binding custom events
is triggered whenever you click on something in the editor.onMouseDown
, in which you can get the current click position of theRange
information, iterates through all the collected Links in a loop, and determines the current click position of theRange
Whether or not it's in there.containsRange
method can determine whether aRange
Is it in anotherRange
Center.
useEffect(() => {
const disposable = ?.onMouseDown((e) => {
const curRange = ;
if (curRange) {
const link = ((e) => {
return ( as Range)?.containsRange(curRange);
});
if (link) {
onLinkClick?.(link);
}
}
});
return () => {
disposable?.dispose();
};
}, []);
Cons: When the log is printed in real time, the link will not be immediately highlighted when it appears, you need to wait a while for it to become link highlighted.
consultation
- Monarch
ultimate
Welcome to [Kangaroo Cloud Digital Stack UED team]~
Kangaroo Cloud Digital Stack UED team continues to share the results of technology for the majority of developers, successively participated in the open source welcome star
- Big Data Distributed Task Scheduler - Taier
- Lightweight Web IDE UI framework - Molecule
- SQL Parser Project for Big Data - dt-sql-parser
- Kangaroo Cloud Digital Stack front-end team code review engineering practices document - code-review-practices
- A faster, more flexible and easier to use module packager - ko
- A component testing library for antd - ant-design-testing