is often used in front-end development API, if we want to realize one of their own, we should how to achieve it? Today we will try to start writing a JSON Parser, to understand the principle of its internal implementation.
JSON Syntax
JSON is a syntax for serializing objects, arrays, values, strings, booleans, and nulls. The syntax rules are as follows:
- Data is represented using name/value pairs.
- Objects are saved using curly braces ({}), with each name followed by a ':' (colon), and name/value pairs are separated using , (comma).
- Use square brackets ([]) to hold arrays, with array values separated by ,(comma).
- JSON values can be: numbers (integer or float) / strings (in double quotes) / logical values (true or false) / arrays (in square brackets) / objects (in curly braces) / null
Implementing Parser
Parser generally goes through the following process, which is categorized intoLexical analysis, syntax analysis, conversion, code generationProcess.
lexical analysis
By understanding the JSON syntax, we can see that JSON has the following types and their characteristics as listed below:
typology | fundamental characteristic |
---|---|
Object | "{" ":" "," "}" |
Array | "[" "," "]" |
String | '"' |
Number | "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" |
Boolean | "true" "false" |
Null | "null" |
So according to these features, traversing the JSON string and comparing it with the above features can get the corresponding token. the lexical analysis implementation code is as follows:
// lexical analysis
const TokenTypes = {
OPEN_OBJECT: '{',
CLOSE_OBJECT: '}',
OPEN_ARRAY: '[',
CLOSE_ARRAY: ']',
STRING: 'string',
NUMBER: 'number',
TRUE: 'true',
FALSE: 'false',
NULL: 'null',
COLON: ':',
COMMA: ',',
}
class Lexer {
constructor(json) {
this._json = json
this._index = 0
this._tokenList = []
}
createToken(type, value) {
return { type, value: value || type }
}
getToken() {
while (this._index < this._json.length) {
const token = ()
this._tokenList.push(token)
}
return this._tokenList
}
bigbang() {
const key = this._json[this._index]
switch (key) {
case ' ':
this._index++
return ()
case '{':
this._index++
return (TokenTypes.OPEN_OBJECT)
case '}':
this._index++
return (TokenTypes.CLOSE_OBJECT)
case '[':
this._index++
return (TokenTypes.OPEN_ARRAY)
case ']':
this._index++
return (TokenTypes.CLOSE_ARRAY)
case ':':
this._index++
return ()
case ',':
this._index++
return ()
case '"':
return ()
}
// number
if ((key)) {
return ()
}
// true false null
const result = (key)
if () {
return (TokenTypes[])
}
}
isNumber(key) {
return key >= '0' && key <= '9'
}
parseString() {
this._index++
let key = ''
while (this._index < this._json.length && this._json[this._index] !== '"') {
key += this._json[this._index]
this._index++
}
this._index++
return (, key)
}
parseNumber() {
let key = ''
while (this._index < this._json.length && '0' <= this._json[this._index] && this._json[this._index] <= '9') {
key += this._json[this._index]
this._index++
}
return (, Number(key))
}
parseKeyword(key) {
let isKeyword = false
let keyword = ''
switch (key) {
case 't':
isKeyword = this._json.slice(this._index, this._index + 4) === 'true'
keyword = 'TRUE'
break
case 'f':
isKeyword = this._json.slice(this._index, this._index + 5) === 'false'
keyword = 'FALSE'
break
case 'n':
isKeyword = this._json.slice(this._index, this._index + 4) === 'null'
keyword = 'NULL'
break
}
this._index +=
return {
isKeyword,
keyword,
}
}
}
grammatical analysis
Syntax analysis is the process of traversing each Token, looking for syntax information, and constructing an object called an AST (Abstract Syntax Tree). Before formal syntax analysis, we create different classes for JSON syntax features to record the information of each node on the AST.
class NumericLiteral {
constructor(type, value) {
= type
= value
}
}
class StringLiteral {
constructor(type, value) {
= type
= value
}
}
class BooleanLiteral {
constructor(type, value) {
= type
= value
}
}
class NullLiteral {
constructor(type, value) {
= type
= value
}
}
class ArrayExpression {
constructor(type, elements) {
= type
= elements || []
}
}
class ObjectExpression {
constructor(type, properties) {
= type
= [] || properties
}
}
class ObjectProperty {
constructor(type, key, value) {
= type
= key
= value
}
}
Next, the syntax analysis is formalized by traversing the Token and checking its type, creating node information, and constructing an AST (Abstract Syntax Tree) object. The code is as follows:
// grammatical analysis
class Parser {
constructor(tokens) {
this._tokens = tokens
this._index = 0
= null
}
jump() {
this._index++
}
getValue() {
const value = this._tokens[this._index].value
this._index++
return value
}
parse() {
const type = this._tokens[this._index].type
const value = ()
switch (type) {
case TokenTypes.OPEN_ARRAY:
const array = ()
()
return array
case TokenTypes.OPEN_OBJECT:
const object = ()
()
return object
case :
return new StringLiteral('StringLiteral', value)
case :
return new NumericLiteral('NumericLiteral', Number(value))
case :
return new BooleanLiteral('BooleanLiteral', true)
case :
return new BooleanLiteral('BooleanLiteral', false)
case :
return new NullLiteral('NullLiteral', null)
}
}
parseArray() {
const _array = new ArrayExpression('ArrayExpression')
while(true) {
const value = ()
_array.(value)
if (this._tokens[this._index].type !== ) break
() // skip over ,
}
return _array
}
parseObject() {
const _object = new ObjectExpression('ObjectExpression')
_object.properties = []
while(true) {
const key = ()
() // skip over :
const value = ()
const property = new ObjectProperty('ObjectProperty', key, value)
_object.(property)
if (this._tokens[this._index].type !== ) break
() // skip over ,
}
return _object
}
}
conversions
After syntactic analysis the AST is obtained, and the conversion phase can perform operations such as adding, deleting, and changing tree nodes to convert to a new AST tree.
code generation
The Generate Code phase is to traverse the converted AST and convert it into the final code based on the syntax information of each node.
// code generation
class Generate {
constructor(tree) {
= tree
}
getResult() {
let result = ()
return result
}
getData(data) {
if ( === 'ArrayExpression') {
let result = []
(item => {
let element = (item)
(element)
})
return result
}
if ( === 'ObjectExpression') {
let result = {}
(item => {
let key = ()
let value = ()
result[key] = value
})
return result
}
if ( === 'ObjectProperty') {
return (data)
}
if ( === 'NumericLiteral') {
return
}
if ( === 'StringLiteral') {
return
}
if ( === 'BooleanLiteral') {
return
}
if ( === 'NullLiteral') {
return
}
}
}
utilization
function JsonParse(b) {
const lexer = new Lexer(b)
const tokens = () // gainToken
const parser = new Parser(tokens)
const tree = () // Generating a syntax tree
const generate = new Generate(tree)
const result = () // Generate Code
return result
}
summarize
So far we have achieved a simple JSON Parse parser, through the JSON Parse implementation of the investigation, we can summarize the implementation of such a parser steps, first of all, the syntax of the target value to understand the extraction of its features, and then through the lexical analysis, and the target features to get the token, and then the token for the syntactic analysis of the generation of the AST (abstract) Syntax tree), and then add, delete and modify the AST to generate a new AST, and finally traverse the AST to generate the target value we need.
consultation
- /
- /json-parser-with-javascript/
- /zh-CN/docs/Web/JavaScript/Reference/Global_Objects/JSON
ultimate
Welcome to [Kangaroo Cloud Digital Stack UED team]~
Kangaroo Cloud Digital Stack UED team continues to share the results of technology for the majority of developers, successively participated in the open source welcome star
- Big Data Distributed Task Scheduler - Taier
- Lightweight Web IDE UI framework - Molecule
- SQL Parser Project for Big Data - dt-sql-parser
- Kangaroo Cloud Digital Stack front-end team code review engineering practices document - code-review-practices
- A faster, more flexible and easier to use module packager - ko
- A component testing library for antd - ant-design-testing