How to implement JSON Parser by hand

is often used in front-end development API, if we want to realize one of their own, we should how to achieve it? Today we will try to start writing a JSON Parser, to understand the principle of its internal implementation.

JSON Syntax

JSON is a syntax for serializing objects, arrays, values, strings, booleans, and nulls. The syntax rules are as follows:

Data is represented using name/value pairs.
Objects are saved using curly braces ({}), with each name followed by a ':' (colon), and name/value pairs are separated using , (comma).

file

Use square brackets ([]) to hold arrays, with array values separated by ,(comma).

file

JSON values can be: numbers (integer or float) / strings (in double quotes) / logical values (true or false) / arrays (in square brackets) / objects (in curly braces) / null

file

Implementing Parser

Parser generally goes through the following process, which is categorized intoLexical analysis, syntax analysis, conversion, code generationProcess.

lexical analysis

file

By understanding the JSON syntax, we can see that JSON has the following types and their characteristics as listed below:

typology	fundamental characteristic
Object	"{" ":" "," "}"
Array	"[" "," "]"
String	'"'
Number	"0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
Boolean	"true" "false"
Null	"null"

So according to these features, traversing the JSON string and comparing it with the above features can get the corresponding token. the lexical analysis implementation code is as follows:

// lexical analysis
const TokenTypes = {
  OPEN_OBJECT: '{',
  CLOSE_OBJECT: '}',
  OPEN_ARRAY: '[',
  CLOSE_ARRAY: ']',
  STRING: 'string',
  NUMBER: 'number',
  TRUE: 'true',
  FALSE: 'false',
  NULL: 'null',
  COLON: ':',
  COMMA: ',',
}

class Lexer {
  constructor(json) {
    this._json = json
    this._index = 0
    this._tokenList = []
  }

  createToken(type, value) {
    return { type, value: value || type }
  }

  getToken() {
    while (this._index < this._json.length) {
      const token = ()
      this._tokenList.push(token)
    }
    return this._tokenList
  }

  bigbang() {
    const key = this._json[this._index]
    switch (key) {
      case ' ':
        this._index++
        return ()
      case '{':
        this._index++
        return (TokenTypes.OPEN_OBJECT)
      case '}':
        this._index++
        return (TokenTypes.CLOSE_OBJECT)
      case '[':
        this._index++
        return (TokenTypes.OPEN_ARRAY)
      case ']':
        this._index++
        return (TokenTypes.CLOSE_ARRAY)
      case ':':
        this._index++
        return ()
      case ',':
        this._index++
        return ()
      case '"':
        return ()
    }
    // number
    if ((key)) {
      return ()
    }
    // true false null
    const result = (key)
    if () {
      return (TokenTypes[])
    }
  }

  isNumber(key) {
    return key >= '0' && key <= '9'
  }

  parseString() {
    this._index++
    let key = ''
    while (this._index < this._json.length && this._json[this._index] !== '"') {
      key += this._json[this._index]
      this._index++
    }
    this._index++
    return (, key)
  }

  parseNumber() {
    let key = ''
    while (this._index < this._json.length && '0' <= this._json[this._index] && this._json[this._index] <= '9') {
      key += this._json[this._index]
      this._index++
    }
    return (, Number(key))
  }

  parseKeyword(key) {
    let isKeyword = false
    let keyword = ''
    switch (key) {
      case 't':
        isKeyword = this._json.slice(this._index, this._index + 4) === 'true'
        keyword = 'TRUE'
        break
      case 'f':
        isKeyword = this._json.slice(this._index, this._index + 5) === 'false'
        keyword = 'FALSE'
        break
      case 'n':
        isKeyword = this._json.slice(this._index, this._index + 4) === 'null'
        keyword = 'NULL'
        break
    }
    this._index +=
    return {
      isKeyword,
      keyword,
    }
  }
}

grammatical analysis

file

Syntax analysis is the process of traversing each Token, looking for syntax information, and constructing an object called an AST (Abstract Syntax Tree). Before formal syntax analysis, we create different classes for JSON syntax features to record the information of each node on the AST.

class NumericLiteral {
  constructor(type, value) {
     = type
     = value
  }
}

class StringLiteral {
  constructor(type, value) {
     = type
     = value
  }
}

class BooleanLiteral {
  constructor(type, value) {
     = type
     = value
  }
}

class NullLiteral {
  constructor(type, value) {
     = type
     = value
  }
}

class ArrayExpression {
  constructor(type, elements) {
     = type
     = elements || []
  }
}

class ObjectExpression {
  constructor(type, properties) {
     = type
     = [] || properties
  }
}

class ObjectProperty {
  constructor(type, key, value) {
     = type
     = key
     = value
  }
}

Next, the syntax analysis is formalized by traversing the Token and checking its type, creating node information, and constructing an AST (Abstract Syntax Tree) object. The code is as follows:

// grammatical analysis
class Parser {
  constructor(tokens) {
    this._tokens = tokens
    this._index = 0
     = null
  }

  jump() {
    this._index++
  }

  getValue() {
    const value = this._tokens[this._index].value
    this._index++
    return value
  }

  parse() {
    const type = this._tokens[this._index].type
    const value = ()
    switch (type) {
      case TokenTypes.OPEN_ARRAY:
        const array = ()
        ()
        return array
      case TokenTypes.OPEN_OBJECT:
        const object = ()
        ()
        return object
      case :
        return new StringLiteral('StringLiteral', value)
      case :
        return new NumericLiteral('NumericLiteral', Number(value))
      case :
        return new BooleanLiteral('BooleanLiteral', true)
      case :
        return new BooleanLiteral('BooleanLiteral', false)
      case :
        return new NullLiteral('NullLiteral', null)
    }
  }

  parseArray() {
    const _array = new ArrayExpression('ArrayExpression')
    while(true) {
      const value = ()
      _array.(value)
      if (this._tokens[this._index].type !== ) break
      () // skip over ,
    }
    return _array
  }

  parseObject() {
    const _object = new ObjectExpression('ObjectExpression')
    _object.properties = []
    while(true) {
      const key = ()
      () // skip over :
      const value = ()
      const property = new ObjectProperty('ObjectProperty', key, value)
      _object.(property)
      if (this._tokens[this._index].type !== ) break
      () // skip over ,
    }
    return _object
  }
}

conversions

After syntactic analysis the AST is obtained, and the conversion phase can perform operations such as adding, deleting, and changing tree nodes to convert to a new AST tree.

code generation

The Generate Code phase is to traverse the converted AST and convert it into the final code based on the syntax information of each node.

// code generation
class Generate {
  constructor(tree) {
     = tree
  }

  getResult() {
    let result = ()
    return result
  }

  getData(data) {
    if ( === 'ArrayExpression') {
      let result = []
      (item => {
        let element = (item)
        (element)
      })
      return result
    }
    if ( === 'ObjectExpression') {
      let result = {}
      (item => {
        let key = ()
        let value = ()
        result[key] = value
      })
      return result
    }
    if ( === 'ObjectProperty') {
      return (data)
    }
    if ( === 'NumericLiteral') {
      return
    }
    if ( === 'StringLiteral') {
      return
    }
    if ( === 'BooleanLiteral') {
      return
    }
    if ( === 'NullLiteral') {
      return
    }
  }
}

utilization

function JsonParse(b) {
  const lexer = new Lexer(b)
  const tokens = () // gainToken
  const parser = new Parser(tokens)
  const tree = () // Generating a syntax tree
  const generate = new Generate(tree)
  const result = () // Generate Code
  return result
}

summarize

So far we have achieved a simple JSON Parse parser, through the JSON Parse implementation of the investigation, we can summarize the implementation of such a parser steps, first of all, the syntax of the target value to understand the extraction of its features, and then through the lexical analysis, and the target features to get the token, and then the token for the syntactic analysis of the generation of the AST (abstract) Syntax tree), and then add, delete and modify the AST to generate a new AST, and finally traverse the AST to generate the target value we need.

consultation

/
/json-parser-with-javascript/
/zh-CN/docs/Web/JavaScript/Reference/Global_Objects/JSON

ultimate

Welcome to [Kangaroo Cloud Digital Stack UED team]~
Kangaroo Cloud Digital Stack UED team continues to share the results of technology for the majority of developers, successively participated in the open source welcome star

Big Data Distributed Task Scheduler - Taier
Lightweight Web IDE UI framework - Molecule
SQL Parser Project for Big Data - dt-sql-parser
Kangaroo Cloud Digital Stack front-end team code review engineering practices document - code-review-practices
A faster, more flexible and easier to use module packager - ko
A component testing library for antd - ant-design-testing