Extra 2: Brief data synchronization tool written in go language

go-etl tools

act asgo-etlThe author of the tool wants to give a shout out to this compact data synchronization tool, which performs extremely well in synchronizing millions of data, basically completing the data synchronization in a few minutes.

1. What does it do?

go-etl is a data synchronization tool set , currently supports MySQL, postgres, oracle, SQL SERVER, DB2 and other mainstream relational databases and csv, xlsx files between the data synchronization , synchronization of millions of levels of data is extremely good, basically in a few minutes to complete the data synchronization .

2. How to get it?

This can be done in theLatest ReleasesDownload the 64-bit version of the binary program for windows or linux operating systems

As shown in the figure the 1st one is the linux binary program without db2 function, the 2nd one is the linux version, the 3rd one is the windows version

3. How to use it?

go-etl dataxThe binary program is a plug-and-play program, its only difficulty is to configure the import configuration file, configure its configuration file we need to understand its workings

3.1 Principles of operation

As shown abovego-etl dataxsynchronizing data from the source data source to the destination data source.

Readers:readeris the data collection module, which is responsible for collecting data from the data source and sending the data to the framework.
Writer:writeris the data writing module, which is responsible for continuously fetching data from the frame and writing it to the destination.
Frames: Frames are used to connectreadercap (a poem)writerIt serves as the data transmission channel between the two and handles core technical issues such as buffering, flow control, concurrency, data conversion, and so on.

3.2 Configuring data synchronization files

By the way it works, it needs to be configuredreadercap (a poem)writerenablego-etl dataxStart the work accurately, this time withcsvcap (a poem)mysqlAs an example, here is the complete configuration file, but just focus on thecan immediately (do sth)

{
    "core" : {
        "container": {
            "job":{
                "id": 1,
                "sleepInterval":100
            }
        }
    },
    "job":{
        "content":[
            {
                "reader":{
                    "name": "csvreader",
                    "parameter": {
                        "path":[""],
                        "encoding":"utf-8"
                    }
                },
                "writer":{
                    "name": "mysqlwriter",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "writeMode": "insert",
                        "column": ["*"],
                        "connection":  {
                                "url": "tcp(192.168.15.130:3306)/mysql",
                                "table": {
                                    "db":"source",
                                    "name":"split"
                                }
                         },
                        "batchTimeout": "1s",
                        "batchSize":1000
                    }
                },
               "transformer":[]
            }
        ],
        "setting":{
            "speed":{
                "byte":0,
                "record":1024,
                "channel":4
            }
        }
    }
}

Here first.readerThe first thing I noticed was thatreaderThe name of thecsvreader, indicating the type of its source data source, and then e.g. e.g.mysqlThe reader for themysqlreaderAnd then the next thing I noticed was thatreaderThe parameters of thepathrepresents the location where the csv file is stored.encodingis the character set of the csv file.

Here again.writerThe first thing I noticed was thatreaderThe name of themysqlreader, indicating the type of its destination data source, and then for examplecsvThe reader for themysqlreaderAnd then the next thing I noticed was thatwriterparameter that needs to be configured for theusername,password,connection(used form a nominal expression)urlcap (a poem)tableThe username and password are not necessary, but should be emphasized here.urlThe basic configuration format: tcp(ip:port)/db, ip:port represents the IP address and port of the mysql database, and db represents the database to which you want to connect by default.tableis the table information to be written.

3.3 Run the program to import data

Name the above configuration, and to compare it with thedataxand the data files to be importedPut it in the same directory, and execute the following commands from the command line in windows or from the terminal in linux

datax

3.4 Batch Write Data

3.4.1 Configuration wizard file for source and destination

The source destination configuration wizard file is a csv file and each line of configuration can be configured as follows.

path[table],path[table]

Each column can be a path or table name, note that all tables should be configured with a library name or schema name, which needs to be configured in the data source configuration file.

3.4.2 Batch Generation of Data Configuration Sets and Execution Scripts

Execute the following command from the command line in windows or from a terminal in linux

datax -c tools/testData/ -w tools/testData/

-c Specifies the data source configuration file -w Specifies the source destination configuration wizard file.

The result of the execution generates a configuration set of the number of lines of the configuration wizard file for the source destination in the data source configuration file directory file, respectively, in the form of a configuration set of the specified data source configuration file,specified data source configuration file,... ,Specify the configuration set of the data source configuration file [n].json.

In addition, an execution script is generated in the current directory or.

3.4.3 Running scripts

Just use the command line in windows

Just use a terminal in linux to execute it