go-etl tools
act asgo-etl
The author of the tool wants to give a shout out to this compact data synchronization tool, which performs extremely well in synchronizing millions of data, basically completing the data synchronization in a few minutes.
1. What does it do?
go-etl is a data synchronization tool set , currently supports MySQL, postgres, oracle, SQL SERVER, DB2 and other mainstream relational databases and csv, xlsx files between the data synchronization , synchronization of millions of levels of data is extremely good, basically in a few minutes to complete the data synchronization .
2. How to get it?
This can be done in theLatest ReleasesDownload the 64-bit version of the binary program for windows or linux operating systems
As shown in the figure the 1st one is the linux binary program without db2 function, the 2nd one is the linux version, the 3rd one is the windows version
3. How to use it?
go-etl datax
The binary program is a plug-and-play program, its only difficulty is to configure the import configuration file, configure its configuration file we need to understand its workings
3.1 Principles of operation
As shown abovego-etl datax
synchronizing data from the source data source to the destination data source.
-
Readers:
reader
is the data collection module, which is responsible for collecting data from the data source and sending the data to the framework. -
Writer:
writer
is the data writing module, which is responsible for continuously fetching data from the frame and writing it to the destination. -
Frames: Frames are used to connect
reader
cap (a poem)writer
It serves as the data transmission channel between the two and handles core technical issues such as buffering, flow control, concurrency, data conversion, and so on.
3.2 Configuring data synchronization files
By the way it works, it needs to be configuredreader
cap (a poem)writer
enablego-etl datax
Start the work accurately, this time withcsv
cap (a poem)mysql
As an example, here is the complete configuration file, but just focus on thecan immediately (do sth)
{
"core" : {
"container": {
"job":{
"id": 1,
"sleepInterval":100
}
}
},
"job":{
"content":[
{
"reader":{
"name": "csvreader",
"parameter": {
"path":[""],
"encoding":"utf-8"
}
},
"writer":{
"name": "mysqlwriter",
"parameter": {
"username": "root",
"password": "123456",
"writeMode": "insert",
"column": ["*"],
"connection": {
"url": "tcp(192.168.15.130:3306)/mysql",
"table": {
"db":"source",
"name":"split"
}
},
"batchTimeout": "1s",
"batchSize":1000
}
},
"transformer":[]
}
],
"setting":{
"speed":{
"byte":0,
"record":1024,
"channel":4
}
}
}
}
Here first.reader
The first thing I noticed was thatreader
The name of thecsvreader
, indicating the type of its source data source, and then e.g. e.g.mysql
The reader for themysqlreader
And then the next thing I noticed was thatreader
The parameters of thepath
represents the location where the csv file is stored.encoding
is the character set of the csv file.
Here again.writer
The first thing I noticed was thatreader
The name of themysqlreader
, indicating the type of its destination data source, and then for examplecsv
The reader for themysqlreader
And then the next thing I noticed was thatwriter
parameter that needs to be configured for theusername
,password
,connection
(used form a nominal expression)url
cap (a poem)table
The username and password are not necessary, but should be emphasized here.url
The basic configuration format: tcp(ip:port)/db, ip:port represents the IP address and port of the mysql database, and db represents the database to which you want to connect by default.table
is the table information to be written.
3.3 Run the program to import data
Name the above configuration, and to compare it with the
datax
and the data files to be importedPut it in the same directory, and execute the following commands from the command line in windows or from the terminal in linux
datax
3.4 Batch Write Data
3.4.1 Configuration wizard file for source and destination
The source destination configuration wizard file is a csv file and each line of configuration can be configured as follows.
path[table],path[table]
Each column can be a path or table name, note that all tables should be configured with a library name or schema name, which needs to be configured in the data source configuration file.
3.4.2 Batch Generation of Data Configuration Sets and Execution Scripts
Execute the following command from the command line in windows or from a terminal in linux
datax -c tools/testData/ -w tools/testData/
-c Specifies the data source configuration file -w Specifies the source destination configuration wizard file.
The result of the execution generates a configuration set of the number of lines of the configuration wizard file for the source destination in the data source configuration file directory file, respectively, in the form of a configuration set of the specified data source configuration file,specified data source configuration file,... ,Specify the configuration set of the data source configuration file [n].json.
In addition, an execution script is generated in the current directory or.
3.4.3 Running scripts
Just use the command line in windows
Just use a terminal in linux to execute it