- synopsis
-
mounting
- Installation Overview
- CentOS Compilation and Installation
-
utilization
- command line (computing)
- Configuration file migration
- Possible errors encountered
- More configuration references
- summarize
- quote
synopsis
pgloader is a data synchronization tool used to migrate data from elsewhere to postgresql and supports migration from the following sources:
- Files: CSV, Fixed Format, Postgres COPY text format, DBF, IXF
- Database system: SQLite, MySql, MSSQLServer, PostgreSQL, Redshift
application scenario
When you need to import data into postgresql, such as data migration.
mounting
Installation Overview
There is a wide variety of mounting options, see/en/latest/ 。
Unfortunately, the compiled program for CentOS environment is not provided for download, so you need to compile and install it manually.
CentOS Compilation and Installation
Go to the official website to download the latest source code:
/dimitri/pgloader
Put the source code under /usr/bin, this article is an example:
[root@bogon pgloader-3.6.9]# pwd
/usr/local/pgloader-3.6.9
If you download the source tarball you need to unzip it using the following command:
tar -zxvf pgloader-3.6.
Grant script execution privileges:
cd /usr/local/pgloader-3.6.9
chmod -R 777 *
Execute the script to download the dependencies
Perform the compilation:
make pgloader
If there are hints to (".1.1" ".1.0.0" ".3" "") not found or relevant information
You need to install openssl first
yum -y install openssl openssl-devel
Copy the compiled program to the system execution directory /usr/local/bin/.
cp /usr/local/pgloader-3.6.9/build/bin/pgloader /usr/local/bin/
Check to see if it's installed:
[root@bogon home]# pgloader --version
pgloader version "3.6.7~devel"
compiled with SBCL 2.2.5
utilization
There are two common ways to use pgloader:
- via the command line
- By migrating configuration files
command line (computing)
The following command line:
pgloader mysql://user:password@ip:3306/dbName postgresql://user:password@ip:5432/dbName
- Migrate database structure and data named dbName from mysql to postgresql
- pgloader is the above /usr/local/bin/pgloader executable.
- This is followed by the connection information for mysql and postgresql, separated by spaces.
- An account with write access is required, root is recommended
Configuration file migration
Another way is to write a migration configuration file and then use the pgloader command to execute the configuration file.
The following configuration file demonstrates synchronizing only the ramble_doc table in mysql's source_db repository to the target_db repository in postgresql, which will create a new table named ramble_doc in postgresql and add new data after execution.
LOAD DATABASE
FROM mysql://root:[email protected]:3306/source_db
INTO postgresql://postgres:[email protected]:5432/target_db
INCLUDING ONLY TABLE NAMES matching 'ramble_doc' ;
- LOAD DATABASE: Indicates a migration from a database.
- FROM : Source database connection information
- INTO : Target database connection information
- INCLUDING ONLY TABLE NAMES matching : Include only matching tables.
- That semicolon at the end is indispensable.
- Configuration files need to be written in a format such as indented
The following configuration file demonstrates synchronizing all the tables under mysql's source_db repository to postgresql's target_db repository, including table structure and data.
LOAD DATABASE
FROM mysql://root:[email protected]:3306/source_db
INTO postgresql://postgres:[email protected]:5432/target_db
WITH batch rows = 10000 , batch size =200MB , prefetch rows = 5000 , workers = 4 ,concurrency = 3
;
- WITH: with can be followed by some additional parameters, each parameter is separated by a comma. Common parameters such as: whether you need to synchronize the data or only synchronize the structure, whether to delete the table before writing data, etc.
- batch rows : The number of rows to be inserted into postgresql in batches when synchronizing data, default is 25k.
- batch size: the maximum data size of each batch, set this parameter to avoid memory overflow.
- prefetch rows: the number of rows to read from mysql in batches when synchronizing data, default is 1000.
- workders: number of threads
- concurrency: number of concurrent threads
Possible errors encountered
memory overflow
The error message is reported as:
Heap exhausted during garbage collection: 64 bytes available, 80 requested.
Heap exhausted during garbage collection: 64 bytes available, 80 requested.
The solution is to tune the number of batches and the number of concurrency, which needs to be tried continuously according to the data volume of the source database, and the hardware situation.
More configuration references
The official website gives an example:
LOAD DATABASE
FROM mysql://root@localhost/sakila
INTO postgresql://localhost:54393/sakila
WITH include drop, create tables, create indexes, reset sequences,
workers = 8, concurrency = 1,
multiple readers per thread, rows per range = 50000
SET PostgreSQL PARAMETERS
maintenance_work_mem to '128MB',
work_mem to '12MB',
search_path to 'sakila, public, "$user"'
SET MySQL PARAMETERS
net_read_timeout = '120',
net_write_timeout = '120'
CAST type bigint when (= precision 20) to bigserial drop typemod,
type date drop not null drop default using zero-dates-to-null,
-- type tinyint to boolean using tinyint-to-boolean,
type year to integer
MATERIALIZE VIEWS film_list, staff_list
-- INCLUDING ONLY TABLE NAMES MATCHING ~/film/, 'actor'
-- EXCLUDING TABLE NAMES MATCHING ~<ory>
-- DECODING TABLE NAMES MATCHING ~/messed/, ~/encoding/ AS utf8
-- ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
-- ALTER TABLE NAMES MATCHING ~/_list$/ SET SCHEMA 'mv'
ALTER TABLE NAMES MATCHING ~/_list$/, 'sales_by_store', ~/sales_by/
SET SCHEMA 'mv'
ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
ALTER TABLE NAMES MATCHING ~/./ SET (fillfactor='40')
ALTER SCHEMA 'sakila' RENAME TO 'pagila'
BEFORE LOAD DO
$$ create schema if not exists pagila; $$,
$$ create schema if not exists mv; $$,
$$ alter database sakila set search_path to pagila, mv, public; $$;
/en/latest/ref/
summarize
pgloader is a database migration tool, spend a little time to study how to use, will play a doubly effective in the time of database migration, often more than their own migration script is more complete and reliable.
quote
- Official website:/en/latest/
- github:/dimitri/pgloader