Pgloader Minimalist Tutorial

synopsis
mounting
- Installation Overview
- CentOS Compilation and Installation
utilization
- command line (computing)
- Configuration file migration
- Possible errors encountered
- More configuration references
summarize
quote

synopsis

pgloader is a data synchronization tool used to migrate data from elsewhere to postgresql and supports migration from the following sources:

Files: CSV, Fixed Format, Postgres COPY text format, DBF, IXF
Database system: SQLite, MySql, MSSQLServer, PostgreSQL, Redshift

application scenario

When you need to import data into postgresql, such as data migration.

mounting

Installation Overview

There is a wide variety of mounting options, see/en/latest/ 。

Unfortunately, the compiled program for CentOS environment is not provided for download, so you need to compile and install it manually.

CentOS Compilation and Installation

Go to the official website to download the latest source code:

/dimitri/pgloader

Put the source code under /usr/bin, this article is an example:


[root@bogon pgloader-3.6.9]# pwd
/usr/local/pgloader-3.6.9

If you download the source tarball you need to unzip it using the following command:


tar -zxvf pgloader-3.6.

Grant script execution privileges:

cd /usr/local/pgloader-3.6.9


chmod -R 777 *

Execute the script to download the dependencies

Perform the compilation:


make  pgloader

If there are hints to (".1.1" ".1.0.0" ".3" "") not found or relevant information
You need to install openssl first


yum -y install openssl openssl-devel

Copy the compiled program to the system execution directory /usr/local/bin/.


cp  /usr/local/pgloader-3.6.9/build/bin/pgloader        /usr/local/bin/

Check to see if it's installed:


[root@bogon home]# pgloader --version
pgloader version "3.6.7~devel"
compiled with SBCL 2.2.5

utilization

There are two common ways to use pgloader:

via the command line
By migrating configuration files

command line (computing)

The following command line:

pgloader mysql://user:password@ip:3306/dbName postgresql://user:password@ip:5432/dbName

Migrate database structure and data named dbName from mysql to postgresql
pgloader is the above /usr/local/bin/pgloader executable.
This is followed by the connection information for mysql and postgresql, separated by spaces.
An account with write access is required, root is recommended

Configuration file migration

Another way is to write a migration configuration file and then use the pgloader command to execute the configuration file.

The following configuration file demonstrates synchronizing only the ramble_doc table in mysql's source_db repository to the target_db repository in postgresql, which will create a new table named ramble_doc in postgresql and add new data after execution.

LOAD DATABASE
     FROM     mysql://root:[email protected]:3306/source_db
     INTO     postgresql://postgres:[email protected]:5432/target_db

INCLUDING ONLY TABLE NAMES matching 'ramble_doc' ;

LOAD DATABASE: Indicates a migration from a database.
FROM : Source database connection information
INTO : Target database connection information
INCLUDING ONLY TABLE NAMES matching : Include only matching tables.
That semicolon at the end is indispensable.
Configuration files need to be written in a format such as indented

The following configuration file demonstrates synchronizing all the tables under mysql's source_db repository to postgresql's target_db repository, including table structure and data.

LOAD DATABASE
     FROM     mysql://root:[email protected]:3306/source_db
     INTO     postgresql://postgres:[email protected]:5432/target_db
     WITH batch rows = 10000 , batch size =200MB , prefetch rows  = 5000 , workers = 4 ,concurrency = 3
;

WITH: with can be followed by some additional parameters, each parameter is separated by a comma. Common parameters such as: whether you need to synchronize the data or only synchronize the structure, whether to delete the table before writing data, etc.
batch rows : The number of rows to be inserted into postgresql in batches when synchronizing data, default is 25k.
batch size: the maximum data size of each batch, set this parameter to avoid memory overflow.
prefetch rows: the number of rows to read from mysql in batches when synchronizing data, default is 1000.
workders: number of threads
concurrency: number of concurrent threads

Possible errors encountered

memory overflow

The error message is reported as:


Heap exhausted during garbage collection: 64 bytes available, 80 requested.


Heap exhausted during garbage collection: 64 bytes available, 80 requested.

The solution is to tune the number of batches and the number of concurrency, which needs to be tried continuously according to the data volume of the source database, and the hardware situation.

More configuration references

The official website gives an example:


LOAD DATABASE
     FROM      mysql://root@localhost/sakila
     INTO postgresql://localhost:54393/sakila

 WITH include drop, create tables, create indexes, reset sequences,
      workers = 8, concurrency = 1,
      multiple readers per thread, rows per range = 50000

  SET PostgreSQL PARAMETERS
      maintenance_work_mem to '128MB',
      work_mem to '12MB',
      search_path to 'sakila, public, "$user"'

  SET MySQL PARAMETERS
      net_read_timeout  = '120',
      net_write_timeout = '120'

 CAST type bigint when (= precision 20) to bigserial drop typemod,
      type date drop not null drop default using zero-dates-to-null,
      -- type tinyint to boolean using tinyint-to-boolean,
      type year to integer

 MATERIALIZE VIEWS film_list, staff_list

 -- INCLUDING ONLY TABLE NAMES MATCHING ~/film/, 'actor'
 -- EXCLUDING TABLE NAMES MATCHING ~<ory>
 -- DECODING TABLE NAMES MATCHING ~/messed/, ~/encoding/ AS utf8
 -- ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
 -- ALTER TABLE NAMES MATCHING ~/_list$/ SET SCHEMA 'mv'

 ALTER TABLE NAMES MATCHING ~/_list$/, 'sales_by_store', ~/sales_by/
  SET SCHEMA 'mv'

 ALTER TABLE NAMES MATCHING 'film' RENAME TO 'films'
 ALTER TABLE NAMES MATCHING ~/./ SET (fillfactor='40')

 ALTER SCHEMA 'sakila' RENAME TO 'pagila'

 BEFORE LOAD DO
   $$ create schema if not exists pagila; $$,
   $$ create schema if not exists mv;     $$,
   $$ alter database sakila set search_path to pagila, mv, public; $$;

/en/latest/ref/

summarize

pgloader is a database migration tool, spend a little time to study how to use, will play a doubly effective in the time of database migration, often more than their own migration script is more complete and reliable.

quote

Official website:/en/latest/
github：/dimitri/pgloader