• Stars
    star
    507
  • Rank 86,778 (Top 2 %)
  • Language
    Perl
  • License
    GNU General Publi...
  • Created about 11 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Migration tool to convert a Microsoft SQL Server Database into a PostgreSQL database, as automatically as possible

sqlserver2pgsql

This is a migration tool to convert a Microsoft SQL Server Database into a PostgreSQL database, as automatically as possible.

It is written in Perl and has received a fair amount of testing.

It does three things:

  • convert a SQL Server schema to a PostgreSQL schema
  • produce a Pentaho Data Integrator (Kettle) job to migrate all the data from SQL Server to PostgreSQL (optional)
  • produce an incremental version of this job to migrate what has changed in the database from the previous run. This is created when the migration job is also created.

Please drop me a word (on github) if you use this tool, feedback is great. I also like pull requests :)

Notes, warnings:

This tool will never be completely finished. For now, it works with all the SQL Server databases I and anyone asking for help in an issue had to migrate. If it doesn't work with yours, feel free to modify this, send me patches, or the SQL dump from your SQL Server Database, with the problem you are facing. I'll try to improve the code, but I need this SQL dump. Create an issue in github !

It won't migrate PL procedures, the languages are too different.

I usually only test this script under Linux. It works on Windows, as I had to do it once with Windows, and on any Unix system.

You'll need to install a few things to make it work. See INSTALL.md

Install

See https://github.com/dalibo/sqlserver2pgsql/blob/master/INSTALL.md

Usage

Ok, I have installed Kettle and Java, I have sqlserver2pgsql.pl, what do I do now ?

You'll need several things:

  • The connection parameters to the SQL Server database: IP address, port, username, password, database name, instance name if not default
  • Access to an empty PostgreSQL database (where you want your migrated data)
  • A text file containing a SQL dump of the SQL Server database

To get this SQL Dump, follow this procedure, in SQL Server's management interface:

  • Under SQL Server Management Studio, Right click on the database you want to export
  • Select Tasks/Generate Scripts
  • Click "Next" on the welcome screen (if it hasn't already been desactivated)
  • Select your database
  • In the list of things you can export, just change "Script Indexes" from False to True, then "Next"
  • Select Tables then "Next"
  • Select the tables you want to export (or select all), then "Next"
  • Script to file, choose a filename, then "Next"
  • Select unicode encoding (who knows…, maybe someone has put accents in objects names, or in comments)
  • Finish

You'll get a file containing a SQL script. Get it on the server you'll want to run sqlserver2pgsql.pl from.

If you just want to convert this schema, run:

./sqlserver2pgsql.pl -f sqlserver_sql_dump   \
                     -b output_before_script \
                     -a output_after_script  \
                     -u output_unsure_script

The sqlserver2pgsql Perl script processes your SQL raw dump "sqlserver_sql_dump" and produces these three scripts:

  • output_before_script: contains what is needed to import data (types, tables and columns)

  • output_after_script: contains the rest (indexes, constraints)

  • output_unsure_script: contains objects where we attempt to migrate, but cannot guarantee, such as views

-conf uses a conf file. All options below can also be set there. Command line options will overwrite conf options. There is an example of such a conf file (example_conf_file)

You can also use the -i, -num and/or -nr options:

-i : Generate an "ignore case" schema, using citext, to emulate MSSQL's case insensitive collation. It will create citext fields, with check constraints. This type is slower on string comparison operations.

-nr : Don't convert the dbo schema to public. By default, this conversion is done, as it converts MSSQL's default schema (dbo) to PostgreSQL's default schema (public)

-relabel_schemas is a list of schemas to remap. The syntax is : source1=>dest1;source2=>dest2. Don't forget to quote this option or the shell might alter it there is a default dbo=>public remapping, that can be cancelled with -nr. Use double quotes instead of simple quotes on Windows.

-num : Converts numeric (xxx,0) to the appropriate smallint, integer or bigint. It won't keep the constraint on the size of the scale of the numeric. smallint, integer and bigint types are much faster than numeric, ano often used only as surrogate keys, so the scale is often not important.

-keep_identifier_case: don't convert the dump to all lower case. This is not recommended, as you'll have to put every identifier (column, table…) in double quotes…

-camel_to_snake: convert the object name (table, column, index...) from CamelCase to snake_case. Only do this if you are willing to change all your queries (or you use an ORM for instance).

-col_map_file: specifies an output text file containing SQL-Server and PostgreSQL schemas, tables and columns names (1 line per column).

-col_map_file_header: add a header line to the col_map_file (no header by default).

-col_map_file_delimiter: specify a field delimiter for the col_map_file (TAB by default).

-validate_constraints=yes/after/no: for foreign keys, if yes: foreign keys are created as valid in the after script (default) if no: they are created as not valid (enforced only for new rows) if after: they are created as not valid, but the statements to validate them are put in the unsure file

If you want to also import data:

./sqlserver2pgsql.pl -b before.sql -a after.sql -u unsure.sql -k kettledir \ 
    -sd source -sh 192.168.0.2 -sp 1433 -su dalibo -sw mysqlpass \
    -pd dest -ph localhost -pp 5432 -pu dalibo -pw mypgpass -f sql_server_schema.sql

-k is the directory where you want to store the kettle xml files (there will be one for each table to copy, plus the one for the job)

You'll also need to specify the connection parameters. They will be stored inside the kettle files (in cleartext, so don't make this directory public): -sd : sql server database -sh : sql server host -si : sql server host instance -sp : sql server port (usually 1433) -su : sql server username -sw : sql server password -pd : postgresql database -ph : postgresql host -pp : postgresql port -pu : postgresql username -pw : postgresql password -pforce_ssl : force a SSL connection to your PostgreSQL database. ssl=on should be set on the PostgreSQL server -f : the SQL Server structure dump file -ignore_errors : ignore insert errors (not advised, you'll need to examine kettle's logs, and it will be slower)

-pi : The parallelism used in kettle jobs to read from SQL Server (1 by default, the jdbc driver frequently errors out when larger) -po : The parallelism used in kettle jobs to write to PostgresSQL: there will be this amount of sessions used to insert into PostgreSQL. Default to 8 -sort_size=100000: sort size to use for incremental jobs. Default is 10000, to try to be on the safe side (see below).

We don't sort in databases for two reasons: the sort order (collation for strings for example) can be different between SQL Server and PostgreSQL, and we don't want to stress the servers more than needed anyway. But sorting a lot of data in Java can generate a Java Out of Heap Memory error.

If you get Out of Memory errors, raise the Java Heap memory (in the kitchen script) as much as you can. If you still have the problem, reduce this sort size. You can also try reducing parallelism, having one or two sorts instead of 8 will of course consume less memory.

The last problem is that if the sort_size is small, kettle is going to generate a very large amount of temporary files, and then read them back sorted. So you may hit the "too many open files" limit of your system (default 1024 on linux for instance). So you'll have to do some tuning here:

  • First, use as much Java memory as you can: set the JAVAXMEM environment variable to 4096 (megabytes) or more if you can afford it. The more the better.
  • If you still get Out Of Memory errors, put a smaller sort size, until you can do the sorts (decrease it tenfold each time for example). You'll obviously lose some performance
  • If then you get the too many open files error, raise the maximum number of open files. In most Linux distributions, this is editing /etc/security/limits.conf and putting
@userName soft nofile 65535
@userName hard nofile 65535

(replace userName with your user name). Log in again, and verify with ulimit -n that you are now allowed to open 65535 files. You may also have to raise the maximum number of open files on the system: echo the new value to /proc/sys/fs/file-max.

You'll need a lot of temporary space on disk to do these sorts...

You can also edit only the offending transformation with Spoon (Kettle's GUI), so that only this one is slowed down.

When Kettle crashed on one of these problems, the temporary files aren't removed. They are usually in /tmp (or in your temp directory in Windows), and start with out_. Don't forget to remove them.

-use_pk_if_possible=0/1/public.table1,myschema.table2: enable the generation of jobs doing sorts in the databases (order by in the select part of Kettle's table inputs).

1 will ask to try for all tables, or you can give a list of tables (if for example, you cannot make these tables work with a reasonable sort size). Anyway, sqlserver2pgsql will only accept to do sorts in the database if the primary key can be guaranteed to be sorted the same way in PostgreSQL and SQL Server. That means that it only accepts if the key is made only of numeric and date/timestamp types. If not, the standard, kettle-sorting incremental job will be generated.

Now you've generated everything. Let's do the import:

  # Run the before script (creates the tables)
  psql -U mypguser mypgdatabase -f name_of_before_script
  # Run the kettle job:
  cd my_kettle_installation_directory
  ./kitchen.sh -file=full_path_to_kettle_job_dir/migration.kjb -level=detailed
  # Run the after script (creates the indexes, constraints...)
  psql -U mypguser mypgdatabase -f name_of_after_script

If you want to dig deeper into the kettle job, you can use kettle_report.pl to display the individual table's transfer performance (you'll need to redirect kitchen's output to a file). Then, if needed, you'll be able to modify the Kettle job to optimize it, using Spoon, Kettle's GUI

You can also give a try to the incremental job:

./kitchen.sh -file=full_path_to_kettle_job_dir/incremental.kjb -level=detailed

This one is highly experimental. I need your feedback ! :). You should only run an incremental job on an already loaded database.

It may fail for a variety of reasons, mainly out of memory errors. If you have other unique constraints beyond the primary key, the series of queries generated by sqlserver2pgsql may generate conflicting updates. So test it several times before the migration day, if you really want to try this method. The "normal" method is safer, but of course, you'll start from scratch, and have those long indexes builds at the end.

By the way, to be able to insert data into all tables, it deactivates triggers at the beginning and activates them back at the end of the job. So, if the job fails, those triggers won't be reactivated.

You can also use a configuration file if you like:

./sqlserver2pgsql.pl -conf example_conf_file -f mydatabase_dump.sql

There is an example configuration file provided. You can also mix the configuration file with command line options. Command line options have the priority over values set in the configuration file.

FAQ

See https://github.com/dalibo/sqlserver2pgsql/blob/master/FAQ.md

Licence

GPL v3 : http://www.gnu.org/licenses/gpl.html

More Repositories

1

pev2

Postgres Explain Visualizer 2
TypeScript
2,492
star
2

pg_activity

pg_activity is a top like application for PostgreSQL server activity monitoring.
Python
2,446
star
3

temboard

PostgreSQL Remote Control
Python
449
star
4

ldap2pg

🐘 👥 Manage PostgreSQL roles and privileges from YAML or LDAP
Go
200
star
5

pandocker

🐳 A simple docker image for pandoc with filters, templates, fonts, and the latex bazaar
Shell
141
star
6

pitrery

DEPRECATED. PostgreSQL Point In Time Recovery made easy
Shell
108
star
7

pgshark

Messing with PostgreSQL network traffic to make some usefull things
Perl
85
star
8

emaj

E-Maj is a PostgreSQL extension that tracks updates on tables sets with rollback capabilities
PLpgSQL
73
star
9

explain.dalibo.com

PEV2 Flask service. Visualizing and understanding PostgreSQL EXPLAIN plans made easy.
HTML
37
star
10

workshops

PostgreSQL Workshops
HTML
34
star
11

db2topg

Automated tool for DB2 migration to PostgreSQL, ala ora2pg
Perl
31
star
12

pgbadger

pgbadger.github.io
HTML
23
star
13

pgtoolkit

Postgres Support from Python
Python
21
star
14

temboard-agent

PostgreSQL Remote Control agent [MERGED IN dalibo/temboard]
Python
20
star
15

pgsnap

pgsnap is a PostgreSQL tool that mimics orasnap performance report tool for Oracle.
PHP
20
star
16

pg_log_authfail

C
16
star
17

pgsql-resource-agent

Simple Pacemaker OCF Agent for two PostgreSQL servers in streaming replication
Shell
11
star
18

docker

Docker images of Dalibo's projects
10
star
19

data2pg

PLpgSQL
10
star
20

emaj_web

A web client for the E-Maj PostgreSQL extension
PHP
9
star
21

check_patroni

A nagios plugin for patroni.
Python
7
star
22

pg_dumpacl

Tool for dumping database creation options
C
7
star
23

pgsql_logwatch

Monitor PostgreSQL logfile using Logwatch
ApacheConf
6
star
24

selinux-pgsql-pgdg

SELinux policy module for PGDG rpms
Shell
4
star
25

pg_query_settings

Module that dynamically set queries parameters based on their queryid
C
4
star
26

cornac

Migrated to GitLab
Python
4
star
27

docker-python26

Docker image with EOL Python 2.6
HTML
3
star
28

from-oracle-to-postgresql

Yet another transition guide for developers
HTML
3
star
29

patroni-rpm

Patroni RPM packaging (centos/rhel)
Shell
3
star
30

blog

🐘 blog.dalibo.com
HTML
3
star
31

hackingpg

Dépôt relatif aux journées de hacking sur PostgreSQL
C
2
star
32

selinux-powa

SELinux policy for PoWA
Makefile
2
star
33

pgpool_adm

A PostgreSQL extension to control pgpool using its pcp protocol
C
2
star
34

pg_french_datatypes

A simple set of handy data types for french speaking PostgreSQL developpers
2
star
35

dalibo.github.io

Some PostgreSQL tools developped by DALIBO, the french PostgreSQL company
CSS
1
star
36

keyex

SSH Key Exchanger
Python
1
star
37

yang

Yet Another Nagios Grapher
JavaScript
1
star
38

docker-pgtap

1
star
39

docker-labs-sdk

Mutualised docker image for PostgreSQL tooling development
Dockerfile
1
star
40

docker-cornac-sdk

Docker image for CI and development of cornac
Makefile
1
star
41

vault-rpm

HashiCorp Vault RPM packaging for CentOS / RHEL 7
Shell
1
star