• Stars
    star
    14,952
  • Rank 1,854 (Top 0.04 %)
  • Language
    Java
  • License
    Other
  • Created over 6 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

DataX是阿里云DataWorks数据集成的开源版本。

Datax-logo

DataX

Leaderboard

DataX 是阿里云 DataWorks数据集成 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS, databend 等各种异构数据源之间高效的数据同步功能。

DataX 商业版本

阿里云DataWorks数据集成是DataX团队在阿里云上的商业化产品,致力于提供复杂网络环境下、丰富的异构数据源之间高速稳定的数据移动能力,以及繁杂业务背景下的数据同步解决方案。目前已经支持云上近3000家客户,单日同步数据超过3万亿条。DataWorks数据集成目前支持离线50+种数据源,可以进行整库迁移、批量上云、增量同步、分库分表等各类同步解决方案。2020年更新实时同步能力,支持10+种数据源的读写任意组合。提供MySQL,Oracle等多种数据源到阿里云MaxCompute,Hologres等大数据引擎的一键全增量同步解决方案。

商业版本参见: https://www.aliyun.com/product/bigdata/ide

Features

DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。

DataX详细介绍

请参考:DataX-Introduction

Quick Start

Download DataX下载地址
请点击:Quick Start

Support Data Channels

DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图,详情请点击:DataX数据源参考指南

类型 数据源 Reader(读) Writer(写) 文档
RDBMS 关系型数据库 MySQL
Oracle
OceanBase
SQLServer
PostgreSQL
DRDS
Kingbase
通用RDBMS(支持所有关系型数据库)
阿里云数仓数据存储 ODPS
ADB
ADS
OSS
OCS
Hologres
AnalyticDB For PostgreSQL
阿里云中间件 datahub 读 、写
SLS 读 、写
图数据库 阿里云 GDB
Neo4j
NoSQL数据存储 OTS
Hbase0.94
Hbase1.1
Phoenix4.x
Phoenix5.x
MongoDB
Cassandra
数仓数据存储 StarRocks 读 、
ApacheDoris
ClickHouse
Databend
Hive
kudu
selectdb
无结构化数据存储 TxtFile
FTP
HDFS
Elasticsearch
时间序列数据库 OpenTSDB
TSDB
TDengine

阿里云DataWorks数据集成

目前DataX的已有能力已经全部融和进阿里云的数据集成,并且比DataX更加高效、安全,同时数据集成具备DataX不具备的其它高级特性和功能。可以理解为数据集成是DataX的全面升级的商业化用版本,为企业可以提供稳定、可靠、安全的数据传输服务。与DataX相比,数据集成主要有以下几大突出特点:

支持实时同步:

离线同步数据源种类大幅度扩充:

我要开发新的插件

请点击:DataX插件开发宝典

重要版本更新说明

DataX 后续计划月度迭代更新,也欢迎感兴趣的同学提交 Pull requests,月度更新内容会介绍介绍如下。

项目成员

核心Contributions: 言柏 、枕水、秋奇、青砾、一斅、云时

感谢天烬、光戈、祁然、巴真、静行对DataX做出的贡献。

License

This software is free to use under the Apache License Apache license.

请及时提出issue给我们。请前往:DataxIssue

开源版DataX企业用户

Datax-logo

长期招聘 联系邮箱:[email protected]
【JAVA开发职位】
职位名称:JAVA资深开发工程师/专家/高级专家
工作年限 : 2年以上
学历要求 : 本科(如果能力靠谱,这些都不是条件)
期望层级 : P6/P7/P8

岗位描述:
    1. 负责阿里云大数据平台(数加)的开发设计。 
    2. 负责面向政企客户的大数据相关产品开发;
    3. 利用大规模机器学习算法挖掘数据之间的联系,探索数据挖掘技术在实际场景中的产品应用 ;
    4. 一站式大数据开发平台
    5. 大数据任务调度引擎
    6. 任务执行引擎
    7. 任务监控告警
    8. 海量异构数据同步

岗位要求:
    1. 拥有3年以上JAVA Web开发经验;
    2. 熟悉Java的基础技术体系。包括JVM、类装载、线程、并发、IO资源管理、网络;
    3. 熟练使用常用Java技术框架、对新技术框架有敏锐感知能力;深刻理解面向对象、设计原则、封装抽象;
    4. 熟悉HTML/HTML5和JavaScript;熟悉SQL语言;
    5. 执行力强,具有优秀的团队合作精神、敬业精神;
    6. 深刻理解设计模式及应用场景者加分;
    7. 具有较强的问题分析和处理能力、比较强的动手能力,对技术有强烈追求者优先考虑;
    8. 对高并发、高稳定可用性、高性能、大数据处理有过实际项目及产品经验者优先考虑;
    9. 有大数据产品、云产品、中间件技术解决方案者优先考虑。

用户咨询支持:

钉钉群目前暂时受到了一些管控策略影响,建议大家有问题优先在这里提交问题 Issue,DataX研发和社区会定期回答Issue中的问题,知识库丰富后也能帮助到后来的使用者。

More Repositories

1

arthas

Alibaba Java Diagnostic Tool Arthas/Alibaba Java诊断利器Arthas
Java
34,428
star
2

easyexcel

快速、简洁、解决大文件内存溢出的java处理Excel工具
Java
30,946
star
3

p3c

Alibaba Java Coding Guidelines pmd implements and IDE plugin
Kotlin
29,294
star
4

nacos

an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
Java
28,956
star
5

canal

阿里巴巴 MySQL binlog 增量订阅&消费组件
Java
27,786
star
6

druid

阿里云计算平台DataWorks(https://help.aliyun.com/document_detail/137663.html) 团队出品,为监控而生的数据库连接池
Java
27,644
star
7

spring-cloud-alibaba

Spring Cloud Alibaba provides a one-stop solution for application development for the distributed solutions of Alibaba middleware.
Java
27,254
star
8

fastjson

FASTJSON 2.0.x has been released, faster and more secure, recommend you upgrade.
Java
25,603
star
9

flutter-go

flutter 开发者帮助 APP,包含 flutter 常用 140+ 组件的demo 演示与中文文档
Dart
23,552
star
10

Sentinel

A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
Java
21,947
star
11

weex

A framework for building Mobile cross-platform UI
C++
18,204
star
12

ice

🚀 ice.js: The Progressive App Framework Based On React(基于 React 的渐进式应用框架)
TypeScript
17,771
star
13

ARouter

💪 A framework for assisting in the renovation of Android componentization (帮助 Android App 进行组件化改造的路由框架)
Java
14,228
star
14

lowcode-engine

An enterprise-class low-code technology stack with scale-out design / 一套面向扩展设计的企业级低代码技术体系
TypeScript
13,869
star
15

hooks

A high-quality & reliable React Hooks library.
TypeScript
13,377
star
16

tengine

A distribution of Nginx with some advanced features
C
12,583
star
17

vlayout

Project vlayout is a powerfull LayoutManager extension for RecyclerView, it provides a group of layouts for RecyclerView. Make it able to handle a complicate situation when grid, list and other layouts in the same recyclerview.
Java
10,804
star
18

formily

📱🚀 🧩 Cross Device & High Performance Normal Form/Dynamic(JSON Schema) Form/Form Builder -- Support React/React Native/Vue 2/Vue 3
TypeScript
10,716
star
19

COLA

🥤 COLA: Clean Object-oriented & Layered Architecture
Java
9,964
star
20

ali-dbhub

已迁移新仓库,此版本将不再维护
8,454
star
21

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
C++
8,307
star
22

atlas

A powerful Android Dynamic Component Framework.
Java
8,120
star
23

rax

🐰 Rax is a progressive framework for building universal application. https://rax.js.org
JavaScript
7,979
star
24

otter

阿里巴巴分布式数据库同步系统(解决中美异地机房)
Java
7,967
star
25

anyproxy

A fully configurable http/https proxy in NodeJS
JavaScript
7,726
star
26

fish-redux

An assembled flutter application framework.
Dart
7,341
star
27

AndFix

AndFix is a library that offer hot-fix for Android App.
C++
6,954
star
28

flutter_boost

FlutterBoost is a Flutter plugin which enables hybrid integration of Flutter for your existing native apps with minimum efforts
Dart
6,832
star
29

x-render

🚴‍♀️ 阿里 - 很易用的中后台「表单 / 表格 / 图表」解决方案
TypeScript
6,765
star
30

transmittable-thread-local

📌 TransmittableThreadLocal (TTL), the missing Java™ std lib(simple & 0-dependency) for framework/middleware, provide an enhanced InheritableThreadLocal that transmits values between threads even using thread pooling components.
Java
6,750
star
31

jvm-sandbox

Real - time non-invasive AOP framework container based on JVM
Java
6,601
star
32

BizCharts

Powerful data visualization library based on G2 and React.
TypeScript
6,066
star
33

freeline

A super fast build tool for Android, an alternative to Instant Run
Java
5,497
star
34

UltraViewPager

UltraViewPager is an extension for ViewPager to provide multiple features in a single ViewPager.
Java
5,004
star
35

jetcache

JetCache is a Java cache framework.
Java
4,774
star
36

AliSQL

AliSQL is a MySQL branch originated from Alibaba Group. Fetch document from Release Notes at bottom.
C++
4,689
star
37

AliOS-Things

面向IoT领域的、高可伸缩的物联网操作系统,可去官网了解更多信息https://www.aliyun.com/product/aliosthings
C
4,540
star
38

dexposed

dexposed enable 'god' mode for single android application.
Java
4,483
star
39

QLExpress

QLExpress is a powerful, lightweight, dynamic language for the Java platform aimed at improving developers’ productivity in different business scenes.
Java
4,361
star
40

BeeHive

🐝 BeeHive is a solution for iOS Application module programs, it absorbed the Spring Framework API service concept to avoid coupling between modules.
Objective-C
4,286
star
41

HandyJSON

A handy swift json-object serialization/deserialization library
Swift
4,185
star
42

x-deeplearning

An industrial deep learning framework for high-dimension sparse data
PureBasic
4,185
star
43

butterfly

🦋Butterfly,A JavaScript/React/Vue2 Diagramming library which concentrate on flow layout field. (基于JavaScript/React/Vue2的流程图组件)
JavaScript
4,168
star
44

Tangram-Android

Tangram is a modular UI solution for building native page dynamically including Tangram for Android, Tangram for iOS and even backend CMS. This project provides the sdk on Android.
Java
4,107
star
45

coobjc

coobjc provides coroutine support for Objective-C and Swift. We added await method、generator and actor model like C#、Javascript and Kotlin. For convenience, we added coroutine categories for some Foundation and UIKit API in cokit framework like NSFileManager, JSON, NSData, UIImage etc. We also add tuple support in coobjc.
Objective-C
4,014
star
46

jstorm

Enterprise Stream Process Engine
Java
3,917
star
47

dragonwell8

Alibaba Dragonwell8 JDK
Java
3,826
star
48

LuaViewSDK

A cross-platform framework to build native, dynamic and swift user interface - 强大轻巧灵活的客户端动态化解决方案
Objective-C
3,707
star
49

f2etest

F2etest是一个面向前端、测试、产品等岗位的多浏览器兼容性测试整体解决方案。
JavaScript
3,562
star
50

Alink

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Java
3,479
star
51

GGEditor

A visual graph editor based on G6 and React
TypeScript
3,405
star
52

fastjson2

🚄 FASTJSON2 is a Java JSON library with excellent performance.
Java
3,353
star
53

cobar

a proxy for sharding databases and tables
Java
3,207
star
54

macaca

Automation solution for multi-platform. 多端自动化解决方案
3,159
star
55

designable

🧩 Make everything designable 🧩
TypeScript
3,120
star
56

GraphScope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
C++
3,103
star
57

lightproxy

💎 Cross platform Web debugging proxy
TypeScript
3,063
star
58

pont

🌉数据服务层解决方案
TypeScript
3,016
star
59

euler

A distributed graph deep learning framework.
C++
2,849
star
60

beidou

🌌 Isomorphic framework for server-rendered React apps
JavaScript
2,736
star
61

sentinel-golang

Sentinel Go enables reliability and resiliency for Go microservices
Go
2,684
star
62

pipcook

Machine learning platform for Web developers
TypeScript
2,497
star
63

kiwi

🐤 Kiwi-国际化翻译全流程解决方案
TypeScript
2,489
star
64

yugong

阿里巴巴去Oracle数据迁移同步工具(全量+增量,目标支持MySQL/DRDS)
Java
2,480
star
65

tsar

Taobao System Activity Reporter
C
2,446
star
66

jvm-sandbox-repeater

A Java server-side recording and playback solution based on JVM-Sandbox
Java
2,395
star
67

ChatUI

The UI design language and React library for Conversational UI
TypeScript
2,383
star
68

TProfiler

TProfiler是一个可以在生产环境长期使用的性能分析工具
Java
2,377
star
69

tidevice

tidevice can be used to communicate with iPhone device
Python
2,310
star
70

higress

Cloud Native API Gateway | 云原生API网关
Go
2,257
star
71

tair

A distributed key-value storage system developed by Alibaba Group
C++
2,128
star
72

dubbo-spring-boot-starter

Dubbo Spring Boot Starter
Java
2,099
star
73

RedisShake

redis-shake is a tool for synchronizing data between two redis databases. Redis-shake 是一个用于在两个 redis之 间同步数据的工具,满足用户非常灵活的同步、迁移需求。
Go
2,077
star
74

uirecorder

UI Recorder is a multi-platform UI test recorder.
JavaScript
2,052
star
75

LVS

A distribution of Linux Virtual Server with some advanced features. It introduces a new packet forwarding method - FULLNAT other than NAT/Tunneling/DirectRouting, and defense mechanism against synflooding attack - SYNPROXY.
C
1,947
star
76

EasyNLP

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
Python
1,946
star
77

AliceMind

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
Python
1,910
star
78

alpha

Alpha是一个基于PERT图构建的Android异步启动框架,它简单,高效,功能完善。 在应用启动的时候,我们通常会有很多工作需要做,为了提高启动速度,我们会尽可能让这些工作并发进行。但这些工作之间可能存在前后依赖的关系,所以我们又需要想办法保证他们执行顺序的正确性。Alpha就是为此而设计的,使用者只需定义好自己的task,并描述它依赖的task,将它添加到Project中。框架会自动并发有序地执行这些task,并将执行的结果抛出来。
HTML
1,873
star
79

GCanvas

A lightweight cross-platform graphics rendering engine. (超轻量的跨平台图形引擎) https://alibaba.github.io/GCanvas
C
1,857
star
80

Tangram-iOS

Tangram is a modular UI solution for building native page dynamically, including Tangram for Android, Tangram for iOS and even backend CMS. This project provides the sdk on iOS platform.
Objective-C
1,857
star
81

testable-mock

换种思路写Mock,让单元测试更简单
Java
1,800
star
82

LazyScrollView

An iOS ScrollView to resolve the problem of reusability in views.
Objective-C
1,775
star
83

compileflow

🎨 core business process engine of Alibaba Halo platform, best process engine for trade scenes. | 一个高性能流程编排引擎
Java
1,705
star
84

SREWorks

Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
Java
1,696
star
85

EasyCV

An all-in-one toolkit for computer vision
Python
1,677
star
86

MongoShake

MongoShake is a universal data replication platform based on MongoDB's oplog. Redundant replication and active-active replication are two most important functions. 基于mongodb oplog的集群复制工具,可以满足迁移和同步的需求,进一步实现灾备和多活功能。
Go
1,648
star
87

xquic

XQUIC Library released by Alibaba is a cross-platform implementation of QUIC and HTTP/3 protocol.
C
1,604
star
88

mdrill

for千亿数据即席分析
Java
1,538
star
89

lowcode-demo

An enterprise-class low-code technology stack with scale-out design / 一套面向扩展设计的企业级低代码技术体系
TypeScript
1,536
star
90

ilogtail

Fast and Lightweight Observability Data Collector
C++
1,529
star
91

EasyRec

A framework for large scale recommendation algorithms.
Python
1,488
star
92

clusterdata

cluster data collected from production clusters in Alibaba for cluster management research
Jupyter Notebook
1,477
star
93

havenask

C++
1,463
star
94

async_simple

Simple, light-weight and easy-to-use asynchronous components
C++
1,455
star
95

Virtualview-Android

A light way to build UI in custom XML.
Java
1,454
star
96

kt-connect

A toolkit for Integrating with your kubernetes dev environment more efficiently
Go
1,453
star
97

tb_tddl

1,410
star
98

react-intl-universal

Internationalize React apps. Not only for Component but also for Vanilla JS.
JavaScript
1,316
star
99

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Python
1,292
star
100

graph-learn

An Industrial Graph Neural Network Framework
C++
1,252
star