• Stars
    star
    210
  • Rank 187,585 (Top 4 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Unified SQL Analytics Engine Based on SparkSQL

XSQL-logo

English | 中文

XSQL is a multi-datasource query engine which designed easy to use, stable to run.1)First of all, XSQL provides a solution to read data from NoSQL database with standard SQL,so that big data engineer can concentrate on data but API with special data source . 2)XSQL takes some efforts of optimizing the execute plan of SQL execution as well as monitoring the running status of every SQL, which make user's job running healthier.

https://qihoo360.github.io/XSQL/

Features

  • XSQL supports eight built-in data source for now (e.g. Hive, Mysql, EleasticSearch, Mongo, Kafka, Hbase, Redis, Druid).
  • XSQL designs a 3-layer metadata architecture to organize data, which is datasource-database-table. So , we can provide a unified view of many data sources and no longer difficult to make a business analytical between off-line data and on-line data .
  • The main idea of XSQL are SQL Everything , SQL let program decoupling with concrete data source API , therefore DBA can upgrade data but need to taking into consideration how to migrating old tasks . More importantly, data analysts prefer SQL rather than special APIs.
  • XSQL only takes use of YARN cluster resources when necessary, this feature is useful for some usage scenario such as user treated spark-xsql as substitution of RDMS Client. We call this Pushdown Query, it makes XSQL get the ability to response DDL and Simple Query in ms delay level, as well as saving cluster resources as much as possible.
  • XSQL uses a different solution than routing , So it only parses SQL once.
  • XSQL caches metadata in runtime but don't manage metadata itself,in consideration of metadata synchronize may cause unnecessary trouble. This feature makes XSQL easy to deploy and ops.
  • XSQL provides a white-blacklist properties file to cover special usage scenario metadata should be carefully authorized.
  • XSQL can run on spark2.3 and spark2.4 for now. Jars of XSQL placed in isolated directory, which means XSQL won't take effect on your existed spark program unless you use our tool bin/spark-xsql. So, just try XSQL on your existed spark distribution, All things will work fine as normal.

Quick Start

Environment Requirements of Build

  • jdk 1.8+

Build XSQL:

  1. To get started with XSQL, you can build it by yourself. For example,

    git clone https://github.com/Qihoo360/XSQL
    

    You can also get pre-built XSQL from Release Pages .

  2. When you want to create a XSQL distribution of source code, which is similar to the release package in the Release Pages , use build-plugin.sh in the root directory of project. For example:

    XSQL/build-plugin.sh
    

    This will produce a .tgz file named like xsql-[project.version]-plugin-spark-[spark.version].tgz in the root directory of project.

    To create a XSQL distribution like natural Spark distribution, use build.sh in the root directory of project. For example:

    XSQL/build.sh
    

    This will produce a .tgz file named like xsql-[project.version]-bin-spark-[spark.version].tgz in the root directory of project.

Environment Requirements of Running

  • jdk 1.8+

  • hadoop 2.7.2+

  • spark 2.4.x

Installing XSQL:

  1. Build the XSQL tar xsql-[project.version]-[plugin|bin]-spark-[spark.version].tgz following the above steps or Download from Release Pages.

  2. If you have installed spark in your machine, please use the plugin version which size is 30M+. Or you need to install the bin version, which is about 300M + in size, which is far more than the plugin version.

    For either plugin version or bin version, both need to be extracted into your software directory at first. For example:

    tar xvf xsql-0.6.0-bin-spark-2.4.3.tgz -C "/path/of/software"

    The destination directory of plugin version is different to bin version:

    tar xvf xsql-0.6.0-plugin-spark-2.4.3.tgz -C "/path/of/sparkhome"
  3. XSQL needs to know the information (like url and authorization ) of each data source .You can configure them in xsql.conf under conf directory. We provided a template file to help user configuring XSQL. For example:

    mv conf/xsql.conf.template xsql.conf
    

    There is an example of MySQL configuration:

    spark.xsql.datasources                     default
    spark.xsql.default.database                real_database
    spark.xsql.datasource.default.type         mysql
    spark.xsql.datasource.default.url          jdbc:mysql://127.0.0.1:2336
    spark.xsql.datasource.default.user         real_username
    spark.xsql.datasource.default.password     real_password
    spark.xsql.datasource.default.version      5.6.19
    

Running XSQL:

  1. If you are familiar with spark-sql , we provide an improved bash tool bin/spark-xsql. XSQL can be started in Cli mode by following command:

    $SPARK_HOME/bin/spark-xsql

    Feel free to input any SQL/HiveSQL in the prompt line:

    spark-xsql> show datasources;
    
  2. If you are familiar with DataSet API, start from our scala api is a good choice. For example:

    var spark = SparkSession
      .builder()
      .enableXSQLSupport()
      .getOrCreate()
    spark.sql("show datasources")

FAQ

Connect to more datasource

Advanced Configuration

XSQL Specific Query Language

Contact Us

Mail Lists: For developers [email protected], For users [email protected]. Add yours by emailing it.

QQ Group for Chinese user : No.838910008

More Repositories

1

RePlugin

RePlugin - A flexible, stable, easy-to-use Android Plug-in Framework
Java
7,261
star
2

Atlas

A high-performance and stable proxy for MySQL, it is developed by Qihoo's DBA and infrastructure team
C
4,650
star
3

wayne

Kubernetes multi-cluster management and publishing platform
TypeScript
3,706
star
4

evpp

A modern C++ network library for developing high performance network services in TCP/UDP/HTTP protocols.
C++
3,564
star
5

ArgusAPM

Powerful, comprehensive (Android) application performance management platform. 360线上移动性能检测平台
Java
2,673
star
6

safe-rules

详细的C/C++编程规范指南,由360质量工程部编著,适用于桌面、服务端及嵌入式软件系统。
2,363
star
7

Quicksql

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Java
2,057
star
8

poseidon

A search engine which can hold 100 trillion lines of log data.
Go
1,966
star
9

QConf

Qihoo Distributed Configuration Management System
C++
1,865
star
10

hbox

AI on Hadoop
Java
1,727
star
11

phptrace

A tracing and troubleshooting tool for PHP scripts.
C
1,677
star
12

mysql-sniffer

mysql-sniffer is a network traffic analyzer tool for mysql, it is developed by Qihoo DBA and infrastructure team
C
845
star
13

huststore

High-performance Distributed Storage
C
823
star
14

doraemon

Doraemon is a Prometheus based monitor system
JavaScript
655
star
15

logkafka

Collect logs and send lines to Apache Kafka
C++
500
star
16

zeppelin

A Scalable, High-Performance Distributed Key-Value Platform
C++
399
star
17

tensornet

C++
316
star
18

qbusbridge

The Apache Kafka Client SDK
C++
292
star
19

360zhinao

360zhinao
Python
274
star
20

WatchAD2.0

WatchAD2.0是一款针对域威胁的日志分析与监控系统
CSS
206
star
21

zendAPI

The C++ wrapper of zend engine
C++
183
star
22

mongosync

mongosync is simple && useful tool to sync data between mongo replicaSet, it is developed by Qihoo's DBA and infrastructure team
C++
154
star
23

artdumper

从oat文件中dump出来dex的工具
C++
138
star
24

influx-proxy

influxdb HA
Go
128
star
25

kmemcache

linux kernel memcache server
C
126
star
26

XLearning-XDML

extremely distributed machine learning
Scala
123
star
27

simcc

A simple C++ common base library used in Qihoo 360
C++
116
star
28

nemo

A library that provide multiply data structure. Such as map, hash, list, set. We build these data structure base on rocksdb as the storage layer for Pika https://github.com/OpenAtomFoundation/pika .
C++
115
star
29

ngx_http_subrange_module

Split one big HTTP/Range request to multiple subrange requesets
C
107
star
30

blackwidow

A library implements REDIS commands(Strings, Hashes, Lists, Sorted Sets, Sets, Keys, HyperLogLog) based on rocksdb, as the storage layer for Pika https://github.com/OpenAtomFoundation/pika .
C++
99
star
31

QNAT

C
88
star
32

Mario

A Library that make the write from synchronous to asynchronous.
C++
78
star
33

Luwak

利用预训练语言模型从非结构化威胁报告中提取 MITRE ATT&CK TTP 信息
Python
68
star
34

mpic

A C++ embedded library of multiple processes framework developed and used at Qihoo360.
C++
50
star
35

nemo-rocksdb

Add TTL feature on rocksdb, and compatible with rocksdb
C++
44
star
36

dgl-operator

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes
Go
44
star
37

ironwill

Useful iOS components for your project. 健壮且有用的OC代码, 可以直接在你的iOS应用中使用.
Objective-C
37
star
38

elog

A erlang log nif
C++
28
star
39

rust-jsonnet

rust-jsonnet - The Google Jsonnet( operation data template language) for rust
Rust
24
star
40

zeppelin-gateway

Object Gateway Provide Applications with a RESTful Gateway to zeppelin
C++
23
star
41

zeppelin-client

Client Library for zeppelin
C++
21
star
42

luajit-jsonnet

The Google Jsonnet( operation data template language) for Luajit
C++
16
star
43

HTTPSLayer

PHP
16
star
44

CReSS

Cross-model Retrieval between 13C NMR Spectrum and Structure
Python
15
star
45

wayne-backend-plugins

Wayne backend plugins
Go
13
star
46

gpstall

Stall Postgres' insert command
C++
8
star
47

cloud-website

360 cloud official website
PHP
8
star
48

wayne-frontend-plugins

Wayne UI Plugins
TypeScript
7
star
49

SEEChat

一见多模态对话模型
Python
5
star
50

wiki

wiki for qihoo infrastructure team
2
star
51

se-office

se-office扩展,提供基于开放标准的全功能办公生产力套件,基于浏览器预览和编辑office。
JavaScript
1
star