• Stars
    star
    2,057
  • Rank 22,480 (Top 0.5 %)
  • Language
    Java
  • License
    MIT License
  • Created almost 6 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

English|中文

200_200

Language Release Version license Documentation Status PRs Welcome

Quicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis.

Star-History

Architecture

An architecture diagram helps you access Quicksql more easily.

1540973404791

QSQL architecture consists of three layers:

  • Parsing Layer: Used for parsing, validation, optimization of SQL statements, splitting of mixed SQL and finally generating Query Plan;

  • Computing Layer: For routing query plan to a specific execution plan, then interpreted to executable code for given storage or engine(such as Elasticsearch JSON query or Hive HQL);

  • Storage Layer: For data prepared extraction and storage;

Basic Features

In the vast majority of cases, we expect to use a language for data analysis and don't want to consider things that are not related to data analysis, Quicksql is born for this.

The goal of Quicksql is to provide three functions:

1. Unify all structured data queries into a SQL grammar

  • Only Use SQL

In Quicksql, you can query Elasticsearch like this:

SELECT state, pop FROM geo_mapping WHERE state = 'CA' ORDER BY state

Even an aggregation query:

SELECT approx_count_distinct(city), state FROM geo_mapping GROUP BY state LIMIT 10

You won't be annoyed again because the brackets in the JSON query can't match ;)

  • Eliminate Dialects

In the past, the same semantic statement needs to be converted to a dialect for different engines, such as:

SELECT * FROM geo_mapping                       -- MySQL Dialect
LIMIT 10 OFFSET 10                              
SELECT * FROM geo_mapping                       -- Oracle Dialect
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY          

In Quicksql, relational databases no longer have the concept of dialects. You can use the grammar of Quicksql to query any engine, just like this:

SELECT * FROM geo_mapping LIMIT 10 OFFSET 10    -- Run Anywhere

2. Shield the isolation between different data sources

Consider a situation where you want to join tables that are in different engines or are not in the same cluster, you may be in trouble.

However, in Quicksql, you can query like this:

SELECT * FROM 
    (SELECT * FROM es_raw.profile AS profile    //index.tpye on Elasticsearch 
        WHERE note IS NOT NULL )AS es_profile
INNER JOIN 
    (SELECT * FROM hive_db.employee AS emp  //database.table on Hive
    INNER JOIN hive_db.action AS act    //database.table on Hive
    ON emp.name = act.name) AS tmp 
ON es_profile.prefer = tmp.prefer

3. Choose the most appropriate way to execute the query

A query involving multiple engines can be executed in a variety of ways. Quicksql wants to combine the advantages of each engine to find the most appropriate one.

Getting Started

For instructions on building Quicksql from source, see Getting Started.

Reporting Issues

If you find any bugs or have any better suggestions, please file a GitHub issue.

And if the issue is approved, a label [QSQL-ID] will be added before the issue description by committer so that it can correspond to commit. Such as:

[QSQL-1002]: Views generated after splitting logical plan are redundant.

Contributing

We welcome contributions.

If you are interested in Quicksql, you can download the source code from GitHub and execute the following maven command at the project root directory:

mvn -DskipTests clean package

If you are planning to make a large contribution, talk to us first! It helps to agree on the general approach. Log a Issures on GitHub for your proposed feature.

Fork the GitHub repository, and create a branch for your feature.

Develop your feature and test cases, and make sure that mvn install succeeds. (Run extra tests if your change warrants it.)

Commit your change to your branch.

If your change had multiple commits, use git rebase -i master to squash them into a single commit, and to bring your code up to date with the latest on the main line.

Then push your commit(s) to GitHub, and create a pull request from your branch to the QSQL master branch. Update the JIRA case to reference your pull request, and a committer will review your changes.

The pull request may need to be updated (after its submission) for two main reasons:

  1. you identified a problem after the submission of the pull request;
  2. the reviewer requested further changes;

In order to update the pull request, you need to commit the changes in your branch and then push the commit(s) to GitHub. You are encouraged to use regular (non-rebased) commits on top of previously existing ones.

Join us

Slack Github QQ

More Repositories

1

RePlugin

RePlugin - A flexible, stable, easy-to-use Android Plug-in Framework
Java
7,261
star
2

Atlas

A high-performance and stable proxy for MySQL, it is developed by Qihoo's DBA and infrastructure team
C
4,650
star
3

wayne

Kubernetes multi-cluster management and publishing platform
TypeScript
3,706
star
4

evpp

A modern C++ network library for developing high performance network services in TCP/UDP/HTTP protocols.
C++
3,564
star
5

ArgusAPM

Powerful, comprehensive (Android) application performance management platform. 360线上移动性能检测平台
Java
2,673
star
6

safe-rules

详细的C/C++编程规范指南,由360质量工程部编著,适用于桌面、服务端及嵌入式软件系统。
2,363
star
7

poseidon

A search engine which can hold 100 trillion lines of log data.
Go
1,966
star
8

QConf

Qihoo Distributed Configuration Management System
C++
1,865
star
9

hbox

AI on Hadoop
Java
1,727
star
10

phptrace

A tracing and troubleshooting tool for PHP scripts.
C
1,677
star
11

mysql-sniffer

mysql-sniffer is a network traffic analyzer tool for mysql, it is developed by Qihoo DBA and infrastructure team
C
845
star
12

huststore

High-performance Distributed Storage
C
823
star
13

doraemon

Doraemon is a Prometheus based monitor system
JavaScript
655
star
14

logkafka

Collect logs and send lines to Apache Kafka
C++
500
star
15

zeppelin

A Scalable, High-Performance Distributed Key-Value Platform
C++
399
star
16

tensornet

C++
316
star
17

qbusbridge

The Apache Kafka Client SDK
C++
292
star
18

360zhinao

360zhinao
Python
274
star
19

XSQL

Unified SQL Analytics Engine Based on SparkSQL
Scala
210
star
20

WatchAD2.0

WatchAD2.0是一款针对域威胁的日志分析与监控系统
CSS
206
star
21

zendAPI

The C++ wrapper of zend engine
C++
183
star
22

mongosync

mongosync is simple && useful tool to sync data between mongo replicaSet, it is developed by Qihoo's DBA and infrastructure team
C++
154
star
23

artdumper

从oat文件中dump出来dex的工具
C++
138
star
24

influx-proxy

influxdb HA
Go
128
star
25

kmemcache

linux kernel memcache server
C
126
star
26

XLearning-XDML

extremely distributed machine learning
Scala
123
star
27

simcc

A simple C++ common base library used in Qihoo 360
C++
116
star
28

nemo

A library that provide multiply data structure. Such as map, hash, list, set. We build these data structure base on rocksdb as the storage layer for Pika https://github.com/OpenAtomFoundation/pika .
C++
115
star
29

ngx_http_subrange_module

Split one big HTTP/Range request to multiple subrange requesets
C
107
star
30

blackwidow

A library implements REDIS commands(Strings, Hashes, Lists, Sorted Sets, Sets, Keys, HyperLogLog) based on rocksdb, as the storage layer for Pika https://github.com/OpenAtomFoundation/pika .
C++
99
star
31

QNAT

C
88
star
32

Mario

A Library that make the write from synchronous to asynchronous.
C++
78
star
33

Luwak

利用预训练语言模型从非结构化威胁报告中提取 MITRE ATT&CK TTP 信息
Python
68
star
34

mpic

A C++ embedded library of multiple processes framework developed and used at Qihoo360.
C++
50
star
35

nemo-rocksdb

Add TTL feature on rocksdb, and compatible with rocksdb
C++
44
star
36

dgl-operator

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes
Go
44
star
37

ironwill

Useful iOS components for your project. 健壮且有用的OC代码, 可以直接在你的iOS应用中使用.
Objective-C
37
star
38

elog

A erlang log nif
C++
28
star
39

rust-jsonnet

rust-jsonnet - The Google Jsonnet( operation data template language) for rust
Rust
24
star
40

zeppelin-gateway

Object Gateway Provide Applications with a RESTful Gateway to zeppelin
C++
23
star
41

zeppelin-client

Client Library for zeppelin
C++
21
star
42

luajit-jsonnet

The Google Jsonnet( operation data template language) for Luajit
C++
16
star
43

HTTPSLayer

PHP
16
star
44

CReSS

Cross-model Retrieval between 13C NMR Spectrum and Structure
Python
15
star
45

wayne-backend-plugins

Wayne backend plugins
Go
13
star
46

gpstall

Stall Postgres' insert command
C++
8
star
47

cloud-website

360 cloud official website
PHP
8
star
48

wayne-frontend-plugins

Wayne UI Plugins
TypeScript
7
star
49

SEEChat

一见多模态对话模型
Python
5
star
50

wiki

wiki for qihoo infrastructure team
2
star
51

se-office

se-office扩展,提供基于开放标准的全功能办公生产力套件,基于浏览器预览和编辑office。
JavaScript
1
star