• Stars
    star
    631
  • Rank 71,222 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 6 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SQLova

  • SQLova is a neural semantic parser translating natural language utterance to SQL query. The name is originated from the name of our department: Search & QLova (Search & Clova).

Authors

Abstract

  • We present the new state-of-the-art semantic parsing model that translates a natural language (NL) utterance into a SQL query.
  • The model is evaluated on WikiSQL, a semantic parsing dataset consisting of 80,654 (NL, SQL) pairs over 24,241 tables from Wikipedia.
  • We achieve 83.6% logical form accuracy and 89.6% execution accuracy on WikiSQL test set.

The model in a nutshell

Results (Updated at Jan 12, 2019)

Model Dev
logical form
accuracy
Dev
execution
accuracy
Test
logical form
accuracy
Test
execution
accuracy
SQLova 81.6 (+5.5)^ 87.2 (+3.2)^ 80.7 (+5.3)^ 86.2 (+2.5)^
SQLova-EG 84.2 (+8.2)* 90.2 (+3.0)* 83.6(+8.2)* 89.6 (+2.5)*
  • ^: Compared to current SOTA models that do not use execution guided decoding.
  • *: Compared to current SOTA.
  • The order of where conditions is ignored in measuring logical form accuracy in our model.

Source code

Requirements

  • python3.6 or higher.
  • PyTorch 0.4.0 or higher.
  • CUDA 9.0
  • Python libraries: babel, matplotlib, defusedxml, tqdm
  • Example
    • Install minicoda
    • conda install pytorch torchvision -c pytorch
    • conda install -c conda-forge records==0.5.2
    • conda install babel
    • conda install matplotlib
    • conda install defusedxml
    • conda install tqdm
  • The code has been tested on Tesla M40 GPU running on Ubuntu 16.04.4 LTS.

Running code

  • Type python3 train.py --seed 1 --bS 16 --accumulate_gradients 2 --bert_type_abb uS --fine_tune --lr 0.001 --lr_bert 0.00001 --max_seq_leng 222 on terminal.
    • --seed 1: Set the seed of random generator. The accuracies changes by few percent depending on seed.
    • --bS 16: Set the batch size by 16.
    • --accumulate_gradients 2: Make the effective batch size be 16 * 2 = 32.
    • --bert_type_abb uS: Uncased-Base BERT model is used. Use uL to use Uncased-Large BERT.
    • --fine_tune: Train BERT. Without this, only the sequence-to-SQL module is trained.
    • --lr 0.001: Set the learning rate of the sequence-to-SQL module as 0.001.
    • --lr_bert 0.00001: Set the learning rate of BERT module as 0.00001.
    • --max_seq_leng 222: Set the maximum number of input token lengths of BERT.
  • The model should show ~79% logical accuracy (lx) on dev set after ~12 hrs (~10 epochs). Higher accuracy can be obtained with longer training, by selecting different seed, by using Uncased Large BERT model, or by using execution guided decoding.
  • Add --EG argument while running train.py to use execution guided decoding.
  • Whenever higher logical form accuracy calculated on the dev set, following three files are saved on current folder:
    • model_best.pt: the checkpoint of the the sequence-to-SQL module.
    • model_bert_best.pt: the checkpoint of the BERT module.
    • results_dev.jsonl: json file for official evaluation.
  • Shallow-Layer and Decoder-Layer models can be trained similarly (train_shallow_layer.py, train_decoder_layer.py).

Evaluation on WikiSQL DEV set

  • To calculate logical form and execution accuracies on dev set using official evaluation script,
    • Download original WikiSQL dataset.
    • tar xvf data.tar.bz2
    • Move them under $HOME/data/WikiSQL-1.1/data
    • Set path on evaluation_ws.py. This is the file where the path information has added on original evaluation.py script. Or you can use original evaluation.py by setting the path to the files by yourself.
    • Type python3 evaluation_ws.py on terminal.

Evaluation on WikiSQL TEST set

  • Uncomment line 550-557 of train.py to load test_loader and test_table.
  • One test(...) function, use test_loader and test_table instead of dev_loader and dev_table.
  • Save the output of test(...) with save_for_evaluation(...) function.
  • Evaluate with evaluatoin_ws.py as before.

Load pre-trained SQLova parameters.

  • Pretrained SQLova model parameters are uploaded in release. To start from this, uncomment line 562-565 and set paths.

Code base

  • Pretrained BERT models were downloaded from official repository.
  • BERT code is from huggingface-pytorch-pretrained-BERT.
  • The sequence-to-SQL model is started from the source code of SQLNet and significantly re-written while maintaining the basic column-attention and sequence-to-set structure of the SQLNet.

Data

  • The data is annotated by using annotate_ws.py which is based on annotate.py from WikiSQL repository. The tokens of natural language guery, and the start and end indices of where-conditions on natural language tokens are annotated.
  • Pre-trained BERT parameters can be downloaded from BERT official repository and can be coverted to ptfile using following script. You need install both pytorch and tensorflow and change BERT_BASE_DIR to your data directory.
    cd sqlova
    export BERT_BASE_DIR=data/uncased_L-12_H-768_A-12
    python bert/convert_tf_checkpoint_to_pytorch.py \
        --tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt \
        --bert_config_file    $BERT_BASE_DIR/bert_config.json \
        --pytorch_dump_path     $BERT_BASE_DIR/pytorch_model.bin 
  • bert/convert_tf_checkpoint_to_pytorch.py is from the previous version of huggingface-pytorch-pretrained-BERT, and current version of pytorch-pretrained-BERT is not compatible with the bert model used in this repo due to the difference in variable names (in LayerNorm). See this for the detail.
  • For the convenience, the annotated WikiSQL data and the PyTorch-converted pre-trained BERT parameters are available at here.

License

Copyright 2019-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

More Repositories

1

billboard.js

๐Ÿ“Š Re-usable, easy interface JavaScript chart library based on D3.js
TypeScript
5,812
star
2

fe-news

FE ๊ธฐ์ˆ  ์†Œ์‹ ํ๋ ˆ์ด์…˜ ๋‰ด์Šค๋ ˆํ„ฐ
5,635
star
3

dust3r

DUSt3R: Geometric 3D Vision Made Easy
Python
4,919
star
4

egjs-flicking

๐ŸŽ  โ™ป๏ธ Everyday 30 million people experience. It's reliable, flexible and extendable carousel.
TypeScript
2,551
star
5

egjs-infinitegrid

A module used to arrange card elements including content infinitely on a grid layout.
TypeScript
2,187
star
6

ngrinder

enterprise level performance testing solution
Java
1,788
star
7

d2codingfont

D2 Coding ๊ธ€๊ผด
1,774
star
8

egjs

Javascript components group that brings easiest and fastest way to build a web application in your way.
JavaScript
922
star
9

splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)
Python
748
star
10

mast3r

Grounding Image Matching in 3D with MASt3R
Python
731
star
11

biobert-pretrained

BioBERT: a pre-trained biomedical language representation model for biomedical text mining
651
star
12

deep-image-retrieval

End-to-end learning of deep visual representations for image retrieval
Python
643
star
13

fixture-monkey

Let Fixture Monkey generate test instances including edge cases automatically
Java
549
star
14

roma

RoMa: A lightweight library to deal with 3D rotations in PyTorch.
Python
493
star
15

r2d2

Python
468
star
16

kapture

kapture is a file format as well as a set of tools for manipulating datasets, and in particular Visual Localization and Structure from Motion data.
Python
466
star
17

egjs-view360

360 integrated viewing solution
TypeScript
438
star
18

scavenger

A runtime dead code analysis tool
Java
400
star
19

yobi

Project hosting software - Deprecated
Java
379
star
20

lispe

An implementation of a full fledged Lisp interpreter with Data Structure, Pattern Programming and High level Functions with Lazy Evaluation ร  la Haskell.
C
369
star
21

lucy-xss-filter

HTML
319
star
22

arcus

ARCUS is the NAVER memcached with lists, sets, maps and b+trees. http://naver.github.io/arcus
Shell
302
star
23

egjs-grid

A component that can arrange items according to the type of grids
TypeScript
275
star
24

spring-jdbc-plus

Spring JDBC Plus
Java
274
star
25

kapture-localization

Provide mapping and localization pipelines based on kapture format
Python
266
star
26

android-imagecropview

android image crop library
Java
250
star
27

croco

Python
249
star
28

smarteditor2

Javascript WYSIWYG HTML editor
JavaScript
241
star
29

lucy-xss-servlet-filter

Java
237
star
30

kor2vec

OOV์—†์ด ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ํ•œ๊ตญ์–ด Embedding ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
Python
219
star
31

claf

CLaF: Open-Source Clova Language Framework
Python
215
star
32

eslint-config-naver

Naver JavaScript Coding Conventions rules for eslint
JavaScript
205
star
33

tamgu

Tamgu (ํƒ๊ตฌ), a FIL programming language: Functional, Imperative, Logical all in one for annotation and data augmentation
C++
199
star
34

egjs-view3d

Fast & customizable 3D model viewer for everyone
TypeScript
192
star
35

multi-hmr

Pytorch demo code and models for Multi-HMR
Python
178
star
36

nlp-challenge

NLP Shared tasks (NER, SRL) using NSML
Python
177
star
37

hackday-conventions-java

์บ ํผ์Šค ํ•ต๋ฐ์ด Java ์ฝ”๋”ฉ ์ปจ๋ฒค์…˜
173
star
38

nbase-arc

nbase-arc is an open source distributed memory store based on Redis
C
171
star
39

nanumfont

170
star
40

egjs-axes

A module used to change the information of user action entered by various input devices such as touch screen or mouse into the logical virtual coordinates.
TypeScript
150
star
41

cgd

Combination of Multiple Global Descriptors for Image Retrieval
Python
147
star
42

naver-openapi-guide

CSS
135
star
43

volley-extensions

Volley Extensions v2.0.0. ( Volleyer, Volley requests, Volley caches, Volley custom views )
Java
134
star
44

fire

Python
128
star
45

tldr

TLDR is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self-supervised learning losses
Python
123
star
46

pr-stats

PR์— ๋Œ€ํ•œ ์œ ์šฉํ•œ ํ†ต๊ณ„๋ฅผ ์‚ฐ์ถœํ•˜๋Š” GitHub Actions
TypeScript
122
star
47

PoseGPT

Python
119
star
48

grabcutios

Image segmentation using GrabCut algorithm for iOS
C++
118
star
49

sling

C++
117
star
50

gdc

Code accompanying our papers on the "Generative Distributional Control" framework
Python
116
star
51

naveridlogin-sdk-android

๋„ค์ด๋ฒ„ ์•„์ด๋””๋กœ ๋กœ๊ทธ์ธ SDK (์•ˆ๋“œ๋กœ์ด๋“œ)
Kotlin
114
star
52

posescript

Python
114
star
53

egjs-conveyer

Conveyer adds Drag gestures to your Native Scroll.
TypeScript
113
star
54

spring-batch-plus

Add useful features to spring batch
Kotlin
111
star
55

cfcs

Write once, create framework components that supports React, Vue, Svelte, and more.
TypeScript
102
star
56

egjs-agent

Extracts browser and operating system information from the user agent string or user agent object(userAgentData).
TypeScript
100
star
57

searchad-apidoc

Java
98
star
58

dope

Python
92
star
59

bergen

Benchmarking library for RAG
Jupyter Notebook
88
star
60

imagestabilizer

C++
77
star
61

guitar

AutoIt
75
star
62

arcus-memcached

ARCUS memory cache server
C
71
star
63

disco

A Toolkit for Distributional Control of Generative Models
Python
68
star
64

prism-live-studio

C++
63
star
65

cover-checker

Check your pull request code coverage
Java
63
star
66

egjs-list-differ

โž•โž–๐Ÿ”„ A module that checks the diff when values are added, removed, or changed in an array.
TypeScript
63
star
67

storybook-addon-preview

Storybook Addon Preview can show user selected knobs in various framework code in Storybook
TypeScript
63
star
68

svc

Easy and intuitive pattern for Android
Kotlin
62
star
69

egjs-imready

I'm Ready to check if the images or videos are loaded!
TypeScript
60
star
70

egjs-flicking-plugins

Plugins for @egjs/flicking
TypeScript
60
star
71

naveridlogin-sdk-ios

Objective-C
59
star
72

garnet

Python
57
star
73

clova-face-kit

On-device lightweight face recognition. Available on Android, iOS, WASM, Python.
57
star
74

rye

RYE, Native Sharding RDBMS
C
54
star
75

hubblemon

Python
54
star
76

zeplin-flutter-gen

๐Ÿš€The Flutter dart code generator from zeplin. ex) Container, Text, Color, TextStyle, ... - Save your time.
JavaScript
53
star
77

egjs-visible

A class that checks if an element is visible in the base element or viewport.
HTML
52
star
78

arcus-java-client

ARCUS Java client
Java
50
star
79

aqm-plus

PyTorch code for Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation (AQM+) (ICLR 2019)
Python
50
star
80

isometrizer

Isometrizer turns your DOM elements into isometric projection
TypeScript
48
star
81

artemis

Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Python
46
star
82

jindojs-jindo

Jindo JavaScript Framework
JavaScript
44
star
83

covid19-nmt

Multi-lingual & multi-domain (specialisation for biomedical data) translation model
Python
40
star
84

react-sample-code

์ด ํ”„๋กœ์ ํŠธ๋Š” hello world์— ๊ณต๊ฐœํ•œ React ๊ฐœ๋ฐœ ๊ฐ€์ด๋“œ์— ํ•„์š”ํ•œ ์ƒ˜ํ”Œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.
JavaScript
39
star
85

pump

Python
39
star
86

posebert

Python
39
star
87

passport-naver

A passport strategy for Naver OAuth 2.0
JavaScript
38
star
88

hadoop

Public hadoop release repository
Java
38
star
89

kaist-oss-course

Introduction to Open Source Software class @ KAIST 2016
38
star
90

egjs-component

A class used to manage events in a component like DOM
TypeScript
38
star
91

graphql-dataloader-mongoose

graphql-dataloader-mongoose is a DataLoader generator based on an existing Mongoose model
TypeScript
38
star
92

egjs-persist

Provide cache interface to handle persisted data among history navigation.
JavaScript
38
star
93

shine

[CVPR'24 Highlight] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Python
36
star
94

naver-spring-batch-ex

Java
33
star
95

naverspeech-sdk-ios

Swift
32
star
96

reflect

C++ class reflection library without RTTI.
C++
32
star
97

android-utilset

Utilset is collections of useful functions to save your valuable time.
Java
32
star
98

cafe-sdk-unity

31
star
99

image-maps

jquery plugin which can be partially linked to the image
JavaScript
31
star
100

mesh-simplifier

Collection of mesh simplification methods written in Typescript
TypeScript
30
star