TypeSQL
Source code accompanying our NAACL 2018 paper:TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation
03/20/2022
: We open-sourced a simple but SOTA model (just T5) for the task! Please check out our code in the UnifiedSKG repo!!
Environment Setup
- The code uses Python 2.7 and Pytorch 0.2.0 GPU.
- Install Python dependency:
pip install -r requirements.txt
- Install Pytorch 0.2.0:
conda install pytorch=0.2.0 cuda91 -c pytorch
. Replace cuda91 to whichever cuda version you have.
Download Data and Embeddings
- Download the zip data file at the Google Drive, and put it in the root dir.
- Download the pretrained Glove and the paraphrase embedding
para-nmt-50m/data/paragram_sl999_czeng.txt
. Put the unziped glove and para-nmt-50m folders in the root dir.
Train Models
- To use knowledge graph types:
mkdir saved_model_kg
python train.py --sd saved_model_kg
- To use DB content types:
mkdir saved_model_con
python train.py --sd saved_model_con --db_content 1
Test Models
- Test Model with knowledge graph types:
python test.py --sd saved_model_kg
- Test Model with knowledge graph types:
python test.py --sd saved_model_con --db_content 1
Get Data Types
- Get a Google Knowledge Graph Search API Key by following the link
- Search knowledge graph to get entities:
python get_kg_entities.py [Google freebase API Key] [input json file] [output json file]
- Use detected knowledge graph entites and DB content to group questions and create type attributes in data files:
python data_process_test.py --tok [output json file generated at step 2] --table TABLE_FILE --out OUTPUT_FILE [--data_dir DATA_DIRECTORY] [--out_dir OUTPUT_DIRECTORY]
python data_process_train_dev.py --tok [output json file generated at step 2] --table TABLE_FILE --out OUTPUT_FILE [--data_dir DATA_DIRECTORY] [--out_dir OUTPUT_DIRECTORY]
Acknowledgement
The implementation is based on SQLNet. Please cite it too if you use this code.