• Stars
    star
    709
  • Rank 63,401 (Top 2 %)
  • Language
    Python
  • License
    GNU Affero Genera...
  • Created over 6 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dataset for couplets. 70万条对联数据库。

对联数据集。

This is a project to fetch couplets from 冯重朴_梨味斋散叶_的博客

This dataset contains more than 700,000 couplets.

Run the spider:

scrapy runspider sina_spider.py

It will store the data into ./output/.

Download the data

There is an already fetched and cleaned dataset that can be used directly with the seq2seq model. You can download it at here.

The downloaded data contains 5 files:

  1. train/in.txt: The input of the couplets. Each line is an input. Each word is split by space.
  2. train/out.txt: The output of the couplets. Each line is the output for the same line in the in.txt. Each word is split by space.
  3. test/in.txt: Same as train/in.txt but with less data.
  4. test/out.txt: Same as train/out.txt but with less data.
  5. vocabs: Vocabs file. Add <s> and <\s> as the first vocabs, which will be used to train in the seq2seq mode.

More Repositories

1

seq2seq-couplet

Play couplet with seq2seq model. 用深度学习对对联。
Python
5,488
star
2

del_gmail

Python script to delete mails from Gmail that match a keyword
Python
25
star
3

bard

A Java web framework that is easy to use, easy to extend.
Java
21
star
4

reddit-desktop

Reddit client for big screen.
JavaScript
15
star
5

tax_counter

Count individual income tax in China. 2019个人所得税计算器。
Python
15
star
6

scala2grpc

A SBT plugin to make it non-invasive to use gRPC with Scala.
Scala
10
star
7

twitter2mastodon

A tool to import posts from Twitter to Mastodon
Python
10
star
8

smart-ua-switcher

A chrome extension to set User-Agent based on URL rules.
JavaScript
10
star
9

redis_lease

Redis get/set/del with lease
Shell
8
star
10

linkgame

Link game written with create.js http://linkgame.binwang.me/
CoffeeScript
4
star
11

erlang_module

Popular erlang modules. http://erlang-modules.binwang.me/
CoffeeScript
4
star
12

fuckgfw

Shadowsocks config for OpenWRT
Shell
4
star
13

web_benchmark

A simple web service aims to be simple and fast.
Scala
4
star
14

wb14123.github.com

My blog
HTML
4
star
15

logical_foundations_exercise

The exercise for Logicial Foundations
HTML
3
star
16

blog

My blog source code
HTML
3
star
17

tla-cache

Use TLA+ to verify cache consistency for different algorithms
TLA
3
star
18

k3s-vm-cluster

3
star
19

redis-benchmark

Benchmark for rediscala
Scala
3
star
20

eroop

Experiment on combine OOP with Erlang's actor model
Elixir
2
star
21

gitlab-ci-multi-runner-docker

gitlab-ci-multi-runner with docker client built in. https://gitlab.com/gitlab-org/gitlab-ci-multi-runner
Go
2
star
22

tensorflow

Computation using data flow graphs for scalable machine learning
C++
1
star
23

matnn

learn neural network with matlab
MATLAB
1
star
24

docker-shadowsocks

Docker container for shadowsocks
1
star
25

bard-doc-ui

Web UI for documents that auto generated by Bard.
JavaScript
1
star
26

scala-stream-demo

Scala
1
star
27

Mr.White

A robot.
Python
1
star
28

auto_tsung

Auto deploy tsung environment to many machines
Shell
1
star