howl-anderson/rasa_chinese

Stars
119
Rank 297,930 (Top 6 %)
Language
Python
License
Apache License 2.0
Created almost 4 years ago
Updated over 1 year ago

howl-anderson/rasa_chinese

howl-anderson

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

rasa_chinese 专门针对中文语言的 rasa 组件扩展包，提供了许多针对中文语言的组件

rasa_chinese

rasa_chinese 是专门针对中文语言的 rasa 组件扩展包。提供了一些针对中文语言的组件。

本软件包得到了 Rasa 官方的认可，官方博客中推荐中文 Rasa 用户使用： https://rasa.com/blog/non-english-tools-for-rasa/

安装

pip install rasa_chinese

当前包含的组件

LanguageModelTokenizer

基于 HuggingFace's transformers 的分词组件。

pipeline 使用：

pipeline:
  - name: "rasa_chinese.nlu.tokenizers.lm_tokenizer.LanguageModelTokenizer"

LanguageModelTokenizer 的分词方法必须和 LanguageModelFeaturizer 保持一致。

如果用户在 pipeline 中指定了 LanguageModelFeaturizer 的参数，那么也需要为 LanguageModelFeaturizer 设置相同的参数。如下所示:

pipeline:
  - name: "rasa_chinese.nlu.tokenizers.lm_tokenizer.LanguageModelTokenizer"
    # 以下的参数必须和 LanguageModelFeaturizer 的参数保持完全一致
    model_name: "roberta"
    model_weights: "roberta-base"
  - name: LanguageModelFeaturizer
    model_name: "roberta"
    model_weights: "roberta-base"

unlocking-the-power-of-llms

使用 Prompts 和 Chains 让 ChatGPT 成为神奇的生产力工具！Unlocking the power of LLMs.

Chinese_models_for_SpaCy

SpaCy 中文模型 | Models for SpaCy that support Chinese

Jupyter Notebook

hanzi_char_featurizer

汉字字符特征提取器 (featurizer)，提取汉字的特征（发音特征、字形特征）用做深度学习的特征｜ A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning

hanzi_chaizi

汉字拆字库，可以将汉字拆解成偏旁部首，在机器学习中作为汉字的字形特征

tools_for_corpus_of_people_daily

人民日报语料处理工具集 | Tools for Corpus of People's Daily

WeatherBot

一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面

MicroTokenizer

一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese

ATIS_dataset

The ATIS (Airline Travel Information System) Dataset

seq2annotation

基于 TensorFlow & PaddlePaddle 的通用序列标注算法库（目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF，更多算法正在持续添加中）实现中文分词（Tokenizer / segmentation）、词性标注（Part Of Speech, POS）和命名实体识别（Named Entity Recognition, NER）等序列标注任务。

MITIE_Chinese_Wikipedia_corpus

Pre-trained Wikipedia corpus by MITIE

chinese-wikipedia-corpus-creator

Corpus creator for Chinese Wikipedia

MicroRegEx

一个微型的正则表达式引擎 | A micro regular expression engine

Jupyter Notebook

Chinese_tokenizer_benchmark

中文分词软件基准测试 | Chinese tokenizer benchmark

rasa_contrib

rasa_contrib is a addon package for rasa. It provide some useful/powerful addition components

NLU_benchmark_dataset

自然语言理解基准测试数据集 | Benchmark datasets for Natural Language Understanding (NLU)

corpus_dataset_for_Chinese_NLP

中文 NLP 语料库数据集

four_corner_method

中文「四角号码」数据与工具，可以将汉字拆解成和字形相关的编码，在机器学习中作为汉字的字形特征

scel2txt

搜狗细胞词库到普通文本的转换提取工具。提取词汇表，用于深度学习做数据生成和字典特征

vimapt

A package manager for vim (VimApt => Vim's Advanced Package Tools)

tf_crf_layer

CRF(Conditional Random Field) Layer for TensorFlow 1.X with many powerful functions

rasa_chinese_service

rasa_chinese 的服务 package

MicroCompiler

一个微型的 LL/LR/LALR 语法解析器 | A micro compiler project to provide LL/LR/LALR syntax parser

WeatherBot_Action

Action server for WeatherBot

WeatherBot_UI

WebChat UI (HTML pages) for WeatherBot

PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

MicroHMM

一个微型的基于 Python 的 HMM (隐马尔可夫模型) 包 | A micro python package for HMM (Hidden Markov Model)

MicroWeatherBot_CN

基于 rasa 1.x 版本搭建的中文天气查询 demo | A simple & micro Chinese Weatherbot based on rasa framework

WeatherBot_Core

entity2embedding

A python package for word2vec

MicroWeatherBot_EN

基于 rasa 1.x 版本搭建的英文天气查询 demo | A simple & micro English Weatherbot based on rasa framework

q_learning_demo

Show how Q-learning works from scratch

Jupyter Notebook

PaddleNER

basic_weather_bot_server

MicroCPUID

A micro tool based on assembly language to detect and display CPU information

SDMdata

ner_offline_evaluate

howl-anderson.github.io

howl-anderson

hanzi_char_lookup_feature

基于字典的方法给字符提供额外的特征，常用于基于深度学习的NER

AIMLChatRobot

how_Chinese_tokenizer_works

google-io-keras-vae

Code for Google IO 2021 Modern Keras design patterns session

Jupyter Notebook

MicroTagger

一个微型的用于提取 Part-Of-Speech (POS) 的 Python 包 | A micro python library for NLP Tagger of Part-Of-Speech (POS)

Assignment_for_Natural_Language_Processing_with_Deep_Learning_CS224n_By_Stanford_University

Assignment for CS224n: Natural Language Processing with Deep Learning By Stanford University

sdmvspecies

SDMvspecies is R package to create virtual species (virtual data or artificial data) for SDM (Species Distribution Modelling)