• Stars
    star
    241
  • Rank 167,643 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created about 4 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

NLP Profiler

||| Gitter ||| License GitHub actions Code coverage Sourcery Codeac PyPI version Python versions PyPi stats Downloads

A simple NLP library that allows profiling datasets with one or more text columns.

NLP Profiler returns either high-level insights or low-level/granular statistical information about the text when given a dataset and a column name containing text data, in that column.

In short: Think of it as using the pandas.describe() function or running Pandas Profiling on your data frame, but for datasets containing text columns rather than the usual columnar datasets.

Table of contents


What do you get from the library?

  • Input a Pandas dataframe series as an input parameter.
  • You get back a new dataframe with various features about the parsed text per row.
    • High-level: sentiment analysis, objectivity/subjectivity analysis, spelling quality check, grammar quality check, ease of readability check, etc...
    • Low-level/granular: number of characters in the sentence, number of words, number of emojis, number of words, etc...
  • From the above numerical data in the resulting dataframe descriptive statistics can be drawn using the pandas.describe() on the dataframe.

See screenshots under the Jupyter section and also under Screenshots for further illustrations.

Under the hood it does make use of a number of libraries that are popular in the AI and ML communities, but we can extend it's functionality by replacing or adding other libraries as well.

A simple notebook have been provided to illustrate the usage of the library.

Please join the Gitter.im community and say "hello" to us, share your feedback, have a fun time with us.

Note: this is a new endeavour and it may have rough edges i.e. NLP_Profiler in its current version is probably NOT capable of doing many things. Many of these gaps are opportunities we can work on and plug, as we go along using it. Please provide constructive feedback to help with the improvement of this library. We just recently achieved this with scaling with larger datasets.

Requirements

  • Python 3.7.x or higher.
  • Dependencies described in the requirements.txt.
  • High-level including Grammar checks:
    • faster processor
    • higher RAM capacity
    • working disk-space of 1 to 3 GBytes (depending on the dataset size)
  • (Optional)
    • Jupyter Lab (on your local machine).
    • Google Colab account.
    • Kaggle account.
    • Grammar check functionality:
      • Internet access
      • Java 8 or higher

Getting started

Installation

For Conda/Miniconda environments:

conda config --set pip_interop_enabled True
pip install "spacy >= 2.3.0,<3.0.0"         # in case spacy is not present
python -m spacy download en_core_web_sm

### now perform any of the below pathways/options

For Kaggle environments:

pip uninstall typing      # this can cause issues on Kaggle hence removing it helps

Follow any of the remaining installation steps but "avoid" using -U with pip install -- again this can cause issues on Kaggle hence not using it helps.

From PyPi:

pip install -U nlp_profiler

From the GitHub repo:

pip install -U git+https://github.com/neomatrix369/nlp_profiler.git@master

From the source:

For library development purposes, see Developer guide

Usage

import nlp_profiler.core as nlpprof

new_text_column_dataset = nlpprof.apply_text_profiling(dataset, 'text_column')

or

from nlp_profiler.core import apply_text_profiling

new_text_column_dataset = apply_text_profiling(dataset, 'text_column')

See Notebooks section for further illustrations.

Developer guide

See Developer guide to know how to build, test, and contribute to the library.

Demo and presentations

Look at a short demo of the NLP Profiler library at one of these:

Demo of the NLP Profiler library (Abhishek talks #6) or you find the rest of the talk here or here for slides Demo of the NLP Profiler library (NLP Zurich talk) or you find the rest of the talk here or here for slides

Notebooks

After successful installation of the library, RESTART Jupyter kernels or Google Colab runtimes for the changes to take effect.

See Notebooks for usage and further details.

Screenshots

See Screenshots

Credits and supporters

See CREDITS_AND_SUPPORTERS.md

Changes

See CHANGELOG.md

License

Refer licensing (and warranty) policy.

Contributing

Contributions are Welcome!

Please have a look at the CONTRIBUTING guidelines.

Please share it with the wider community (and get credited for it)!


Go to the NLP page

More Repositories

1

awesome-ai-ml-dl

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Jupyter Notebook
1,446
star
2

awesome-graal

A curated list of awesome resources for Graal, GraalVM, Truffle and related topics
Shell
355
star
3

refactoring-developer-habits

Refactor developer habits: among many such habits when writing or maintaining code
130
star
4

chatbot-conversations

Chatbot conversations: a demo application how two (or more) chatbots can talk to each other, the logic used to build Eliza (along with an NLP model) has been used to power the chatbots.
Java
35
star
5

java-10-and-beyond

Examples and exercises based on some of the features of Java 10 (GA and Early Access builds)
Java
20
star
6

learning-path-index

A repo with data files, assets and code supporting and powering the Learning Path Index Project
Jupyter Notebook
16
star
7

LambdaExamples

LambdaExamples - a collection of examples and resources to learning about Lambdas in Java 8
Java
15
star
8

SalarySlipKata

SalarySlipKata - implementation of generating Salary Slips Kata, for UK companies, in Java via multiple iterations
Java
9
star
9

RESTAPIUnifier

RESTAPIUnifier - brings together all the APIs of various formats under one roof!
Java
8
star
10

GildedRoseKata

A number of solutions to the Gilded Rose Kata implemented in Java using various refactoring methods and Java testing tools
Java
4
star
11

dl4j-nlp-cuda-example

A git repository containing an NLP example using DL4J (cuda) in Java
Java
3
star
12

speech-to-text

Find your Speech-to-text resources for various platforms
3
star
13

RefactoringSpecifications

Refactoring Specifications - codebase with an example of spec to code
Java
3
star
14

code-butler-app

A FB messenger app that serves and helps learning coding easier and fun
JavaScript
2
star
15

nlp-java-jvm-example

A repo with NLP examples of libraries/packages/framework written in Java/JVM
Jupyter Notebook
2
star
16

hello-world-github-actions

Hello World repo to test out GitHub Actions
Shell
1
star
17

Fluent-Specs

Java library for BDD-style unit-level specifications
Java
1
star
18

PatchReviewUtilities

Patch review utilities that are a little more complex than shell scripts
Java
1
star
19

ShoppingCartKata

ShoppingCartKata - implementation of the shopping cart kata in Java via multiple iterations
Java
1
star
20

OpenJDKProductivityTool

The OpenJDK productivity tool - a reviewers and contributors handy tool, to speed up delivery of patches!
Java
1
star
21

OwnTestFrameworkConstraint

Tic Tac Toe Kata, extracting my own test framework
C#
1
star
22

SICP-Section-1.1-2.2

SICP-Secton-1.1
Scheme
1
star