• Stars
    star
    1,645
  • Rank 28,407 (Top 0.6 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Text Analytics with Python - 2nd Edition

A Practitioner's Guide to Natural Language Processing

Text analytics can be a bit overwhelming and frustrating at times with the unstructured and noisy nature of textual data and the vast amount of information available. "Text Analytics with Python" is a book packed with 674 pages of useful information based on techniques, algorithms, experiences and various lessons learnt over time in analyzing text data. This repository contains datasets and code used in this book. I will also be adding various notebooks and bonus content here from time to time. Keep watching this space!

Get the book



About the book

Book Cover

Leverage Natural Language Processing (NLP) in Python and learn how to set up your own robust environment for performing text analytics. This second edition has gone through a major revamp and introduces several significant changes and new topics based on the recent trends in NLP.

You’ll see how to use the latest state-of-the-art frameworks in NLP, coupled with machine learning and deep learning models for supervised sentiment analysis powered by Python to solve actual case studies. Start by reviewing Python for NLP fundamentals on strings and text data and move on to engineering representation methods for text data, including both traditional statistical models and newer deep learning-based embedding models. Improved techniques and new methods around parsing and processing text are discussed as well.
Text summarization and topic models have been overhauled so the book showcases how to build, tune, and interpret topic models in the context of an interest dataset on NIPS conference papers. Additionally, the book covers text similarity techniques with a real-world example of movie recommenders, along with sentiment analysis using supervised and unsupervised techniques. There is also a chapter dedicated to semantic analysis where you’ll see how to build your own named entity recognition (NER) system from scratch. While the overall structure of the book remains the same, the entire code base, modules, and chapters has been updated to the latest Python 3.x release.

Edition: 2nd
Pages: 674
Language: English
Book Title: Text Analytics with Python
Book Subtitle: A Practitioner's Guide to Natural Language Processing
Publisher: Apress (a part of Springer)
Print ISBN: 978-1-4842-4353-4
Online ISBN: 978-1-4842-4354-1
DOI: 10.1007/978-1-4842-4354-1
Copyright: Dipanjan Sarkar

With this book you will:

  • Understanding NLP and text syntax, semantics and structure
  • Discover text cleaning and feature engineering strategies
  • Learn and implement text classification and text clustering
  • Understand and build text summarization and topic models
  • Learn about the promise of deep learning and transfer learning for NLP
  • Implement hands-on examples based on Python and several popular open source libraries in NLP and text analytics, such as the natural language toolkit (nltk), gensim, scikit-learn, spaCy, keras and tensorflow

More Repositories

1

practical-machine-learning-with-python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Jupyter Notebook
2,263
star
2

hands-on-transfer-learning-with-python

Deep learning simplified by transferring prior learning using the Python deep learning ecosystem
Jupyter Notebook
826
star
3

tensorflow2-crash-course

A quick crash course in understanding the essentials of TensorFlow 2 and the integrated Keras API
Jupyter Notebook
227
star
4

training-fine-tuning-large-language-models-workshop-dhs2024

This repository will contain all the presentations, content, hands-on notebooks for a full day Generative AI workshop on Training, Fine-tuning Large Language Models for the DataHack Summit 2024 conference.
Jupyter Notebook
221
star
5

data_science_for_all

Code and resources for my blog and articles to share Data Science and AI knowledge and learnings with everyone
Jupyter Notebook
203
star
6

nlp_essentials

Essential and Fundametal aspects of Natural Language Processing with hands-on examples and case-studies
Jupyter Notebook
171
star
7

art_of_data_visualization

The art of effective visualization of multi-dimensional data
Jupyter Notebook
153
star
8

nlp_workshop_odsc_europe20

Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and Topic Models.
Jupyter Notebook
133
star
9

learning-social-media-analytics-with-r

This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
R
118
star
10

BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark

This repository contains code files specifically IPython notebooks for the assignments in the course "Introduction to Big Data with Apache Spark" by UC Berkeley and Databricks on edX
Jupyter Notebook
114
star
11

adversarial-learning-robustness

Contains materials for workshops pertaining to adversarial robustness in deep learning.
Jupyter Notebook
86
star
12

deep_transfer_learning_nlp_dhs2019

Contains the code and deck for the presentation on Applying Deep Transfer Learning for NLP in Analytics Vidhya's DataHack Summit 2019
Jupyter Notebook
81
star
13

nlp_crash_course_plugin20

Contains relevant notebooks for the hands-on NLP workshop for the Analytics India Magazine Plugin Conference -2020 Edition
Jupyter Notebook
71
star
14

nlp_workshop_dhs18

Contains code and presentation for our full day workshop, 'Getting Started with Natural Language Processing'. This is created for the purpose of being presented in Analytics Vidhya's DataHack Summit 2018. Authors: Dipanjan Sarkar & Raghav Bali
Jupyter Notebook
65
star
15

improving-RAG-systems-dhs2024

This repository will contain the presentation and python jupyter notebooks for the DataHack Summit 2024 conference talk, Improving Real-world Retrieval Augmented Generation Systems, focusing on the key challenges and practical solutions of how to solve them
Jupyter Notebook
65
star
16

adv_nlp_workshop_odsc_europe22

Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage deep learning and deep transfer learning to solve popular tasks in NLP including Classification, Information Retrieval, Sentiment Analysis, Search Engines, Clustering, Paraphrase Mining, Summarization, Language Translation, Q&A systems
Jupyter Notebook
48
star
17

convolutional_neural_networks_essentials

Contains presentation deck and notebooks showcasing fundamental concepts and hands-on examples for Convolutional Neural Networks
Jupyter Notebook
43
star
18

nlp_workshop_odsc19

Contains all tutorials and hands-on examples for the ODSC 2019 Workshop
Jupyter Notebook
37
star
19

live-manning-nlpconf20

Papers, code and slides for my session at the live@manning NLP conference, 2020 covering my talk on Deep Transfer Learning for Natural Language Processing
Jupyter Notebook
36
star
20

explainable_artificial_intelligence

Slides, code and resources for model interpretation methods in machine learning and deep learning
Jupyter Notebook
31
star
21

BerkeleyX-CS190.1x-Scalable-Machine-Learning

This repository contains code files specifically IPython notebooks for the assignments in the course "Scalable Machine Learning" by UC Berkeley and Databricks on edX
30
star
22

feature_engineering_session_dhs18

Contains code and presentation for my interactive hack session, 'Effective Feature Engineering: A Structured Approach to Building Better Machine Learning Models' where we look at two interesting case studies on how to effectively leverage feature engineering and use a structured approach to build good machine learning models. This is created for the purpose of being presented in Analytics Vidhya's DataHack Summit 2018
Jupyter Notebook
29
star
23

transformers_nlp_essentials

Contains slides and hands-on tutorials for understanding and implementing Transformers in Natural Language Processing. Uses the HuggingFace Transformers framework in the hands-on tutorials.
Jupyter Notebook
26
star
24

practical_nlp_workshop_gids20

Contains relevant notebooks for the hands-on NLP workshop for the GIDS AIML Conference -2020 Edition
Jupyter Notebook
23
star
25

low_code_machine_learning_pycaret_workshop_2022

This workshop was done as a part of the 1729 conference organized by Fractal Analytics and Analytics Vidhya. Key content covered was hands-on notebooks leveraging PyCaret to compare, build, tune, evaluate and interpret machine learning models
Jupyter Notebook
21
star
26

nlp_workshop_iisc19

Hands-on examples showcasing popular NLP applications
Jupyter Notebook
19
star
27

adversarial_learning_tfug2020

Contains the slides and hands-on tutorials showcasing adversarial learning on convolutional neural networks to build robust vision models
Jupyter Notebook
17
star
28

transfer-learning-in-action

This repository contains detailed tutorials and notebooks as an accompaniment to our new book 'Transfer Learning in Action' which will be a live work-in-progress with additions over time.
Jupyter Notebook
16
star
29

stanford-statistical-learning

Slides, material and solutions of the popular Statistical Learning course from Stanford's own Hastie & Tibshirani. Join me on my journey to finally try and complete this course after leaving it mid-way atleast 3-4 times due to other commitments!
Jupyter Notebook
15
star
30

MyShinyApps

Shiny is a web application framework for R. This repo contains all the web apps developed by me using R and Shiny.
R
12
star
31

ml_model_deployment_example

A simple example to showcase machine learning model deployment with an API
Jupyter Notebook
10
star
32

Digital-image-steganography

This project successfully implements an encoder-decoder system where we can hide a secret image inside another image and retrieve it secretly later using the decoder only.
MATLAB
9
star
33

flask-api-tutorials

This repository contains the RESTful APIs developed showing the capabilites of Flask and how to modularize the same code using some advanced features of Flask
Python
5
star
34

adv_nlp_workshop_odsc_europe23

Extensive tutorials for the Advanced NLP Session in Open Data Science Conference Europe 2023. We will leverage deep transfer learning, notably transformers to solve popular tasks in NLP including Classification, Information Retrieval, Sentiment Analysis, Search Engines, Clustering, Paraphrase Mining, Summarization, Language Translation, Q&A systems
Jupyter Notebook
5
star
35

student-information-system

This project has an entire template for managing students and related information pertaining to them in any University.
JavaScript
4
star
36

deeplearning.ai-generative-ai-courses

This repository will contain all the exercises, tutorials and python jupyter notebooks for all the DeepLearning.AI courses on Generative AI, ChatGPT, LangChain, LLMs and more
Jupyter Notebook
4
star
37

online-movie-booking

This project consist of a template which can be used for handling operations related to typical online movie booking websites.
JavaScript
3
star
38

pyslate

This is a text translation library implemented in Python, it uses the Google Translate API for language detection and translation.
Python
3
star
39

text-analytics-python-improvements

This is a temporary repository for working on improvements for my book 'Text Analytics with Python'
3
star
40

Temperature-Aware-Linux

A temperature-aware application for creating a temperature-aware variant of the linux OS
Perl
3
star
41

duke-cloud-computing-for-data-coursera

Will contain all the necessary code examples for the Duke Cloud Computing Specialization on Coursera
Jupyter Notebook
3
star
42

adv_nlp_workshop_odsc_apac2323

Extensive tutorials for the Advanced NLP Session in Open Data Science Conference APAC 2023. We will leverage deep transfer learning, notably transformers and LLMs like ChatGPT to solve popular tasks in NLP including Classification, Information Retrieval, Sentiment Analysis, Search Engines, Summarization, Language Translation, Q&A systems
Jupyter Notebook
3
star
43

dipanjanS.github.io

My website
CSS
2
star
44

tag-me

The main objective was to design machine learning algorithms for tagging/classifying a collection of images.
2
star
45

TweetSense

Sentiment analysis of live Twitter Stream
Python
2
star
46

railway-reservation-system

This was one of my projects when i just started with Java so it obviously has a lot of bugs and scope for improvement. However it implements the working of a typical railway reservation system
Java
2
star
47

practicalML-course-project

This detailed analysis has been performed to fulfill the requirements of the course project for the course Practical Machine Learning offered by the Johns Hopkins University on Coursera
2
star
48

datasciencecoursera

This repo is specially created for all the work done my me as a part of Coursera's Data Science Specialization.
R
2
star
49

gf

1
star
50

simple-chat-application

A simple application implemented in C, where a client can chat with the server and send messages to each other.
C
1
star
51

dipanjanS

Profile Repo
1
star
52

deep-learning-cloud-gpu-setup-guide

Learn how to start using deep learning frameworks leveraging python, jupyter notebooks in the cloud using GPUs with a ready-reference guide
1
star
53

dataprod-project-pitch

Slidify presentation for Coursera's Developing Data Products course project
JavaScript
1
star
54

play2048usingR

Now you can make the computer play the very popular game 2048 for you by using R!
1
star
55

noaa-stormdatabase-analysis

Here, we will explore and analyse the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database.
1
star