• Stars
    star
    102
  • Rank 334,306 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Automated tool for data story telling

data-storyteller

forthebadge pythonbadge

πŸ“± Data Storyteller πŸ“‰

ONE STOP SOLUTION FOR ALL YOUR DATA NEEDS

Introduction

As per Gartner [2], the analytics and business intelligence platform market has transitioned from the visual data discovery era to the augmented era. Data and analytics leaders/administrators should begin piloting capabilities and competencies that enable the β€œaugmented consumer”.

With the technology advancements, the organisation today has the pre-eminence of taking data driven decisions and strategize their planning, forecasts based on the same. A profusion of business users do not have time to analyze the data and then secure noteworthy insights. And there are gaps betwixt how the tool produces an output and how the business user can exploit it to interpret it. In accompaniment, it needs a admirable domain knowledge to build business insights from data. Not every user is a business expert. Given a snapshot of data, we would like to fabricate a system which can verbalise a story from the data. The story includes the automation in the sense of being driven by the data, context and personal preferences. In this case, it solves both the problems of the tool usage as well as guiding the user with the data driven intelligence to make business decision. The whole corollary is driven by outcome and effectiveness.

Tool Description

Data Storyteller is an AI based tool that can take a data set, identify patterns in the data, can interpret the result, and can then produce an output story that is understandable to a business user based on the context. It is able to pro-actively analyse data on behalf of users and generate smart feeds using natural language generation techniques which can then be consumed easily by business users with very less efforts. The application has been built keeping in mind a rather elementary user and is hence, easily usable and understandable. This also uses a multipage implementation of Streamlit Library using Class based pages.

Features

Given data/analytics output, the tool can:-

  • turn the data into interactive data stories based on the given data
  • generate deep insights, infer pattern and help in business decisions.
  • provide personalization profiles; these could be represented as meta data describing what would be of interest to a given user.
  • generate reports understandable to a business user with interactive and intuitive interface.

πŸ“ Module-Wise Description

The application also uses Streamlit for a multiclass page implementation which can be viewed in the multipage.py file. The UI of the application can be seen here. The application is divided into multiple modules, each of which have been described below.

UI of the application

πŸ“Œ Data Upload

This module deals with the data upload. It can take csv and excel files. As soon as the data is uploaded, it creates a copy of the data to ensure that we don't have to read the data multiple times. It also saves the columns and their data types along with displaying them for the user. This is used to upload and save the data and it's column types which will be further needed at a later stage.

πŸ“Œ Change Metadata

Once the column types are saved in the metadata, we need to give the user the option to change the type. This is to ensure that the automatic column tagging can be overridden if the user wishes. For example a binary column with 0 and 1s can be tagged as numerical and the user might have to correct it. The three data types available are:

  • Numerical
  • Categorical
  • Object

The correction happens immediately and is saved at that moment.

πŸ“Œ Machine Learning

This section automates the process of machine learning by giving the user the option to select X and y variables and letting us do everything else. The user can specify which columns they need for machine learning and then select the type of process - regression and classficiation. The application selects multiple models and saves the best one as a binary .sav file to be used in the future for inferencing. The accuracy or R2 score is shown right then and there with the model running in the background.

πŸ“Œ Data Visualization

πŸ“Œ Y-Parameter Optimization

Technology Stack

  1. Python
  2. Streamlit
  3. Pandas
  4. Scikit-Learn
  5. Seaborn

How to Run

  • Clone the repository
  • Setup Virtual environment
$ python3 -m venv env
  • Activate the virtual environment
$ source env/bin/activate
  • Install dependencies using
$ pip install -r requirements.txt
  • Run Streamlit
$ streamlit run app.py

Other Content

Video Walkthrough

Presentation

🀝 How to Contribute? [3]

  • Take a look at the Existing Issues or create your own Issues!
  • Wait for the Issue to be assigned to you after which you can start working on it.
  • Fork the Repo and create a Branch for any Issue that you are working upon.
  • Create a Pull Request which will be promptly reviewed and suggestions would be added to improve it.
  • Add Screenshots to help us know what this Script is all about.

πŸ‘¨β€πŸ’» Contributors ✨


Prakhar Rathi


Manav Prabhakar


Salil Sxena

References

[1] SAP Hackathon: https://sap-code.hackerearth.com/challenges/hackathon/sap-code/custom-tab/data-4-storytelling/#Data%204%20Storytelling (used for the README.md introduction)

[2] Gartner: https://www.gartner.com/en/documents/3982132

[3] Soumyajit Behera: https://github.com/soumyajit4419/MedHub_360

Contact

For any feedback or queries, please reach out to [email protected].

Note: The project is only for education purposes, no plagiarism is intended.

More Repositories

1

Text-Analytics-Tool

This is an application that automates the process of text analysis with a user-friendly GUI. πŸ“± It has been implemented using Python and deployed with the Streamlit package.
Jupyter Notebook
34
star
2

Sports-Management-System

This is a database management project for managing sports teams, players and managers for the Shiv Nadar University Sports League
PHP
26
star
3

multipage-dash-app

Multipage Application built using Dash
Python
11
star
4

reddit-flair-predictor

Created a Reddit flair detector
Python
10
star
5

academic-portfolio

My academic portfolio website.
Jupyter Notebook
7
star
6

Stock-Price-Predictor

Using Recurrent Neural Networks LSTMs to predict Alphabet Inc. stock price trend using a data of three years
Jupyter Notebook
7
star
7

artificial-intelligence-for-trading

Everything that I have been doing in the AI for trading nanodegree program by Udacity
HTML
6
star
8

UNIX-Shell-in-C

Building a UNIX Shell from Scratch in C
C
6
star
9

natural-language-processing-coursera

My work for the Natural Language Processing Specialisation offered by deeplearning.ai
Jupyter Notebook
6
star
10

wikipedia-translator

App that allows translation from a wikipedia page in English to any other page
Python
5
star
11

Football_data_analysis

Basic analysis of soccer data obtained from Kaggle using Python.
Python
5
star
12

Socket-Programming

Socket Programming in Java. Information is shared between Client and Server using TCP Connection-Oriented Service
Java
4
star
13

streamlit-machine-learning-app

An app made using streamlit to compare machine learning algorithms
Python
4
star
14

prakharrathi25-portfolio

You can find my portfolio website here:
HTML
4
star
15

intellimart

Intellimart: A Smart shopping system during the pandemic
CSS
4
star
16

Artificial_Neural_Networks

Python
3
star
17

peer-delivery-system

PHP
3
star
18

Multithreaded-Sorting

a multithreaded sorting program
C++
3
star
19

mindfire-quest

Our work for the Mindfire Quest with Swiss Re
Jupyter Notebook
3
star
20

network_graph

professors citation network and personal profiles
JavaScript
3
star
21

prakharrathi25

3
star
22

machine-learning-algos-from-scratch

Building popular machine learning algorithms from scratch
Jupyter Notebook
3
star
23

Deploying-an-ML-model-using-Flask

First attempt at creating a flask app
HTML
3
star
24

mlops-specialization-coursera

Machine Learning for Production Specialization
3
star
25

pytorch-implementations

Repository Containing the different deep learning exercises that I have completed using PyTorch
Jupyter Notebook
3
star
26

Motion_Detector-

Building an Image Recognition and Motion Detector System using OpenCV in Python from scratch
Python
3
star
27

Spark-AR-Hackathon

Repository for the Spark AR hackathon
3
star
28

MNIST-Image-Recognition

Using MNIST dataset to identify digit images.
Jupyter Notebook
3
star
29

Modularized-RNN-boiler-plate

A boilerplate code of a recurrent neural network which is also modularized and documented. The only thing that needs to be changed is the form of feature extraction and the number of timesteps.
Python
3
star
30

Data-Science-Winter-Bootcamp

Content for the data science winter bootcamp
Jupyter Notebook
2
star
31

CSD316-Introduction-to-ML-Coursework

Assignments for CSD 316 Introduction to Machine Learning
Jupyter Notebook
2
star
32

Econometrics-End-Term-Project

Solve two questions using hypothesis testing and given data
Stata
2
star
33

studentsllt-change

HTML
1
star
34

address-norm

A repository to normalize addresses using fuzzy logic
Python
1
star
35

the-tool-bmwi

Jupyter Notebook
1
star