• Stars
    star
    2
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 4 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is a project centered towards applying machine learning algorithms in the air-quality sensor data. A device known as **Purple Air** used to measure the Air quality especially particulate matter in the atmosphere was used for data collection since September 2019 to January 2020. The data was collected and but there was none to make the sense out of the data and thus gave me an idea to delve deep into it and find out the insights.

More Repositories

1

WhatsappChatAnalysis

Visualize a whatsapp group using Dash-Plotly.
Jupyter Notebook
4
star
2

Jumia-WebScrapper

This is a python scraper using bs4 to scrape the Online Jumia Shop. It is one of the projects I started with after completing my python fundamental course from Chaptr Global and Data Quest.
Jupyter Notebook
2
star
3

DataminingPDF

This project helps understand how to mine PDFs and get organized data for analysis.
Jupyter Notebook
2
star
4

GrokkingTheCoddingInterview

My little step journey to mastering DS.
Python
1
star
5

Rohianon

1
star
6

BikeSharing

This is a collaborative machine learning project I worked with Shraddah.
1
star
7

StreamLit_app

Covid19 Streamlit Application.
Python
1
star
8

LoanPrediction

This is an application used to predict the credit approval of a customer. It has been deployed using streamlit web applications.
HTML
1
star
9

FeatureScalingTechniques

Jupyter Notebook
1
star
10

ImageClassificationProjects

This is a Corpora of all the image classification done.
Jupyter Notebook
1
star
11

Tweeter-Mining-101.

A beginner's guide to text mining using R. Employing the communities vastness of NLP packages such as tm, and word cloud.
Jupyter Notebook
1
star
12

Prohack-Challenge

“Beeep…Beeeep….Beeeep… Hooomans*, are you there?...” This very strange transmission is coming from your narrowband radio signal receiver, pointed towards one of the farthest away galaxies. It’s early morning, you are sitting in your radio observatory high in the mountains. For the last 10 years you’ve been a Chief Data Scientist in one of the best astrophysics research teams in the world. You are enjoying a quiet time with a cup of coffee and reviewing the data reports from last night, when this strange sound arrived. You almost spill your coffee in surprise. “Am I dreaming?” is your first thought as you move closer towards the speaker and listen… “Beep…Beeeep….Beeeep… To all Hooomans who can hear us – we need your help” You lean closer and grab a notebook and a pencil – you don’t really trust computers when it comes to such important tasks as taking notes from a radio transmission. You start recording everything that the strange voice from light years away is saying. “… We need serious Data Science help and we know you Hooomans are the best at it…. We are an intergalactic species which have almost achieved singularity and the highest possible levels of development. We travel fast through space and explore other galaxies” “The only essence that we consume is energy, measured in DSML units…Our populace is widespread and we live across many different star clusters and galaxies. What we need now is to optimize our well-being across all those galaxies… We have a lot of data but our сomputers and methods are too weak – we urgently need your data science knowledge to help us” “Only two steps prevent us from achieving singularity · To understand what makes us better off. Our elders used the composite index to measure our well-being performance, but this knowledge has disappeared in the sands of time. Use our data and train your model to predict this index with the highest possible level of certainty. · To achieve the highest possible level of well-being through optimized allocation of additional energy We have discovered the star of an unusually high energy of 50000 zillion DSML. We have agreed between ourselves that · no one galaxy will consume more than 100 zillion DSML and · at least 10% of the total energy will be consumed by galaxies in need with existence expectancy index below 0,7. Think of our galaxies as your “countries” (or how you call them??) and our population as citizens. We have similar healthcare and wellbeing characteristic as you, Hooomans” “We are sending all the data to you right now. Let the data be with you, Hoomans… … …” Transmission suddenly ends. You put your notebook and pencil away and start thinking. You really want to help this species optimize their well-being. You open up Python and upload the dataset from the narrowband radio signal receiver. It will be another great day at the observatory today. ———— * probably intergalactic species meant to say “humans” here but we will never know for sure Description Data Recieved The solutions are evaluated on two criteria: predicted future Index values and allocated energy from a newly discovered star 1) Index predictions are evaluated using RMSE metric 2) Energy allocation is also evaluated using RMSE metric and has a set of known factors that need to be taken into account. Every galaxy has a certain limited potential for improvement in the index described by the following function: Potential for increase in the Index = -np.log(Index+0.01)+3 Likely index increase dependent on potential for improvement and on extra energy availability is described by the following function: Likely increase in the Index = extra energy * Potential for increase in the Index **2 / 1000 There are also several constraints: in total there are 50000 zillion DSML available for allocation and no galaxy at a point in time should be allocated more than 100 zillion DSML or less than 0 zillion DSML. Galaxies with low existence expectancy index below 0.7 should be allocated at least 10% of the total energy available in the foreseeable future 3) Leaderboard is based on a combined scaled metric: 80% prediction task RMSE + 20% optimization task RMSE * lambda where lambda is a normalizing factor 4) Leaderboard is 80% public and 20% private 5) The submission should be in the following format: Variable Index pred opt_pred Description Unique index from the test dataset in the ascending order Prediction for the index on interest Optimal energy allocation
Jupyter Notebook
1
star