• Stars
    star
    242
  • Rank 167,048 (Top 4 %)
  • Language
  • Created about 4 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A guide to building awesome machine learning projects.

Guide to Awesome Machine Learning Projects

alt txt

As a content creator and educator, I am constantly looking for awesome projects that I find useful and share them with the broader community. I am not the only one doing this. There are lots of people that share fun projects that they find interesting and useful. This is how projects go viral and gain lots of visibility. From my observation, there are a few components that make certain machine learning projects stand out from the rest. If your goal is to build a portfolio or create impactful and unique projects for the community, here are a few areas you can focus on to make your projects compelling and stand out from the rest.

Purpose

Building projects is sometimes the easy part. Creating a strong messaging around it is perhaps the most difficult part due to the large number of projects fighting for attention these days. One of the first things you should be doing before starting a machine learning project is to identify what makes your project impactful, unique, and what really is the main purpose of it. This could be a well-written impact statement or just sharing your reasons on why the project matters. Is the project just about educating others about a particular machine learning method/feature? Or is it more specific like solving a challenging and unique problem using a new technique? Tell your audience about the purpose of your project. Build that connection and motivate your project. Build a good messaging around it. You are not selling, you are informing and educating.

Usability

I like projects that are usable and quickly accessible. What does this mean? Imagine you have developed a new text classification approach and want others to better understand how useful it is. Just having an example notebook with 100s of lines of code is probably not going to make it the most usable and accessible project. What you would want to do is not only to provide the notebook but also to provide a complete library that others can easily install on their computers that enables them to explore your project. Python allows you to do this easily but other languages work just as well. Make sure to provide instructions on how to use the project/library (we will talk more about this in an upcoming section). In fact, I implore you to be more ambitious and create an online demo accompanying the project. Later on, I will talk about visibility and how demos can help. These tips all go hand in hand. The easier you make it for someone to use your project, the quicker they find how impactful and useful it is. Quick adoption helps to project a huge return on your investment.

Accessibility

Not only should you aim to make your project usable to stand out, but it also has to be highly accessible to be successful. What do I mean by that? One good example is to create an online demo as I said earlier as this makes it easy for others to access your project. But there are other important things you should be thinking about. Very often we tend to ignore the fact that not all our users are going to have the same means or ways to access your project. Think about other ways to make your project more accessible. Things like translations, metrics, visualizations, and audio recordings are also important to consider. For instance, some users may not be so comfortable reading what your project is about (maybe because of some disability or lack of technical expertise), so in that case, maybe you can record an audio/video clip that briefly and clearly explains your project and what it is about. The more you increase the accessibility of your project, the more potential it has to become highly impactful and gain the visibility you want.

Uniqueness

Nowadays, it is simply not enough to build a useful project that users find interesting to play with for a few minutes. If you want your project to stick, you should initially be focusing on a unique problem that your project aims to solve. This should have already been clear if you addressed the “Purpose” section of this guide. There are so many similar projects that it makes it really hard for your project to stand out. For instance, I cannot tell you how many image classifiers I have come across—potentially thousands of them. I am always looking for a surprise factor in these projects. If I came across an image classifier that provides me interpretability functionalities, that’s something I will be willing to explore a bit further—there are not so many of these online. Ideally, you want to set your project objectives before starting it and ensure to conduct extensive research to identify key and unique ways it is contributing to the community.

Presentation

One of the main problems with machine learning projects these days is that the developers forget to address the presentation aspect of it. I think it’s easily a missed opportunity. You should always be thinking about how you present your project to an audience. In addition to all the tips I have discussed so far, you need to think about how you want to package and present your projects. For instance, if you are publishing your project on GitHub, which you should definitely do, you can improve its presentation by including a very clean, clear, concise README file. I am not exaggerating when I say that the majority of machine learning projects that I come across don’t care or put effort towards presentation, and in fact don’t even include a README for that matter. That’s bad! It doesn’t say good things about the seriousness and professionalism you are trying to project with your projects. I may be going on a limb here, but most of the successful machine learning projects I have across have excellent and well-written README files, including other ways to improve the presentation of the project.

Maintenance/Contributions

The truth of the matter is that the majority of machine learning projects eventually die. Your goal is to make your projects interesting enough that others start to care about its sustainability. Only the best projects survive and you just never know where yours will take you. With so many open-source enthusiasts out there, there is a good opportunity to attract collaborators to help keep building and maintaining your project. Make sure you provide more information about maintenance cycles and future improvements. Try to provide guidance on how others can contribute to your projects, even if it is to just improve a certain function or something like that. Try not to ask for minor improvements like editing your README file. This doesn't encourage any good practice in the community. Ideally, you want to provide more guidance about major improvements needed like optimizing the speed at which data is read, etc.

When I think about maintenance I also think you should not only provide regular updates about your projects but also help the community to respond to issues and questions. Typically, when I find projects that have been modified 5 months ago and include several unanswered open issues, this tells me a lot about the maintenance and projected sustainability of the project. I will think hard about sharing a project like this just because it’s probably outdated already. If you think it makes sense, create a free slack or discord group where people can reach out and ask questions directly.

Documentation

Documentation is a huge part of the messaging and packaging of your project. What’s the point of publishing a project if there are no instructions on how to use it. Given all the sections I discussed before, at this point you start to notice a pattern. Messaging is huge! It’s not easy. You have to be clear and concise in your messaging. People that are looking for interesting projects are spending less than 30 seconds on your project and if they don’t see neat documentation or something else that hooks them, it’s sad news for you and your project. Even if you consider your projects to be a small one, you should think about how you expect others to use it and better provide guidance around it. For example, if you have built a complete Python library, try to provide clear and easy examples on how to use the library, including how to install it, run it, and providing examples of the expected inputs/outputs. If you are building an API, you need to clearly explain all the functionalities and behaviors. In some cases, you may even need to provide a documentation website but for most small projects this is probably not necessary. Regardless, you should definitely consider full examples that guide the user from start to finish. In my opinion, notebooks are great but they don’t serve as good ways to provide documentation about your machine learning projects.

Searchability/Visibility

Not only do we want our machine learning projects to stand out, but we also want these projects to be easily accessible and searchable. The great thing about the internet is that there are many easy ways to actually build more visibility for your project. Besides making your projects more presentable, think about ways you can improve the searchability/visibility of your projects. You can try to share a GitHub repo with your friends on a group chat or Slack group. Just make sure you have a great README and you already thought about and addressed all of the components I wrote about here before sharing your project. Write a nice blog post about your project and publish it. Share on websites like Reddit, Made with ML, Hacker News, and Twitter. The more places you share your projects, the more visibility you are giving it, and the more searchable/visible it becomes.

That’s it! Hope you find this guide helpful. I am going to regularly maintain it as I come across more ideas on how to improve your machine learning projects. I also welcome any feedback (just open an issue). Feel free to fork this repo and use this guide as a checklist for your next big machine learning project. Wish you all the best!

If you wish to hear more about my advice and tips, including different ML-related guides and topics, connect with me on Twitter or follow my blog.


How you can contribute to this guide?

  • Add more components that in your experience help projects stand out
  • It will be great to add more examples to each section

More Repositories

1

Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
MDX
47,520
star
2

ML-YouTube-Courses

📺 Discover the latest machine learning / AI courses on YouTube.
14,690
star
3

ml-visuals

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
13,103
star
4

ML-Papers-of-the-Week

🔥Highlighting the top ML papers every week.
9,856
star
5

ML-Papers-Explained

Explanation to key concepts in ML
7,016
star
6

ML-Course-Notes

🎓 Sharing machine learning course / lecture notes.
5,980
star
7

Mathematics-for-ML

🧮 A collection of resources to learn mathematics for machine learning
4,399
star
8

ML-Notebooks

🔥 Machine Learning Notebooks
Jupyter Notebook
3,270
star
9

Transformers-Recipe

🧠 A study guide to learn about Transformers
1,521
star
10

nlp_paper_summaries

✍️ A carefully curated list of NLP paper summaries
1,476
star
11

GNNs-Recipe

🟠 A study guide to learn about Graph Neural Networks (GNNs)
1,095
star
12

MLOPs-Primer

A collection of resources to learn about MLOPs.
925
star
13

AI-Product-Index

A curated index to track AI-powered products.
755
star
14

d2l-study-group

🧠 Material for the Deep Learning Study Group
388
star
15

nlp_fundamentals

📘 Contains a series of hands-on notebooks for learning the fundamentals of NLP
Jupyter Notebook
364
star
16

nlp_newsletter

📰Natural language processing (NLP) newsletter
300
star
17

dair-ai.github.io

Home of DAIR.AI
HTML
208
star
18

emotion_dataset

😄 Dataset for Emotion Recognition Research
197
star
19

awesome-research-proposals-guide

A guide to improve your research proposals.
185
star
20

ml-nlp-paper-discussions

📄 A repo containing notes and discussions for our weekly NLP/ML paper discussions.
149
star
21

keep-learning-ml

A club to keep learning about ML
89
star
22

notebooks

🔬 Sharing your data science notebooks with the community has never been this easy.
Jupyter Notebook
37
star
23

covid_19_search_application

Text Similarity Search Application using Modern NLP and Elasticsearch
Jupyter Notebook
29
star
24

odsc_2020_nlp

Repository for ODSC talk related to Deep Learning NLP
23
star
25

research_emotion_analysis

😄 Multilingual emotion analysis research
Python
19
star
26

maven-pe-for-llms-4

Prompt Engineering for Large Language Models - Notebooks, Demos, Exercises, and Projects
Jupyter Notebook
17
star
27

data_science_writing_primer

Writing Primer for Data Scientists
Jupyter Notebook
17
star
28

arxiv_analysis

A project to help explore research papers and fuel new discovery
Jupyter Notebook
16
star
29

pe-for-llms

Jupyter Notebook
14
star
30

llm-evaluator

Example for Logging LLM Evaluator Prompt Responses
Jupyter Notebook
14
star
31

paper_implementations

A project for implementing ML and NLP papers
13
star
32

maven-pe-for-llms

Jupyter Notebook
12
star
33

nlp-roadmap

A comprehensive roadmap to get informed of the NLP landscape.
9
star
34

ml-discussions

Discussing ML research, engineering, papers, resources, learning paths, best practices, and much more.
8
star
35

maven-pe-for-llms-6

Materials for the Prompt Engineering for LLMs (Cohort 6)
Jupyter Notebook
8
star
36

maven-pe-for-llms-8

Materials for the Prompt Engineering for LLMs (Cohort 8)
Jupyter Notebook
8
star
37

maven-pe-for-llms-7

Code, Demos, and Exercises for Prompt Engineering for LLMs Course
Jupyter Notebook
6
star
38

maven-pe-for-llms-12

Course material for Prompt Engineering for LLMs
Jupyter Notebook
6
star
39

maven-pe-for-llms-9

Materials for Prompt Engineering for LLMs (Cohort 9)
Jupyter Notebook
6
star
40

paper_presentations

All paper presentation material will be added here
5
star
41

nlp_research_highlights

Contains all issues of the NLP Research Highlights series
5
star
42

deep_affective_layer

😄 Building a deep learning based affective computing platform
3
star
43

maven-pe-for-llms-2

Jupyter Notebook
3
star
44

datasets

AI Datasets
3
star
45

maven-pe-for-llms-11

Materials for the Prompt Engineering for LLMs Course (Cohort 11)
Jupyter Notebook
3
star
46

.github

2
star
47

meetups

Material for dair.ai meetups
2
star
48

tensorflow_notebooks

A repository containing Deep Learning and Machine Learning related TensorFlow notebooks.
1
star
49

maven-pe-for-llms-10

Materials for Cohort 10
Jupyter Notebook
1
star