OpenML: Open Machine Learning
Welcome to the OpenML GitHub page!
Contents:
Who are we?
We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans.
What is OpenML?
Want to learn about OpenML or get involved? Please do and get in touch in case of questions or comments!
- Getting started:
- Check out the OpenML Website to get a first impression of what OpenML is
- The OpenML Documentation page gives an introduction in details and features, as well as
- OpenML's different APIs and integrations so that everyone can work with their favorite tool.
- How to contribute: https://github.com/openml/OpenML/blob/master/CONTRIBUTING.md
- Citation and Honor Code: https://www.openml.org/terms
- Communication / Contact: https://github.com/openml/OpenML/wiki/Communication-Channels
OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.
As an open science platform, OpenML provides important benefits for the science community and beyond.
Benefits for Science
Many sciences have made significant breakthroughs by adopting online tools that help organizing, structuring and analyzing scientific data online. Indeed, any shared idea, question, observation or tool may be noticed by someone who has just the right expertise to spark new ideas, answer open questions, reinterpret observations or reuse data and tools in unexpected new ways. Therefore, sharing research results and collaborating online as a (possibly cross-disciplinary) team enables scientists to quickly build on and extend the results of others, fostering new discoveries.
Moreover, ever larger studies become feasible as a lot of data are already available. Questions such as “Which hyperparameter is important to tune?”, “Which is the best known workflow for analyzing this data set?” or “Which data sets are similar in structure to my own?” can be answered in minutes by reusing prior experiments, instead of spending days setting up and running new experiments.
Benefits for Scientists
Scientists can also benefit personally from using OpenML. For example, they can save time, because OpenML assists in many routine and tedious duties: finding data sets, tasks, flows and prior results, setting up experiments and organizing all experiments for further analysis. Moreover, new experiments are immediately compared to the state of the art without always having to rerun other people’s experiments.
Another benefit is that linking one’s results to those of others has a large potential for new discoveries (see, for instance, Feurer et al. 2015; Post et al. 2016; Probst et al. 2017), leading to more publications and more collaboration with other scientists all over the world.
Finally, OpenML can help scientists to reinforce their reputation by making their work (published or not) visible to a wide group of people and by showing how often one’s data, code and experiments are downloaded or reused in the experiments of others.
Benefits for Society
OpenML also provides a useful learning and working environment for students, citizen scientists and practitioners. Students and citizen scientist can easily explore the state of the art and work together with top minds by contributing their own algorithms and experiments. Teachers can challenge their students by letting them compete on OpenML tasks or by reusing OpenML data in assignments. Finally, machine learning practitioners can explore and reuse the best solutions for specific analysis problems, interact with the scientific community or efficiently try out many possible approaches.
Get involved
OpenML has grown into quite a big project. We could use many more hands to help us out
- You want to contribute?: Awesome! Check out our wiki page on how to contribute or get in touch. There may be unexpected ways for how you could help. We are open for any ideas.
- You want to support us financially?: YES! Getting funding through conventional channels is very competitive, and we are happy about every small contribution. Please send an email to [email protected]!
GitHub organization structure
OpenML's code distrubuted over different repositories to simplify development. Please see their individual readme's and issue trackers of you like to contribute. These are the most important ones:
- openml/OpenML: The OpenML web application, including the REST API.
- openml/openml-python: The Python API, to talk to OpenML from Python scripts (including scikit-learn).
- openml/openml-r: The R API, to talk to OpenML from R scripts (inclusing mlr).
- openml/java: The Java API, to talk to OpenML from Java scripts.
- openml/openml-weka: The WEKA plugin, to talk to OpenML from the WEKA toolbox.