Tuesday, April 23, 2013

Some artificial datasets for machine learning

Interested by a question on StackOverflow, I made a set of Matlab functions that generate a variety of artificial datasets that can be used to test Machine Learning methods. Which classifier does well on the Two Spirals problem? What causes some classifiers to fail on this or that problem? It's often useful to have some datasets available that can be challenging to an algorithm although the pattern is quite clear to a human, and artificial datasets provide just that.

Below is some example output of the six functions with default parameters. They can all be customized with regards to number of instances, noise, scale, etc. You can download the functions (including a demo) from my website.