Neural Network Mushroom Classification


This network allows to classify the mushrooms into edible, conditionally-edible, and poisonous species. NN determines if certain species of mushroom is poisonous based on the mushroom’s physical features using an artificial neural network. This NN illustrated how to use genetic algorithm to do features selection on mushroom dataset, and compares the classification result with a standard neural network as well as a standard neural network with inputs compressed by an auto-associative network. Genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. GA is used with a hall of fame selection method. The genetic algorithm determines which subset of features has the best fitness as the input for mushroom type classification. GAs were first proposed by John Holland as a means to find good solutions to problems that were otherwise computationally intractable. Holland's Schema Theorem, and the related building block hypothesis, provided a theoretical and conceptual basis for the design of efficient GAs. It also proved straightforward to implement GAs due to their highly modular nature. As a consequence, the field grew quickly and the technique was successfully applied to a wide range of practical problems in science, engineering and industry. GA theory is an active and growing area, with a range of approaches being used to describe and explain phenomena not anticipated by earlier theory.


Data Set

This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like "leaflets three, let it be'' for Poisonous Oak and Ivy.

The mushroom dataset provides 8124 instances of mushrooms corresponding to 23 mushroom species, with regard to its 22 physical traits. It can be identified that These physical traits described the shape, size and color of mushroom’s different physical parts like stalk, gill and cap. Some features like odor and habitat are also described. The values of these features are discrete labels. In the dataset there are 8124 entries, each corresponding a mushroom instance. Each instance is described with respects to 22 features. The value of those attributes in each instance is represented by the first letter of the label, for example, bell = b. The dataset was chosen because it has a respectable number of instances and attributes. The neural network classified will be trained on the training set, and tested on the testing sets. The mushroom data instances in the testing set will not be feed into the neural network classifier before the testing phase begin.


The mushroom dataset is retrieved from UCI machine learning dataset repository (National Science Foundation). It contains descriptions of mushroom samples related to 23 different mushroom species of gilled mushrooms from the Agaricus and Lepiota Family. In term of mushroom edibility, the mushrooms are described as edible and poisonous.

The data set has 22 features, and NN observes how some groups of feature are correlated. Stalk color above and below the ring may have some correlation as they are both properties relating to the stalk color of a certain type of mushroom. By using genetic algorithm one of those two properties could be pruned if our model trained and tested using only other properties yield a better accuracy and performance than having all two properties as input features. The nature that the dataset features are both diverse and include features that potentially have inner correlation provides the ground for a genetic algorithm to do feature selection. This dataset was originally donated to the UCI Machine Learning repository.


How it works

Please fill in the information about the mushroom and submit the form
Cap Shape
Cap Color
Cap Surface
Gill Attachment
Gill Spacing
Gill Size
Gill Color
Stalk Shape
Stalk Root
Stalk Surface Above Ring
Stalk Surface Below Ring
Stalk Color Above Ring
Stalk Color Below Ring
Veil Type
Veil Color
Ring Number
Ring Type
Spore Printcolor