[ad_1]
Data mining classification is a process of grouping items based on key characteristics. Techniques include nearest neighbor classification, decision tree learning, and support vector machines. Other methods include clustering, regression, and rule learning. Algorithms like Bayes’ naive classification and neural networks are used for probability and mimicking human brain, respectively. Support vector machines use a scatterplot to categorize information.
Data mining classification is a stage in the data mining process. It is used to group items based on certain key characteristics. There are several techniques used for data mining classification, including nearest neighbor classification, decision tree learning, and support vector machines.
Data mining is a method used by researchers to extract patterns from data. Typically a representative sample is chosen from the data pool and then manipulated and analyzed to find patterns. In addition to data mining classification, researchers can also use clustering, regression, and rule learning to analyze data.
There are several algorithms that can be used in data mining classification. Nearest neighbor classification is one of the simplest data mining classification algorithms. It is based on a training set. A training set is a collection of data used to train the computer to pay attention to certain variables. In nearest neighbor classification, the computer simply classifies all data as part of the group that contains the data closest in value to the input.
Decision tree learning uses a branching model to classify data. The computer basically asks a series of questions about the data. If the answer to the first question is true, question 2a is asked. If the answer is false, ask question 2b. When checked out, this method forms a tree of branched paths.
Bayes’ naive classification is based on probability. He asks a series of questions about each data item and then uses the answers to determine the probability that the data belongs to a particular classification. This is different from learning the decision tree because the answer to the first question doesn’t influence which question will be asked next.
More complicated methods of data mining classification include neural networks and support vector machines. These methods are computer-based models that would be difficult to do by hand. Neural networks are often used in artificial intelligence programming because it mimics the human brain. It filters information through a series of nodes that find patterns and then classify the information.
Support vector machines use training samples to build a model that categorizes information, usually displayed as a scatterplot with a large gap between categories. As new information is entered into the machine, it is plotted on the graph. The data is then ranked according to the category that the information on the graph most closely approximates. This method only works when there are two options to choose from.