Data mining is the process of extracting valuable and usable information from a huge set of data.
It is the method of turning the raw data into useful information. Data mining is of immense use
in almost all the fields nowadays. It allows automatically searching big chunks of data to identify
different kinds of relationship, pattern and trends from the data that can enable the companies to
formulate strategies for future. It is especially useful in case of companies dealing with large
customer base to draw inferences about the customer behavior and preferences. Data mining uses
mathematical algorithms to segment the data which allows analysis of data and evaluate the
different alternatives possible. Data mining is also known as Knowledge discovery of Data.
Process of Data Mining
The process of Data Mining occurs in following steps:
Problem Definition
Data Gathering and Preparation
Model Building and Evaluation
Knowledge Deployment
Problem Definition involves understanding the objectives and the purpose of the project to
define a path for moving forward.
Data Gathering and Preparation involves data collection and exploration of the variables
identified by defining the purpose of the project.
Model Building and Evaluation is the important step of data mining that involves selecting and
applying various modeling techniques to calibrate data to optimal values. This is done to reduce
the set of data as it is easy to work with reduced set of data. It involves choosing the right
algorithm for the task. Associative rules are created by analyzing data for getting the frequent
patterns and relationships. Other data mining parameters include Sequence or Path analysis,
Clustering and Forecasting.
Knowledge Deployment is the final step of Data mining which involves the application of results
in the real situation and the models can also be used for application to new data.
Some of the softwares available for data mining:
Weka
Rapid Miner
Orange
Knime
DataMelt
Rattle
By
Divya Bisht