Data Mining
Data mining, also known as Knowledge Discovery in Databases, refers to the nontrivial extraction of implicit, previously unknown, and potentially useful information from data stored in databases.
Data Mining is only a step within the overall KDD process. There are two major Data Mining goals defined by the application's goal: verification of discovery. Verification verifies the user's hypothesis about data, while discovery automatically finds interesting patterns.
There are four major data mining tasks: clustering, classification, regression, and association (summarization). Clustering is identifying similar groups from unstructured data. Classification is learning rules that can be applied to new data. Regression is finding functions with minimal error to model data. And the association looks for relationships between variables. Then, the specific data mining algorithm needs to be selected. Different algorithms like linear regression, logistic regression, decision trees, and Naive Bayes can be selected depending on the goal. Then patterns of interest in one or more symbolic forms are searched. Finally, models are evaluated either using predictive accuracy or understandability
KDD Process Steps
Knowledge discovery in the database process includes the following steps, such as:
- Goal identification: Develop and understand the application domain and the relevant prior knowledge and identify the KDD process's goal from the customer perspective.
- Creating a target data set: Selecting the data set or focusing on a set of variables or data samples on which the discovery was made.
- Data cleaning and preprocessing:Basic operations include removing noise if appropriate, collecting the necessary information to model or account for noise, deciding on strategies for handling missing data fields, and accounting for time sequence information and known changes.
- Data reduction and projection: Finding useful features to represent the data depending on the purpose of the task. The effective number of variables under consideration may be reduced through dimensionality reduction methods or conversion, or invariant representations for the data can be found.
- Matching process objectives: KDD with step 1 a method of mining particular. For example, summarization, classification, regression, clustering, and others.
- Modeling and exploratory analysis and hypothesis selection: Choosing the algorithms or data mining and selecting the method or methods to search for data patterns. This process includes deciding which model and parameters may be appropriate (e.g., definite data models are different models on the real vector) and the matching of data mining methods, particularly with the general approach of the KDD process (for example, the end-user might be more interested in understanding the model in its predictive capabilities).
- Data Mining: The search for patterns of interest in a particular representational form or a set of these representations, including classification rules or trees, regression, and clustering. The user can significantly aid the data mining method to carry out the preceding steps properly.
- Presentation and evaluation: Interpreting mined patterns, possibly returning to some of the steps between steps 1 and 7 for additional iterations. This step may also involve the visualization of the extracted patterns and models or visualization of the data given the models drawn.
- Taking action on the discovered knowledge: Using the knowledge directly, incorporating the knowledge in another system for further action, or simply documenting and reporting to stakeholders. This process also includes checking and resolving potential conflicts with previously believed knowledge (or extracted).
Difference between KDD and Data Mining
Although the two terms KDD and Data Mining are heavily used interchangeably, they refer to two related yet slightly different concepts.
KDD is the overall process of extracting knowledge from data, while Data Mining is a step inside the KDD process, which deals with identifying patterns in data.
And Data Mining is only the application of a specific algorithm based on the overall goal of the KDD process.
KDD is an iterative process where evaluation measures can be enhanced, mining can be refined, and new data can be integrated and transformed to get different and more appropriate results.
No comments:
Post a Comment