Integration of Data Mining System with a Data Warehouse Issues

 

Data Mining System Architecture

Integrating Data Mining systems with Databases and Data Warehouses with these methods
  • No Coupling
  • Loose Coupling
  • Semi-Tight Coupling
  • Tight Coupling 

No Coupling

No coupling means that a DM system will not utilize any function of a DB or DW system. 

It may fetch data from a particular source (such as a file system), process data using some data mining algorithms, and then store the mining results in another file.

Drawbacks:

First, a Database/Data Warehouse system provides a great deal of flexibility and efficiency at storing, organizing, accessing, and processing data.

Without using a Database/Data Warehouse system, a Data Mining system may spend a substantial amount of time finding, collecting, cleaning, and transforming data.

Second, there are many tested, scalable algorithms and data structures implemented in Database and Data Warehouse systems.
 

Loose Coupling

Loose coupling means that a Data Mining system will use some facilities of a Database or Data warehouse system, fetching data from a data repository managed by these systems, performing data mining, and then storing the mining results either in a file or in a designated place in a Database or Data Warehouse.

Loose coupling is better than no coupling because it can fetch any portion of data stored in Databases or Data Warehouses by using query processing, indexing, and other system facilities.

Drawbacks

It's difficult for loose coupling to achieve high scalability and good performance with large data sets.

Semi-Tight Coupling - Enhanced Data Mining Performance

The semi-tight coupling means that besides linking a Data Mining system to a Database/Data Warehouse system, efficient implementations of a few essential data mining primitives (identified by the analysis of frequently encountered data mining functions) can be provided in the Database/Data Warehouse system. 

These primitives can include sorting, indexing, aggregation, histogram analysis, multi-way join, and pre-computation of some essential statistical measures, such as sum, count, max, min, standard deviation.

This design will enhance the performance of Data Mining systems.

Tight Coupling - A Uniform Information Processing Environment

Tight coupling means that a Data Mining system is smoothly integrated into the Database/Data Warehouse system. 

The data mining subsystem is treated as one functional component of the information system.

Data mining queries and functions are optimized based on mining query analysis, data structures, indexing schemes, and query processing methods of a Database or Data Warehouse system.

No comments:

Post a Comment