DMQL

     DMQL(Data Mining Query Language)

            Data Mining is a process is in which user data are extracted and processed from a heap of unprocessed raw data. By aggregating these datasets into a summarized format, many problems arising in finance, marketing, and many other fields can be solved. In the modern world with enormous data, Data Mining is one of the growing fields of technology that acts as an application in many industries we depend on in our life. Many developments and researches have been held in this field and many systems are also been disclosed. Since there are numerous processes and functions to be done in Data Mining, a very well developed user interface is needed. Even though there are many well-developed user interfaces for the relational systems, Han, Fu, Wang, et al. proposed the Data Mining Query Language(DMQL) to further build more developmental systems and innovate many kinds of research in this field. Though we can’t consider DMQL as a standard language. It is a derived language that stands as a general query language to perform data mining techniques. DMQL is executed in DB miner systems for collecting data from several layers of databases.


DMQL is designed based on Structured Query Language(SQL) which in turn is a relational query language.

  • Data Mining request: For the given data mining task, the corresponding datasets must be defined in the form of a data mining request. Let us see this with an example. As the user can request for any specific part of a dataset in the database, the data miner can use the database query to retrieve the suitable datasets before the process of data mining. If the aggregation of that specific data is not possible for the data miner, he then collects the supersets from which one can derive the required data. This proves the need for query language in data mining which acts as its subtask. Since the extraction of relevant data from huge datasets cannot be performed by manual work, many development methods are present in the data mining technique. But by doing this way, sometimes the task of collecting relevant data requested by the user may be failed. By using DMQL, a command to retrieve specific datasets or data from the database, which gives a desired result to the user and it gives comprehending experience in fulfilling the expectations of users.
  • Background Knowledge: Prior knowledge of datasets and their relationships in a database help in mining the data. By knowing the relationships or any useful information can ease the process of extraction and aggregation. For an instance, the conceptual hierarchy of the number of datasets can increase the efficiency of the process and accuracy by collecting the desired data easily. By knowing the hierarchy, the data can be generalized with ease.
  • Generalization: When the data in datasets of a data warehouse is not generalized, often the data would be in form of unprocessed primitive integrity constraints, roughly associated multi-valued datasets and their dependencies. But by using the generalization concept using query language can help in processing the raw data into a precise abstraction. It also works in the multi-level collection of data with a quality aggregation. When the larger databases come into the scene, the generalization would play a major role in giving desirable results in a conceptual level of data collection.
  • Flexibility and Interaction: To avoid the collection of less desirable or unwanted data from databases, efficient exposure values or thresholds must be specified for the flexible data mining and to provide compulsive interaction which makes the user experience interesting. Such threshold values can be provided with queries of data mining.

The four parameters of data mining:

  • The first parameter is to fetch the relevant dataset from the database in the form of a relational query. By specifying this primitive, relevant data are retrieved.
  • The second parameter is the type of resource/information extracted. This primitive includes generalization, association, classification,  characterization, and discrimination rules.
  • The third parameter is the hierarchy of datasets or generalization relation or background knowledge as said earlier in the designing of DMQL.
  • The final parameter is the proficiency of the data collected which can be represented by a specific threshold value which in turn depends on the type of rules used in data mining

No comments:

Post a Comment