Flowchart: Alternate Process: JUNE 2008

Code: AT19                                       Subject: DATA WAREHOUSING AND DATA MINING

Time: 3 Hours                                                                                                     Max. Marks: 100

 

NOTE: There are 9 Questions in all.

·      Question 1 is compulsory and carries 20 marks. Answer to Q. 1. must be written in the space provided for it in the answer book supplied and nowhere else.

·      Out of the remaining EIGHT Questions answer any FIVE Questions. Each question carries 16 marks.

·      Any required data not explicitly given, may be suitably assumed and stated.

 

 

Q.1         Choose the correct or best alternative in the following:                                       (2x10)

                                      

       a.      Which one of the following does not involve a typical use of the information from data warehouse by any enterprise?

 

               (A)  To increase customer focus.

               (B)  To focus on market economy.

               (C)  To analyse operations to enhance profit.

               (D)  To manage the customer relation and make environmental corrections.

 

       b.     Which one of the following statements is false?

 

               (A)  OLTP is the acronym of online transaction processing.

               (B)  CLDS is a classic data-driven development life cycle.

        (C)  To do “drill down”, it is necessary to be able to do slicing and dicing on data.

        (D)  The shorter the cycle of the feedback loop, the more successful the warehouse effort.

 

       c.      Which one of the following is not a preprocessing step for preparing the data for classification and prediction?

 

               (A)  Data Transformation                      

               (B) Data cleaning

               (C)  Data clustering                               

               (D) Relevance analysis

 

       d.     Which one of the following is not a part of the data-driven methodology for operational development?

 

               (A)  Algorithmic analysis and processing

               (B)  Operational systems and processing

               (C)  DSS and processing

               (D)  Heuristic component

 

       e.      What is created in association with metadata on inclusion of an external data in the data warehouse?   

 

               (A)  Data Mart                                       (B)  Notification data

               (C)  External reference                           (D)  Structure of data

 

       f.      Which on of the following formula is used to compute the support of an association rule A Þ B?

 

               (A)  P(A|B)                                            (B)  P(B|A)

               (C)  P(A È B)                                       (D)  P(A Ç B)

 

       g.      On which system is OLTP performed?

 

               (A)  Data warehouse systems

               (B)  Decision support systems

               (C)  Statistical database systems

               (D)  Operational database systems

 

       h.      Which one of the following is a method for data compression?

 

               (A)  Smoothing                                      (B)  Principle Component Analysis

               (C)  Regression                                      (D)  Sampling

 

       i.       Which one of the following is a technique for data smoothing usually applied for data cleaning and sometimes for data discretization?

 

               (A)  Histogram analysis                          (B)  Segmentation

               (C)  Binning                                           (D)  None of the above

 

       j.      Which one of the following is not used in a EIS?

 

(A)  Trend analysis                                

(B)  Problem monitoring

               (C)  Systems Programming                    

               (D)  Key performance indicator monitoring

 

 

 

Answer any FIVE Questions out of EIGHT Questions.

Each question carries 16 marks.

 

Q.2  a.    Define a data warehouse elaborating its key features. How do the organizations benefit from it?           (6+2)

 

        b.    What are the features of external/unstructured data that pose problems while storing it in the data warehouse? Describe an effective technique for handling unstructured data.                                (8)

 

 

Q.3  a.    What are the major features that differentiate OLTP from OLAP?                           (6)

 

        b.    What is a data cube? The weather bureau has about 10,000 probes which are scattered throughout various land and sea locations across the country to collect data such as air pressure and temperature at each hour. All the data have to be stored at a central office of the bureau. Give a 4-D view clearly mentioning the dimensions of the data collected at the central office.                                                                (6)

 

        c.    Define and illustrate a Decision Tree.                                                                      (4)

 

Q.4  a.    Use diagrams to explain the path of migration from corporate data model to a DSS. (4)

 

        b.    Define k-itemset. Explain the join and prune steps and the terminating condition of Apriori algorithm.  (2+10)

 

 

Q.5  a.    Define Concept hierarchy. Which of the OLAP operations use the concept hierarchy? Illustrate using examples for each.                                                                                                 (8)

 

        b.    Illustrate using an example the role of drill-down analysis in EIS.                            (8)

 

 

 

Q.6  a.    Why is Entity-Relation data model not the best model for data warehouse? What are the forms/schemas of the multidimensional model? Justify the suitability of any two schemas for data warehouse. (8)

 

       b.    Define data cleaning. Explain the basic methods for data cleaning.                           (8)

 

 

Q.7  a.    Use an example to illustrate the problems in creating a base of data for EIS. What are the advantages of designing the data warehouse as a basis for EIS use a diagram to illustrate if needed?  (8)

 

        b.    What is a data cube measure? List the categories of measures based on the kind of aggregate functions used in computing a data cube. Let variance be computed by using the formula  where  is the average of xi’s. To which category does the variance belong to?                            (8)  

 

 

Q.8  a.    Why is feedback loop important for success of data warehouse implementation?     (4)

 

        b.    Differentiate between a migration plan and a methodology.                                      (4)

 

       c.    What are the two focal components of monitoring a data warehouse environment? Point out four important results achieved by monitoring the data.                                                                  (8)

 

 

Q.9  a.    List the technological challenges in a migration plan. While migrating to a data warehouse which elements from a data model need to be changed?                                                                         (8)

 

        b.    Briefly describe the three problems with naturally evolving architecture.                   (8)