AMIETE – IT (OLD SCHEME)

 

Code: AT19                                       Subject: DATA WAREHOUSING AND DATA MINING

Flowchart: Alternate Process: DECEMBER 2009

Time: 3 Hours                                                                                                     Max. Marks: 100

 

 

NOTE: There are 9 Questions in all.

·      Question 1 is compulsory and carries 20 marks. Answer to Q.1 must be written in the space provided for it in the answer book supplied and nowhere else.

·      Out of the remaining EIGHT Questions answer any FIVE Questions. Each question carries 16 marks.

·      Any required data not explicitly given, may be suitably assumed and stated.

 

 

Q.1       Choose the correct or the best alternative in the following:                                (2  10)

 

       a.      Monitoring data warehouse data determines which factor:

 

               (A) if a reorganization needs to be done

               (B) if an index is poorly structured

               (C) available remaining space

               (D) All of the above

 

       b.     PCA is a technique used for

 

               (A) Mining Patterns                                 (B) Compressing Data

               (C) Integrating Data                                 (D) Cleaning Data

 

       c.      ETL does not include

 

               (A) Finding data                                      (B) Deleting data

               (C) Integrating data                                  (D) Placing data in warehouse

 

       d.     Find odd one out:

 

               (A) ROLAP                                            (B) TOLAP

               (C) MOLAP                                           (D) HOLAP

 

       e.      Metadata is

 

               (A) Data out of main data

               (B) Data about data

               (C) Separating primary and secondary data

               (D) Partitioning of data

 

       f.      Independent data marts -

 

               (A)  Do not provide a platform for reusability

               (B)  Do not provide a basis for reconciliation of data

               (C)  Do not provide a basis for a single set of legacy interface programs

               (D)  All of these.

    


       g.      VSAM stands for

 

              (A) Virtual System Assisted Monitoring

               (B) Virtual Storage Access Method

               (C) Virtual System And Maintenance

               (D) None of these

 

      h.      Types of data at the heart of an architected environment are :

 

               (A) Primitive data                                    (B) Derived data

               (C) Both (A) and (B)                               (D) None of the above

 

       i.       A process model does not contain:

 

               (A) Data flow diagram.                            (B) Structure Chart.

               (C) Flow Chart.                                       (D) HIPO Chart.

 

       j.      In classical operational environment:

 

               (A) Production environment exists.

               (B) Testing environment exists

               (C) Both of the above exist

               (D) None of the above

 

 

Answer any FIVE Questions out of EIGHT Questions.

Each question carries 16 marks.

 

 

  Q.2     a. What is “Extract Processing”. Give two reasons, why extract program became popular. Explain the challenges with naturally evolving architecture.                                                                                    (8)

 

             b.   What is data mining?  How does mining differ from traditional database access?                 (8)

 

  Q.3     a.   What is Event Mapping?  Explain with the help of a suitable example.                  (8)

 

             b.   How much detailed data is needed to run EIS/DSS environment? Explain.           (8)

 

  Q.4           What is the beginning point for the migration plan? Give four reasons for excluding derived data and DSS data from the corporate data model and mid level model. What is criteria to find best source of existing data.                                                            (16)

 

  Q.5     a.   Data integration is more important in a data warehouse than in an operational system. Explain                                                                     (8)

 

             b.   Explain the following OLAP operations with an example each.                     (4  2)

 

                   (i)  Pivot                                              (ii)  Slice and Dice

 

       

 

 

  Q.6     a.   What are the features of external / unstructured data that pose problems while storing it in the data warehouse? Describe an effective technique for handling unstructured data.                         (8)

 

             b.   Why does every structure in the data warehouse contains the time element?         (4)

 

             c.   Explain how an EIS is supported by a data warehouse.                                       (4)

 

  Q.7     a.   Write an algorithm to generate a decision tree from the given training data.           (8)

 

             b. Write Apriori algorithm for discovering frequent item sets for mining Boolean association rules.                       (8)

 

  Q.8     a.   Describe 3-4-5 rule of segmentation with the help of an example.                        (8)

 

             b.   What are multifeature cubes? What are advantages of multifeature cubes?           (8)

 

  Q.9          Write short notes on any FOUR:-                                                              (4  4)

 

                   (i)   4 GL technology

                   (ii)  Data warehouse implementation

                   (iii) Data Reduction

                   (iv) Clustering

                   (v)  Feed back loop