1. Data quality can be assessed in terms of accuracy, completeness, and consistency.
Propose two other dimensions of data quality.
2. How is a quantile-quantile plot different from a quantile plot?
3. In real-world data, tuples with missing values for some attributes are a common occurrence. Describe various methods for handling this problem.
4. Using the data for age given in Exercise2 answer the following:
a. Use smoothing by bin means to smooth the above data, using a bin depth of 3. Illustrate your steps.
b. Comment on the effect of this technique for the given data.
c. How might you determine outliers in the data?
d. What other methods are there for data smoothing
5. Robust data loading poses a challenge in database systems because the input data are often dirty. In many cases, an input record may have several missing values and some records could be contaminated (i.e., with some data values out of range or of a different data type than expected). Work out an automated data cleaning and loading algorithm so that the erroneous data will be marked and contaminated data will not be mistakenly inserted into the database during data loading