DATA ANALYSIS - FULL QUESTION SET WITH DETAILED ANSWERS

Objectives: DATA ANALYSIS - FULL QUESTION SET WITH DETAILED ANSWERS

DATA ANALYSIS - FULL QUESTION SET WITH DETAILED ANSWERS

DATA ANALYSIS
FULL QUESTION SET WITH DETAILED ANSWERS

Examination / Assignment / Revision Material

β€’ All sections included: MCQ, Short Answer, Structured, Long/Essay, Practical, Analytical, True/False
β€’ Detailed answers with clear definitions, explanations and real-life examples
β€’ Suitable for Certificate, Diploma and early Undergraduate level students
β€’ Topics: Data Analysis Lifecycle, Tabular Data, Preprocessing, Linear & Polynomial Graphs, Curves, Visualization

SECTION A: MULTIPLE CHOICE QUESTIONS (10 Γ— 2 = 20 marks)

1. Which of the following is the first stage of the Data Analysis Lifecycle?
A. Data Visualization
B. Data Collection
C. Data Cleaning
D. Model Evaluation
Answer: B. Data Collection
Explanation: This is always the starting point β€” gathering raw data from various sources.
Real-life example: Supermarket collecting purchase receipts and loyalty card data.
2. Which stage involves removing missing values and correcting errors?
A. Data Collection
B. Data Preprocessing
C. Data Interpretation
D. Deployment
Answer: B. Data Preprocessing
Explanation: Preprocessing cleans and prepares data for analysis.
Real-life example: Hospital fixing incomplete patient blood pressure readings before research.
3. Data stored in rows and columns is best described as:
A. Unstructured data
B. Semi-structured data
C. Tabular data
D. Streaming data
Answer: C. Tabular data
Explanation: Classic spreadsheet format β€” most common structure for analysis.
Real-life example: School timetable or monthly sales report.
4. Which software is commonly used to organize data in table form?
A. MS Word
B. MS Excel
C. PowerPoint
D. Paint
Answer: B. MS Excel
Explanation: Excel was specifically designed for tabular data management.
Real-life example: Small shops tracking daily sales in Excel sheets.
5. A linear graph represents a relationship where:
A. Data changes randomly
B. Data follows a curve
C. Data changes at a constant rate
D. Data cannot be predicted
Answer: C. Data changes at a constant rate
Explanation: Straight line = proportional/constant change.
Real-life example: Taxi fare increasing steadily with distance travelled.
6. Polynomial graphs are mainly used when data:
A. Is constant
B. Is linear
C. Shows curved trends
D. Has no pattern
Answer: C. Shows curved trends
Explanation: Polynomials are flexible enough to fit curves.
Real-life example: Sales of ice-cream during the year (slow in winter, peak in summer).
7. Which of the following is NOT a data preprocessing task?
A. Normalization
B. Removing duplicates
C. Data visualization
D. Handling missing values
Answer: C. Data visualization
Explanation: Visualization comes after cleaning β€” it's for presentation.
Real-life example: You clean data first, then make dashboard charts.
8. Which chart is best for showing trends over time?
A. Pie chart
B. Line chart
C. Histogram
D. Bar chart
Answer: B. Line chart
Explanation: Connected points show continuous change very clearly.
Real-life example: Daily temperature graph or monthly revenue trend.
9. Outliers in data are values that:
A. Appear frequently
B. Are missing
C. Are far from other values
D. Are duplicated
Answer: C. Are far from other values
Explanation: Significantly different from the rest of the dataset.
Real-life example: One person earning 10 million while others earn 30–80 thousand.
10. Data preprocessing improves:
A. Data size only
B. Data accuracy and quality
C. Data storage
D. Data collection speed
Answer: B. Data accuracy and quality
Explanation: Main goal = make data trustworthy for decision making.
Real-life example: Cleaned customer data β†’ better targeted advertising.

SECTION B: SHORT ANSWER QUESTIONS (10 Γ— 4 = 40 marks)

1. Define data analysis.
Answer:
Data analysis is the systematic process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw meaningful conclusions, and support better decision-making.

Real-life example: A restaurant analyzing customer orders to know which dishes are most popular and should be promoted.
2. What is meant by the data analysis lifecycle?
Answer:
The data analysis lifecycle is a structured, repeatable sequence of steps that describes how to handle a data project from beginning to end β€” from collecting data to using the results in real life.

Real-life example: A mobile money company following clear steps to detect fraud patterns and protect customers.
3. List any four stages of the data analysis lifecycle.
Answer:
1. Data Collection
2. Data Preprocessing / Cleaning
3. Data Exploration / Analysis / Modeling
4. Interpretation, Visualization & Deployment

Real-life example: Weather bureau collects data β†’ cleans it β†’ analyzes patterns β†’ makes forecast maps for TV/news.
4. What is tabular data?
Answer:
Tabular data is information organized in a table format with rows (individual records/observations) and columns (variables/attributes/features).

Real-life example: Class attendance register β€” each row is one student, columns show name, admission number, days present.
5. Mention two advantages of using Excel for data analysis.
Answer:
1. Very user-friendly with many built-in functions and formulas
2. Quick creation of charts, pivot tables and basic statistical summaries

Real-life example: Small grocery shop owner tracks daily sales and quickly sees which products sell most using Excel pivot table.
6. What is data preprocessing?
Answer:
Data preprocessing is the preparation stage where raw data is cleaned, corrected, transformed and organized so it becomes suitable and reliable for analysis.

Real-life example: Removing fake/incomplete online survey responses before studying customer satisfaction.
7. State two examples of data preprocessing techniques.
Answer:
1. Handling missing values (fill with average, median or delete row)
2. Normalization / Standardization (scaling values to same range)

Real-life example: In university admission system, scaling different exam marks from various boards to same range for fair comparison.
8. What is a linear graph?
Answer:
A linear graph is a straight-line graph that shows a constant/proportional relationship between two variables (equation form: y = mx + c).

Real-life example: Cost of printing photos β€” more photos = higher cost at fixed rate per photo.
9. What is a curve in data analysis?
Answer:
A curve is a non-linear line on a graph that shows changing rates of increase or decrease (acceleration/deceleration).

Real-life example: Learning curve β€” progress is slow at beginning, then becomes faster, then slows again as you master the skill.
10. Why is data visualization important?
Answer:
Data visualization makes complex information much easier to understand quickly, helps discover hidden patterns, and communicates findings effectively to others.

Real-life example: Using colored maps during elections to show which areas support which candidate β€” much faster than reading numbers.

SECTION C: STRUCTURED QUESTIONS (3 questions)

Question C1 – Data Analysis Lifecycle (12 marks)

a) Explain the meaning of the Data Analysis Lifecycle.
b) Describe **each stage** of the Data Analysis Lifecycle.
c) Explain why the lifecycle is important in real-world data analysis.
Answer:

a) The Data Analysis Lifecycle is a systematic, step-by-step framework that describes how to manage a data project from start to finish. It is usually iterative (you can go back to previous steps).

b) Main stages (most common version):
1. Problem Definition / Business Understanding – Understand what question needs answering
2. Data Collection / Acquisition – Gather the necessary raw data
3. Data Preprocessing / Cleaning – Fix errors, missing values, duplicates, format issues
4. Exploratory Data Analysis (EDA) – Understand patterns, distributions, relationships
5. Modeling / Advanced Analysis – Apply statistical or machine learning techniques
6. Evaluation – Check if results are correct and useful
7. Interpretation & Communication – Explain findings with visuals
8. Deployment / Action – Put insights into real-world use

c) Importance:
β€’ Prevents costly mistakes
β€’ Makes work systematic and repeatable
β€’ Saves time in long run
β€’ Increases trust in results

Real-life example: Mobile network company uses lifecycle to detect network problems β†’ clean call drop data β†’ find patterns β†’ improve coverage in weak areas.

SECTION C: STRUCTURED QUESTIONS (continued)

Question C2 – Data in Table Form (12 marks)

a) Explain how data is represented in table form.
b) Draw an example of a data table with at least 5 rows and 3 columns.
c) Explain how Excel helps in organizing and analyzing tabular data.
Full Answer:

a) How data is represented in table form:
Tabular data is organized like a grid:
β€’ Rows represent individual records or cases (e.g. one customer, one student, one transaction)
β€’ Columns represent different attributes or variables (e.g. name, age, score, date, amount)
β€’ First row is usually the header (column names)
This structure makes data easy to read, sort, filter, search and analyze.

b) Example table (5 rows + header):
Product ID Product Name Price (TZS)
P001Maize Flour 2kg8,500
P002Rice 5kg22,000
P003Cooking Oil 1L6,200
P004Sugar 2kg7,800
P005Beans 1kg4,500

c) How Excel helps in organizing and analyzing tabular data:
Excel is extremely powerful for tabular data because it provides:
β€’ Sorting & filtering (find highest/lowest prices quickly)
β€’ Formulas & functions (SUM, AVERAGE, IF, VLOOKUP, etc.)
β€’ Pivot Tables (summarize large data instantly)
β€’ Charts & graphs (visualize trends)
β€’ Data validation & conditional formatting (highlight errors/outliers)
β€’ Freeze panes & tables (easy navigation)

Real-life example: A shopkeeper in Dar es Salaam enters daily sales in Excel, uses pivot table to see best-selling items per week, and creates a bar chart for the supplier meeting.

Question C3 – Charts and Graphs (14 marks)

a) Define data visualization.
b) Explain the difference between bar charts and line charts.
c) Describe situations where a linear graph is preferred.
d) Explain why polynomial graphs are useful in real-life data analysis.
Full Answer:

a) Definition of data visualization:
Data visualization is the graphical representation of information and data using visual elements like charts, graphs, maps, and infographics to make complex data easier to understand and interpret.

b) Difference between bar charts and line charts:
β€’ Bar charts: Use rectangular bars (height/length shows value). Best for comparing categories (discrete data).
β€’ Line charts: Connect data points with lines. Best for showing trends over continuous time/sequence.

Real-life comparison: Bar chart β†’ compare sales of different phone brands in one month
Line chart β†’ show how phone sales changed every day for one month.

c) Situations where linear graph is preferred:
When the relationship between variables is expected to be constant/proportional (straight line).
Examples:
β€’ Distance travelled vs. time at constant speed
β€’ Total cost vs. number of items (fixed price per item)
β€’ Temperature rise in an oven with steady heating

d) Why polynomial graphs are useful in real-life:
Many real-world phenomena are not linear β€” they accelerate, slow down, reach maximum, etc.
Polynomial curves (especially quadratic, cubic) can fit these non-linear patterns very well.

Examples:
β€’ Crop yield vs. fertilizer amount (increases fast, then levels off)
β€’ Car speed during acceleration & braking
β€’ Sales of seasonal products (slow start β†’ peak β†’ drop)
β€’ Population growth in early stages of a new product/market

SECTION D: LONG ANSWER / ESSAY QUESTIONS (Any 2 Γ— 15 = 30 marks)

1. Explain the complete Data Analysis Lifecycle with clear examples at each stage.

Answer:

The Data Analysis Lifecycle is a structured process that guides analysts from identifying a problem to implementing solutions. Most practical versions have 6–8 stages.

Typical complete stages with examples (supermarket chain example):

1. Problem Definition / Business Understanding
Understand the goal. Example: Management wants to know why sales dropped in December.

2. Data Collection / Acquisition
Gather relevant data. Example: Sales records, customer feedback, weather data, competitor prices.

3. Data Preprocessing / Cleaning
Fix errors, missing values, duplicates, wrong formats. Example: Correct wrong dates, remove duplicate transactions, fill missing prices.

4. Exploratory Data Analysis (EDA)
Understand patterns with statistics & visuals. Example: Discover most sales drop happened in rainy days.

5. Modeling / Advanced Analysis
Apply statistical/ML models. Example: Build regression model to see impact of rain on sales.

6. Evaluation
Check model accuracy, usefulness. Example: Model predicts 85% correctly β†’ good enough.

7. Interpretation & Communication
Explain findings clearly. Example: Present dashboard showing rainy days reduce sales by 32%.

8. Deployment / Action
Implement recommendations. Example: Offer discounts on rainy days, increase indoor marketing.

The process is iterative β€” you may return to previous steps when new insights appear.

2. Discuss the role of data preprocessing in data analysis. Include: Handling missing data, Removing duplicates, Normalization, Outlier detection.

Answer:

Data preprocessing is usually the most time-consuming and most important stage because poor quality data leads to poor quality results (β€œGarbage In β†’ Garbage Out”).

Key tasks and their roles:

β€’ Handling missing data
Missing values can break many algorithms.
Methods: delete row, fill with mean/median/mode, use advanced imputation.
Example: In mobile money transactions, missing amount β†’ replace with average transaction of that customer.

β€’ Removing duplicates
Duplicates distort statistics and waste storage.
Example: Same customer registered twice β†’ remove one entry to avoid counting twice.

β€’ Normalization / Standardization
Brings features to same scale (very important for many machine learning algorithms).
Example: Comparing salary (millions) and age (tens) β€” normalize both to 0–1 range.

β€’ Outlier detection
Extreme values can heavily skew results.
Methods: z-score, IQR rule, domain knowledge.
Example: In salary data of employees, one 500 million salary (CEO) is an outlier β†’ may remove or analyze separately.

Overall importance: Preprocessing can improve model accuracy by 20–60% in many real projects.

SECTION E: PRACTICAL / APPLICATION QUESTIONS

Question E1

Given the student scores dataset:
StudentTest 1Test 2Test 3
A657075
B556062
C808590
D455055

a) Identify the type of data shown above.
b) Calculate the average score for each student.
c) Suggest an appropriate chart to represent this data and explain why.
Answers:

a) Tabular data (rows = students, columns = variables)
Also quantitative / numerical data (marks are numbers)

b) Averages:
β€’ A: (65+70+75)/3 = 70.00
β€’ B: (55+60+62)/3 β‰ˆ 59.00
β€’ C: (80+85+90)/3 = 85.00
β€’ D: (45+50+55)/3 = 50.00

c) Best chart: Grouped Bar Chart or Radar/Spider Chart
Reason: Allows easy comparison of performance across multiple tests for each student.

Question E2

Using the same dataset:
a) Explain how you would preprocess the above data before analysis.
b) Identify any possible outliers and justify.
c) Describe how a linear graph could be used to analyze performance trends.
Answers:

a) Preprocessing steps:
β€’ Check missing values β†’ none present
β€’ Check duplicates β†’ none
β€’ Check data types β†’ all numeric (good)
β€’ Check range β†’ all between 0–100 (valid)
β€’ Optional: Normalize if comparing with other subjects (not necessary here)

b) Possible outlier: Student C (80–90 range) while others are 45–75.
Justification: C’s scores are significantly higher than group average (~66).
Decision: Keep if genuine high performer; investigate if error.

c) Linear graph usage:
Plot Test number (1,2,3) on x-axis, score on y-axis for each student.
If line is straight β†’ consistent improvement
Example: Student A shows steady improvement (linear trend).

SECTION F: ANALYTICAL / CRITICAL THINKING QUESTIONS (4 Γ— 6 = 24 marks)

1. Why is data preprocessing considered the most critical stage in data analysis?
Most time-consuming (often 60–80% of project time).
Directly determines quality of results.
β€œGarbage In β†’ Garbage Out” β€” even best model fails on bad data.
Real example: Wrong conclusions in medical research due to unclean patient data.
2. What problems may arise if raw data is analyzed without preprocessing?
β€’ Wrong/misleading conclusions
β€’ Biased models
β€’ Algorithm errors or crashes
β€’ Wasted resources (time & money)
β€’ Loss of trust in analysis
Example: Bank approves bad loans because duplicate customers distorted credit risk model.
3. How do curves and polynomial models help in predicting future trends?
Real-world data rarely changes at constant rate.
Curves capture acceleration, saturation, decline.
Polynomial models fit these patterns better than straight lines.
Example: Predicting electricity demand β€” rises sharply in evening, then drops.
4. Explain how poor data quality can affect decision-making.
Leads to wrong decisions β†’ financial loss, safety issues, missed opportunities.
Example: Company expands to wrong region because unclean sales data showed false high demand.

SECTION G: TRUE OR FALSE (6 Γ— 2 = 12 marks)

1. Data preprocessing is optional in data analysis.
False – Almost always essential
2. Excel supports data visualization.
True – Charts, pivot charts, sparklines
3. Linear graphs always show curved relationships.
False – They show straight-line relationships
4. Polynomial graphs can model complex trends.
True – Especially quadratic, cubic, etc.
5. The data analysis lifecycle ends with data collection.
False – It continues through analysis, visualization, deployment
6. Outliers should always be removed from the dataset.
False – Depends on context; sometimes they are valuable information

Reference Book: N/A

Author name: SIR H.A.Mwala Work email: biasharaboraofficials@gmail.com
#MWALA_LEARN Powered by MwalaJS #https://mwalajs.biasharabora.com
#https://educenter.biasharabora.com

:: 1::

➑