Fundamentals of Data Visualization — Interview Questions & Answers | Meritshot Interview Guides

Data Visualization Fundamentals

1. What is data visualization and why is it important?

Data visualization is the graphical representation of information and data using charts, graphs, maps, and other visual elements to make patterns, trends, and insights easier to understand. The human brain processes visuals 60,000 times faster than text, making visualization essential for communicating analytical findings. It transforms raw numbers into compelling stories, helps stakeholders make faster, better-informed decisions, surfaces patterns invisible in tabular data, and democratises data — enabling non-technical audiences to understand complex analyses. Effective visualization is the final step in the data analysis pipeline where analytical value is delivered to the business.

2. What are the main types of charts and when should each be used?

Bar chart: comparing discrete categories (sales by region, revenue by product). Line chart: showing trends over time (monthly revenue, stock prices). Pie/donut chart: showing part-to-whole relationships with few categories (market share, budget allocation — use sparingly, max 5 categories). Scatter plot: showing relationships/correlations between two continuous variables. Histogram: showing the distribution of a continuous variable. Box plot: comparing distributions across groups and identifying outliers. Heatmap: showing patterns across two categorical dimensions (correlation matrix, activity by day/hour). Area chart: cumulative totals over time. Waterfall chart: explaining how a value changes through additions and subtractions.

3. What is the difference between a chart and a dashboard?

A chart (or visualisation) is a single graphical representation of one aspect of data — a bar chart showing sales by region or a line chart showing revenue over time. A dashboard is a collection of multiple charts, KPIs, and data elements assembled on a single page to provide a comprehensive overview of a business domain or process. Dashboards are designed for at-a-glance monitoring — executives check dashboards to quickly assess business health. Effective dashboards follow a layout hierarchy: summary KPIs at the top, supporting charts below. They include filters/slicers for interactivity and are optimised for the decision-making cadence (daily operational vs. monthly strategic).

4. What is the data-to-ink ratio?

The data-to-ink ratio (Edward Tufte) is the principle that the proportion of "ink" (visual elements) used to represent actual data should be maximised, and non-data ink (decorative elements, gridlines, borders, 3D effects) should be minimised. A high data-to-ink ratio means almost every visual element encodes meaningful information. Practical applications: remove gridlines or make them very faint, eliminate chart borders, remove background fills, delete data labels when axes are sufficient, use direct labels instead of legends when possible, and avoid 3D charts (they distort data perception). Chartjunk (Tufte's term for visual clutter) reduces clarity and should be eliminated.

5. What is a misleading chart and how do you avoid creating one?

Misleading charts distort the viewer's perception of the data. Common examples: truncated Y-axis starting not at zero (makes small differences look large — acceptable for line charts, misleading for bar charts), 3D charts (depth distorts area perception), dual Y-axes with different scales (implying false correlations), cherry-picked time ranges, inappropriate chart types (pie charts with too many slices), area charts with filled areas for non-cumulative data, bubble charts with area not proportional to the value, and using colour to suggest meaning that isn't there. Always start bar/column chart axes at zero, label axes clearly, include context (time period, sample size), and choose chart types appropriate for the data type.

6. What are the pre-attentive attributes in data visualization?

Pre-attentive attributes are visual properties that the human brain processes automatically (within 200ms) before conscious attention — they "pop out" immediately. They include: colour (hue, saturation, intensity), size (length, area, width), shape, position (on a common scale — the most accurate), orientation, texture, motion, and enclosure (grouping by proximity or containment). Effective visualization uses pre-attentive attributes to direct viewer attention to the most important elements. For example, using a bright orange bar against grey bars to highlight the current year's performance. Overusing pre-attentive attributes (too many colours, sizes, and shapes) creates visual noise that overwhelms rather than guides.

7. What is the difference between quantitative and categorical data in visualisation?

Quantitative (numerical) data represents measurable values — revenue, temperature, age. It is shown with: bar charts (comparisons), line charts (trends), scatter plots (relationships), histograms (distributions), and box plots (distributions with quartiles). Categorical (nominal) data represents groups or labels without inherent order — product categories, regions, departments. It is shown with: bar charts, pie charts, grouped bars. Ordinal data has ordered categories — satisfaction ratings (Likert scale), education level. Temporal data (dates, times) is a special quantitative type shown with line charts, area charts, and Gantt charts. Choosing the right encoding for the data type is the first principle of effective chart design.

8. What is colour theory in data visualization?

Colour in visualization encodes meaning and must be used intentionally. Sequential colour palettes (single hue, varying lightness) encode ordered numeric data — light to dark representing low to high. Diverging palettes (two hues meeting at a neutral midpoint) encode data with a meaningful centre (positive/negative, above/below average). Categorical palettes use distinct hues for unordered categories (max 8-10 distinct colours). Best practices: use colour consistently (the same colour always means the same thing), ensure sufficient contrast for accessibility, test for colour blindness (8% of men are red-green colour blind — use Coblis or Viz Palette), avoid using red/green together, and limit the colour palette (3-5 colours maximum for most charts).

9. What is the audience consideration in data visualization?

The audience determines every design decision. Executive audience: needs summary KPIs, trend indicators, and exception-based alerts — no technical detail, minimal chart types, clean and fast-scanning layout. Analytical audience: needs drill-down capability, multiple dimensions, statistical detail, and the ability to explore data freely. Operational audience: needs real-time or near-real-time monitoring, status indicators, and alerting. Consumer audience: needs simplified, engaging visuals with clear context and annotations. Always ask: what decision does this visualisation need to support? What question should the viewer be able to answer in under 10 seconds? Design for the least familiar viewer with the data, not for yourself.

10. What is the difference between exploratory and explanatory visualization?

Exploratory visualisation is used by the analyst during the discovery phase to understand the data — generating many quick, rough charts to find patterns, anomalies, and insights. The audience is the analyst themselves. These charts prioritise speed and flexibility over aesthetics. Explanatory visualisation is used to communicate findings to others — it is polished, annotated, and designed to guide the viewer to a specific insight or conclusion. The analyst removes all exploratory charts except those supporting the key message and adds titles, annotations, and context. Most visualisations produced for internal stakeholders or publications are explanatory; most produced during analysis are exploratory.

Chart Design Principles

11. What is a small multiple (faceted chart)?

Small multiples (Tufte's term) divide a complex visualisation into a grid of smaller, identical charts — one for each category or segment — sharing the same axes and scales. This enables direct visual comparison across panels. Example: 12 identical bar charts side by side (one per month) allow immediate comparison of monthly patterns. In Python: seaborn's FacetGrid or matplotlib's subplots. In ggplot2 (R): facet_wrap(). In Tableau: place a dimension on Columns or Rows to create small multiples. Small multiples are more effective than animation or multiple series in a single cluttered chart when comparing many groups. They scale well — 4×4 grids are readable; beyond that, use a different approach.

12. When should you use a pie chart vs. a bar chart?

Use a pie chart only when: showing part-to-whole relationships, with fewer than 5-6 slices, when the percentages add to 100%, and when the relative proportions (not exact values) are the message. Avoid pie charts when: exact values matter (bar charts allow more accurate comparison), there are many small slices, two slices are nearly equal (impossible to judge visually), or comparing across multiple time periods. Bar charts are almost always a better choice — they use length (the most accurate visual encoding) while pie charts use area and angle (less accurate). If you must use a pie chart, start slices at 12 o'clock, order by size, and include percentage labels directly on slices.

13. What is a box plot and what information does it convey?

A box plot (box-and-whisker plot) displays the five-number summary of a distribution: minimum, Q1 (25th percentile), median (Q2, 50th percentile), Q3 (75th percentile), and maximum. The box spans from Q1 to Q3 (the IQR — interquartile range). The line inside the box is the median. Whiskers extend to the last data point within 1.5×IQR from the box edges. Data points beyond the whiskers are plotted individually as outliers. Box plots are excellent for comparing distributions across multiple groups (e.g., salary by department, test scores by school) in a compact format. They reveal skewness, spread, median differences, and outliers simultaneously.

14. What is a heatmap and when is it most effective?

A heatmap uses colour intensity to represent values in a matrix, where rows and columns represent two categorical variables. Common uses: correlation matrix (relationships between variables), website activity by hour of day × day of week, cross-tabulation of two dimensions (region × product sales), gene expression analysis, and confusion matrices. The colour scale encodes value magnitude. Heatmaps are effective when: you have two categorical dimensions, exact values matter less than patterns, and you have many cells (a bar chart would be cluttered). Hierarchical clustering can be applied to rows/columns to group similar patterns. Always include a colour scale legend and consider annotating individual cells with values for small heatmaps.

15. What is a waterfall chart?

A waterfall chart shows how a starting value is built up or reduced by a series of positive and negative contributions, ending at a final value. Example uses: profitability bridge (Revenue → Gross Profit → EBITDA → Net Income), explaining variance between actual and budget, and showing cumulative impact of multiple factors. The first and last bars are typically grounded at zero; intermediate bars float based on the prior cumulative total. Colour distinguishes positive (typically blue/green) from negative (red/orange) contributions. Waterfall charts are essential in financial reporting, management consulting presentations, and business performance reviews. They are natively available in Excel, Power BI, and Tableau.

16. What is the Gestalt principle in data visualization?

Gestalt principles describe how the human brain organises visual elements into meaningful wholes. Key principles for visualization: Proximity (elements close together are perceived as a group — use spacing to group related chart elements), Similarity (elements that look similar are perceived as related — use consistent colour coding), Continuity (the eye follows smooth paths — use connected lines for trends), Closure (the brain fills in incomplete shapes — use reference lines instead of full grids), and Figure-Ground (distinguish the important element from background — use contrast and whitespace). Applying Gestalt principles reduces cognitive load and makes visualisations intuitive without requiring additional explanation.

17. What is the difference between a stacked bar and a grouped bar chart?

A stacked bar chart places the sub-category values on top of each other within each bar, showing both the total and the composition. It is best for showing part-to-whole relationships while still comparing totals across categories. However, only the bottom segment has a common baseline, making it hard to compare non-bottom segments. A grouped (clustered) bar chart places sub-category bars side by side, making it easier to compare individual sub-categories across groups but harder to see the total. A 100% stacked bar chart normalises all bars to the same height, showing only proportional composition without absolute values. Choose: stacked for total + composition, grouped for precise sub-category comparison.

18. What is a treemap?

A treemap displays hierarchical data as nested rectangles, where each rectangle's area is proportional to its value. Parent nodes contain child nodes, using colour and size to encode additional dimensions. Treemaps efficiently use space to show relative sizes within a hierarchy — for example, market capitalisation by sector and company, or storage usage by folder and file. They are effective for large hierarchical datasets where you want to see the big picture and identify dominant items. Limitations: difficult to compare non-adjacent items, labels become unreadable for small rectangles, and the hierarchy is harder to follow than a tree diagram. Treemaps are available in Tableau, Power BI, D3.js, and Plotly.

19. What is a KPI card and when should it be used?

A KPI (Key Performance Indicator) card displays a single important metric prominently — typically showing the current value, a comparison to a target or prior period, and a trend indicator (arrow, sparkline, traffic light). Used at the top of dashboards to provide an immediate summary of business health before the viewer explores detailed charts. Design principles: use large, readable numbers; include the unit and time period; show the change (absolute and percentage) vs. target or prior period; use colour to indicate status (green = on target, red = below threshold); add a sparkline for trend context. Too many KPI cards reduce their impact — prioritise the 3-5 most important metrics for each audience.

20. What is a Gantt chart?

A Gantt chart is a horizontal bar chart showing a project schedule — tasks on the Y-axis, time on the X-axis, with bars representing the duration of each task. It shows task start/end dates, dependencies (which tasks must complete before others can start), resource assignments, and project milestones. Gantt charts are essential for project management, communicating timelines to stakeholders, and identifying the critical path (the sequence of dependent tasks determining the project's minimum duration). In data work, Gantt charts visualise pipeline run schedules, data processing workflows, and sprint plans. Tools: Microsoft Project, Jira (roadmap view), Smartsheet, Python's Plotly, and Excel.

Storytelling with Data

21. What is the narrative structure for data storytelling?

Effective data stories follow a narrative arc: (1) Context — what is the business situation or question? Who is the audience? (2) Rising tension — what does the data reveal? What is surprising, concerning, or noteworthy? (3) Climax/Insight — the central finding, the "so what" moment. (4) Resolution — the recommended action or decision. The Big Idea framework (Nancy Duarte) distills the story to one sentence with a specific audience, a unique point of view, and stakes for the audience if they don't act. Data stories must answer "so what?" — not just "here is what happened" but "here is what you should do about it."

22. What is annotation in data visualization?

Annotations add contextual text and markers directly to charts, guiding the viewer to the most important elements and explaining what they are seeing. Types: callout labels (pointing to a specific data point — "COVID-19 impact"), reference lines (average, target, threshold), trend line labels, region highlights (shading a recession period), and title-as-insight (chart title stating the insight, not just the measure: "North Region Missed Q4 Target by 18%"). Annotations reduce the cognitive load of interpreting charts by providing context directly where it is needed. Well-annotated charts can often stand alone without verbal explanation — critical for self-service dashboards and reports distributed via email.

23. What is the hierarchy of visual encodings in terms of accuracy?

Cleveland and McGill's research established that visual encodings vary in accuracy for quantitative comparisons, from most to least accurate: (1) Position on a common scale (bar charts — most accurate), (2) Position on identical but non-aligned scales (small multiples), (3) Length, (4) Angle and slope, (5) Area, (6) Volume, (7) Colour saturation/hue (least accurate). This explains why bar charts (using position/length) are more accurate for comparison than pie charts (using angle and area) or bubble charts (using area/volume). When precision matters, choose encodings high on this hierarchy. When communicating proportional relationships at a glance, lower encodings like area and colour are acceptable.

24. What is the slide layout principle for data presentations?

The "assertion-evidence" structure places the insight as the slide title (not just a topic label) and uses the chart as evidence supporting the assertion. Example: instead of "Quarterly Revenue" as a title, use "Revenue Grew 23% YoY in Q4, Driven by Enterprise Segment." This respects the viewer's time, makes the point before they have to interpret the chart, and helps presenters focus on the "so what" rather than description. In a presentation with 10 data slides, each slide title should tell the story in isolation — the audience should understand the key messages by reading only the titles in sequence. This is the standard in consulting and investment banking presentations.

25. What is the difference between correlation and causation in data visualisation?

Correlation is a statistical relationship between two variables — they move together. Scatter plots and correlation matrices show correlation. Causation means one variable directly causes changes in the other — this requires controlled experiments or causal inference methods, not just visual patterns. A classic example: ice cream sales and drowning rates correlate (both rise in summer) — the shared cause is hot weather, not a causal relationship between ice cream and drowning. In visualisations: never label a correlation as causation, always provide alternative explanations, annotate with sample size and context, and distinguish observational data from experimental data. Misleading causal implications from correlational charts are a common form of statistical manipulation.

Python Visualization

26. What is Matplotlib and what is it used for?

Matplotlib is Python's foundational plotting library, providing complete control over every visual element of a chart. fig, ax = plt.subplots() creates a figure and axes. ax.bar(categories, values), ax.plot(x, y), ax.scatter(x, y). While verbose compared to higher-level libraries, Matplotlib enables precise customisation — exact colours, font sizes, tick positions, axis limits, annotations. It is the backend for Seaborn, Pandas plotting, and many other libraries. plt.savefig('chart.png', dpi=300, bbox_inches='tight') exports for reports. Matplotlib is used when you need a specific, publication-quality chart that higher-level libraries can't produce with simple calls.

27. What is Seaborn and how does it extend Matplotlib?

Seaborn is a statistical data visualisation library built on Matplotlib that provides a high-level interface for attractive statistical graphics with less code. Key strengths: beautiful default themes (sns.set_theme()), built-in statistical aggregation (automatically computes means and confidence intervals), and charts designed for statistical exploration. Key charts: sns.boxplot(), sns.violinplot(), sns.heatmap(df.corr(), annot=True), sns.pairplot(df) (pairwise scatter plots for all numeric columns), sns.FacetGrid() for small multiples, sns.regplot() (scatter with regression line), sns.countplot(), and sns.distplot(). Seaborn handles pandas DataFrames natively and is the standard for EDA visualisation in data science.

28. What is Plotly and when is it preferred over Matplotlib?

Plotly is an interactive visualisation library that produces browser-rendered charts with hover tooltips, zoom, pan, and click interactions. import plotly.express as px; fig = px.bar(df, x='category', y='value', color='segment'); fig.show(). Plotly Express is the high-level API (similar to Seaborn for Matplotlib). Plotly Graph Objects provides lower-level control. Dash is a framework for building interactive analytical web applications using Plotly. Plotly is preferred when: delivering interactive reports or dashboards (Jupyter notebooks, web apps), exploring data with hover details, or building web-based analytics. Matplotlib is preferred for static, publication-quality charts in PDF reports and academic papers.

29. What is `matplotlib.pyplot` vs `plt.subplots()` and which should you use?

plt.subplots() (object-oriented interface) is the recommended approach. It explicitly creates figure and axes objects: fig, ax = plt.subplots(2, 2, figsize=(12, 8)). Calling methods on ax (ax.bar(), ax.set_xlabel()) is clear and unambiguous about which axes you are modifying — essential when working with multiple subplots. The plt. (pyplot state machine) interface is simpler for single plots but becomes confusing with multiple subplots because it implicitly tracks the "current" axes. Best practice: always use fig, ax = plt.subplots() for all but the simplest single-chart scripts. In production code, the OO interface is more maintainable and testable.

30. What is Altair and what makes it different?

Altair is a declarative Python visualisation library based on the Vega-Lite grammar of graphics. You describe what the chart should show (data, encoding, mark type) rather than how to draw it. alt.Chart(df).mark_bar().encode(x='category:N', y='value:Q', color='segment:N'). The :N (Nominal), :O (Ordinal), :Q (Quantitative), :T (Temporal) type annotations tell Altair how to encode each field. Altair automatically infers axis types, scales, and tooltips. It produces interactive Vega-Lite JSON specifications rendered in browsers. Altair is excellent for concise, repeatable chart definitions and is well-suited for notebooks and sharing visualisations as self-contained JSON specs.

Tableau & Power BI

31. What is the difference between a dimension and a measure in Tableau?

A dimension is a categorical field — typically a string, date, or boolean — used to categorise, group, and filter data. Dimensions appear in headers, colour legends, and filters. A measure is a quantitative field — typically a number — that can be aggregated (SUM, AVG, COUNT). Measures appear on axes and as colour/size encodings. Tableau automatically classifies fields as dimensions (blue pills) or measures (green pills) based on data type. Converting a dimension to a continuous measure or a discrete dimension changes how Tableau renders the chart. Calculated fields ([Revenue] / [Units]) create new measures. Understanding this distinction is the foundation of all Tableau chart building.

32. What is a calculated field in Tableau?

A calculated field creates a new field derived from existing fields using Tableau's calculation syntax. Types: basic arithmetic ([Revenue] - [Cost]), string functions (UPPER([Name])), date functions (DATEDIFF('month', [Start Date], [End Date])), logical (IF [Score] >= 60 THEN 'Pass' ELSE 'Fail' END), and aggregate calculations ({FIXED [Region]: SUM([Sales])}). LOD (Level of Detail) expressions are the most powerful feature — they compute at a specified granularity regardless of the current view: {FIXED [Customer]: MIN([Order Date])} computes the first order date per customer. Calculated fields enable data transformation, custom metrics, and complex analytics within Tableau without modifying the source data.

33. What are LOD expressions in Tableau?

LOD (Level of Detail) expressions compute aggregate calculations at a specific level independently of the current view's level of detail. Three types: FIXED — computes at the specified dimensions regardless of view filters (except context filters and data source filters): {FIXED [Region]: SUM([Sales])}. INCLUDE — computes at the view's level plus additional dimensions: {INCLUDE [Customer ID]: AVG([Order Value])}. EXCLUDE — computes at a coarser level by removing dimensions from the view: {EXCLUDE [Month]: SUM([Sales])} computes annual total even in a monthly view. LOD expressions enable: cohort analysis, ratio to total, first/last purchase date, customer segmentation, and any calculation requiring a different granularity than the current chart.

34. What are the key chart types available in Tableau?

Tableau's Show Me panel offers: bar chart, stacked bar, line chart, area chart, scatter plot, dual-axis chart (two measures on separate axes), pie chart, treemap, heat map, highlight table (coloured text table), symbol map and filled map (geographic), box-and-whisker plot, bullet chart (KPI vs. target), Gantt chart, and histogram. Tableau also supports custom charts built via calculated fields and dual-axis combinations: dumbbell chart (comparing two points per category), waterfall chart, funnel chart, and Marimekko chart. Tableau's Viz Extensions and custom shapes enable further customisation. The Show Me panel guides chart type selection based on the fields dragged to the view.

35. What is the difference between a live connection and an extract in Tableau?

A live connection queries the data source directly every time the workbook is opened or a filter is applied — data is always current but depends on source availability and query speed. An extract creates a compressed, columnar snapshot of the data in Tableau's .hyper format — stored locally or on Tableau Server. Extracts are faster for large datasets (especially with many calculations), work offline, and can be scheduled to refresh automatically. Extracts support incremental refresh (only new rows are added, not a full rebuild). Use live connections for real-time operational dashboards; use extracts for analytical dashboards with large datasets where speed and reliability matter more than absolute freshness.

36. What is Power BI's data connectivity and transformation pipeline?

Power BI's data pipeline: Power Query (data ingestion and transformation — ETL), Data Model (relationships between tables, calculated columns, measures in DAX), and Visualisations (interactive reports and dashboards). Data flows through: Get Data (connect to 100+ sources) → Power Query Editor (transform — filter, reshape, clean) → Load to Model (Apply changes) → Define relationships (Manage Relationships) → Create DAX measures → Build visualisations → Publish to Power BI Service. Power Query transformations are repeatable and auditable. The Data Model supports star schema design (fact tables connected to dimension tables). Understanding this pipeline is fundamental to diagnosing performance issues and building scalable reports.

37. What are slicers in Power BI?

Slicers are visual-level filters that allow report consumers to interactively filter the data shown across all visuals on a page or report. They can filter by: dropdown list, checkbox list, date range picker, relative date (last 7 days, this quarter), slider (numeric range), and search box. Slicers sync across pages with Sync Slicers (View → Sync Slicers). Slicer interactions with visuals are controlled via Format → Edit Interactions. Slicers replace manual report filtering and allow end users to explore data without modifying the report itself. Best practices: place slicers at the top or left of a report page, use clear labels, and group related slicers.

38. What is cross-filtering and cross-highlighting in Power BI?

Cross-filtering: clicking a data point in one visual filters other visuals on the page to show only data related to the selected item. For example, clicking "North Region" in a map filters all charts to show only North Region data. Cross-highlighting: clicking a data point in one visual highlights the corresponding portion in other visuals while keeping all data visible (highlighted portion is bright, others are faded). The default interaction can be changed via Format → Edit Interactions (select each target visual and choose filter, highlight, or none). Cross-filtering is more useful for focused analysis; cross-highlighting is better for seeing proportional context.

39. What is the difference between a report and a dashboard in Power BI?

A Report in Power BI is a multi-page interactive document with charts, slicers, and rich visualisations built from a single dataset. Reports support full interactivity (cross-filtering, drill-through, drill-down, Q&A). A Dashboard in Power BI is a single-page canvas assembled from pinned tiles (visuals from different reports and datasets). Dashboards are high-level, at-a-glance views pinned from multiple reports. Dashboards support data alerts (notify when a KPI crosses a threshold) but have less interactivity than reports. In practice: build detailed reports for exploration, then pin key KPIs and charts to a dashboard for executive monitoring.

40. What is drill-through in Power BI?

Drill-through allows users to navigate from a summary visual to a detail page filtered to a specific dimension value. Set up by creating a detail page and adding the dimension field to the Drill-through filters well. Users right-click a data point in any visual and select "Drill through → [Detail Page Name]" to navigate. The detail page opens showing only data for the selected dimension (e.g., clicking "North Region" opens a page showing all North Region transactions). Drill-through is the standard pattern for separating summary and detail views without cluttering the main dashboard. The Back button is automatically added to return to the origin page.

Advanced Visualization

41. What is D3.js?

D3.js (Data-Driven Documents) is a JavaScript library for creating custom, interactive, web-based data visualisations by binding data directly to DOM elements and transforming them using HTML, SVG, and CSS. It provides full control over every visual element with a declarative data binding approach. D3 is used when standard chart libraries can't produce the required visualisation — custom network graphs, geographic projections, animated transitions, scrollytelling, and novel chart types. Learning curve is steep (requires JavaScript, SVG, and web development knowledge) but the output is unmatched in flexibility. Major news organisations (NYT, FT, Bloomberg) use D3 for their interactive data journalism pieces.

42. What is a choropleth map and when is it used?

A choropleth map colours geographic regions (countries, states, counties) based on a statistical variable — for example, income by county, COVID vaccination rates by country, or election results by state. Colour intensity encodes the variable's magnitude. Key considerations: use sequential colour scales for one-directional data, diverging scales for data with a meaningful midpoint, and normalise by population (rate rather than count) to avoid misleading patterns in densely populated regions. Tools: Tableau's Filled Map, Power BI Map visual, Python's Plotly choropleth, Folium, and D3's geoPath. Limitations: large geographic regions can dominate visually regardless of their data values — consider cartograms (area distorted by data value) for fairer representation.

43. What is a network graph in data visualization?

A network graph (node-link diagram) represents entities (nodes) and their relationships (edges). Node size can encode importance (degree, betweenness centrality). Edge thickness encodes relationship strength. Colour encodes community or category. Applications: social network analysis (who knows whom), knowledge graphs, supply chain dependencies, system architecture diagrams, and recommendation systems. Force-directed layouts (D3's d3-force) organise nodes so connected nodes cluster together. Tools: NetworkX + Matplotlib (Python), Gephi (standalone), Sigma.js and Vis.js (web). Network graphs become unreadable with many nodes ("hairball" problem) — use filtering, community detection, or egocentric views for large networks.

44. What is a sankey diagram?

A Sankey diagram visualises flows between nodes, where the width of the flow arrows is proportional to the flow quantity. Applications: energy flows (Sankey's original use — showing energy losses in steam engines), website user journeys (from landing page through funnel stages), financial flows (budget allocation from departments to expenses), and migration flows. Nodes on the left represent sources; nodes on the right represent destinations. Connecting flows show how quantities are distributed. Sankey diagrams make it easy to see where the majority of flow goes and where losses or drops occur. Tools: Plotly's Sankey trace, D3 Sankey plugin, Tableau (via extensions), Power BI (third-party visual).

45. What is a bump chart?

A bump chart shows how rankings change over time — particularly useful for comparing ordinal positions across multiple entities over a period. The Y-axis shows rank (inverted — rank 1 at top); lines connect each entity's rank across time periods. Bump charts reveal which entities are rising, falling, or stable in relative position. Examples: sales ranking of products by quarter, country ranking by Olympic medals by year, team standings over a season. They are more readable than multiple line charts when the story is about relative position rather than absolute values. Add distinctive colours and direct labels to each line for readability. Available in Tableau and Python (manual with Matplotlib/Plotly).

46. What is responsive design in data visualization?

Responsive data visualisation automatically adapts its layout, scale, and complexity to the screen size and device — desktop, tablet, and mobile. Principles: prioritise content (on mobile, show fewer KPIs and larger text), use simplified chart types on small screens (avoid complex scatter plots with many labels), stack layouts vertically on mobile (horizontal bar charts become vertical, side-by-side panels stack), scale fonts and touch targets for finger interaction, and use progressive disclosure (show summary on mobile, details on tap). Power BI supports mobile layout view; Tableau has a device designer. SVG-based charts (D3, Plotly) resize by default; ensure axes and labels scale proportionally.

47. What is a connected scatter plot?

A connected scatter plot links observations in a scatter plot with lines, showing how a point moves through the two-dimensional space over time or across a sequence. Unlike a regular scatter plot (showing only the final position), the connecting lines reveal the trajectory. Famous example: Hans Rosling's "wealth and health of nations" animation (now reproduced with Gapminder). Connected scatter plots work best with 5-30 data points per series — with more, the lines become a tangled mess. Label key points (start, end, notable events) to provide context. They are used in economics (Phillips curve, GDP vs. debt trajectory), sports analytics, and any two-variable time series comparison.

48. What is the principle of chart titles and labels?

Chart titles should answer "what should I notice?" — insight titles are more effective than topic titles. "Revenue Declined 15% in H2 2024" is better than "Revenue by Quarter." Axis labels should always include units: "Revenue (₹ millions)" not "Revenue." Axis tick labels should be readable: rotate labels only when necessary, prefer horizontal labels, and abbreviate thousands/millions (₹1.2M not ₹1,200,000). Direct labels (on or next to data points) are better than legends when there are 3-5 series — they eliminate the back-and-forth between legend and chart. Legends should be placed to the right of or below the chart, never above. Avoid label clutter — use data-call-out annotation for the most important points only.

49. What is visual hierarchy in dashboard design?

Visual hierarchy guides the viewer's eye in the intended sequence — from most to least important information. Techniques: size (larger elements are perceived as more important — use large KPI numbers for headline metrics), position (top-left is the natural starting point in left-to-right languages — place the most important element there), contrast (high contrast elements draw attention — use bold colour for key metrics), whitespace (surrounding an element with space makes it more prominent), and grouping (related elements placed together form a visual unit). A well-designed dashboard has a clear Z-pattern or F-pattern reading flow: headline KPIs → key chart → secondary charts → detail tables. Visual hierarchy reduces the time to insight.

50. What is the iterative design process for creating effective dashboards?

The iterative dashboard design process: (1) Define requirements — identify the audience, key decisions to support, and questions to answer. (2) Sketch wireframes — low-fidelity sketches of layout and chart types before building. (3) Build a prototype — a basic version with representative data to validate the design with stakeholders. (4) Gather feedback — share with intended users and observe how they interpret and use the dashboard. (5) Iterate — refine based on feedback (simplify cluttered areas, fix misleading encodings, add missing context). (6) User acceptance testing — confirm the dashboard supports the target decisions. (7) Deploy and monitor — track usage, gather ongoing feedback, and update as business needs evolve. Great dashboards are never built in a single pass.

Fundamentals of Data Visualization — Interview Questions & Answers