CS171 Midterm
Michelle Borkin - March 3, 2009
This midterm should take you 90 to 120 minutes. Of course you can spend much more time on it. But if you find yourself tinkering with details, e.g., measuring circle diameters to get the proportions exactly right, then you are wasting your efforts. We are looking to see if you can apply the principles discussed in class to critique and design static and interactive visualizations.
You will be graded based on how precise you are about your data types, visual encodings, perceptual principles, and interaction techniques. Use the terms that were introduced in class as much as possible. We will also grade the overall impression of your visualization design. But as with any creative activity, there are no right or wrong answers -- just better or worse choices. Be thoughtful, be concise, be creative -- and above all, have fun.
It goes without saying that all submitted work has to be your own and that you may not get help from anyone but the TFs, in keeping with our academic honesty policy and the Wheaton Honor Code.
Part 1: Visualization Critique (25 points)
Take a look at the visualization above, which has three distinct parts (migration, major metros, and urban sprawl), and write a visualization critique. Your analysis should address these questions for each of the three parts:
- Who is the intended audience?
- What tasks does it enable?
- List all the data types this visualization represents (quantitative, ordinal, nominal)
- How is each data type visually encoded?
- What design principles best describe why it is good / bad?
- What perceptual principles and color design rules are followed / violated? You may refer to the Color Rules in the downloads area by number (e.g., "R1, R4, and R6 were violated.")
- Why do you like / dislike this visualization?
Your critiques will be graded based on the thoroughness of your evaluation, and your understanding of the design principles used (or, perhaps not used effectively!).
Migration
[Note: In all the sections, I list all the data types that should be included in #3 instead as part of #4.]
- Who is the intended audience?
The intended audience for this and the other parts of the visualization is the general public reading Newsweek magazine.
- What tasks does it enable?
These "migration" graphics are supposed to enable the viewer to understand where the primary
migration routes are, and how the populations of various states and regions of the country are being
affected in terms of people/population numbers.
- List all the data types this visualization represents (quantitative, ordinal, nominal)
The various components of this visualization represent quantitative (e.g. population percent changes/distributions),
ordinal (e.g. increasing binned ranges of population density),
and nominal (e.g. specific states or regions) data types.
- How is each data type visually encoded?
The quantitative data is encoded as length (bar length) for number of people migrated (x1000) in a horizontal bar graph, in a table (printed number) to represent number of new immigrants (x million) per state,
as color/grey-scale to represent percent of total US population in the shaded maps, as length for percent of state populations in the horizontal bar graph (bar length),
latitude/longitude as position in the maps, and as small multiples of US map for time (years).
The ordinal data is encoded as a tabular list with numbers for rank in the table of population increases by state,
as decreasing lightness from light grey to black (plus red) for population percent of US in increasing binned ranges for map shading,
and as small multiples for chronologically ordered years in increasing order for small multiples maps.
The nominal data is encoded as text representing migration routes in the first bar chart, and
as text representing specific states in the table and the bar graphs in the lower right.
- What design principles best describe why it is good / bad?
For this section, I'm going to break down and analyze each graphical component separately.
However, a quick general comment on the poorly done page layout. For the "migration" section of the visualization,
all of the pieces are not cohesively ordered and having the graphics make a horizontal "L" shape is very discontinuous.
Starting at the top left, the first horizontal bar chart represent numbers of people migrating from one region to another.
Although the attempt was to optimize the data-ink ratio, completely eliminating any label/scale for the variable axis was a bad choice (no way to confirm they are all starting at zero).
I think it was a bad design choice to put the "From Northeast" text within the top bar since it doesn't match the rest of the bars. I also find it hard to comprehend each of the migratory routes,
and would have preferred a map of the US with arrows indicating directions with the numbers and some other encoding (such as color/luminance?) to represent how many people migrated.
It should also be noted that only select migration routes are displayed - other directions, such as migration to the northeast, are omitted and may change the trends/conclusions (or at least an explination is needed as to why only certain routes are included).
Finally, the migration directions are different from the region terminology used/shown in the small multiples map (discussed below) thus it is hard to properly draw comparisons between these graphics.
To the right is a table listing states with the most migration. It is the case that sometimes using a table instead of a graphical representation for data is a good idea.
I think it works ok in this graphic if the point is to convey the order in which the states are ranked. However, for better visual comparison of the numbers,
an ordered bar chart would have been more effective.
The next graphic is a nice example of "small multiples" where, instead of animation, static images over time show the progression of where the US is most densely populated.
One bad design principle is that the multiples are not evenly spaced in time (one step is 50 years and the other 30 years).
In the last set of bar graphs, the top five "magnet" and "sticky" states are presented. The changes are so small that the bars barely look different - a table would have probably been a better choice to encode the data.
- What perceptual principles and color design rules are followed / violated? You may refer to the Color Rules in the downloads area by number (e.g., "R1, R4, and R6 were violated.")
Starting with the first graphic in the upper left, the designers were trying to draw your attention to the fact that there was a large amount of immigration to the south.
However, but choosing such as strong pop-out color as red against a mute background is too strong of an effect. It distracts you from the rest of the data, and a more subtle color choice would have been better.
In terms of color rules, R1 is not effectively used.
For the table, the perceptual rules were adequately followed with easy to read text.
The only thing that could have possibly helped is having alternate shaded lines and only one column to make reading-off of the data easier and more distinct.
The small multiples are color-coded with a segmented color scale with decreasing lightness to represent increased population density (this is good since it uses the acute luminance channel).
The red is also used with its "pop out" effect to draw the viewers attention to the most densely populated regions. This seems a little too drastic to me, and adjusting the color scale so black represented the highest density bin would have been better
(the red would not look the "darkest" if printed in black-and-white).
In terms of color rules, R1 and R3 are not effectively used and R5 is violated (in terms of 0-5 and 6-10% bins).
For the last to bar graphs, the red pops out significantly and distracts from the other visuals on the page. A dark grey would have been sufficient since the red is not specifically encoding any trends (or at least violating the fact that in most of the other graphics red is used to indicate "south" but here it is used arbitrarily).
In terms of color rules, R1 is not effectively used.
- Why do you like / dislike this visualization?
Overall, I don't like these visualizations for the abusive use of the color red, the poor layout, and
the poor encoding choices. However, I do like the use of small multiples on the US maps.
Major Metros
- Who is the intended audience?
The intended audience for this and the other parts of the visualization is the general public reading Newsweek magazine.
- What tasks does it enable?
This piece of the visualization is supposed to enable one to see what aspects of US economy and population
are most concentrated in urban portions of the US.
- List all the data types this visualization represents (quantitative, ordinal, nominal)
This portion of the visualization represents quantitative (e.g. percent of economic sectors),
and nominal (e.g. economic sectors or industries) data types.
- How is each data type visually encoded?
The quantitative data is encoded as the area (or radius - unclear) of each circle representing the percentage of a sector in urban areas,
as text below the circles, as text in the tables at the bottom showing percent share of an industry,
and latitude/longitude as position on the map.
The nominal data is encoded as grey-scale shades on the circles for each sector category,
and circular markers (with numbers in the middle) corresponding to industry names in the tables to show
corresponding city positions on the map.
- What design principles best describe why it is good / bad?
For the top circular diagram, this is a bad visualization due to its "lie factor" - the varying sizes of the circle areas
do not proportionally correspond to the varying change/effect in the actual data. Also, humans are bad at judging variation in area
so a bar graph would have been a more appropriate choice for this data.
For the bottom map with corresponding tables, the markers overlap too much in the New York metro area,
having duplicate numbers for both table's markers is confusing, and no context is given to these numbers.
- What perceptual principles and color design rules are followed / violated? You may refer to the Color Rules in the downloads area by number (e.g., "R1, R4, and R6 were violated.")
For the top circular diagram, the pop out red is too bright/saturated and a toned-down color would have been just as effective with overwhelming the graphic.
In this case since we are not dealing with ordinal data, a progressive grey scale is not necessary and having distinct colors to represent each of the sectors would have made identifying groups easier
(the subtle grey shades are hard to distinguish). So in terms of our color rules, RI is inappropriately used, R4 should have been used, R5 was violated, and R12/R13 were not appropriately used.
In the bottom graphic, the red color was not appropriately chosen since there is no reason to make the "service capitals" pop out more than the "manufacturing capitals".
So in terms of our color rules, R1 is inappropriately used.
- Why do you like / dislike this visualization?
In summary, I don't like the use of circles to compare percentage values (a bar chart would have been much more effective),
I don't like the colors/shades used for categorizing the circular groups, and I did not like the use of circles with numbers/colors
to correspond industries in the tables to their mapped cities.
Urban Sprawl
- Who is the intended audience?
The intended audience for this and the other parts of the visualization is the general public reading Newsweek magazine.
- What tasks does it enable?
This portion of the visualization is supposed to enable a viewer to compare percent populations of a metropolitan area,
and see the increase in suburban residents.
- List all the data types this visualization represents (quantitative, ordinal, nominal)
This visualization represents quantitative (e.g. percent of population), ordinal (e.g. distance from city center in "urban" terminology), and nominal (e.g. "urban", "suburban", etc.) data.
- How is each data type visually encoded?
For the quantitative data, percent of the population is encoded in the equivalent percent area of the pie chart (and as text in the labels),
and time (years) is encoded using small multiples of the pie chart.
The ordinal data is encoded as shaded/colored concentric areas rings around the city center (identifies which terms is furthest from the center).
The nominal data is encoded in the use of the regional terms such as "urban" or "suburban", and is encoded both in the concentric circular areas in the lower left
and as labels on the pie charts (and color-coded regions).
- What design principles best describe why it is good / bad?
The location graphic on the left isn't too bad in terms of design principles. It serves as a good "sketch" to describe the location terminology. However, it does violate multiple color/perceptual rules (see next section).
The pie charts on the right were a poor choice to encode the data in this scenario. There is almost no discernible difference between the pie wedges so there is no easy visual way to compare the small multiples.
It should also be noted that the time-steps for the small multiples are not even increments (10 and 7 years). A cluster bar chart or a line graph would have shown the trends more effectively.
- What perceptual principles and color design rules are followed / violated? You may refer to the Color Rules in the downloads area by number (e.g., "R1, R4, and R6 were violated.")
For the location graphic on the left, the red pop out is too harsh - they are trying to draw your attention/link to the suburban wedges in the pie charts but there is too much pop out.
Picking a simple segmented grey scale labeling scheme going from dark to light (dark inherently being associated with higher density) would have been more appropriate. Also, one cannot read the
white text in the middle of the urban circle - not enough contrast for legibility. In terms of our color rules, R1 is not used effectively.
For the pie charts, red is used as pop out trying to draw ones attention to the increasing suburban population but the poor encoding choices make this trend barely visible.
Also, the smallest wedges are hard to discern due to their small size and a more appropriate color choice could have made them more visible. Also along the same color-size illusion,
the red wedges probably appear larger or more significant due to the red label. In terms of color rules, R1 is not used effectively, and R4, R5, and R6 are all violated.
- Why do you like / dislike this visualization?
In summary, I think the notion of having a "sketch" to explain the terminology with an accompanying color-linked graphic is a good idea.
However, the use of color was not well done (as described above) and the small multiples pie charts are not effective (aside from the differing time steps)
due to the lack of visible change in the wedge slices and inability to discern the smaller wedges. A more effective encoding than wedge area of a circle
would have been either some form of bar chart or a line graph.
Part 2: Visualization Redesign (25 points)
Select two of the parts of the visualization in Part 1 and redesign them with principles of excellence in mind. These two redesigned visualizations should be linked in some way.You can use any tools available to you (i.e. Photoshop, pen and paper, markers, paint, PowerPoint, Keynote, Matlab, etc.) to generate your redesigns. You do not have to worry about getting all the quantitative information correct, and you may use " ... " to indicate more of the same. When choosing colors you may refer to the Color Brewer scales. These redesigns are for a color print publication such as Newsweek, and would be presented together.
In your write-up, include the redesigns as a digital image (again, you can digitize print versions using a scanner or digital camera), and a thorough explanation of your design process, data type and visual encoding decisions, and the perceptual and design principles you followed. Make a case as to why your redesign is better than the original.
|
[Note: Quantitative and geographic data is not accurate in this sketch. Alaska and Hawaii should be included, too. Also, unless
otherwise indicated, all the data presented is for the same year (e.g. 2000). The US maps were based on images from TheUS50.com]
|
For this part of the midterm, I decided to redesign and link the "Migration" and "Major Metros" portions of the original visualization.
My first priority was combining the migration data (which was poorly represented as a bar chart in the original) with spatial data (i.e. US map).
This can be seen in the upper left quadrant of my redesign. I also wanted to tie-in the "Major Metro Areas..." map to this migration map to make any
trends in migration toward particular cities or industries visible to the viewer. I chose a map with state boundaries to make it more recognizable for the viewer
with simple black-and-white outlines for good contrast. The "Major Metro" cities that are listed in the upper right table are represented by dark markers
and city names (not all included in my sketch) so one can more easily match the table data with geographic locations. Other major metro areas that
one might care about in terms of the migration data are represented as light markers (thus not overly distracting or popping out).
For the migration directional arrows, I decided to color-code them to levels (low, medium, and high) of people migrating. Please note I only
included a few sample arrows - with the real data, one would have to more precisely chose the most relevant arrows to display.
It should also be noted that the pink sequential 3-class color scale I took from Color Brewer. (Quick note - I chose all my color scales for my redesigns on the midterm to be color blind friendly, and appropriate for color printing and monitor/projector display.)
This map makes the migration data much easier to interpret, and provides a link to both the
industry table but also to the small multiples below.
In the box below the large US map is a series of small multiples representing population density distribution for the US over 60 years.
This small multiples graphic is able to provide the "narrative" for this visualization, and help put the US map above in some context.
I chose to pick equal-increment time steps for my series for more accurate comparisons. Even though my sketch is "blocky" in its coloring,
my intention would be to color-code based on population density within each county (regions or states may not provide the whole picture and not properly show cities).
I chose a simple sequential 5-class color scale from Color Brewer. I also added a black arrow and box to clearly indicate that the data in the large US map above
corresponds to that particular step in time.
To the right of the large US map, I placed the tabular data with each of the select major industries listed. First of all in terms of the data, before I turned
this into a real visualization, I would consider picking more significant cities or industries. I would also consider picking a variable other than "percent share"
that gave more context, such as "percent change". Instead of using the confusing double-numbered markers, I decided to encode just using the city with a dark marker indicating that it corresponds
to one of the table entries, thus the table entries should be listed in some intelligible way such as by region. I have also added subtle gray backgrounds to
every other line to make the table more legible.
In the lower left of the visualization one finds the other table data. I chose to leave this data as a table since the numbers were too similar in value and seeing these small changes would not encode well in a graphic.
Instead, I chose to improve the tables by making every other line have a grey background, and by putting all the ranked states in single columns.
Finally, in the lower right corner is my redesign of the circular-plots representing "urban share of the US economy". I chose to encode the data
as a bar chart rather than circles since seeing the difference in height would be easier to determine than difference in area, and the visible trends would be more linearly
related to the actual trends. I made sure to put a bar's width in-between each bar, arrange the bars in descending order after the first pop-out "12% land use" bar,
and chose a qualitative 4-class color scheme for the categorical encoding. In this graph, as well as the other parts of the visualization, I tried to optimize my
data-ink ratio as well as avoid chart junk.
In summary, my redesign is better than the original because it more effectively links the data and applies our color design rules more effectively.
The linking is seen with the migration arrows being combined with the metro regions' map, and with the time series being placed just below the large map for easy comparison as well as giving context by connecting the small multiple to the large map.
In terms of design, this redesign has a more concise better layout, uses color more effectively (e.g. color scales that more effectively use luminance), and encodes the data more effectively (e.g. bar chart instead of circular plot).
Part 3: Interaction Design (25 points)
Now turn the visualization redesign of Part 2 into an interactive visualization using the simple interaction sketch technique described in class. You can use any part of your redesign in Part 2 by cutting and pasting it. Indicate interaction techniques using the terms introduced in class, such as brushing and linking, zooming, animated transitions, etc.
You can use any tools available to you (i.e. Photoshop, pen and paper, markers, paint, PowerPoint, Keynote, Matlab, etc.) to generate your redesign. As shown in class, it may be most effective to use slides to show interactions since you can refer to them by number (e.g., "if you click here then jump to slide 5") and since you can add notes to each slide. Be creative -- there is no right or wrong way to sketch interactions! Your interaction sketch should be included in your write-up as either a series of annotated images, or, (preferably) as a series of slides, exported to pdf from PowerPoint or Keynote. For the latter option, include a link in your write-up to the pdf file.
In your interaction sketch, add a thorough explanation of your design process, decisions, and the interaction principles you followed. Make a case as to why your redesign is better than the original and why it is better than your static visualization in Part 2.
|
Note: Click on the above image to open the "interactive" PDF in Acrobat. (The first page is the static version,
the second page has a "snapshot" of the interactive version, and on page 3 you can "click" to interact with the visualization.)
|
For this redesign, my primary goal was to make all the data as "linked" as possible so as to make data exploration easy for the user.
Although the amount of time it takes for the viewer to look at and explore the data is increased, it is worth it for the increased knowledge.
The primary design principle applied here is "linking and brushing" where if the user clicks on portions of the map or row(s) of a table in order to filter the data,
they then see the corresponding data in the other tables and map. This interaction allows the user to interactively explore and mine the data.
The user can also gain more detailed information (since each county would be colored) in the map by using the "zoom" (magnifying glass) mode for the cursor. Although not indicated in my sketch,
a useful add-on feature for the zoom would be a little "context" window that would pop-up after a certain level of magnification so one can see where they are on the US map.
Another set of options that gives the user control over the map's data is the check-boxes next to both the "Migration" and "US Population" color scales. These allow the user to
turn on or off these particular overlays.
One of the key improvements in this interactive version is the ability to interact with the time dimension. By including the variation of this data over time,
we are both able to gain better context for the data, include more data than can be encoded in a short series of small multiples, and we are able to see the "story" of how the
data trends change over time. The user can either play it as an animation with the "play" button, slide the ruler at their own pace or to specific years, or
manually enter a year in the top text field. The data in all the other tables and graph would also automatically update/change as the years change.
Also, if a user were to have data "selected" then they would maintain their highlighted status as the animation played so a user could follow specific cities or states.
In summary, this interactive version of my redesign is better than the static version since it allows one to more effectively explore the narrative over time, to zoom and interact
with the map, chose which overlays to display on the map, and to filter and highlight data that is linked between the visualizations. In addition, this is better than the original Newsweek version
both because of these interactive capabilities as well as the improved data encoding methods and color choices (as described in the Part 2 write-up).
Part 4a: Data Types (10 points)
From the questionnaire in HW 0 we collected -- among others -- the following data items for all CS 171 students:
- What kind of computer(s) do you own?
- What operating system(s) do you run on your computer(s)?
- What is the resolution of your (primary) computer's monitor?
- How long have you been programming?
- How often do you write code?
- What is your primary programming language?
- What other languages do you know?
- What computer science courses have you already taken, if any?
- Overall, how comfortable are you with programming?
Look
at the questionnaire for more details. For each of the following tasks,
select the data items that are required to answer the question, and
transform each data item into the appropriate data type (nominal,
ordinal, or quantitative). You can refer to the data item by number,
e.g., "a) 1 = nominal, 3 = ..."
- What is the relationship between operating system and numbers of years of programming experience?
- Do students who are more comfortable with programming have bigger monitors?
- What programming languages do experienced programmers use?
- How many students with less than a year programming experience took CS 50?
- Do students that write code often and that are comfortable with programming own a laptop?
- What is the relationship between operating system and numbers of years of programming experience?
2 = nominal, and 4 = ordinal (and quantitative)
- Do students who are more comfortable with programming have bigger monitors?
9 = ordinal (and quantitative), and 3 = nominal [3 could also be interpreted as quantitative (total # square pixels)]
- What programming languages do experienced programmers use?
4 = ordinal (and quantitative), 6 = nominal (assuming "use" indicates their primary language, and not all languages that they know)
- How many students with less than a year programming experience took CS 50?
4 = ordinal, and 8 = nominal
- Do students that write code often and that are comfortable with programming own a laptop?
5 = ordinal, 9 = ordinal, and 1 = nominal
Part 4b: Visual Encodings and Visualization Types (15 points)
For each task of Part 4a, choose an appropriate visualization type of the ones available in Many Eyes
(e.g., "stacked bar graph"). Choose effective visual encodings for each
data type you chose in Part 4. You may use the terminology used by
Mackinlay (e.g., "1 nominal -> encode with hue and position").
In
your write-up, describe how the visual encodings are used in the
visualization with words. You do not need to submit sketches, unless
you feel it is easier to explain your choices that way.
- What is the relationship between operating system and numbers of years of programming experience?
Assuming the order of magnitude is large enough between the relationships, I would then use a "bubble chart"
to encode the given data. I would have different circles with distinct colors to encode "operating system", and the size of the
circle would encode the average number of years of programming experience.
- Do students who are more comfortable with programming have bigger monitors?
In this case, I would use a bar chart for my visualization. I would encode the monitor sizes as nominal categories for each bar,
and I would encode the average "comfort" rating for each monitor size along the y-axis.
- What programming languages do experienced programmers use?
In this case, I would first sort through the data and only look at the responses from those who said they had more than 3 years
of programming experience. I would then make a tag cloud where I would encode the number of experienced users who said they used a
particular language as their "primary" language to the size of the word. Thus the most used/mentioned programming language
would be the largest word in the visualization.
- How many students with less than a year programming experience took CS 50?
I would use a bar chart in this case where the x-axis bins would encode the number of months/years experience based on
the survey form's pull-down menu, and I would encode the number of students who responded that they took CS 50 as the
height of each bar.
- Do students that write code often and that are comfortable with programming own a laptop?
I would use a "matrix chart" for this visualization in which how often one writes code is encoded in the x-axis bins,
the comfort-level ratings are encoded in the y-axis bins, and the matrix elements would be circles (unless the magnitude of variation is not great
enough in which case bars would be used) whose area would represent the number of students who said they own a laptop.
|