All posts by Yimin Wang

Effective Data Visualization

One of the most critical items in crafting in effective presentation is the proper visualization of supporting data.  A good picture will both engage the audience and highlight the salient features embedded within the data.  As a scientist, I have sat through numerous talks with poorly executed visualizations—tables overcrowded with numbers (most of which are irrelevant), plots containing too many overlapping and indistinguishable lines, indecipherable figures, etc…  The result of these was, almost uniformly, audiences who were uninterested and unwilling to believe the conclusions because they couldn’t understand any of the data being presented.  Even though there are plenty of examples of what not to do, sometimes it can be hard to find positive examples, especially for presenting the complex data or results that come out of business analysis.  While looking for ways to develop my visualization skills, I recently encountered this blog post: http://www.targetprocess.com/articles/visual-encoding.html, which has a few good rules-of-thumb for creating presentations:

  • Humans do better comparing relative areas, so if you want to map data to a shape, you should map it to its area.
  • Use no more than a dozen colors to encode categories effectively.  If there’s more, it becomes difficult to differentiate them.
  • A diverging color scale should have different colors for positive and negative values.
  • A planar chart is best for representing simple two variable data sets.

As an example of how good data visualization can help build a story and highlight underlying trends and relationships, here are, what I think, are two effective uses of a particular visualization type called a heat map.  In a heat map, data are plotted on a plane (usually projected onto an image of some particular area) and colored according to frequency or some other variable.  This representation helps quickly highlight where important areas (geographic or other) are on the map.

From The New York Times, a good visualization of basketball players shooting and scoring patterns: http://www.nytimes.com/interactive/2012/06/11/sports/basketball/nba-shot-analysis.html

And finally, a visualization of where all the action is on a typical World of Tanks game on the Abbey map: http://www.vbaddict.net/heatmaps/abbey/12

 

Let the data speak for itself

I’ve been interested in developing models and using data to drive business decisions, and so I was recently reading “Doing Data Science”, which is available at http://www.amazon.com/Doing-Data-Science-Straight-Frontline/dp/1449358659/.  The book contains a fair bit of math, which might make it seem a bit daunting, but I believe it’s worth the read since the authors offer some interesting insights into how to incorporate data analysis and modelling into solving business problems.   There are two sections in particular that I found useful.  The first is on exploratory data analysis, which is the process by which you start to construct a solution to your problem.  As the author states, “Exploratory data analysis (EDA) is often relegated to chapter 1 (by which we mean the ‘easiest’ and lowest level) of standard introductory statistics textbooks and then forgotten about for the rest of the book… But EDA is a critical part of the data science process…”  One of the challenges for me, especially when facing a (messy) business problem, is figuring out what is relevant to the issue, and so I think the framework laid out in this book for doing EDA gives me a good structure for how to approach this step.  This involves both asking what information might be available to help me develop correlations between with the desired business result as well as strategies for teasing out those correlations.  Related to this is the chapter on extracting meaning from data, where the author effectively makes the point that just asking more questions and getting more information doesn’t necessarily lead to a better outcome/model if the data you are gathering is not relevant to the problem at hand.

The book also includes a number of useful vignettes about the real-life application (and misapplication) of data-driven business decisions.  For instance, here is an example from IBM where they wanted to find potential customers for their online business service:

At IBM, the target was to predict companies that would be willing to buy “websphere” solutions.  The data was transaction data and crawled potential company websites.  The winning model showed that if the term “websphere” appeared on the company’s website, then it was a great candidate for the product.  What happened?  Remember, when considering a potential customer, by definition that company wouldn’t have bought websphere yet (otherwise IBM wouldn’t be trying to sell to it); therefore no potential customer would have websphere on its site, so it’s not a predictor at all…  Doing simple sanity checking to make sure things are what you think they are can sometimes get you much further in the end…

Adventures in Miscommunication

In my work in science, there is often pressure to achieve “breakthrough” results in order to continue to receive grant funding and to publish in high impact journals.  As a consequence, there is sometimes the tendency to not directly falsify, but to prune data so as to cast experiments in the most favorable light.  Minimizing this kind of data manipulation requires effective communication of core scientific principles at all levels of a research team; however, team leaders need to be especially careful that they are sending the right message.  In “Business Adventures” by John Brooks, a book I discovered through an interesting review article by Bill Gates in the Wall Street Journal, there is an example of miscommunication of business ethics throughout General Electric’s entire hierarchy that seemed particularly relevant.  Although the book is several decades old, the series of articles on which it was based appeared from 1959-1969, it is “as much about the strengths and weaknesses of leaders in challenging circumstances as it is about the particulars of one business or another,” as Bill Gates writes in his review.  In the case of GE, the communication of the company policy regarding price-fixing with competitors began to be accompanied by an unmistakable wink from some executives and this eventually became so engrained in corporate culture that even a direct order by an upper-level executive to not engage in price-fixing was ignored.  In his conclusions, Brooks offers the following scenario, where he describes how effective communication requires you to consider not only what you are saying, but precisely how you are conveying it to your audience:

Suppose, purely as hypothesis, that the owner of a company who orders his subordinates to obey the antitrust laws has such poor communication with himself that he does not really know whether he wants the order to be complied with or not.  If his order is disobeyed, the resulting price-fixing may benefit his company’s coffers; if it is obeyed, then he has done the right thing.  In the first instance, he is not personally implicated in any wrongdoing, while in the second he is positively implicated in rightdoing.  What, after all, can he lose?  It is perhaps reasonable to suppose that such an executive might communicate his uncertainty more forcefully than his order.

The review is available here: http://www.gatesnotes.com/Books/Business-Adventures, and the book is available either from the Emory library or Amazon.