Visualizing Predictor Effects With Partial Dependence Plots

 

Article describing the use of partial dependence to uncover predictor effects.

Although basic data mining models have good predictive performance they are essentially “black box” methods, making it difficult to extract information about predicted patterns of occurrence. As described in the Project Feeder Watch Exploratory Analysis Article, partial dependence functions (PDFs) summarize the effect of predictors on the probability of occurrence after accounting for the average effect of all other predictors. It is this "statistical control " that makes partial dependence functions well suited to explore and visualize the effects of predictors based on observational data. In this section we demonstrate how partial dependence plots can be used to

  1. visualize additive effects of each winter season and within-season date on the occurrence of Eastern House Finch, and,
  2. visualize the interacting effects winter season and within-season date that describe the irruptive winter migrations of American Goldfinch.

Visualizing Additive Effects with Partial Dependence Plots

Partial dependence plots are a useful tool to discover a model’s additive structure. For example, the partial dependence plots of season on House Finch (Figure 1a) show declining prevalence in BCR 13, as mycoplasmal conjunctivitis disease caused mortality in this region, while simultaneously House Finches were still colonizing and expanding their range in the southeastern lowlands of BCR 27.  Figure 1b shows another contrast between these two regions, with House Finches being less prevalent at bird feeders in the more northern BCR 13 in mid-winter, while being more prevalent further south at bird feeders at this same time. This latter result may indicate the extent of migration of House Finches from northern parts of their range and into more southern regions over-winter.

PIF PFW Season Date

Figure 1: a) Inter-seasonal trends and b) Intra-seasonal trends for BCR 13 (blue) and 27 (red). 

Partial dependence functions will best represent the nature of the influence a predictor on the predicted response when the effects of the predictor are nearly additive or multiplicative. In the next section we describe how partial dependence functions may be computed for small subsets of predictors to visualize the effects of interacting predictors.

Visualizing Interactions with Partial Dependence Functions

Partial dependence plots need to be interpreted with caution because sometimes two or more predictors will act in synergy to affect birds' presence (a "statistical interaction"). When this happens, the partial dependence plots of the individual predictors participating in the interaction will tend to be “flattened” obscuring their influence on the response.  Practically, the presence of a significant interaction means that biologically realistic predictions cannot be made unless all of the interacting predictors are varied together in realistic ways. Similarly, realistic representations of interacting effects must be based on partial dependence functions where all the interacting predictors are varied together, that is, all interacting predictors must simultaneously be considered “focal” predictors.

During the last decade, the American Goldfinch has exhibited irruptive winter migrations from the Ontario Boreal Hardwood Forests, BCR12, migrating south during the fall of odd numbered years with less pronounced southward migrations during even numbered years. Thus, we expect that winter intra-seasonal occurrence patterns change depending on the year, a statistical interaction between intra-seasonal date and year. In order to visualize this interaction between date and year, it is necessary to plot a two-dimensional partial dependence plot, where both date and season are fixed at specific values while averaging over the remaining nuisance predictors. Figure 2 shows that Goldfinches are less prevalent at bird feeders in mid-winter during odd, or irruptive, years (blue) while Goldfinches are stable, if not increasingly prevalent, at bird feeders through the winter season during even years (red).

amegfi-date-season-interaction-bcr12.jpg

Figure 2: Partial dependence of Date and Season on American Goldfinch occurrence in BCR 12. Irruptive years are plotted with blue trajectories. Non-irruptive years are plotted with red trajectories. The one dimensional partial dependence plot of Date (green) is the average over all eleven seasons.

If instead we plot the one-dimensional partial dependence plot of Date on the presence of American Goldfinch we loose information the effect of Date and the interaction between date and season. The one-dimensional partial dependence plot of intra-seasonal “Date”, the green trajectory in Figure 2, is relatively flat, suggesting little variation in winter season occupancy. The substantial effects of Date have been concealed by averaging over all 11 seasons.

In general, a flat partial dependence plot does not, by itself, suggest the lack of a predictor effect. One must be certain that the focal predictor does not strongly interact with any of the nuisance predictors before such conclusion may be drawn.