I’ve made quite a few blog posts about neural networks and some of the diagnostic tools that can be used to ‘demystify’ the information contained in these models. Frankly, I’m kind of sick of writing about neural networks but I wanted to share one last tool I’ve implemented in R. I’m a strong believer that supervised neural networks can be used for much more than prediction, as is the common assumption by most researchers. I hope that my collection of posts, including this one, has shown the versatility of these models to develop inference into causation. To date, I’ve authored posts on visualizing neural networks, animating neural networks, and determining importance of model inputs. This post will describe a function for a sensitivity analysis of a neural network. Specifically, I will describe an approach to evaluate the form of the relationship of a response variable with the explanatory variables used in the model.

The general goal of a sensitivity analysis is similar to evaluating relative importance of explanatory variables, with a few important distinctions. For both analyses, we are interested in the relationships between explanatory and response variables as described by the model in the hope that the neural network has explained some real-world phenomenon. Using Garson’s algorithm,^{1} we can get an idea of the magnitude and sign of the relationship between variables relative to each other. Conversely, the sensitivity analysis allows us to obtain information about the form of the relationship between variables rather than a categorical description, such as variable x is positively and strongly related to y. For example, how does a response variable change in relation to increasing or decreasing values of a given explanatory variable? Is it a linear response, non-linear, uni-modal, no response, etc.? Furthermore, how does the form of the response change given values of the other explanatory variables in the model? We might expect that the relationship between a response and explanatory variable might differ given the context of the other explanatory variables (i.e., an interaction may be present). The sensitivity analysis can provide this information.

As with most of my posts, I’ve created the sensitivity analysis function using ideas from other people that are much more clever than me. I’ve simply converted these ideas into a useful form in R. Ultimate credit for the sensitivity analysis goes to Sovan Lek (and colleagues), who developed the approach in the mid-1990s. The ‘Lek-profile method’ is described briefly in Lek et al. 1996^{2} and in more detail in Gevrey et al. 2003.^{3} I’ll provide a brief summary here since the method is pretty simple. In fact, the profile method can be extended to any statistical model and is not specific to neural networks, although it is one of few methods used to evaluate the latter. For any statistical model where multiple response variables are related to multiple explanatory variables, we choose one response and one explanatory variable. We obtain predictions of the response variable across the range of values for the given explanatory variable. All other explanatory variables are held constant at a given set of respective values (e.g., minimum, 20^{th} percentile, maximum). The final product is a set of response curves for one response variable across the range of values for one explanatory variable, while holding all other explanatory variables constant. This is implemented in R by creating a matrix of values for explanatory variables where the number of rows is the number of observations and the number of columns is the number of explanatory variables. All explanatory variables are held at their mean (or other constant value) while the variable of interest is sequenced from its minimum to maximum value across the range of observations. This matrix (actually a data frame) is then used to predict values of the response variable from a fitted model object. This is repeated for different variables.

I’ll illustrate the function using simulated data, as I’ve done in previous posts. The exception here is that I’ll be using two response variables instead of one. The two response variables are linear combinations of eight explanatory variables, with random error components taken from a normal distribution. The relationships between the variables are determined by the arbitrary set of parameters (`parms1`

and `parms2`

). The explanatory variables are partially correlated and taken from a multivariate normal distribution.

require(clusterGeneration) require(nnet) #define number of variables and observations set.seed(2) num.vars<-8 num.obs<-10000 #define correlation matrix for explanatory variables #define actual parameter values cov.mat<-genPositiveDefMat(num.vars,covMethod=c("unifcorrmat"))$Sigma rand.vars<-mvrnorm(num.obs,rep(0,num.vars),Sigma=cov.mat) parms1<-runif(num.vars,-10,10) y1<-rand.vars %*% matrix(parms1) + rnorm(num.obs,sd=20) parms2<-runif(num.vars,-10,10) y2<-rand.vars %*% matrix(parms2) + rnorm(num.obs,sd=20) #prep data and create neural network rand.vars<-data.frame(rand.vars) resp<-apply(cbind(y1,y2),2, function(y) (y-min(y))/(max(y)-min(y))) resp<-data.frame(resp) names(resp)<-c('Y1','Y2') mod1<-nnet(rand.vars,resp,size=8,linout=T)

Here’s what the model looks like:

We’ve created a neural network that hopefully describes the relationship of two response variables with eight explanatory variables. The sensitivity analysis lets us visualize these relationships. The Lek profile function can be used once we have a neural network model in our workspace. The function is imported and used as follows:

source('https://gist.githubusercontent.com/fawda123/6860630/raw/b8bf4a6c88d6b392b1bfa6ef24759ae98f31877c/lek_fun.r') lek.fun(mod1)

Each facet of the plot shows the bivariate relationship between one response variable and one explanatory variable. The multiple lines per plot indicate the change in the relationship when the other explanatory variables are held constant, in this case at their minimum, 20^{th}, 40^{th}, 60^{th}, 80^{th}, and maximum quantile values (the splits variable in the legend). Since our data were random we don’t necessarily care about the relationships, but you can see the wealth of information that could be provided by this plot if we don’t know the actual relationships between the variables.

The function takes the following arguments:

`mod.in` |
model object for input created from `nnet` function (nnet package) |

`var.sens` |
vector of character strings indicating the explanatory variables to evaluate, default NULL will evaluate all |

`resp.name` |
vector of character strings indicating the reponse variables to evaluate, default NULL will evaluate all |

`exp.in` |
matrix or data frame of input variables used to create the model in `mod.in` , default `NULL` but required for `mod.in` class RSNNS |

`steps` |
numeric value indicating number of observations to evaluate for each explanatory variable from minimum to maximum value, default 100 |

`split.vals` |
numeric vector indicating quantile values at which to hold other explanatory variables constant |

`val.out` |
logical value indicating if actual sensitivity values are returned rather than a plot, default F |

By default, the function runs a sensitivity analysis for all variables. This creates a busy plot so we may want to look at specific variables of interest. Maybe we want to evaluate different quantile values as well. These options can be changed using the arguments.

lek.fun(mod1,var.sens=c('X2','X5'),split.vals=seq(0,1,by=0.05))

The function also returns a ggplot2 object that can be further modified. You may prefer a different theme, color, or line type, for example.

p1<-lek.fun(mod1) class(p1) # [1] "gg" "ggplot" p1 + theme_bw() + scale_colour_brewer(palette="PuBu") + scale_linetype_manual(values=rep('dashed',6)) + scale_size_manual(values=rep(1,6))

Finally, the actual values from the sensitivity analysis can be returned if you’d prefer that instead. The output is a data frame in long form that was created using `melt.list`

from the reshape package for compatibility with ggplot2. The six columns indicate values for explanatory variables on the x-axes, names of the response variables, predicted values of the response variables, quantiles at which other explanatory variables were held constant, and names of the explanatory variables on the x-axes.

head(lek.fun(mod1,val.out=T)) # Explanatory resp.name Response Splits exp.name # 1 -9.825388 Y1 0.4285857 0 X1 # 2 -9.634531 Y1 0.4289905 0 X1 # 3 -9.443674 Y1 0.4293973 0 X1 # 4 -9.252816 Y1 0.4298058 0 X1 # 5 -9.061959 Y1 0.4302162 0 X1 # 6 -8.871102 Y1 0.4306284 0 X1

I mentioned earlier that the function is not unique to neural networks and can work with other models created in R. I haven’t done an extensive test of the function, but I’m fairly certain that it will work if the model object has a predict method (e.g., `predict.lm`

). Here’s an example using the function to evaluate a multiple linear regression for one of the response variables.

mod2<-lm(Y1~.,data=cbind(resp[,'Y1',drop=F],rand.vars)) lek.fun(mod2)

This function has little relevance for conventional models like linear regression since a wealth of diagnostic tools are already available (e.g., effects plots, add/drop procedures, outlier tests, etc.). The application of the function to neural networks provides insight into the relationships described by the models, insights that to my knowledge, cannot be obtained using current tools in R. This post concludes my contribution of diagnostic tools for neural networks in R and I hope that they have been useful to some of you. I have spent the last year or so working with neural networks and my opinion of their utility is mixed. I see advantages in the use of highly flexible computer-based algorithms, although in most cases similar conclusions can be made using more conventional analyses. I suggest that neural networks only be used if there is an extremely high sample size and other methods have proven inconclusive. Feel free to voice your opinions or suggestions in the comments.

Cheers,

Marcus

^{1}Garson GD. 1991. Interpreting neural network connection weights. Artificial Intelligence Expert. 6:46–51.

^{2}Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S. 1996. Application of neural networks to modelling nonlinear relationships in Ecology. Ecological Modelling. 90:39-52.

^{3}Gevrey M, Dimopoulos I, Lek S. 2003. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling. 160:249-264.

Update:

The sensitivity analysis function should now work for neural networks created using the RSNNS package. The only change was a new argument that was necessary for using the sensitivity analysis with RSNNS. The `exp.in`

argument is a required matrix or data frame of explanatory variables that were used to create the input model (`mod.in`

argument). I tried to extract this information from the RSNNS model object but I don’t think it’s possible, as is not the case with neural network objects from the nnet package. Also, I tried to modify the function for use with the neuralnet package, but was unsuccessful since there is no `predict`

method for neuralnet objects. As described above, a `predict`

method is necessary for sensitivity analysis to evaluate the model using novel input data.

Here’s some code showing use of the updated function with an RSNNS object, same as above:

require(clusterGeneration) require(RSNNS) require(devtools) #define number of variables and observations set.seed(2) num.vars<-8 num.obs<-10000 #define correlation matrix for explanatory variables #define actual parameter values cov.mat<-genPositiveDefMat(num.vars,covMethod=c("unifcorrmat"))$Sigma rand.vars<-mvrnorm(num.obs,rep(0,num.vars),Sigma=cov.mat) parms1<-runif(num.vars,-10,10) y1<-rand.vars %*% matrix(parms1) + rnorm(num.obs,sd=20) parms2<-runif(num.vars,-10,10) y2<-rand.vars %*% matrix(parms2) + rnorm(num.obs,sd=20) #prep data and create neural network rand.vars<-data.frame(rand.vars) resp<-apply(cbind(y1,y2),2, function(y) (y-min(y))/(max(y)-min(y))) resp<-data.frame(resp) names(resp)<-c('Y1','Y2') #create neural network model mod2<-mlp(rand.vars, resp, size=8,linOut=T) #import sensitivity analysis function source_url('https://gist.githubusercontent.com/fawda123/6860630/raw/b8bf4a6c88d6b392b1bfa6ef24759ae98f31877c/lek_fun.r') #sensitivity analsyis, note 'exp.in' argument lek.fun(mod2,exp.in=rand.vars)

Hey thanks for this series of posts, have really enjoyed them

Thanks Pete, glad you liked them!

Just discovered these, skimmed a bit and greatly looking forward to reading the whole series…

Thanks, hope you find them useful!

Thanks a lot for this instructive post. It really helps!

Since I know little about sensitivity analysis, I would like to ask why you choose this specific “Lek profile” method. Does it generate justified or reasonable results, or simply because it is easy to implement?

Hi, glad you liked the post! I chose the Lek method because it’s very intuitive and is only one of a handful of methods available for neural networks. I prefer it over Garson’s algorithm in my previous post because it allows visualization of the response form that the model has characterized, as compared to a simple idea of relative importance using the former approach. Have a look at the following pubs for a nice overview and comparison of approaches available for neural networks:

Olden JD, Jackson DA. 2002. Illuminating the ‘black box’: A randomization approach for understanding variable contributions in artificial neural networks. Ecological Modelling. 154:135-150.

Olden JD, Joy MK, Death RG. 2004. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological Modelling. 2004. 389-397.

One method that I haven’t tried is a partial derivatives approach (Dimopoulous et al. 1995, 1999, cited in Olden et al. 2004). I don’t recall the specifics but it seemed like an interesting alternative to the Lek method, although less intuitive.

Hello,

I really like your post’s about the sensitivity analysis and the variable importance. Have you already used the sensitivity analysis (any method of it) on RNN or LSTM Networks? What do you think about this? Generally this should work or am I wrong?

Thanks!

Hi Chris,

The sensitivity analysis (Lek Profile Method) should work with any R model object that has a predict method. That function relies only on the fitted model object, so the structure of the weights is not important. The variable importance functions are a bit different as they extract the weights from a fitted model object to estimate importance. I am not as familiar with RNN or LSTM, but my guess is the methods are not as easily transferable. The variable importance functions were designed for simple feed-forward networks and the weights for the other model types likely do not have the same form. Check the source code here:https://github.com/fawda123/NeuralNetTools/tree/master/R You may find some useful snippets that you can adapt for these other models.

-Marcus

Hey, thanks for the reply.

I think I will first try this Lek method on my keras model and then the mentioned below.

If you are interested:

Most approaches to interpret the feature Importance I found are based on the visualization of the network, activation functions etc.

However in my research, I discovered three methods for RNN’s/LSTM’s.

Gradient-based Sensitivity Analysis and LRP:

https://arxiv.org/pdf/1706.07206.pdf

Garson’s method:

https://www.biorxiv.org/content/biorxiv/early/2016/09/28/078246.full.pdf

Greetings, Chris

Hi Chris, checkout the garson and olden functions in the NeuralNetTools package. These are the most current functions for variable importance with feed-forward neural nets. I suggest using olden over garson because it preserves the sign.

-Marcus

Visualizing neural networks in R – update – R is my friend

Please, it possible apply lek.fun in RSNNS.

I tried and …Error in mapply(function(x, name)……

Tks

Hm, that’s odd that the function doesn’t work with RSNNS since it has a predict method. I can have a look at it this weekend. I’ve been meaning to update all of these functions to be compatible with other packages besides nnet.

-Marcus

Tarcisio, see the update above…

Thanks a lot Marcus,

I changed, the 34 line #for rsnns

to make graphic with real inputs names

names(mat.in)<-names(mat.in), and for select variables with var.sens don't changed any. Works wonderfully.

However this update don't can work in lm(methods) anymore.

Thanks,

Tarcísio

Whoops, sorry about the lm issue… should be fixed now, see line 11.

Hi, Marcus.

I am trying to use de lek.fun function in my network with the atributtes below:

initial value 146.956657

final value 36.593223

stopped after 5 iterations

a 81-1-1 network with 84 weights

options were – linear output units

But i keep getting the error:

Erro em names(lek.vals) <- c("Explanatory", "resp.name", "Response", :

atributo 'names' [5] deve ter o mesmo comprimento que o vetor [3]

Notice that my RStudio is configurated to portuguese language.

Thanks a lot!!

NeuralNetTools 1.0.0 now on CRAN – R is my friend

Hi, i hv a problem. i hv a set of 5 input parametrs and one output parameter. if i train the network using all 5 input parameters and one output parameter i will get a MSE value and an R value.then i train the network using 4 input parameters,then 3 etc varying the combination of input parameters i get MSE and R value for each set.cant i compare these obtained MSE and R values and find out which combination is best in terms of impact on the output? please help with ur valuable comments

Regards

Hi Suja,

Finding the optimal neural network structure is the hardest part of using these models. I haven’t had a a lot of experience with variable selection for neural networks, as you describe, though it has been explored by others. Check out the Olden and Jackson citation in the help documentation for the Garson function (also linked here). They describe a variable selection technique for neural networks, though I don’t think this was ever implemented in R. As far as comparing variable importance between networks with different explanatory variables… I’m not sure this is a good idea since the importance values are relative only within the context of the particular network. Directly comparing values between networks would not make much sense. Hope that helps.

-Marcus

Hi Marcus,

Thanks for your great work on sensitivity analysis and I find it really helpful.

You mentioned that lekprofile won’t work for “neuralnet”. Unfortunately I have a neuralnet model(I know the weights for each layer) and wanted to apply lekprofile on it.

Is there a simple way for me to trick lekprofile to do it for neuralnet models? For example, possibly convert neuralnet model into either nnet or RSNNS model by replacing their network weights (not a big network) ?

Thanks,

John

Hi John,

You could try this, similar to what you mentioned… get the weights from a neuralnet model, use those to create the nnet model w/ no training iterations, then use lekprofile. The bottom part of the code compares the model predictions between the two to verify the results are the same. Hope that helps – Marcus

Hi Marcus,

Many thanks for your reply ! This is a cool trick and good news for people using neuralnet.

I tried your example, and I actually got identical results between the two nets. On line 9, it seems that it should be “mod1” as opposed to “mod”, which you probably were experimenting with different models at the time.

>>line 9: modwts <- neuralweights(mod1)

With that change, I found identical results from 2 nets, so your code below would effectively makes lekprofile applicable to neuralnets.

Thanks

John

Doh, thanks for noticing the error. I’ve edited my original response. Your initial problem motivated me to add this feature to the package, so expect this forthcoming in a later release!

Check version 1.3.2 on the development repo – I added this method to lekprofile.

https://github.com/fawda123/NeuralNetTools

I do not understand why I get an error when I call the lek function, whereas the neural net is perfectly built. It must be something stupid, but I do not see what.

Here is my code:

net <- nnet(Qty ~ Price + Discount, size = 7, data = trainingdata, linout=TRUE)

lek.fun(net)

and I get the following error:

Error in apply(mat.cons, 2, fun.in) : dim(X) must have a positive length

Does someone see why?

Hi there and sorry you’re having problems. I would suggest downloading the development version of the package and trying with the lekprofile function. Just follow the instructions on the readme file at http://www.github.com/fawda123/NeuralNetTools

Hope that helps.

-Marcus

Thank you so much for your answer. It works perfectly well now. Thank you for doing such a useful and great job in R, it helps a lot.

Hi Marcus

I would like to use sensitivity analysis in Matlab for deriving feature importance for a neural net. Do I have to train a new neural net for each varied feature value or can I train only one neural net and only vary the feature values of the test data set? I have > 1000 features and if I have to train a new neural net for each varied feature, this will be computational impossible.

Hi again,

You should only have to train one model. The function calls the predict method for an existing neural network object that is used as input. The tricky part is creating the data for the sensitivity analysis that is passed to the predict method, i.e., holding all values constant for all but the variable of interest, repeating different constant values, etc. I would try to link Matlab with R to call functions in the NeuralNetTools package. Otherwise you can try to recreate them in Matlab, particularly the functions here and the pred_sens function here starting on line 263.

-Marcus

Hi Marcus

Thanks for the answer. In short; I would do it in the following way:

1.Train a NN on my data.

2. Choose for every feature e.g. 10 values and make prediction for every value which gives 10*(number of features) predictions

3. Choose e.g. features which show smallest or highest variation as most important

Regarding the range of values I could choose the minimum and maximum feature value and scale the value in between (to have e.g. 10 values).

Is this wrong? Unfortunately I’m never used R, so it is a bit tricky to dig through the code you linked and port it to Matlab.

Or could you give a small example of how to call NeuralNetTools for sensitivity analysis from Matlab?

Hi Helvi, that sounds about right. The key is to hold the other varibles constant while you get the predictions for the response as a function of only one of the explanatory variables that changes across it’s range. Actually, try running the

`lekprofile`

function using the argument`val_out = TRUE`

. The function will return the values that are plotted for the response variable and the corresponding values for the explanatory variables that were used to get the predictions, i.e., the x variable that is varied across its range (labelled as ‘Explanatory’ for the values and ‘exp_name’ for the corresponding variable) and the constant quantile that the other explanatory variables are held at to get the response. Hopefully that provides some insight.-Marcus

Hi Marcus,

From what I understand, the sensitivity plots show the relationship between a response variable and an explanatory variable given the other explanatory variables are held constant at selected quantile values.

Does this only apply to explanatory variables which have a positive lock step correlations? Some of the variables in my data set are negatively correlated such that a maximum value for one variable may correspond to a minimum value for another so I’m not sure whether it would be realistic to hold both of them constant at their 75% quantile values for example. Please could you share your thoughts on whether the lekprofile plots would be suitable for this kind of data?

Thanks

Amara

Hi Amara,

You bring up a good point about this method… holding the other variables constant for levels that don’t make sense given their correlations might not be the best approach. I actually created a method that holds the variables constant based on natural groupings between variables but this is not implemented in the package. It’s described in this paper: http://dx.doi.org/10.1016/j.ecolind.2014.04.002. The relevant text on p. 199:

‘Lek’s profile method plots the predicted values for a response variable across the range of values for a specific explanatory variable. All other explanatory variables are held constant (e.g., minimum, 20th quantile, maximum). We modified the method to use constant values for unevaluated explanatory variables based on an agglomerative clusteringtechnique to identify groups of lakes with similar characteristics(Milligan, 1989). This approach provided a more robust method of evaluating response curves because values for other explanatory variables that were more likely to co-occur were identified.The original approach for Lek’s profile method creates unrealistic response curves because not all lakes are, for example, large, deep,etc. Unevaluated explanatory variables were held constant at the mean value for each identified cluster to determine response to specific explanatory variables.’

I considered including this as an option in the NeuralNetTools package but thought I’d keep it simple and stick with the original Lek method. I might include it in future versions if there’s interest. If you’re interested, I could dig up my original code and send it your way.

-Marcus

Just added this feature to the development version of the package (details here). Try this example to evaluate a neural network based on kmeans clustering of the explanatory variables. The unevaluated variables are held constant at the mean for each cluster, which can be viewed with split_show = T. More info is in the help file for lekprofile. This isn’t a very interesting example though because the explanatory variables are uncorrelated, i.e., all groups are similar.

Hi Marcus,

Thanks for your great work with sensitivity analysis and I found it was quite useful. As you mentioned lek function also can be used to other linear regression models. So I am trying to use this function to linear mixed effect model.There are 13 explanatory variables, 7 variables are factor variables, 6 are numeric variables. When I run the function I got the error:

Error in matrix(nrow = step.val, ncol = ncol(mat.in), dimnames = list(c(1:step.val), :

non-numeric matrix extent

I was wondering whether this function can used to factor variable. If so, how can I figure out this problem.

Hi Ayumi,

Unfortunately, the method only applies to continuous variables. The response variable is evaluated across the range of values for each explanatory variable while holding the unevaluated variables constant. For categorical data, you would have to hold them constant at each of the factor levels. This would create a much more complicated plot depending on how many levels you have. I think it’s similar to plotting predictions for a linear model where one of the variables is a factor with different levels. The predictions include the effects of the continuous variables with an additive constant for each level of the categorical variable. Something similar would have to be done for lekprofile… not impossible just beyond the scope.

-Marcus

Hi Marcus,

How do I perform sensitivity analysis for the multilayered neural network trained using h2o package?

Thanks in advance

Suneetha

Hi Suneetha, unfortunately I have not used the h2o package for neural networks. The NeuralNetTools package has methods for neural networks from the nnet, RSNNS, neuralnet, and caret packages, so maybe you can try using those first? It might also be helpful to see if you can extract a model object from the h2o output, sometimes they are buried.

-Marcus

Hi Marcus,

Thank you very much for your response.

I tried using lekprofile method using RSNNS. It worked. However, how do I just extract all values of 6 columns (Explanatory, resp.name, Response, Splits ,exp.name) into a table/dataframe.

x <- mydata[, c(1:77)]

y <- mydata[, 'Class']

mod <- mlp(x, y, size = c(5,10,5),linOut=T)

head(lekprofile(mod, xvar=x, yvar=y, val.out=T,xsel = NULL, ysel = NULL))

The output which I obtained is a long list ……

…..

1994 1.939393939 Y1 0.1776309013 2 X4

1995 1.949494949 Y1 0.1775554568 2 X4

1996 1.959595960 Y1 0.1774802953 2 X4

1997 1.969696970 Y1 0.1774054915 2 X4

1998 1.979797980 Y1 0.1773309857 2 X4

1999 1.989898990 Y1 0.1772567630 2 X4

2000 2.000000000 Y1 0.1771828681 2 X4

[ reached getOption("max.print") — omitted 44200 rows ]

$layers

$layers[[1]]

mapping: colour = Groups

geom_line:

stat_identity:

position_identity: (width = NULL,

$scales

Reference class object of class "Scales"

Field "scales":

list()

……..

…..

.

……

Thanks in advance.

Hi, could you provide some suggestions how to do survival analysis(parametric method) with neuron network?

Please excuse the first reply to the incorrect post.

*******

Hello,

I really like your post’s about the sensitivity analysis and the variable importance. Have you already used the sensitivity analysis (any method of it) on RNN or LSTM Networks? What do you think about this? Generally this should work or am I wrong?

Thanks!