Keras entered the Python world in 2015, and really propelled and sustained the use of Python for neural networks and more general machine learning. R, however, did not take long to catch up, with the R Keras package released in 2017. This package essentially translates the familiar style of R to Python, allowing you to easily use the full power of Keras while enjoying the elegance of, for instance the “tidyverse” style of programming.

While it is worthwhile to work once through the pain of coding a neural network by hand to make sure you fully understand what you’re doing, in everyday life using Keras will save you time, errors, and frustration.

## How simple is it?

Say we want a fully connected network of a few layers, for a regression problem (N.B. with many regression problems you want to try a much simpler model first!).

We have a data set train_x, which we will feed into the network. First, one installs R and keras. Then type:

model <- keras_model_sequential() %>%

layer_dense(units = 64,

activation = "relu",

input_shape = dim(train_x)[2]) %>%

layer_dense(units = 64,

activation = "relu") %>%

layer_dense(units = 1) model %>% compile(

loss = "mse",

optimizer = optimizer_adam(),

metrics = list("mean_absolute_error")

)

model %>% fit(train_x,

train_y,

epochs = 20)

and that’s all you need to train your first neural network. When you consider that this is a model with 4609 parameters, it doesn’t look so difficult to code up!

Let’s unpack this a bit, because even if you’re not familiar with R, you can sort of see what the different bits of this code are doing. Part 1 is building the architecture, part 2 is telling the model how to learn, and part 3 is training the model.

- We have three layers, of 64, 64 and 1 neuron. Each neuron except the last layer has an activation (“relu” for rectified linear unit) that determines what it outputs. Finally the dim(train_x) part is just saying the dimensions (number of columns/variables) of the information you are feeding into the network. One great part about Keras is that it can figure out the dimensions of the intermediate steps, once you give it a little bit of information about the initial step.
- The compile step tells the model things like how fast it should learn and how you’re judging the performance of your model. Here we’re using an adam optimizer that adjusts the learning parameters as you train.
- The final step just feeds in the data, saying how long (how many epochs) you want to train it for.

If we want to evaluate our model, it is just as simple:

model %>% evaluate(test_x,

test_y)

There are a lot of inbuilt layers to choose from – basically all the standard tools are there – dense, convolutional, recurrent and so forth. If you have some non-standard operation to do in a layer, you can implement a function using layer_lambda()

square_function <- function(params){

a_square <- params^2

}

model <- keras_model_sequential() %>%

layer_dense(units = 64,

activation = "relu",

input_shape = dim(train_x)[2]) %>%

layer_lambda(square_function) %>%

layer_dense(units = 1)

You can also write more custom complicated layers, but some attention should be paid to this as sometimes in this case it is not possible to transfer as simply between the various back-ends that Keras offers.

The architecture here is standardized, so that, for instance if I wanted to write a 1-layer gated recurrent unit, it looks almost identical to our dense set-up

model <- keras_model_sequential() %>%

layer_gru(units = 32,

activation = "relu",

input_shape = list(NULL, dim(train_x)[[-1]])) %>%

layer_dense(units = 1)

This makes it especially easy to try out different methods, for instance switching between a GRU and an LSTM, by altering one line of code.

## Overfitting

One of the major issues to fight with when it comes to neural networks is overfitting. This is when instead of learning general rules that work for the data you learn very specific rules that work for your training set, but are too specific and do not generalize to the test set or other new data you have coming in. There are several techniques to handle overfitting – two common being regularization and dropout.

Regularization comes in two forms – L1 where the cost added is proportional to the absolute value of the weight coefficients, and L2 where the cost is proportional to the square of the value of the weight coefficients (anyone familiar with ridge regression will recognise these terms – it works in exactly the same way). L1 regularization sets some of the weights to zero, while L2 regularization shrinks weights. One can of course always use a combination of the two

The other, perhaps more popular option is dropout. Dropout turns off neurons in layers randomly. This means that while the network is learning, it cannot rely too much on any one neuron, meaning the network has to spread out the learning more and can’t memorize induvial data points.

Its easy to implement both of these in R keras

model <- keras_model_sequential() %>%

layer_dense(units = 64,

activation = "relu",

input_shape = dim(train_x)[2]) %>%

layer_dense(units = 32,

kernel_regularizer = regularizer_l2(0.001),

activation = "relu") %>%

layer_dropout(rate = 0.5) %>%

layer_dense(units = 1)

As you train your model, it will likely eventually begin to overfit even with these measures in place. Typically, one want to get the model just before it starts overfitting. One way to do this is to periodically save the model as you train, and then grab the version saved before the overfitting happened. You can implement this automatically with callback_model_checkpoint()

## Saving, loading, getting and assigning weights

Now you’ve trained up your model, how do you store it? You can easily save your model (weights and architecture) with

save_model_hdf5("model.h5")

load_model_hdf5("model.h5")

Often however, you may just want to store the weights as an .rda file.

weights <- get_weights(model)

save(weights, file=weights..rda)

Then when you want to restore your model you can load your weights, and use

set_weights(model, weights)

One can also use this as testing to set exact weights to your model – Note that if you mainly work with dataframes, you may have to brush up on subsetting lists and matrices.

Under some circumstance you may also wish to initialize the weights differently – this is likewise easy to do

model <- keras_model_sequential() %>%

layer_dense(units = 1,

activation = "relu",

input_shape = dim(train_x)[2],

kernel_initializer = 'orthogonal',

bias_initializer = initializer_constant(2.0)

)

This uses a orthogonal kernel initializer (starting with a random orthogonal matrix), and sets all bias terms to a constant (here, 2).

## Other bits and pieces

- When you first get started, stick to the CPU implementation of R Keras – only try to get the GPU up and running when and if you need it, as it can take some time to figure out.

- Obviously, as always, one will eventually do something wrong, and get error messages. A Keras error message is usually of the form of one or two descriptive lines designed to be understood by an R user, followed by a screed of Python errors. Unless you’re totally fluent in Python, its probably best to ignore the long list of Python errors and start trying to understand that first line.

- Sometimes the most difficult bit of the process of making a neural network is chopping up and entering the data correctly into your network. This can be non-trivial for ordered data such as sentences and time series, or for large data that cannot fit into RAM, such as large image sets. A smart way to do this in R is with something called a generator – a way of calling bits of data that you can feed into your model. There are several inbuilt generators in keras such as flow_images_from_directory() and timeseries_generator(). If find you need to build your own, you will need to grapple with the super assignment operator denoted by <<-

- At some point your model will probably start producing NaNs or Infs. This is essentially due to the fundamental instability in a lot of neural networks. If this happens to you, firstly check for NaNs, Infs and NAs in your data (and then double check). There are several things you can then try (as always stackoverflow will have a bunch of additional hints). Try a lower learning rate or an optimizer such as adam. Add a regularizer or gradient clipping or give leaky relu a go.

- Things that often go wrong to relate to incorrect dimensions, incorrect data formats (e.g. using data frames instead of matrices or lists, unexpected NAs) or incorrect subsetting. These are always good things to double check if you’re getting errors.

- It always helps to have a benchmark of a simple rule-of-thumb or easy model to compare with, so you know if your complicated model is worth it.

This is obviously not an extensive overview of Keras, just a small sample of some of the things I’ve found useful. Give it a go for yourself!

## References

There’s lots more places to read in detail about all sort of networks.

https://keras.rstudio.com/ tutorials,

The R Keras “bible”: “Deep Learning with R” by François Chollet (Keras author) and J. J. Allaire (R interface writer)

and the accompanying code: https://github.com/jjallaire/deep-learning-with-r-notebooks

Subsetting, closures and super assignment operators: “Advanced R” by Hadley Wickham https://adv-r.hadley.nz/