Skorch: A PyTorch wrapper providing an sklearn interface
Table of Contents
Introduction
sklearn
and pytorch
are without a doubt among the most popular Machine and Deep Learning libraries for Python.
For a recent project, I wanted to quickly replace a tree-based learner with a neural network, while reusing as much code as possible.
sklearn
provides an implementation of neural networks, but it lacks many architectural network features that pytorch
offers (e.g., dropout layers).
However, the biggest caveat is the lack of GPU acceleration!
This is where skorch
comes into place: an sklearn
compatible neural network library that wraps PyTorch!
The main features advertised by skorch
are:
- A wrapper around
pytorch
with ansklearn
interface, - a reduction of boilerplate code in the training loop,
- custom callbacks,
- no intention to reinvent the wheel!
Getting started with skorch
As skorch
is well-documented, getting started could not be any easier, as the following code demonstrates – note that it is unreasonable to expect any significant speedup from cuda
on such a tiny fully-connected network!
We begin by loading the required packages:
1import numpy as np
2from sklearn.datasets import make_classification, make_regression
3from skorch import NeuralNetRegressor
4from skorch.callbacks import Checkpoint, ProgressBar
5from torch_nn import RegressorModule
6import torch
7from torch import nn
Next, we build our RegressorModule
that defines our neural network.
Any instantiation of this class expects arguments num_inputs
, num_outputs
and the shape of the hidden_units
as an array (e.g., [128, 64] to represent two hidden layers of size 128 and 64, respectively).
By default, we use rectified linear units as activation functions.
Following that, the network is instantiated as a dense, fully connected network.
1class RegressorModule(nn.Module):
2 def __init__(
3 self,
4 num_inputs,
5 hidden_units,
6 num_outputs,
7 nonlin=nn.ReLU(),
8 squeeze_output=False,
9 ):
10 super(RegressorModule, self).__init__()
11 self.input_units = num_inputs
12 self.hidden_units = hidden_units
13 self.output_units = num_outputs
14 self.nonlin = nonlin
15 self.squeeze_output = squeeze_output
16
17 self.reset_params()
18
19 def reset_params(self):
20 """(Re)set all parameters."""
21 units = [self.input_units]
22 units += self.hidden_units
23 units += [self.output_units]
24
25 sequence = []
26 for u0, u1 in zip(units, units[1:]):
27 sequence.append(nn.Linear(u0, u1))
28 sequence.append(self.nonlin)
29
30 sequence = sequence[:-1] # no ReLU on output!
31
32 self.sequential = nn.Sequential(*sequence)
33
34 def forward(self, X):
35 X = self.sequential(X)
36 if self.squeeze_output:
37 X = X.squeeze(-1)
38 return X
A simple training function might look as follows.
We begin by generating dummy inputs X
and labels y
and specify the architecture
of our fully connected feed-forward network.
This code also demonstrates one of the advantages of using skorch
: we can use predefined or custom callbacks
.
In our case, we add a Checkpoint
in order to save a model with the corresponding fn_prefix
to a file with in a folder foo
when either:
- An epoch has passed.
- An epoch has passed in which we saw a new best training loss.
- An epoch has passed in which we saw a new best validation loss.
1def train():
2 # Generate dummy data
3 X, y = make_regression(
4 1000, 20, n_informative=10, n_targets=2, random_state=0
5 )
6 X, y = X.astype(np.float32), y.astype(np.float32) / 100
7 architecture = [32, 16]
8
9 # Callbacks
10 path = "foo"
11 cp_epoch = Checkpoint(
12 dirname=path,
13 fn_prefix="epoch_",
14 monitor=None,
15 event_name="e",
16 )
17 cp_train = Checkpoint(
18 dirname=path,
19 fn_prefix="train_",
20 monitor="train_loss_best",
21 event_name="t",
22 )
23 cp_valid = Checkpoint(
24 dirname=path,
25 fn_prefix="valid_",
26 monitor="valid_loss_best",
27 event_name="v",
28 )
29 progress = ProgressBar()
30 cb = [cp_epoch, cp_train, cp_valid, progress]
31
32 model = NeuralNetRegressor(
33 module=RegressorModule,
34 module__num_inputs=X.shape[1],
35 module__hidden_units=architecture,
36 module__num_outputs=y.shape[1],
37 criterion=torch.nn.MSELoss,
38 # device="cuda", # uncomment this to train with CUDA
39 optimizer=torch.optim.Adam,
40 max_epochs=10,
41 batch_size=256, # default: 128
42 lr=1e-2, # default: 1e-3
43 # train_split=None, # default: 80/20 train/valid
44 iterator_train__shuffle=True,
45 # iterator_train__num_workers=4,
46 # iterator_valid__num_workers=4,
47 # iterator_valid__shuffle=False,
48 callbacks=cb,
49 )
50
51 torch.manual_seed(0)
52 torch.cuda.manual_seed(0)
53
54 model.fit(X, y)
Apart from a Checkpoint
, skorch.callbacks
provides several other callbacks
and allows us to define our own.
This includes, e.g., callbacks to apply EarlyStopping
, when a metric of interest does not improve for a given number of epochs, or LRScheduler
to dynamically adjust the lr
, in case the model stalls.
The latter is perfectly suited to provide an interface to torch.optim
in order to apply, e.g., ReduceLROnPlateau
.
1if __name__ == "__main__":
2
3 train()
Running the code yields the following output that documents the changes in the training and validation loss along with the duration of each epoch.
The events e
, t
and v
correspond to our checkpoints and document the epochs in which the corresponding monitor
improved.
epoch e t train_loss v valid_loss dur
------- --- --- ------------ --- ------------ ------
1 + + 3.1877 + 2.9380 0.0295
2 + + 2.9004 + 2.4576 0.0105
3 + + 2.3326 + 1.5936 0.0112
4 + + 1.3732 + 0.5787 0.0130
5 + + 0.4784 + 0.3349 0.0105
6 + + 0.4192 0.3570 0.0120
7 + + 0.2749 + 0.1664 0.0135
8 + + 0.1456 0.1880 0.0097
9 + 0.1630 + 0.1354 0.0105
10 + + 0.0919 + 0.0892 0.0100
Loading or warmstarting a model
A Checkpoint
is also perfectly suited to load and initialize a model.
Consider that our previous models were saved to the folder foo
.
We can resume training from the last Checkpoint
by passing a LoadInitState
.
Note that this
1 # Previous call at end of train()
2 # model.fit(X, y)
3
4 # Warm start a model
5 model.warm_start = True
6 model.fit(X, y)
7
8 # Load a model from a Checkpoint
9 from skorch.callbacks import LoadInitState
10 cb.append(LoadInitState(cp_train))
11
12 model = NeuralNetRegressor(
13 RegressorModule,
14 module__num_inputs=X.shape[1],
15 module__hidden_units=architecture,
16 module__num_outputs=y.shape[1],
17 optimizer=torch.optim.Adam,
18 lr=0.1,
19 callbacks=cb,
20 )
21
22 # Continue training at the last epoch where the
23 # training loss improved with an lr
24 model.fit(X, y)
Query model weights and bias
In some cases, it may be desirable to query weight matrices W
and bias vectors b
of each layer.
With skorch
, this can easily be achieved as follows.
1 # Previous call to fit() saved a model
2 #model.fit(X, y)
3
4 model = NeuralNetRegressor(
5 RegressorModule,
6 module__num_inputs=X.shape[1],
7 module__hidden_units=architecture,
8 module__num_outputs=y.shape[1],
9 optimizer=torch.optim.Adam,
10 lr=0.1,
11 callbacks=cb,
12 )
13
14 # Load the model and delete its history
15 model.initialize()
16 model.trim_for_prediction()
17
18 # Query weight matrices and bias vectors
19 W = []
20 b = []
21 for i, tensor in enumerate(list(model.module_.parameters())):
22 if i % 2 == 0:
23 assert len(tensor.shape) == 2
24 W.append(tensor.mT.detach().numpy())
25 else:
26 assert len(tensor.shape) == 1
27 b.append(tensor.detach().numpy())
Compatability with sklearn API
Because skorch
provides compatability with sklearn
, any model that was instantitated through skorch can be used in the standard sklearn
manner.
Applications include a Pipeline
or a GridSearchCV
.
1from sklearn.pipeline import Pipeline
2from sklearn.preprocessing import StandardScaler
3
4pipe = Pipeline([
5 ('scale', StandardScaler()),
6 ('net', model),
7])
8
9pipe.fit(X, y)
1from sklearn.model_selection import GridSearchCV
2
3# deactivate skorch-internal train-valid split and verbose logging
4net.set_params(train_split=False, verbose=0)
5params = {
6 'lr': [0.01, 0.02],
7 'max_epochs': [10, 20],
8 'module__architecture': [[64], [64, 32], [128, 64]],
9}
10gs = GridSearchCV(net, params, refit=False, cv=3, scoring='r2')
11
12gs.fit(X, y)
13print(gs.best_score_, gs.best_params_)
Conclusion
This concludes a first look at the regressor
(classifier
) and callbacks
modules of skorch! 😎