Train Your Networks¶
We’re finally ready to train your networks! After all of our preparatory work, the training is pretty straightforward. For each output parameter, one runs a command like:
$ neurosynchro train transformed . j_I
Here, transformed
is the path to the directory with the transformed
training set, .
is the result directory containing the nn_config.toml
file (the current directory, in the standard tutorial layout), and j_I
is
the name of the component to train for. After training is complete, a file
named j_I.h5
will be saved in the result directory. The program will print
out the mean squared error (MSE) characterizing the neural network’s
performance against the training set.
When training against the training set used in the tutorial, it takes about 20 minutes to train on each parameter when using an 8-core laptop CPU. Because there are 9 parameters to train, this means that the training might take something like 3 hours in total. (No substantial effort has gone into optimizing the training process!) If you’re ready to commit to that, you can train all of the parameters in sequence with:
$ all="j_I alpha_I rho_Q rho_V j_frac_pol alpha_frac_pol j_V_share alpha_V_share rho_Q_sign"
$ for p in $all ; do neurosynchro train transformed . $p ; done
If you pass the argument -p
to the train
subcommand, diagnostic plots
will be shown after training is complete. The plots will be made with the
obscure omegaplot package, so make sure
to install it before trying this option.
Trainer Types¶
Neurosynchro supports the following neural network training schemes. For
each output parameter, you can specify which scheme to use by editing its
trainer
keyword in the nn_config.toml
file.
- generic
This neural network has the following characteristics:
- Dense, single-layer architecture
- 300 neurons
- RELU activation.
- Keras’s
normal
kernel initializer - Trained with the adam optimizer.
- Optimized against the mean-squared-error (MSE) loss function.
The network is trained in two passes. First, 30 epochs of training are run. Then the training set is sigma-clipped with ±7σ tolerance — the intention is to remove any cases where the detailed calculation has mistakenly delivered totally bogus results. Then 30 more epochs of training are run.
This network has been observed to perform well in a variety of real-world situations.
- twolayer
This neural network has the following characteristics:
- Dense, two-layer architecture
- 120 neurons in first layer, 60 in second
- RELU activation in both layers.
- Keras’s
normal
kernel initializer - Trained with the adam optimizer.
- Optimized against the mean-squared-error (MSE) loss function.
The training is run in the same way as in the
generic
setup. This network has been observed to perform a little bit better than the generic network when predicting the rho_Q output parameter. This doesn’t always hold, though; if you wish to investigate, try both and see which gives a better MSE.- binary
This neural network has the following characteristics:
- Dense, two-layer architecture
- 120 neurons in first layer, 60 in second
- RELU activation in both layers.
- Sigmoid activation in the output layer.
- Keras’s
normal
kernel initializer - Trained with the adam optimizer.
- Optimized against the binary cross-entropy loss function.
The training is run in almost the same way as in the
generic
setup, but no sigma-clipping is performed. This setup is intended for the rho_Q_sign output parameter, which predicts the sign of the rho_Q coefficient. However, sometimes thegeneric
scheme actually performs better in practice. Once again, investigate by trying both and seeing which gives a better MSE.
This menu of options is, obviously, quite limited. For novel applications, you may have to edit the code to add new training schemes. Pull requests contributing new ones are more than welcome!
Next: run some test problems!