Download a Sample Training Set

In order to train a neural network, you need something to train it on!

In neurosynchro, the training data are a bunch of samples of some detailed synchrotron calculation. That is: given a choice of some input parameters (e.g., a harmonic number), the detailed calculation produces eight output parameters (the Stokes radiative transfer coefficients). The exact number of input parameters can vary depending on what particle distribution is being modeled. Neurosynchro needs a lot of samples of this function to develop a good approximation to it.

The page Make Your Training Set describes how such a training set should be parametrized and formatted. Here, we’ll just download a training set and start using it without worrying about the details.

Note

The training set is about a 300 MB download. It expands to a 700 MB directory, which we will soon reprocess into another directory that’s also about 700 MB.

First, download the example training data set. It is archived on Zenodo under DOI 10.5281/zenodo.1341154. The main data file is a bzipped tar archive named rimphony_powerlaw_s5-5e7_p1.5-7.tar.bz2. The following shell command should download it:

$ curl -fsSL https://zenodo.org/record/1341154/files/rimphony_powerlaw_s5-5e7_p1.5-7.tar.bz2 >trainingset.tar.bz2

Note

This portion of the tutorial is driven through the command line, so you should open up a terminal even if you download and unpack the data file through your web browser or some other means.

And this command should unpack it, creating a directory named rimphony_powerlaw_s5-5e7_p1.5-7:

$ tar xjf trainingset.tar.bz2

If you poke around, you will see that the new directory contains about 500 text files. These are the raw training data, working out to about 22 million synchrotron coefficient calculations. They are performed for an isotropic power-law model with power-law indices ranging between 1.5 and 7.

It is easiest to do the training if we rearrange the directory structure a little bit. The following commands will set things up in a way that we have found to be convenient:

$ mv rimphony_powerlaw_s5-5e7_p1.5-7 rawsamples
$ neurosynchro init-nndir rimphony_powerlaw_s5-5e7_p1.5-7
$ mv rawsamples rimphony_powerlaw_s5-5e7_p1.5-7/
$ cd rimphony_powerlaw_s5-5e7_p1.5-7/

The second command above creates a new directory called rimphony_powerlaw_s5-5e7_p1.5-7 (the same name as was initially created, until we renamed it) and populates it with a default configuration file, nn_config.toml, that we will edit later. The final command changes your shell to work from this new directory, which will now also contain a subdirectory named rawsamples. Subsequent steps in the tutorial will assume that you are still operating from this directory. Make sure to return to it if you close and re-open your shell.

Next: transform your training set.