Add documentation
This commit is contained in:
parent
265587922d
commit
728e0597f0
77
README.md
77
README.md
@ -5,6 +5,9 @@ Neural network emulators to transform field/map data
|
|||||||
* [Installation](#installation)
|
* [Installation](#installation)
|
||||||
* [Usage](#usage)
|
* [Usage](#usage)
|
||||||
* [Data](#data)
|
* [Data](#data)
|
||||||
|
* [Data cropping](#data-cropping)
|
||||||
|
* [Data padding](#data-padding)
|
||||||
|
* [Data loading, sampling, and page caching](#data-loading-sampling-and-page-caching)
|
||||||
* [Data normalization](#data-normalization)
|
* [Data normalization](#data-normalization)
|
||||||
* [Model](#model)
|
* [Model](#model)
|
||||||
* [Training](#training)
|
* [Training](#training)
|
||||||
@ -53,6 +56,49 @@ The total sample size is the number of input and target pairs multiplied
|
|||||||
by the number of cropped samples per pair.
|
by the number of cropped samples per pair.
|
||||||
|
|
||||||
|
|
||||||
|
#### Data padding
|
||||||
|
|
||||||
|
Here we are talking about two types of padding.
|
||||||
|
We differentiate the padding during convolution from our explicit
|
||||||
|
data padding, and refer to the former as conv-padding.
|
||||||
|
|
||||||
|
Convolution preserves translational invariance, but conv-padding breaks
|
||||||
|
it, except for the periodic conv-padding, which is not feasible at
|
||||||
|
runtime for large 3D fields.
|
||||||
|
Therefore we recommend convolution without conv-padding.
|
||||||
|
By doing this, the output size will be smaller than the input size, and
|
||||||
|
thus smaller than the target size if it equals the input size, making
|
||||||
|
loss computation inefficient.
|
||||||
|
|
||||||
|
To solve this, we can pad the input before feeding it into the model.
|
||||||
|
The pad size should be adjusted so that the output size equals or
|
||||||
|
approximates the target size.
|
||||||
|
One should be able to calculate the proper pad size given the model.
|
||||||
|
Padding works for cropped samples, or samples with periodic boundary
|
||||||
|
condition.
|
||||||
|
|
||||||
|
|
||||||
|
#### Data loading, sampling, and page caching
|
||||||
|
|
||||||
|
The difference in speed between disks and GPUs makes training an
|
||||||
|
IO-bound job.
|
||||||
|
Stochastic optimization exacerbates the situation, especially for large
|
||||||
|
3D data *with multiple crops per field*.
|
||||||
|
In this case, we can use the `--div-data` option to divide field files
|
||||||
|
among GPUs, so that each node only need to load part of all data if
|
||||||
|
there are multiple nodes.
|
||||||
|
Data division is shuffled every epoch.
|
||||||
|
Crops within each field can be further randomized within a distance
|
||||||
|
relative to the field, controlled by `--div-shuffle-dist`.
|
||||||
|
Setting it to 0 turn off this randomization, and setting it to N limits
|
||||||
|
the shuffling within a distance of N files.
|
||||||
|
With both `--div-data` and `--div-shuffle-dist`, each GPU only need to
|
||||||
|
work on about N files at a time, with those files kept in the Linux page
|
||||||
|
cache.
|
||||||
|
This is especially useful when the amount of data exceeds the CPU memory
|
||||||
|
size.
|
||||||
|
|
||||||
|
|
||||||
#### Data normalization
|
#### Data normalization
|
||||||
|
|
||||||
Input and target (output) data can be normalized by functions defined in
|
Input and target (output) data can be normalized by functions defined in
|
||||||
@ -66,6 +112,31 @@ Find the models in `map2map/models/`.
|
|||||||
Modify the existing models, or write new models somewhere and then
|
Modify the existing models, or write new models somewhere and then
|
||||||
follow [Customization](#customization).
|
follow [Customization](#customization).
|
||||||
|
|
||||||
|
```python
|
||||||
|
class Net(nn.Module):
|
||||||
|
def __init__(self, in_chan, out_chan, mid_chan=32, kernel_size=3,
|
||||||
|
negative_slope=0.2, **kwargs):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
self.conv1 = nn.Conv2d(in_chan, mid_chan, kernel_size)
|
||||||
|
self.act = nn.LeakyReLU(negative_slope)
|
||||||
|
self.conv2 = nn.Conv2d(mid_chan, out_chan, kernel_size)
|
||||||
|
|
||||||
|
def forward(self, x):
|
||||||
|
x = self.conv1(x)
|
||||||
|
x = self.act(x)
|
||||||
|
x = self.conv2(x)
|
||||||
|
return x
|
||||||
|
```
|
||||||
|
|
||||||
|
The model `__init__` requires two positional arguments, the number of
|
||||||
|
input and output channels.
|
||||||
|
Other hyperparameters can be specified as keyword arguments, including
|
||||||
|
the `scale_factor` useful for super-resolution tasks.
|
||||||
|
Note that the `**kwargs` is necessary when `scale_factor` is not
|
||||||
|
specified, because `scale_factor` is always passed when instantiating
|
||||||
|
a model.
|
||||||
|
|
||||||
|
|
||||||
### Training
|
### Training
|
||||||
|
|
||||||
@ -102,6 +173,12 @@ passed by `--callback-at`.
|
|||||||
The default locations are searched first before the callback directory.
|
The default locations are searched first before the callback directory.
|
||||||
So be aware of name collisions.
|
So be aware of name collisions.
|
||||||
|
|
||||||
|
The default locations are
|
||||||
|
* models: `map2map/models/`
|
||||||
|
* criteria: `torch.nn`
|
||||||
|
* optimizers: `torch.optim`
|
||||||
|
* normalizations: `map2map/data/norms/`
|
||||||
|
|
||||||
This approach is good for experimentation.
|
This approach is good for experimentation.
|
||||||
For example, one can play with a model `Bar` in `path/to/foo.py`, by
|
For example, one can play with a model `Bar` in `path/to/foo.py`, by
|
||||||
calling `m2m.py` with `--model foo.Bar --callback-at path/to`.
|
calling `m2m.py` with `--model foo.Bar --callback-at path/to`.
|
||||||
|
@ -128,11 +128,15 @@ def add_train_args(parser):
|
|||||||
|
|
||||||
parser.add_argument('--div-data', action='store_true',
|
parser.add_argument('--div-data', action='store_true',
|
||||||
help='enable data division among GPUs for better page caching. '
|
help='enable data division among GPUs for better page caching. '
|
||||||
|
'Data division is shuffled every epoch. '
|
||||||
'Only relevant if there are multiple crops in each field')
|
'Only relevant if there are multiple crops in each field')
|
||||||
parser.add_argument('--div-shuffle-dist', default=1, type=float,
|
parser.add_argument('--div-shuffle-dist', default=1, type=float,
|
||||||
help='distance to further shuffle within each data division. '
|
help='distance to further shuffle cropped samples relative to '
|
||||||
'Only relevant if there are multiple crops in each field. '
|
'their fields, to be used with --div-data. '
|
||||||
|
'Only relevant if there are multiple crops in each file. '
|
||||||
'The order of each sample is randomly displaced by this value. '
|
'The order of each sample is randomly displaced by this value. '
|
||||||
|
'Setting it to 0 turn off this randomization, and setting it to N '
|
||||||
|
'limits the shuffling within a distance of N files. '
|
||||||
'Change this to balance cache locality and stochasticity')
|
'Change this to balance cache locality and stochasticity')
|
||||||
parser.add_argument('--dist-backend', default='nccl', type=str,
|
parser.add_argument('--dist-backend', default='nccl', type=str,
|
||||||
choices=['gloo', 'nccl'], help='distributed backend')
|
choices=['gloo', 'nccl'], help='distributed backend')
|
||||||
|
Loading…
Reference in New Issue
Block a user