Add documentation

2020-08-04 18:59:49 -04:00 · 2020-08-04 18:59:49 -04:00 · 728e0597f0
commit 728e0597f0
parent 265587922d
2 changed files with 83 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -5,6 +5,9 @@ Neural network emulators to transform field/map data
 * [Installation](#installation)
 * [Usage](#usage)
    * [Data](#data)
        * [Data cropping](#data-cropping)
        * [Data padding](#data-padding)
        * [Data loading, sampling, and page caching](#data-loading-sampling-and-page-caching)
        * [Data normalization](#data-normalization)
    * [Model](#model)
    * [Training](#training)
@ -53,6 +56,49 @@ The total sample size is the number of input and target pairs multiplied
 by the number of cropped samples per pair.
 #### Data padding
 Here we are talking about two types of padding.
 We differentiate the padding during convolution from our explicit
 data padding, and refer to the former as conv-padding.
 Convolution preserves translational invariance, but conv-padding breaks
 it, except for the periodic conv-padding, which is not feasible at
 runtime for large 3D fields.
 Therefore we recommend convolution without conv-padding.
 By doing this, the output size will be smaller than the input size, and
 thus smaller than the target size if it equals the input size, making
 loss computation inefficient.
 To solve this, we can pad the input before feeding it into the model.
 The pad size should be adjusted so that the output size equals or
 approximates the target size.
 One should be able to calculate the proper pad size given the model.
 Padding works for cropped samples, or samples with periodic boundary
 condition.
 #### Data loading, sampling, and page caching
 The difference in speed between disks and GPUs makes training an
 IO-bound job.
 Stochastic optimization exacerbates the situation, especially for large
 3D data *with multiple crops per field*.
 In this case, we can use the `--div-data` option to divide field files
 among GPUs, so that each node only need to load part of all data if
 there are multiple nodes.
 Data division is shuffled every epoch.
 Crops within each field can be further randomized within a distance
 relative to the field, controlled by `--div-shuffle-dist`.
 Setting it to 0 turn off this randomization, and setting it to N limits
 the shuffling within a distance of N files.
 With both `--div-data` and `--div-shuffle-dist`, each GPU only need to
 work on about N files at a time, with those files kept in the Linux page
 cache.
 This is especially useful when the amount of data exceeds the CPU memory
 size.
 #### Data normalization
 Input and target (output) data can be normalized by functions defined in
@ -66,6 +112,31 @@ Find the models in `map2map/models/`.
 Modify the existing models, or write new models somewhere and then
 follow [Customization](#customization).
 ```python
 class Net(nn.Module):
    def __init__(self, in_chan, out_chan, mid_chan=32, kernel_size=3,
                 negative_slope=0.2, **kwargs):
        super().__init__()
        self.conv1 = nn.Conv2d(in_chan, mid_chan, kernel_size)
        self.act = nn.LeakyReLU(negative_slope)
        self.conv2 = nn.Conv2d(mid_chan, out_chan, kernel_size)
    def forward(self, x):
        x = self.conv1(x)
        x = self.act(x)
        x = self.conv2(x)
        return x
 ```
 The model `__init__` requires two positional arguments, the number of
 input and output channels.
 Other hyperparameters can be specified as keyword arguments, including
 the `scale_factor` useful for super-resolution tasks.
 Note that the `**kwargs` is necessary when `scale_factor` is not
 specified, because `scale_factor` is always passed when instantiating
 a model.
 ### Training
@ -102,6 +173,12 @@ passed by `--callback-at`.
 The default locations are searched first before the callback directory.
 So be aware of name collisions.
 The default locations are
 * models: `map2map/models/`
 * criteria: `torch.nn`
 * optimizers: `torch.optim`
 * normalizations: `map2map/data/norms/`
 This approach is good for experimentation.
 For example, one can play with a model `Bar` in `path/to/foo.py`, by
 calling `m2m.py` with `--model foo.Bar --callback-at path/to`.
--- a/map2map/args.py
+++ b/map2map/args.py
@ -128,11 +128,15 @@ def add_train_args(parser):
    parser.add_argument('--div-data', action='store_true',
            help='enable data division among GPUs for better page caching. '
            'Data division is shuffled every epoch. '
            'Only relevant if there are multiple crops in each field')
    parser.add_argument('--div-shuffle-dist', default=1, type=float,
-            help='distance to further shuffle within each data division. '
+            help='distance to further shuffle cropped samples relative to '
-            'Only relevant if there are multiple crops in each field. '
+            'their fields, to be used with --div-data. '
            'Only relevant if there are multiple crops in each file. '
            'The order of each sample is randomly displaced by this value. '
            'Setting it to 0 turn off this randomization, and setting it to N '
            'limits the shuffling within a distance of N files. '
            'Change this to balance cache locality and stochasticity')
    parser.add_argument('--dist-backend', default='nccl', type=str,
            choices=['gloo', 'nccl'], help='distributed backend')