Add documentation

2020-08-04 18:59:49 -04:00 · 2020-08-04 18:59:49 -04:00 · 728e0597f0
commit 728e0597f0
parent 265587922d
2 changed files with 83 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -5,6 +5,9 @@ Neural network emulators to transform field/map data
 * [Installation](#installation)
 * [Usage](#usage)
    * [Data](#data)
+        * [Data cropping](#data-cropping)
+        * [Data padding](#data-padding)
+        * [Data loading, sampling, and page caching](#data-loading-sampling-and-page-caching)
        * [Data normalization](#data-normalization)
    * [Model](#model)
    * [Training](#training)
@ -53,6 +56,49 @@ The total sample size is the number of input and target pairs multiplied
 by the number of cropped samples per pair.


+#### Data padding
+
+Here we are talking about two types of padding.
+We differentiate the padding during convolution from our explicit
+data padding, and refer to the former as conv-padding.
+
+Convolution preserves translational invariance, but conv-padding breaks
+it, except for the periodic conv-padding, which is not feasible at
+runtime for large 3D fields.
+Therefore we recommend convolution without conv-padding.
+By doing this, the output size will be smaller than the input size, and
+thus smaller than the target size if it equals the input size, making
+loss computation inefficient.
+
+To solve this, we can pad the input before feeding it into the model.
+The pad size should be adjusted so that the output size equals or
+approximates the target size.
+One should be able to calculate the proper pad size given the model.
+Padding works for cropped samples, or samples with periodic boundary
+condition.
+
+
+#### Data loading, sampling, and page caching
+
+The difference in speed between disks and GPUs makes training an
+IO-bound job.
+Stochastic optimization exacerbates the situation, especially for large
+3D data *with multiple crops per field*.
+In this case, we can use the `--div-data` option to divide field files
+among GPUs, so that each node only need to load part of all data if
+there are multiple nodes.
+Data division is shuffled every epoch.
+Crops within each field can be further randomized within a distance
+relative to the field, controlled by `--div-shuffle-dist`.
+Setting it to 0 turn off this randomization, and setting it to N limits
+the shuffling within a distance of N files.
+With both `--div-data` and `--div-shuffle-dist`, each GPU only need to
+work on about N files at a time, with those files kept in the Linux page
+cache.
+This is especially useful when the amount of data exceeds the CPU memory
+size.
+
+
 #### Data normalization

 Input and target (output) data can be normalized by functions defined in
@ -66,6 +112,31 @@ Find the models in `map2map/models/`.
 Modify the existing models, or write new models somewhere and then
 follow [Customization](#customization).

+```python
+class Net(nn.Module):
+    def __init__(self, in_chan, out_chan, mid_chan=32, kernel_size=3,
+                 negative_slope=0.2, **kwargs):
+        super().__init__()
+
+        self.conv1 = nn.Conv2d(in_chan, mid_chan, kernel_size)
+        self.act = nn.LeakyReLU(negative_slope)
+        self.conv2 = nn.Conv2d(mid_chan, out_chan, kernel_size)
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.act(x)
+        x = self.conv2(x)
+        return x
+```
+
+The model `__init__` requires two positional arguments, the number of
+input and output channels.
+Other hyperparameters can be specified as keyword arguments, including
+the `scale_factor` useful for super-resolution tasks.
+Note that the `**kwargs` is necessary when `scale_factor` is not
+specified, because `scale_factor` is always passed when instantiating
+a model.
+

 ### Training

@ -102,6 +173,12 @@ passed by `--callback-at`.
 The default locations are searched first before the callback directory.
 So be aware of name collisions.

+The default locations are
+* models: `map2map/models/`
+* criteria: `torch.nn`
+* optimizers: `torch.optim`
+* normalizations: `map2map/data/norms/`
+
 This approach is good for experimentation.
 For example, one can play with a model `Bar` in `path/to/foo.py`, by
 calling `m2m.py` with `--model foo.Bar --callback-at path/to`.
--- a/map2map/args.py
+++ b/map2map/args.py
@ -128,11 +128,15 @@ def add_train_args(parser):

    parser.add_argument('--div-data', action='store_true',
            help='enable data division among GPUs for better page caching. '
+            'Data division is shuffled every epoch. '
            'Only relevant if there are multiple crops in each field')
    parser.add_argument('--div-shuffle-dist', default=1, type=float,
-            help='distance to further shuffle within each data division. '
-            'Only relevant if there are multiple crops in each field. '
+            help='distance to further shuffle cropped samples relative to '
+            'their fields, to be used with --div-data. '
+            'Only relevant if there are multiple crops in each file. '
            'The order of each sample is randomly displaced by this value. '
+            'Setting it to 0 turn off this randomization, and setting it to N '
+            'limits the shuffling within a distance of N files. '
            'Change this to balance cache locality and stochasticity')
    parser.add_argument('--dist-backend', default='nccl', type=str,
            choices=['gloo', 'nccl'], help='distributed backend')