map2map/README.md

# map2map
Neural network emulators to transform field/map data


* [Installation](#installation)
* [Usage](#usage)
    * [Data](#data)
        * [Data cropping](#data-cropping)
        * [Data padding](#data-padding)
        * [Data loading, sampling, and page caching](#data-loading-sampling-and-page-caching)
        * [Data normalization](#data-normalization)
    * [Model](#model)
    * [Training](#training)
        * [Files generated](#files-generated)
        * [Tracking](#tracking)
    * [Customization](#customization)


## Installation

Install in editable mode

```bash
pip install -e .
```


## Usage

The command is `m2m.py` in your `$PATH` after installation.
Take a look at the examples in `scripts/*.slurm`.
For all command line options look at `map2map/args.py` or do `m2m.py -h`.


### Data

Put each field in one npy file.
Structure your data to start with the channel axis and then the spatial
dimensions, e.g. `(2, 64, 64)` for a 2D vector field of size `64^2` and
`(1, 32, 32, 32)` for a 3D scalar field of size `32^3`.
Specify the data path with
[glob patterns](https://docs.python.org/3/library/glob.html).

During training, pairs of input and target fields are loaded.
Both input and target data can consist of multiple fields, which are
then concatenated along the channel axis.


#### Data cropping

If the size of a pair of input and target fields is too large to fit in
a GPU, we can crop part of them to form pairs of samples.
Each field can be cropped multiple times, along each dimension.
See `--crop`, `--crop-start`, `--crop-stop`, and `--crop-step`.
The total sample size is the number of input and target pairs multiplied
by the number of cropped samples per pair.


#### Data padding

Here we are talking about two types of padding.
We differentiate the padding during convolution from our explicit
data padding, and refer to the former as conv-padding.

Convolution preserves translational invariance, but conv-padding breaks
it, except for the periodic conv-padding, which is not feasible at
runtime for large 3D fields.
Therefore we recommend convolution without conv-padding.
By doing this, the output size will be smaller than the input size, and
thus smaller than the target size if it equals the input size, making
loss computation inefficient.

To solve this, we can pad the input before feeding it into the model.
The pad size should be adjusted so that the output size equals or
approximates the target size.
One should be able to calculate the proper pad size given the model.
Padding works for cropped samples, or samples with periodic boundary
condition.


#### Data loading, sampling, and page caching

The difference in speed between disks and GPUs makes training an
IO-bound job.
Stochastic optimization exacerbates the situation, especially for large
3D data *with multiple crops per field*.
In this case, we can use the `--div-data` option to divide field files
among GPUs, so that each node only need to load part of all data if
there are multiple nodes.
Data division is shuffled every epoch.
Crops within each field can be further randomized within a distance
relative to the field, controlled by `--div-shuffle-dist`.
Setting it to 0 turn off this randomization, and setting it to N limits
the shuffling within a distance of N files.
With both `--div-data` and `--div-shuffle-dist`, each GPU only need to
work on about N files at a time, with those files kept in the Linux page
cache.
This is especially useful when the amount of data exceeds the CPU memory
size.


#### Data normalization

Input and target (output) data can be normalized by functions defined in
`map2map2/data/norms/`.
Also see [Customization](#customization).


### Model

Find the models in `map2map/models/`.
Modify the existing models, or write new models somewhere and then
follow [Customization](#customization).

```python
class Net(nn.Module):
    def __init__(self, in_chan, out_chan, mid_chan=32, kernel_size=3,
                 negative_slope=0.2, **kwargs):
        super().__init__()

        self.conv1 = nn.Conv2d(in_chan, mid_chan, kernel_size)
        self.act = nn.LeakyReLU(negative_slope)
        self.conv2 = nn.Conv2d(mid_chan, out_chan, kernel_size)

    def forward(self, x):
        x = self.conv1(x)
        x = self.act(x)
        x = self.conv2(x)
        return x
```

The model `__init__` requires two positional arguments, the number of
input and output channels.
Other hyperparameters can be specified as keyword arguments, including
the `scale_factor` useful for super-resolution tasks.
Note that the `**kwargs` is necessary when `scale_factor` is not
specified, because `scale_factor` is always passed when instantiating
a model.


### Training


#### Files generated

* `*.out`: job stdout and stderr
* `state_{i}.pt`: training state after the i-th epoch including the
  model state
* `checkpoint.pt`: symlink to the latest state
* `runs/`: directories of tensorboard logs


#### Tracking

Install tensorboard and launch it by

```bash
tensorboard --logdir PATH --samples_per_plugin images=IMAGES --port PORT
```

* Use `.` as `PATH` in the training directory, or use the path to some parent
  directory for tensorboard to search recursively for multiple jobs.
* Show `IMAGES` images, or all of them by setting it to 0.
* Pick a free `PORT`. For remote jobs, do ssh port forwarding.


### Customization

Models, criteria, optimizers and data normalizations can be customized
without modifying map2map.
They can be implemented as callbacks in a user directory which is then
passed by `--callback-at`.
The default locations are searched first before the callback directory.
So be aware of name collisions.

The default locations are
* models: `map2map/models/`
* criteria: `torch.nn`
* optimizers: `torch.optim`
* normalizations: `map2map/data/norms/`

This approach is good for experimentation.
For example, one can play with a model `Bar` in `path/to/foo.py`, by
calling `m2m.py` with `--model foo.Bar --callback-at path/to`.
Initial commit 2019-11-19 03:43:39 +01:00			`# map2map`
			`Neural network emulators to transform field/map data`
Add instructions 2020-03-04 02:23:13 +01:00

Add TOC 2020-04-21 18:34:14 +02:00			`* [Installation](#installation)`
			`* [Usage](#usage)`
			`* [Data](#data)`
Add documentation 2020-08-05 00:59:49 +02:00			`* [Data cropping](#data-cropping)`
			`* [Data padding](#data-padding)`
			`* [Data loading, sampling, and page caching](#data-loading-sampling-and-page-caching)`
Add TOC 2020-04-21 18:34:14 +02:00			`* [Data normalization](#data-normalization)`
			`* [Model](#model)`
			`* [Training](#training)`
			`* [Files generated](#files-generated)`
			`* [Tracking](#tracking)`
Fix README order 2020-06-15 00:13:34 +02:00			`* [Customization](#customization)`
Add TOC 2020-04-21 18:34:14 +02:00

Add instructions 2020-03-04 02:23:13 +01:00			`## Installation`

			`Install in editable mode`

			```bash
			`pip install -e .`
			```


			`## Usage`

Add callback loading mechanism 2020-06-14 23:59:31 +02:00			The command is `m2m.py` in your `$PATH` after installation.
			Take a look at the examples in `scripts/*.slurm`.
			For all command line options look at `map2map/args.py` or do `m2m.py -h`.
Add instructions 2020-03-04 02:23:13 +01:00

Add TOC 2020-04-21 18:34:14 +02:00			`### Data`
Add instructions 2020-03-04 02:23:13 +01:00
Add callback loading mechanism 2020-06-14 23:59:31 +02:00			`Put each field in one npy file.`
Add instructions 2020-03-04 02:23:13 +01:00			`Structure your data to start with the channel axis and then the spatial`
Add cropping anchors controlled by start, stop, step 2020-05-05 05:30:59 +02:00			dimensions, e.g. `(2, 64, 64)` for a 2D vector field of size `64^2` and
			`(1, 32, 32, 32)` for a 3D scalar field of size `32^3`.
Add callback loading mechanism 2020-06-14 23:59:31 +02:00			`Specify the data path with`
			`[glob patterns](https://docs.python.org/3/library/glob.html).`

			`During training, pairs of input and target fields are loaded.`
			`Both input and target data can consist of multiple fields, which are`
			`then concatenated along the channel axis.`
Add cropping anchors controlled by start, stop, step 2020-05-05 05:30:59 +02:00

			`#### Data cropping`

Add callback loading mechanism 2020-06-14 23:59:31 +02:00			`If the size of a pair of input and target fields is too large to fit in`
Add cropping anchors controlled by start, stop, step 2020-05-05 05:30:59 +02:00			`a GPU, we can crop part of them to form pairs of samples.`
			`Each field can be cropped multiple times, along each dimension.`
			See `--crop`, `--crop-start`, `--crop-stop`, and `--crop-step`.
Add callback loading mechanism 2020-06-14 23:59:31 +02:00			`The total sample size is the number of input and target pairs multiplied`
			`by the number of cropped samples per pair.`
Add instructions 2020-03-04 02:23:13 +01:00

Add documentation 2020-08-05 00:59:49 +02:00			`#### Data padding`

			`Here we are talking about two types of padding.`
			`We differentiate the padding during convolution from our explicit`
			`data padding, and refer to the former as conv-padding.`

			`Convolution preserves translational invariance, but conv-padding breaks`
			`it, except for the periodic conv-padding, which is not feasible at`
			`runtime for large 3D fields.`
			`Therefore we recommend convolution without conv-padding.`
			`By doing this, the output size will be smaller than the input size, and`
			`thus smaller than the target size if it equals the input size, making`
			`loss computation inefficient.`

			`To solve this, we can pad the input before feeding it into the model.`
			`The pad size should be adjusted so that the output size equals or`
			`approximates the target size.`
			`One should be able to calculate the proper pad size given the model.`
			`Padding works for cropped samples, or samples with periodic boundary`
			`condition.`


			`#### Data loading, sampling, and page caching`

			`The difference in speed between disks and GPUs makes training an`
			`IO-bound job.`
			`Stochastic optimization exacerbates the situation, especially for large`
			`3D data with multiple crops per field.`
			In this case, we can use the `--div-data` option to divide field files
			`among GPUs, so that each node only need to load part of all data if`
			`there are multiple nodes.`
			`Data division is shuffled every epoch.`
			`Crops within each field can be further randomized within a distance`
			relative to the field, controlled by `--div-shuffle-dist`.
			`Setting it to 0 turn off this randomization, and setting it to N limits`
			`the shuffling within a distance of N files.`
			With both `--div-data` and `--div-shuffle-dist`, each GPU only need to
			`work on about N files at a time, with those files kept in the Linux page`
			`cache.`
			`This is especially useful when the amount of data exceeds the CPU memory`
			`size.`


Add TOC 2020-04-21 18:34:14 +02:00			`#### Data normalization`
Add instructions 2020-03-04 02:23:13 +01:00
			`Input and target (output) data can be normalized by functions defined in`
			`map2map2/data/norms/`.
Add callback loading mechanism 2020-06-14 23:59:31 +02:00			`Also see [Customization](#customization).`
Add instructions 2020-03-04 02:23:13 +01:00

Add TOC 2020-04-21 18:34:14 +02:00			`### Model`
Add instructions 2020-03-04 02:23:13 +01:00
			Find the models in `map2map/models/`.
Add callback loading mechanism 2020-06-14 23:59:31 +02:00			`Modify the existing models, or write new models somewhere and then`
			`follow [Customization](#customization).`
Add descriptions on files and tensorboard 2020-04-19 20:34:23 +02:00
Add documentation 2020-08-05 00:59:49 +02:00			```python
			`class Net(nn.Module):`
			`def __init__(self, in_chan, out_chan, mid_chan=32, kernel_size=3,`
			`negative_slope=0.2, **kwargs):`
			`super().__init__()`

			`self.conv1 = nn.Conv2d(in_chan, mid_chan, kernel_size)`
			`self.act = nn.LeakyReLU(negative_slope)`
			`self.conv2 = nn.Conv2d(mid_chan, out_chan, kernel_size)`

			`def forward(self, x):`
			`x = self.conv1(x)`
			`x = self.act(x)`
			`x = self.conv2(x)`
			`return x`
			```

			The model `__init__` requires two positional arguments, the number of
			`input and output channels.`
			`Other hyperparameters can be specified as keyword arguments, including`
			the `scale_factor` useful for super-resolution tasks.
			Note that the `**kwargs` is necessary when `scale_factor` is not
			specified, because `scale_factor` is always passed when instantiating
			`a model.`

Add descriptions on files and tensorboard 2020-04-19 20:34:23 +02:00
			`### Training`


			`#### Files generated`

			* `*.out`: job stdout and stderr
Change .pth to .pt following torch convention 2020-06-21 00:31:09 +02:00			* `state_{i}.pt`: training state after the i-th epoch including the
			`model state`
			* `checkpoint.pt`: symlink to the latest state
Add descriptions on files and tensorboard 2020-04-19 20:34:23 +02:00			* `runs/`: directories of tensorboard logs


			`#### Tracking`

			`Install tensorboard and launch it by`

			```bash
			`tensorboard --logdir PATH --samples_per_plugin images=IMAGES --port PORT`
			```

			* Use `.` as `PATH` in the training directory, or use the path to some parent
			`directory for tensorboard to search recursively for multiple jobs.`
			* Show `IMAGES` images, or all of them by setting it to 0.
			* Pick a free `PORT`. For remote jobs, do ssh port forwarding.
Fix README order 2020-06-15 00:13:34 +02:00

			`### Customization`

			`Models, criteria, optimizers and data normalizations can be customized`
			`without modifying map2map.`
			`They can be implemented as callbacks in a user directory which is then`
			passed by `--callback-at`.
			`The default locations are searched first before the callback directory.`
			`So be aware of name collisions.`

Add documentation 2020-08-05 00:59:49 +02:00			`The default locations are`
			* models: `map2map/models/`
			* criteria: `torch.nn`
			* optimizers: `torch.optim`
			* normalizations: `map2map/data/norms/`

Fix README order 2020-06-15 00:13:34 +02:00			`This approach is good for experimentation.`
			For example, one can play with a model `Bar` in `path/to/foo.py`, by
			calling `m2m.py` with `--model foo.Bar --callback-at path/to`.