map2map/README.md

108 lines
2.9 KiB
Markdown

# map2map
Neural network emulators to transform field/map data
* [Installation](#installation)
* [Usage](#usage)
* [Data](#data)
* [Data normalization](#data-normalization)
* [Model](#model)
* [Training](#training)
* [Files generated](#files-generated)
* [Tracking](#tracking)
* [Customization](#customization)
## Installation
Install in editable mode
```bash
pip install -e .
```
## Usage
The command is `m2m.py` in your `$PATH` after installation.
Take a look at the examples in `scripts/*.slurm`.
For all command line options look at `map2map/args.py` or do `m2m.py -h`.
### Data
Put each field in one npy file.
Structure your data to start with the channel axis and then the spatial
dimensions, e.g. `(2, 64, 64)` for a 2D vector field of size `64^2` and
`(1, 32, 32, 32)` for a 3D scalar field of size `32^3`.
Specify the data path with
[glob patterns](https://docs.python.org/3/library/glob.html).
During training, pairs of input and target fields are loaded.
Both input and target data can consist of multiple fields, which are
then concatenated along the channel axis.
#### Data cropping
If the size of a pair of input and target fields is too large to fit in
a GPU, we can crop part of them to form pairs of samples.
Each field can be cropped multiple times, along each dimension.
See `--crop`, `--crop-start`, `--crop-stop`, and `--crop-step`.
The total sample size is the number of input and target pairs multiplied
by the number of cropped samples per pair.
#### Data normalization
Input and target (output) data can be normalized by functions defined in
`map2map2/data/norms/`.
Also see [Customization](#customization).
### Model
Find the models in `map2map/models/`.
Modify the existing models, or write new models somewhere and then
follow [Customization](#customization).
### Training
#### Files generated
* `*.out`: job stdout and stderr
* `state_{i}.pt`: training state after the i-th epoch including the
model state
* `checkpoint.pt`: symlink to the latest state
* `runs/`: directories of tensorboard logs
#### Tracking
Install tensorboard and launch it by
```bash
tensorboard --logdir PATH --samples_per_plugin images=IMAGES --port PORT
```
* Use `.` as `PATH` in the training directory, or use the path to some parent
directory for tensorboard to search recursively for multiple jobs.
* Show `IMAGES` images, or all of them by setting it to 0.
* Pick a free `PORT`. For remote jobs, do ssh port forwarding.
### Customization
Models, criteria, optimizers and data normalizations can be customized
without modifying map2map.
They can be implemented as callbacks in a user directory which is then
passed by `--callback-at`.
The default locations are searched first before the callback directory.
So be aware of name collisions.
This approach is good for experimentation.
For example, one can play with a model `Bar` in `path/to/foo.py`, by
calling `m2m.py` with `--model foo.Bar --callback-at path/to`.