map2map/README.md

105 lines
2.9 KiB
Markdown
Raw Normal View History

2019-11-19 03:43:39 +01:00
# map2map
Neural network emulators to transform field/map data
2020-03-04 02:23:13 +01:00
2020-04-21 18:34:14 +02:00
* [Installation](#installation)
* [Usage](#usage)
* [Data](#data)
* [Data normalization](#data-normalization)
* [Model](#model)
* [Training](#training)
* [Files generated](#files-generated)
* [Tracking](#tracking)
2020-06-15 00:13:34 +02:00
* [Customization](#customization)
2020-04-21 18:34:14 +02:00
2020-03-04 02:23:13 +01:00
## Installation
Install in editable mode
```bash
pip install -e .
```
## Usage
2020-06-14 23:59:31 +02:00
The command is `m2m.py` in your `$PATH` after installation.
Take a look at the examples in `scripts/*.slurm`.
For all command line options look at `map2map/args.py` or do `m2m.py -h`.
2020-03-04 02:23:13 +01:00
2020-04-21 18:34:14 +02:00
### Data
2020-03-04 02:23:13 +01:00
2020-06-14 23:59:31 +02:00
Put each field in one npy file.
2020-03-04 02:23:13 +01:00
Structure your data to start with the channel axis and then the spatial
dimensions.
2020-06-14 23:59:31 +02:00
For example a 2D vector field of size `64^2` should have shape `(2, 64,
64)`.
Specify the data path with
[glob patterns](https://docs.python.org/3/library/glob.html).
During training, pairs of input and target fields are loaded.
Both input and target data can consist of multiple fields, which are
then concatenated along the channel axis.
If the size of a pair of input and target fields is too large to fit in
a GPU, we can crop part of them to form pairs of samples (see `--crop`).
Each field can be cropped multiple times, along each dimension,
controlled by the spacing between two adjacent crops (see `--step`).
The total sample size is the number of input and target pairs multiplied
by the number of cropped samples per pair.
2020-03-04 02:23:13 +01:00
2020-04-21 18:34:14 +02:00
#### Data normalization
2020-03-04 02:23:13 +01:00
Input and target (output) data can be normalized by functions defined in
`map2map2/data/norms/`.
2020-06-14 23:59:31 +02:00
Also see [Customization](#customization).
2020-03-04 02:23:13 +01:00
2020-04-21 18:34:14 +02:00
### Model
2020-03-04 02:23:13 +01:00
Find the models in `map2map/models/`.
2020-06-14 23:59:31 +02:00
Modify the existing models, or write new models somewhere and then
follow [Customization](#customization).
### Training
#### Files generated
* `*.out`: job stdout and stderr
* `state_{i}.pt`: training state after the i-th epoch including the
model state
* `checkpoint.pt`: symlink to the latest state
* `runs/`: directories of tensorboard logs
#### Tracking
Install tensorboard and launch it by
```bash
tensorboard --logdir PATH --samples_per_plugin images=IMAGES --port PORT
```
* Use `.` as `PATH` in the training directory, or use the path to some parent
directory for tensorboard to search recursively for multiple jobs.
* Show `IMAGES` images, or all of them by setting it to 0.
* Pick a free `PORT`. For remote jobs, do ssh port forwarding.
2020-06-15 00:13:34 +02:00
### Customization
Models, criteria, optimizers and data normalizations can be customized
without modifying map2map.
They can be implemented as callbacks in a user directory which is then
passed by `--callback-at`.
The default locations are searched first before the callback directory.
So be aware of name collisions.
This approach is good for experimentation.
For example, one can play with a model `Bar` in `path/to/foo.py`, by
calling `m2m.py` with `--model foo.Bar --callback-at path/to`.