Torch-Automatic-Distributed-Neural-Network

Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI this module will automatically parallelize neural network training.

Main contribution is reducing the implementation complexity of data parallel neural network training by more than 90% and providing components, with near zero implementation complexity, to execute model parallel training on all or only select fully-connected neural layers.

See Thesis insert link

Before Installation

Install Torch 7.0 and TorchMPI

Note: TorchMPI must be built using OpenMPI 2.0.1

Additional Dependencies

cutorch
cunn
torchnet

Source Code

To install:

git clone https://github.com/ngrabaskas/Torch-Automatic-Distributed-Neural-Network.git
cd Torch-Automatic-Distributed-Neural-Network
luarocks make

To remove:

luarocks remove torchad_nn

TorchMPI

If you wish to start MPI yourself, use the following commands. This is necessary to use the manual synchronize function.

mpi = require('torchmpi')
mpi.start(true)  --true equals use GPU

The mpi handle must be passed to the parallelize function and synchronizeModel function.

To use Data Parallelism

Load TorchAD-NN library

automation = require 'torchad_nn.datamodule'

Then after loading your data and before pre-processing call automated parallelization function. For example:

----------------------------------------------------------------------
-- Load Dataset Example
for i = 0,4 do
   subset = torch.load('cifar-10-batches-t7/data_batch_' .. (i+1) .. '.t7', 'ascii')
   data[{ {i*10000+1, (i+1)*10000} }] = subset.data:t()
   labels[{ {i*10000+1, (i+1)*10000} }] = subset.labels
end
size = data.size()

----------------------------------------------------------------------
-- Call Automated Parallelization 
data, labels, size = automation.parallelize(data, labels, model, size, mpi, mpinn, batchSize) 

----------------------------------------------------------------------
-- preprocess/normalize data
...

This parallelize() function will split dataset evenly across all nodes in the MPI handle. Synchronization will occur automatically as long as stochasticgradient:train() or model:backward() is being used in training. Mpi and mpinn handles can be replaced with 'nil' and TorchAD-NN will automatically start them.

To set synchronization manually use:

model:backward()

-- turn off automatic synchronize by using -1 
data, labels, size = automation.parallelize(data, labels, model, size, mpi, mpinn, -1) 

-- after backward propagation place synchronize call
automation.synchronizeModel(model, mpi)

To use Model Parallelism

Load TorchAD-NN library

require 'torchad_nn'

Start MPI

mpi = require('torchmpi')
mpi.start(true)  --true equals use GPU

Use new neural layer component names.

-- old components
model:add(nn.Reshape(1024))
model:add(nn.Linear(1024, 2048))
model:add(nn.Tanh())
model:add(nn.Linear(2048,10))

-- new parallelized components
model:add(nn.MPInitialReshape(1024))
model:add(nn.MPInitialLinear(1024, 2048))
model:add(nn.MPTanh())
model:add(nn.MPBaseLinear(2048,10))

Five available components:

MPInitialReshape()*
MPInitialLinear()*
MPBaseReshape()*
MPBaseLinear()*
MPTanh()

*Use Initial components if they are at the top of the network and Base if they are below the top layers.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
cifar_example		cifar_example
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
datamodule.lua		datamodule.lua
init.lua		init.lua
nodemodule.lua		nodemodule.lua
torchad_nn-scm-1.rockspec		torchad_nn-scm-1.rockspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Torch-Automatic-Distributed-Neural-Network

Before Installation

Source Code

TorchMPI

To use Data Parallelism

To use Model Parallelism

About

Releases

Packages

Languages

License

ngrabaskas/Torch-Automatic-Distributed-Neural-Network

Folders and files

Latest commit

History

Repository files navigation

Torch-Automatic-Distributed-Neural-Network

Before Installation

Source Code

TorchMPI

To use Data Parallelism

To use Model Parallelism

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages