Deep Learning model too large #15516

hasithjp · 2023-05-24T12:34:00Z

hasithjp
May 24, 2023
Maintainer

Problem

H2O Deep Learning triggers an internal limitation of H2O on the max. size of an object in the distributed K-V store (that is the core of H2O). This limit is 256MB, and once the DL model hits that size, this condition occurs. The reason is that the Deep Learning model is currently stored as one large piece, instead of splitting it up into partial pieces. Cutting it into one piece per hidden layer won't solve this issue either, so we would have to cut a single matrix into multiple pieces to address this issue, which is somewhat cumbersome to implement. That said, a model of that size is also going to take a long time to train.

Note: The memory limit has nothing to do with the number of rows of the training data (just the # columns, as that affects the first hidden layer matrix size), nor the RAM or max. allowed heap memory (that is checked separately). It also has nothing to do with the number of nodes, threads, etc. It's purely a function of the model complexity, see the next section.

What affects the model size?

It's mainly the number of total weights and biases, multiplied by an overhead factor of x1, x2 or x3, depending on whether momentum_start==0 && momentum_stable==0 (x1), momentum > 0 (x2) or adaptive learning rate (x3) is used. Then there's some small overhead for model metrics, statistics, counters, etc.

The total weights is directly given by the fully connected layers:

The number of input columns (after automatic one-hot encoding of categoricals)
The size of the hidden layers
The number of output neurons (#classes)

Failing example (~25M floats * 3 for ADADELTA > 256MB)

library(h2o)
h2o.init()
h2o.deeplearning(x=1:4,y=5,as.h2o(iris),hidden=c(5000,5000))

Working example (~25M floats * 1 without ADADELTA and no momentum < 256MB)

library(h2o)
h2o.init()
h2o.deeplearning(x=1:4,y=5,as.h2o(iris),hidden=c(5000,5000), adaptive_rate=F)

Output:

java.lang.IllegalArgumentException: Model is too large

For more information visit:
http://jira.h2o.ai/browse/TN-5
at hex.deeplearning.DeepLearningModel.(DeepLearningModel.java:424)
at hex.deeplearning.DeepLearning$DeepLearningDriver.buildModel(DeepLearning.java:201)
at hex.deeplearning.DeepLearning$DeepLearningDriver.compute2(DeepLearning.java:171)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1005)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
barrier onExCompletion for hex.deeplearning.DeepLearning$DeepLearningDriver@5205f0fd

Solution

The current solution is to reduce the number of hidden neurons, or to reduce the number of (especially categorical) features.

JIRA Issue Migration Info

Jira Issue: TN-5
Assignee: Arno Candel
Reporter: Arno Candel
State: Closed
Relates to: #13925

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep Learning model too large #15516

{{title}}

Replies: 0 comments

Select a reply

Deep Learning model too large #15516

hasithjp May 24, 2023 Maintainer

Problem

What affects the model size?

Failing example (~25M floats * 3 for ADADELTA > 256MB)

Working example (~25M floats * 1 without ADADELTA and no momentum < 256MB)

Output:

Solution

Replies: 0 comments

hasithjp
May 24, 2023
Maintainer