[FIX] Naive Bayes: Ignore existing classes in Laplacian smoothing #3575
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
As suggested by @lanzagar, issue #2943 can be fixed by ignoring empty classes in Laplacian smoothing.
Plainly, say that
y
has valuesmale
,female
,yes
andno
becausey
appeared in two different data sets loaded in the same session. Of course, only two values appear in each set. With this PR, only 2 is added to denominators in computation of probabilities, and the probabilities of empty classes are set to 0.Variable reuse remains a large open issue.
Description of changes
The change also required a minor change in computation of probabilities: instead of computing
exp(log(class_probs) + sum(conditional_probs))
, the code now computesclass_probs * exp(sum(conditional_probls))
becauseclass_probs
can now be 0. This should not affect numerical stability.Includes