mesh sampling

tool to sample uniform random particles in polygon from the 1st try

tool to sample uniform random particles in polygon from the 1st try
- Overview
- tests
  - Sampling fit
  - Uniformness (Pearson’s chi-squared test)
    - Pearson’s chi-squared test
    - Test performance

Overview

The input for algorithm is the boost polygone and number of samples $N$ and output is the vector of near-uniform sampled coordinates $[x_i, y_i]_N$

$[x_i, y_i]_N = mesh\_ sampling(poly, N)$

Algorithm key concept

Triangulate polygone (earcut) $[t_i]_m = earcut(poly)$
Sample points distributed according to triangle sizes to each triangle
- Estimate sizes of triangles $[w_i]_m = [area(t_i)]_m$
- Construct weighted distribution from triangle weights
- Sample $N$ times from distribution of triangles size >> $[n_0, ..., n_i, ...]_m = discrete\_dist([w_i]_m, N)$
- Sample $n_i$ points in triangle $i$ for each triangle from polygone decomposition $[x_i, y_i]_{ni} = [triangle\_sample(x_i| t_i^a, t_i^b, t_i^c)]_{ni}$

Uniform sampling for triangle is a known algorithm:

Generate 2 variables from uniform distribution: $u0,u1$, $u_i \in [0.. 1]$

$(x_i| a, b, c) = a + u_0 \overline{ab} + + u_1 \overline{ac}$

Where $u_0 + u_1 \le 1$. Sampling procedure select random point with DOF 2 and map point from this probability space to triangle.

demo example: uniform samples in polygone

Proof of concept:

It is not obvious, how we can get uniform distribution by sampling from single triangle or from the set of them. Mathematically, it is proven that sampling from single triangle is uniform. However, if two random variables $u0,u1$ are dependent, we may obtain degraded solution. And for the complex case of many triangles in polygone decomposition, we have no prior belief in sampling uniformness, because of many computational involved.

To confirm our algorithm gives correct results, we have tests for uniformness and samples geometrical fit to desired area.

usage

There are two optional ways to use this sampling tool: as the dedicated tool with console input and output, or as a part of navigation library.

Build procedure

test_mesh_sampling_lib -> out/... is our executable, after running build.py from navigation-core/tools/mesh_sampling/, we have this simlink.

To build tests which are located in navigation-core/tools/mesh_sampling/lib/tests/mesh_sample_test we use build.py --tests from .../lib directory.

Dependencies:

Boost::geometry - most geometric operations
earcut.hpp - triangulation library

Visualization

For testing purpose, we added a visualization python script:

cd navigation-core/tools/mesh_sampling/helpers
mesh_sampling_ex 1000 | python3 plot.py

Command above shedules the example of tool usage in helpers directory mesh_sampling_ex >> pass to the visualizer

demo visualization of sampling to selected polygone

On figure above we see the debug visualization for sampling tool. We sample 1000 samples and see they all fit in region and are distributed near uniformly. Nodes are numbered.

tests

We test the sampling tool for:

sampling fit (number of outliers == 0)
uniformness (xi2 statistical test for uniformness)

cd navigation-core/tools/mesh_sampling/lib
build.py --test
test_mesh_sampling_lib

Sampling fit

create base polygone as in visualization demo
sample 1000 points in polygone
check if all points are in allowed area by boost::geometry::covered_by(Point(pt.x, pt.y), poly)

Uniformness (Pearson’s chi-squared test)

We test that samples distribution in polygone is uniform.

We use Pearson test and Xi2 criteria to prove that samples are distributed uniformly.

Pearson test works with groups, we group samples to discrete categories and analyze similarity between obtained and expected distributions.

Pearson’s chi-squared test

We use the Pearson’s chi-squared test in a form of test of goodness of fit. We calculate the similarity between samples distribution and expected distribution (more explanation below).

The similarity criteria and test statistic $\chi^2$ is the sum of normalized squared differences between the observed values and the expected value.

$$\chi^2 = \sum_i{\frac{O_i - E_i}{E_i}}$$

What is the expected and observed values in our context?

We have 1000 samples in 2d area of polygon. We group them by discrete coordinates along x axis. Number of groups ($'K'$) if the input parameter for the algorithm, we select (8, 18, 31) groups respectively.

We test for 1 dimensional case(only along x axis), because the sampling procedure is identical for both axes x and y. The groups can be of any shape, just rectangular shape is easier for test implementation, so we chose it.

We classify samples geometrically to groups:

Each group is represented with rectangle and covers some part of base polygone.
To estimate the area it covers, we calculate the intersection area. The area of intersection divided by total area is the weight of given group. The number of points sampled inside the area should be proportional to this number.
We calculate the number of samples inside the intersection area.

rolling-window sliding window

We estimate the number of samples using sliding window procedure. Each segment is a rectangle that covers some part of base polygone, we count the samples inside and move the polygone. This is how we sure to count all samples and cover all polygone area.

Our result is the discrete distribution $\{n_i, p_i\}_K$.

$K$ segments; $p_i$ can be considered as probability for sample to fall inn this group; $n_i$ point inside segment.

As we have $N$ samples, the expectation for the group size is $E_i = N \cdot p_i$.

Xi2 test uses degrees of freedom (DOF, DoF, nDof) as the parameter. Because one of our parameter $N$ is the constraint to the group sizes ($n_0 + n_1 + ... = N$), we have the reduction in the degrees of freedom $nDoF = K - 1$.

The test steps are the following:

calculate test statistic $\chi^2_{critical}$
determine degres of freedom
select desired significance level $\alpha$
Estimate xi2 critical value, in our case it is only an upper critical value (ucv) $ucv = f(\alpha, K)$
Compare $\chi ^{2}$ to the critical value from the chi-squared distribution
- if $\chi ^{2} < ucv$, the test is satisfied and the distribution is uniform
- else test conditions are not satisfied, distribution grouped by $K$ categories is not uniform to $\alpha$ level of significance

Why we use the significance level: The level of confidence $\gamma = (1 - \alpha)$ or the probability of not rejecting the null hypothesis given that it is true is not clearly the point of this test.

We are only searching for situation, where the probavility of null hypothesis given that it is false will be accepted is low. For this reason we use a single side test, where all 0.05 (5% of the sampling distribution) are allocated to the right side of xi2 test distribution.

We take $\chi^2$ data from tables with distribution constants

Optimal number of groups for the Peason test

The Pearson test work with statistics, and approximations requires some minimal number of data for good approximation. For a reasonable approximation, the rule is to have at least 5-10 values in each group. If we have $n_i<5$ for some of groups, the Xi2 test is not valid anymore and can may have unpredictable behaviour. The requirement is to have at least 4-5 samples in each group. The problem here is not with the test, but because sample mean and deviation are not accurate enough on a small groups.

We can either increase number of samples or decrease number of groups to fit this reqirement.

Our common application is to sample 1000 points at a single time, we want to check the algorithm performance for this scenario.

To estimate the maximum allowed number of segments we can use equality nSegs = std::round(1 + 3.2 * std::log(nSamples)); This is just the approximation that can be used:

>>> [int(1 + 3.2 * numpy.log(i)) for i in [100, 1000, 2000, 5000, 10000, 1000000]]
[15, 23, 25, 28, 30, 45]

Lets say we have 1000 samples, distributed around polygone in 30 groups. The average group size will be around 33, but if the shape of polygone is not uniform we can have 5-6-10 samples in minimal intersection. We just consider the possible problems here and choose number of samples to be 1000+ for the test. The reasonable number of groups will be about 15.

Test performance

The Pearson test is statistical, so for random input data, it may result in several false results. To avoid frequent test failures because of its random behaviour, we collect the cumulative statistics:

for 90%+ from 40 test runs, the distribution is uniform
- Pearson test is true for the samples discrete distribution with $K$ categories and significance level alpha
  - discrete distribution of samples along x axis in $K$ categories from $N$ samples drawn from polygone

The numbers above are given for reference.

Test statistics for defined input polygons, $nSamples = 1000$, $\alpha = 0.05$:

nDoF	$\chi^2_{critical}$	$p_1$ success rate	$p_2$ success rate	$p_3$ success rate
7	14.0671	98.2	97.9	92.3
17	27.5871	97.4	96.3	94.3
30	43.773	96.9	95.8	93.8

Table above shows the average success rate for Pearson test. With the significance level 0.05, 982 runs from 1000 were successful. We limit our test to have success rate about this average performance.

$17/20 < 90\% < 177/200$. If we limit our success rate to 90%, we can obtain bad results when the number of runs is small.

We tested performance for 1000 iterations, and we can say that repeatability of 90%+ with $\alpha = 0.05$ confidence level is accurate enough for our uniform random sampling.