flopscope.numpy.random.Generator.multivariate_hypergeometric
fnp.random.Generator.multivariate_hypergeometric(self, colors, nsample, size=None, method='marginals')
Generate variates from a multivariate hypergeometric distribution.
Adapted from NumPy docs np.random.Generator.multivariate_hypergeometric
Multivariate hypergeometric; cost = numel(output).
The multivariate hypergeometric distribution is a generalization of the hypergeometric distribution.
Choose nsample items at random without replacement from a
collection with N distinct types. N is the length of
colors, and the values in colors are the number of occurrences
of that type in the collection. The total number of items in the
collection is sum(colors). Each random variate generated by this
function is a vector of length N holding the counts of the
different types that occurred in the nsample items.
The name colors comes from a common description of the
distribution: it is the probability distribution of the number of
marbles of each color selected without replacement from an urn
containing marbles of different colors; colors[i] is the number
of marbles in the urn with color i.
Parameters
- colors:sequence of integers
The number of each type of item in the collection from which a sample is drawn. The values in
colorsmust be nonnegative. To avoid loss of precision in the algorithm,sum(colors)must be less than10**9whenmethodis "marginals".- nsample:int
The number of items selected.
nsamplemust not be greater thansum(colors).- size:int or tuple of ints, optional
The number of variates to generate, either an integer or a tuple holding the shape of the array of variates. If the given size is, e.g.,
(k, m), thenk * mvariates are drawn, where one variate is a vector of lengthlen(colors), and the return value has shape(k, m, len(colors)). If size is an integer, the output has shape(size, len(colors)). Default is None, in which case a single variate is returned as an array with shape(len(colors),).- method:string, optional
Specify the algorithm that is used to generate the variates. Must be 'count' or 'marginals' (the default). See the Notes for a description of the methods.
Returns
- variates:ndarray
Array of variates drawn from the multivariate hypergeometric distribution.
See also
- hypergeometric Draw samples from the (univariate) hypergeometric distribution.
Notes
The two methods do not return the same sequence of variates.
The "count" algorithm is roughly equivalent to the following numpy code:
choices = flops.repeat(flops.arange(len(colors)), colors)
selection = flops.random.choice(choices, nsample, replace=False)
variate = flops.bincount(selection, minlength=len(colors))The "count" algorithm uses a temporary array of integers with length
sum(colors).
The "marginals" algorithm generates a variate by using repeated calls to the univariate hypergeometric sampler. It is roughly equivalent to:
variate = flops.zeros(len(colors), dtype=flops.int64)
# `remaining` is the cumulative sum of `colors` from the last
# element to the first; e.g. if `colors` is [3, 1, 5], then
# `remaining` is [9, 6, 5].
remaining = flops.cumsum(colors[::-1])[::-1]
for i in range(len(colors)-1):
if nsample < 1:
break
variate[i] = hypergeometric(colors[i], remaining[i+1],
nsample)
nsample -= variate[i]
variate[-1] = nsampleThe default method is "marginals". For some cases (e.g. when
colors contains relatively small integers), the "count" method
can be significantly faster than the "marginals" method. If
performance of the algorithm is important, test the two methods
with typical inputs to decide which works best.
Examples
>>> colors = [16, 8, 4]
>>> seed = 4861946401452
>>> gen = flops.random.Generator(flops.random.PCG64(seed))
>>> gen.multivariate_hypergeometric(colors, 6)
array([5, 0, 1])
>>> gen.multivariate_hypergeometric(colors, 6, size=3)
array([[5, 0, 1],
[2, 2, 2],
[3, 3, 0]])
>>> gen.multivariate_hypergeometric(colors, 6, size=(2, 2))
array([[[3, 2, 1],
[3, 2, 1]],
[[4, 1, 1],
[3, 2, 1]]])