skhubness.neighbors.RandomProjectionTree

class skhubness.neighbors.RandomProjectionTree(n_candidates: int = 5, metric: str = 'euclidean', n_trees: int = 10, search_k: int = - 1, mmap_dir: str = 'auto', n_jobs: int = 1, verbose: int = 0)[source]

Wrapper for using annoy.AnnoyIndex

Annoy is an approximate nearest neighbor library, that builds a forest of random projections trees.

Parameters
n_candidates: int, default = 5

Number of neighbors to retrieve

metric: str, default = ‘euclidean’

Distance metric, allowed are “angular”, “euclidean”, “manhattan”, “hamming”, “dot”

n_trees: int, default = 10

Build a forest of n_trees trees. More trees gives higher precision when querying, but are more expensive in terms of build time and index size.

search_k: int, default = -1

Query will inspect search_k nodes. A larger value will give more accurate results, but will take longer time.

mmap_dir: str, default = ‘auto’

Memory-map the index to the given directory. This is required to make the the class pickleable. If None, keep everything in main memory (NON pickleable index), if mmap_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux.

n_jobs: int, default = 1

Number of parallel jobs

verbose: int, default = 0

Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Attributes
valid_metrics:

List of valid distance metrics/measures

__init__(n_candidates: int = 5, metric: str = 'euclidean', n_trees: int = 10, search_k: int = - 1, mmap_dir: str = 'auto', n_jobs: int = 1, verbose: int = 0)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([n_candidates, metric, n_trees, …])

Initialize self.

fit(X[, y])

Build the annoy.Index and insert data from X.

get_params([deep])

Get parameters for this estimator.

kneighbors([X, n_candidates, return_distance])

Retrieve k nearest neighbors.

set_params(**params)

Set the parameters of this estimator.

Attributes

valid_metrics

fit(X, y=None) → skhubness.neighbors.random_projection_trees.RandomProjectionTree[source]

Build the annoy.Index and insert data from X.

Parameters
X: np.array

Data to be indexed

y: any

Ignored

Returns
self: RandomProjectionTree

An instance of RandomProjectionTree with a built index

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

kneighbors(X=None, n_candidates=None, return_distance=True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]

Retrieve k nearest neighbors.

Parameters
X: np.array or None, optional, default = None

Query objects. If None, search among the indexed objects.

n_candidates: int or None, optional, default = None

Number of neighbors to retrieve. If None, use the value passed during construction.

return_distance: bool, default = True

If return_distance, will return distances and indices to neighbors. Else, only return the indices.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfobject

Estimator instance.