skhubness.neighbors.RandomProjectionTree¶
-
class
skhubness.neighbors.
RandomProjectionTree
(n_candidates: int = 5, metric: str = 'euclidean', n_trees: int = 10, search_k: int = - 1, mmap_dir: str = 'auto', n_jobs: int = 1, verbose: int = 0)[source]¶ Wrapper for using annoy.AnnoyIndex
Annoy is an approximate nearest neighbor library, that builds a forest of random projections trees.
- Parameters
- n_candidates: int, default = 5
Number of neighbors to retrieve
- metric: str, default = ‘euclidean’
Distance metric, allowed are “angular”, “euclidean”, “manhattan”, “hamming”, “dot”
- n_trees: int, default = 10
Build a forest of n_trees trees. More trees gives higher precision when querying, but are more expensive in terms of build time and index size.
- search_k: int, default = -1
Query will inspect search_k nodes. A larger value will give more accurate results, but will take longer time.
- mmap_dir: str, default = ‘auto’
Memory-map the index to the given directory. This is required to make the the class pickleable. If None, keep everything in main memory (NON pickleable index), if mmap_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux.
- n_jobs: int, default = 1
Number of parallel jobs
- verbose: int, default = 0
Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.
- Attributes
- valid_metrics:
List of valid distance metrics/measures
-
__init__
(n_candidates: int = 5, metric: str = 'euclidean', n_trees: int = 10, search_k: int = - 1, mmap_dir: str = 'auto', n_jobs: int = 1, verbose: int = 0)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([n_candidates, metric, n_trees, …])Initialize self.
fit
(X[, y])Build the annoy.Index and insert data from X.
get_params
([deep])Get parameters for this estimator.
kneighbors
([X, n_candidates, return_distance])Retrieve k nearest neighbors.
set_params
(**params)Set the parameters of this estimator.
Attributes
valid_metrics
-
fit
(X, y=None) → skhubness.neighbors.random_projection_trees.RandomProjectionTree[source]¶ Build the annoy.Index and insert data from X.
- Parameters
- X: np.array
Data to be indexed
- y: any
Ignored
- Returns
- self: RandomProjectionTree
An instance of RandomProjectionTree with a built index
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
kneighbors
(X=None, n_candidates=None, return_distance=True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]¶ Retrieve k nearest neighbors.
- Parameters
- X: np.array or None, optional, default = None
Query objects. If None, search among the indexed objects.
- n_candidates: int or None, optional, default = None
Number of neighbors to retrieve. If None, use the value passed during construction.
- return_distance: bool, default = True
If return_distance, will return distances and indices to neighbors. Else, only return the indices.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfobject
Estimator instance.