skhubness.neighbors.HNSW¶

class
skhubness.neighbors.
HNSW
(n_candidates: int = 5, metric: str = 'euclidean', method: str = 'hnsw', post_processing: int = 2, n_jobs: int = 1, verbose: int = 0)[source]¶ Wrapper for using nmslib
Hierarchical navigable smallworld graphs are data structures, that allow for approximate nearest neighbor search. Here, an implementation from nmslib is used.
 Parameters
 n_candidates: int, default = 5
Number of neighbors to retrieve
 metric: str, default = ‘euclidean’
Distance metric, allowed are “angular”, “euclidean”, “manhattan”, “hamming”, “dot”
 method: str, default = ‘hnsw’,
ANN method to use. Currently, only ‘hnsw’ is supported.
 post_processing: int, default = 2
More post processing means longer index creation, and higher retrieval accuracy.
 n_jobs: int, default = 1
Number of parallel jobs
 verbose: int, default = 0
Verbosity level. If verbose >= 2, show progress bar on indexing.
 Attributes
 valid_metrics:
List of valid distance metrics/measures

__init__
(n_candidates: int = 5, metric: str = 'euclidean', method: str = 'hnsw', post_processing: int = 2, n_jobs: int = 1, verbose: int = 0)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([n_candidates, metric, method, …])Initialize self.
fit
(X[, y])Setup the HNSW index from training data.
kneighbors
([X, n_candidates, return_distance])Retrieve k nearest neighbors.
Attributes
valid_metrics

fit
(X, y=None) → skhubness.neighbors.hnsw.HNSW[source]¶ Setup the HNSW index from training data.
 Parameters
 X: np.array
Data to be indexed
 y: any
Ignored
 Returns
 self: HNSW
An instance of HNSW with a built graph

kneighbors
(X: Optional[numpy.ndarray] = None, n_candidates: Optional[int] = None, return_distance: bool = True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]¶ Retrieve k nearest neighbors.
 Parameters
 X: np.array or None, optional, default = None
Query objects. If None, search among the indexed objects.
 n_candidates: int or None, optional, default = None
Number of neighbors to retrieve. If None, use the value passed during construction.
 return_distance: bool, default = True
If return_distance, will return distances and indices to neighbors. Else, only return the indices.