skhubness.neighbors.PuffinnLSH¶

class
skhubness.neighbors.
PuffinnLSH
(n_candidates: int = 5, metric: str = 'euclidean', memory: int = 1073741824, recall: float = 0.9, n_jobs: int = 1, verbose: int = 0)[source]¶ Wrap Puffinn LSH for scikitlearn compatibility.
 Parameters
 n_candidates: int, default = 5
Number of neighbors to retrieve
 metric: str, default = ‘euclidean’
Distance metric, allowed are “angular”, “jaccard”. Other metrics are partially supported, such as ‘euclidean’, ‘sqeuclidean’. In these cases, ‘angular’ distances are used to find the candidate set of neighbors with LSH among all indexed objects, and (squared) Euclidean distances are subsequently only computed for the candidates.
 memory: int, default = 1GB
Max memory usage
 recall: float, default = 0.90
Probability of finding the true nearest neighbors among the candidates
 n_jobs: int, default = 1
Number of parallel jobs
 verbose: int, default = 0
Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.
 Attributes
 valid_metrics:
List of valid distance metrics/measures

__init__
(n_candidates: int = 5, metric: str = 'euclidean', memory: int = 1073741824, recall: float = 0.9, n_jobs: int = 1, verbose: int = 0)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([n_candidates, metric, memory, …])Initialize self.
fit
(X[, y])Build the puffinn LSH index and insert data from X.
get_params
([deep])Get parameters for this estimator.
kneighbors
([X, n_candidates, return_distance])Retrieve k nearest neighbors.
set_params
(**params)Set the parameters of this estimator.
Attributes
metric_map
valid_metrics

fit
(X, y=None) → skhubness.neighbors.lsh.PuffinnLSH[source]¶ Build the puffinn LSH index and insert data from X.
 Parameters
 X: np.array
Data to be indexed
 y: any
Ignored
 Returns
 self: Puffinn
An instance of Puffinn with a built index

get_params
(deep=True)¶ Get parameters for this estimator.
 Parameters
 deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
 Returns
 paramsmapping of string to any
Parameter names mapped to their values.

kneighbors
(X=None, n_candidates=None, return_distance=True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]¶ Retrieve k nearest neighbors.
 Parameters
 X: np.array or None, optional, default = None
Query objects. If None, search among the indexed objects.
 n_candidates: int or None, optional, default = None
Number of neighbors to retrieve. If None, use the value passed during construction.
 return_distance: bool, default = True
If return_distance, will return distances and indices to neighbors. Else, only return the indices.

set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object. Parameters
 **paramsdict
Estimator parameters.
 Returns
 selfobject
Estimator instance.