skhubness.neighbors.PuffinnLSH¶
-
class
skhubness.neighbors.
PuffinnLSH
(n_candidates: int = 5, metric: str = 'euclidean', memory: int = 1073741824, recall: float = 0.9, n_jobs: int = 1, verbose: int = 0)[source]¶ Wrap Puffinn LSH for scikit-learn compatibility.
- Parameters
- n_candidates: int, default = 5
Number of neighbors to retrieve
- metric: str, default = ‘euclidean’
Distance metric, allowed are “angular”, “jaccard”. Other metrics are partially supported, such as ‘euclidean’, ‘sqeuclidean’. In these cases, ‘angular’ distances are used to find the candidate set of neighbors with LSH among all indexed objects, and (squared) Euclidean distances are subsequently only computed for the candidates.
- memory: int, default = 1GB
Max memory usage
- recall: float, default = 0.90
Probability of finding the true nearest neighbors among the candidates
- n_jobs: int, default = 1
Number of parallel jobs
- verbose: int, default = 0
Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.
- Attributes
- valid_metrics:
List of valid distance metrics/measures
-
__init__
(n_candidates: int = 5, metric: str = 'euclidean', memory: int = 1073741824, recall: float = 0.9, n_jobs: int = 1, verbose: int = 0)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([n_candidates, metric, memory, …])Initialize self.
fit
(X[, y])Build the puffinn LSH index and insert data from X.
get_params
([deep])Get parameters for this estimator.
kneighbors
([X, n_candidates, return_distance])Retrieve k nearest neighbors.
set_params
(**params)Set the parameters of this estimator.
Attributes
metric_map
valid_metrics
-
fit
(X, y=None) → skhubness.neighbors.lsh.PuffinnLSH[source]¶ Build the puffinn LSH index and insert data from X.
- Parameters
- X: np.array
Data to be indexed
- y: any
Ignored
- Returns
- self: Puffinn
An instance of Puffinn with a built index
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsmapping of string to any
Parameter names mapped to their values.
-
kneighbors
(X=None, n_candidates=None, return_distance=True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]¶ Retrieve k nearest neighbors.
- Parameters
- X: np.array or None, optional, default = None
Query objects. If None, search among the indexed objects.
- n_candidates: int or None, optional, default = None
Number of neighbors to retrieve. If None, use the value passed during construction.
- return_distance: bool, default = True
If return_distance, will return distances and indices to neighbors. Else, only return the indices.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfobject
Estimator instance.