skhubness.neighbors.PuffinnLSH

class skhubness.neighbors.PuffinnLSH(n_candidates: int = 5, metric: str = 'euclidean', memory: int = 1073741824, recall: float = 0.9, n_jobs: int = 1, verbose: int = 0)[source]

Wrap Puffinn LSH for scikit-learn compatibility.

Parameters
n_candidates: int, default = 5

Number of neighbors to retrieve

metric: str, default = ‘euclidean’

Distance metric, allowed are “angular”, “jaccard”. Other metrics are partially supported, such as ‘euclidean’, ‘sqeuclidean’. In these cases, ‘angular’ distances are used to find the candidate set of neighbors with LSH among all indexed objects, and (squared) Euclidean distances are subsequently only computed for the candidates.

memory: int, default = 1GB

Max memory usage

recall: float, default = 0.90

Probability of finding the true nearest neighbors among the candidates

n_jobs: int, default = 1

Number of parallel jobs

verbose: int, default = 0

Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Attributes
valid_metrics:

List of valid distance metrics/measures

__init__(n_candidates: int = 5, metric: str = 'euclidean', memory: int = 1073741824, recall: float = 0.9, n_jobs: int = 1, verbose: int = 0)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([n_candidates, metric, memory, …])

Initialize self.

fit(X[, y])

Build the puffinn LSH index and insert data from X.

get_params([deep])

Get parameters for this estimator.

kneighbors([X, n_candidates, return_distance])

Retrieve k nearest neighbors.

set_params(**params)

Set the parameters of this estimator.

Attributes

metric_map

valid_metrics

fit(X, y=None) → skhubness.neighbors.lsh.PuffinnLSH[source]

Build the puffinn LSH index and insert data from X.

Parameters
X: np.array

Data to be indexed

y: any

Ignored

Returns
self: Puffinn

An instance of Puffinn with a built index

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

kneighbors(X=None, n_candidates=None, return_distance=True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]

Retrieve k nearest neighbors.

Parameters
X: np.array or None, optional, default = None

Query objects. If None, search among the indexed objects.

n_candidates: int or None, optional, default = None

Number of neighbors to retrieve. If None, use the value passed during construction.

return_distance: bool, default = True

If return_distance, will return distances and indices to neighbors. Else, only return the indices.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfobject

Estimator instance.