skhubness.neighbors.FalconnLSH

class skhubness.neighbors.FalconnLSH(n_candidates: int = 5, radius: float = 1.0, metric: str = 'euclidean', num_probes: int = 50, n_jobs: int = 1, verbose: int = 0)[source]

Wrapper for using falconn LSH

Falconn is an approximate nearest neighbor library, that uses multiprobe locality-sensitive hashing.

Parameters
n_candidates: int, default = 5

Number of neighbors to retrieve

radius: float or None, optional, default = None

Retrieve neighbors within this radius. Can be negative: See Notes.

metric: str, default = ‘euclidean’

Distance metric, allowed are “angular”, “euclidean”, “manhattan”, “hamming”, “dot”

num_probes: int, default = 50

The number of buckets the query algorithm probes. The higher number of probes is, the better accuracy one gets, but the slower queries are.

n_jobs: int, default = 1

Number of parallel jobs

verbose: int, default = 0

Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Notes

From the falconn docs: radius can be negative, and for the distance function ‘negative_inner_product’ it actually makes sense.

Attributes
valid_metrics:

List of valid distance metrics/measures

__init__(n_candidates: int = 5, radius: float = 1.0, metric: str = 'euclidean', num_probes: int = 50, n_jobs: int = 1, verbose: int = 0)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([n_candidates, radius, metric, …])

Initialize self.

fit(X[, y])

Setup the LSH index from training data.

kneighbors([X, n_candidates, return_distance])

Retrieve k nearest neighbors.

radius_neighbors([X, radius, return_distance])

Retrieve neighbors within a certain radius.

Attributes

valid_metrics

fit(X: numpy.ndarray, y: Optional[numpy.ndarray] = None) → skhubness.neighbors.lsh.FalconnLSH[source]

Setup the LSH index from training data.

Parameters
X: np.array

Data to be indexed

y: any

Ignored

Returns
self: FalconnLSH

An instance of LSH with a built index

kneighbors(X: Optional[numpy.ndarray] = None, n_candidates: Optional[int] = None, return_distance: bool = True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]

Retrieve k nearest neighbors.

Parameters
X: np.array or None, optional, default = None

Query objects. If None, search among the indexed objects.

n_candidates: int or None, optional, default = None

Number of neighbors to retrieve. If None, use the value passed during construction.

return_distance: bool, default = True

If return_distance, will return distances and indices to neighbors. Else, only return the indices.

radius_neighbors(X: Optional[numpy.ndarray] = None, radius: Optional[float] = None, return_distance: bool = True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]

Retrieve neighbors within a certain radius.

Parameters
X: np.array or None, optional, default = None

Query objects. If None, search among the indexed objects.

radius: float or None, optional, default = None

Retrieve neighbors within this radius. Can be negative: See Notes.

return_distance: bool, default = True

If return_distance, will return distances and indices to neighbors. Else, only return the indices.

Notes

From the falconn docs: radius can be negative, and for the distance function ‘negative_inner_product’ it actually makes sense.