skhubness.neighbors.FalconnLSH¶
-
class
skhubness.neighbors.
FalconnLSH
(n_candidates: int = 5, radius: float = 1.0, metric: str = 'euclidean', num_probes: int = 50, n_jobs: int = 1, verbose: int = 0)[source]¶ Wrapper for using falconn LSH
Falconn is an approximate nearest neighbor library, that uses multiprobe locality-sensitive hashing.
- Parameters
- n_candidates: int, default = 5
Number of neighbors to retrieve
- radius: float or None, optional, default = None
Retrieve neighbors within this radius. Can be negative: See Notes.
- metric: str, default = ‘euclidean’
Distance metric, allowed are “angular”, “euclidean”, “manhattan”, “hamming”, “dot”
- num_probes: int, default = 50
The number of buckets the query algorithm probes. The higher number of probes is, the better accuracy one gets, but the slower queries are.
- n_jobs: int, default = 1
Number of parallel jobs
- verbose: int, default = 0
Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.
Notes
From the falconn docs: radius can be negative, and for the distance function ‘negative_inner_product’ it actually makes sense.
- Attributes
- valid_metrics:
List of valid distance metrics/measures
-
__init__
(n_candidates: int = 5, radius: float = 1.0, metric: str = 'euclidean', num_probes: int = 50, n_jobs: int = 1, verbose: int = 0)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([n_candidates, radius, metric, …])Initialize self.
fit
(X[, y])Setup the LSH index from training data.
kneighbors
([X, n_candidates, return_distance])Retrieve k nearest neighbors.
radius_neighbors
([X, radius, return_distance])Retrieve neighbors within a certain radius.
Attributes
valid_metrics
-
fit
(X: numpy.ndarray, y: Optional[numpy.ndarray] = None) → skhubness.neighbors.lsh.FalconnLSH[source]¶ Setup the LSH index from training data.
- Parameters
- X: np.array
Data to be indexed
- y: any
Ignored
- Returns
- self: FalconnLSH
An instance of LSH with a built index
-
kneighbors
(X: Optional[numpy.ndarray] = None, n_candidates: Optional[int] = None, return_distance: bool = True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]¶ Retrieve k nearest neighbors.
- Parameters
- X: np.array or None, optional, default = None
Query objects. If None, search among the indexed objects.
- n_candidates: int or None, optional, default = None
Number of neighbors to retrieve. If None, use the value passed during construction.
- return_distance: bool, default = True
If return_distance, will return distances and indices to neighbors. Else, only return the indices.
-
radius_neighbors
(X: Optional[numpy.ndarray] = None, radius: Optional[float] = None, return_distance: bool = True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]¶ Retrieve neighbors within a certain radius.
- Parameters
- X: np.array or None, optional, default = None
Query objects. If None, search among the indexed objects.
- radius: float or None, optional, default = None
Retrieve neighbors within this radius. Can be negative: See Notes.
- return_distance: bool, default = True
If return_distance, will return distances and indices to neighbors. Else, only return the indices.
Notes
From the falconn docs: radius can be negative, and for the distance function ‘negative_inner_product’ it actually makes sense.