skhubness.neighbors.NNG¶

class skhubness.neighbors.NNG(n_candidates: int = 5, metric: str = 'euclidean', index_dir: str = 'auto', optimize: bool = False, edge_size_for_creation: int = 80, edge_size_for_search: int = 40, num_incoming: int = - 1, num_outgoing: int = - 1, epsilon: float = 0.1, n_jobs: int = 1, verbose: int = 0)[source]¶

Wrapper for ngtpy and NNG variants.

By default, the graph is an ANNG. Only when the optimize parameter is set, the graph is optimized to obtain an ONNG.

Parameters

n_candidates: int, default = 5: Number of neighbors to retrieve
metric: str, default = ‘euclidean’: Distance metric, allowed are ‘manhattan’, ‘L1’, ‘euclidean’, ‘L2’, ‘minkowski’, ‘Angle’, ‘Normalized Angle’, ‘Hamming’, ‘Jaccard’, ‘Cosine’ or ‘Normalized Cosine’.
index_dir: str, default = ‘auto’: Store the index in the given directory. If None, keep the index in main memory (NON pickleable index), If index_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux. Note: The directory/the index will NOT be deleted automatically.
optimize: bool, default = False: Use ONNG method by optimizing the ANNG graph. May require long time for index creation.
edge_size_for_creation: int, default = 80: Increasing ANNG edge size improves retrieval accuracy at the cost of more time
edge_size_for_search: int, default = 40: Increasing ANNG edge size improves retrieval accuracy at the cost of more time
epsilon: float, default 0.1: Trade-off in ANNG between higher accuracy (larger epsilon) and shorter query time (smaller epsilon)
num_incoming: int: Number of incoming edges in ONNG graph
num_outgoing: int: Number of outgoing edges in ONNG graph
n_jobs: int, default = 1: Number of parallel jobs
verbose: int, default = 0: Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Notes

NNG stores the index to a directory specified in index_dir. The index is persistent, and will NOT be deleted automatically. It is the user’s responsibility to take care of deletion, when required.

Attributes

valid_metrics:: List of valid distance metrics/measures

__init__(n_candidates: int = 5, metric: str = 'euclidean', index_dir: str = 'auto', optimize: bool = False, edge_size_for_creation: int = 80, edge_size_for_search: int = 40, num_incoming: int = - 1, num_outgoing: int = - 1, epsilon: float = 0.1, n_jobs: int = 1, verbose: int = 0)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`([n_candidates, metric, index_dir, …])	Initialize self.
`fit`(X[, y])	Build the ngtpy.Index and insert data from X.
`get_params`([deep])	Get parameters for this estimator.
`kneighbors`([X, n_candidates, return_distance])	Retrieve k nearest neighbors.
`set_params`(**params)	Set the parameters of this estimator.

Attributes

`internal_distance_type`
`valid_metrics`

fit(X, y=None) → skhubness.neighbors.nng.NNG[source]¶

Build the ngtpy.Index and insert data from X.

Parameters

X: np.array: Data to be indexed
y: any: Ignored

Returns

self: NNG: An instance of NNG with a built index

get_params(deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any: Parameter names mapped to their values.

kneighbors(X=None, n_candidates=None, return_distance=True) → Union[Tuple[numpy.array, numpy.array], numpy.array][source]¶

Retrieve k nearest neighbors.

Parameters

X: np.array or None, optional, default = None: Query objects. If None, search among the indexed objects.
n_candidates: int or None, optional, default = None: Number of neighbors to retrieve. If None, use the value passed during construction.
return_distance: bool, default = True: If return_distance, will return distances and indices to neighbors. Else, only return the indices.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfobject: Estimator instance.