skhubness.neighbors.kneighbors_graph¶

skhubness.neighbors.
kneighbors_graph
(X, n_neighbors, mode='connectivity', algorithm: str = 'auto', algorithm_params: dict = None, hubness: str = None, hubness_params: dict = None, metric='minkowski', p=2, metric_params=None, include_self=False, n_jobs=None)[source]¶ Computes the (weighted) graph of kNeighbors for points in X
Read more in the scikitlearn User Guide
 Parameters
 X: arraylike or BallTree, shape = [n_samples, n_features]
Sample data, in the form of a numpy array or a precomputed
BallTree
. n_neighbors: int
Number of neighbors for each sample.
 mode: {‘connectivity’, ‘distance’}, optional
Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, and ‘distance’ will return the distances between neighbors according to the given metric.
 algorithm: {‘auto’, ‘hnsw’, ‘lsh’, ‘falconn_lsh’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
Algorithm used to compute the nearest neighbors:
‘hnsw’ will use
HNSW
‘lsh’ will use
PuffinnLSH
‘falconn_lsh’ will use
FalconnLSH
‘ball_tree’ will use
BallTree
‘kd_tree’ will use
KDTree
‘brute’ will use a bruteforce search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to
fit()
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
 algorithm_params: dict, optional
Override default parameters of the NN algorithm. For example, with algorithm=’lsh’ and algorithm_params={n_candidates: 100} one hundred approximate neighbors are retrieved with LSH. If parameter hubness is set, the candidate neighbors are further reordered with hubness reduction. Finally, n_neighbors objects are used from the (optionally reordered) candidates.
 hubness: {‘mutual_proximity’, ‘local_scaling’, ‘dis_sim_local’, None}, optional
Hubness reduction algorithm
‘mutual_proximity’ or ‘mp’ will use
MutualProximity
‘local_scaling’ or ‘ls’ will use
LocalScaling
‘dis_sim_local’ or ‘dsl’ will use
DisSimLocal
If None, no hubness reduction will be performed (=vanilla kNN).
 hubness_params: dict, optional
Override default parameters of the selected hubness reduction algorithm. For example, with hubness=’mp’ and hubness_params={‘method’: ‘normal’} a mutual proximity variant is used, which models distance distributions with independent Gaussians.
 metric: string, default ‘minkowski’
The distance metric used to calculate the kNeighbors for each sample point. The DistanceMetric class gives a list of available metrics. The default distance is ‘euclidean’ (‘minkowski’ metric with the p param equal to 2.)
 p: int, default 2
Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
 metric_params: dict, optional
additional keyword arguments for the metric function.
 include_self: bool, default=False.
Whether or not to mark each sample as the first nearest neighbor to itself. If None, then True is used for mode=’connectivity’ and False for mode=’distance’ as this will preserve backwards compatibility.
 n_jobs: int or None, optional (default=None)
The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
 Returns
 A: sparse matrix in CSR format, shape = [n_samples, n_samples]
A[i, j] is assigned the weight of edge that connects i to j.
See also
Examples
>>> X = [[0], [3], [1]] >>> from skhubness.neighbors import kneighbors_graph >>> A = kneighbors_graph(X, 2, mode='connectivity', include_self=True) >>> A.toarray() array([[1., 0., 1.], [0., 1., 1.], [1., 0., 1.]])