skhubness.reduction.DisSimLocal

class skhubness.reduction.DisSimLocal(k: int = 5, squared: bool = True, *args, **kwargs)[source]

Hubness reduction with DisSimLocal [1].

Parameters
k: int, default = 5

Number of neighbors to consider for the local centroids

squared: bool, default = True

DisSimLocal operates on squared Euclidean distances. If True, return (quasi) squared Euclidean distances; if False, return (quasi) Eucldean distances.

References

1

Hara K, Suzuki I, Kobayashi K, Fukumizu K, Radovanović M (2016) Flattening the density gradient for eliminating spatial centrality to reduce hubness. In: Proceedings of the 30th AAAI conference on artificial intelligence, pp 1659–1665. https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12055

__init__(k: int = 5, squared: bool = True, *args, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([k, squared])

Initialize self.

fit(neigh_dist, neigh_ind, X[, assume_sorted])

Fit the model using X, neigh_dist, and neigh_ind as training data.

fit_transform(neigh_dist, neigh_ind, X[, …])

Equivalent to call .fit().transform()

transform(neigh_dist, neigh_ind, X[, …])

Transform distance between test and training data with DisSimLocal.

fit(neigh_dist: numpy.ndarray, neigh_ind: numpy.ndarray, X: numpy.ndarray, assume_sorted: bool = True, *args, **kwargs) → skhubness.reduction.dis_sim.DisSimLocal[source]

Fit the model using X, neigh_dist, and neigh_ind as training data.

Parameters
neigh_dist: np.ndarray, shape (n_samples, n_neighbors)

Distance matrix of training objects (rows) against their individual k nearest neighbors (colums).

neigh_ind: np.ndarray, shape (n_samples, n_neighbors)

Neighbor indices corresponding to the values in neigh_dist.

X: np.ndarray, shape (n_samples, n_features)

Training data, where n_samples is the number of vectors, and n_features their dimensionality (number of features).

assume_sorted: bool, default = True

Assume input matrices are sorted according to neigh_dist. If False, these are sorted here.

fit_transform(neigh_dist, neigh_ind, X, assume_sorted=True, return_distance=True, *args, **kwargs)[source]

Equivalent to call .fit().transform()

transform(neigh_dist: np.ndarray, neigh_ind: np.ndarray, X: np.ndarray, assume_sorted: bool = True, *args, **kwargs)[source]

Transform distance between test and training data with DisSimLocal.

Parameters
neigh_dist: np.ndarray, shape (n_query, n_neighbors)

Distance matrix of test objects (rows) against their individual k nearest neighbors among the training data (columns).

neigh_ind: np.ndarray, shape (n_query, n_neighbors)

Neighbor indices corresponding to the values in neigh_dist

X: np.ndarray, shape (n_query, n_features)

Test data, where n_query is the number of vectors, and n_features their dimensionality (number of features).

assume_sorted: ignored
Returns
hub_reduced_dist, neigh_ind

DisSimLocal distances, and corresponding neighbor indices

Notes

The returned distances are NOT sorted! If you use this class directly, you will need to sort the returned matrices according to hub_reduced_dist. Classes from skhubness.neighbors do this automatically.