skhubness.neighbors.DistanceMetric¶

class
skhubness.neighbors.
DistanceMetric
¶ DistanceMetric class
This class provides a uniform interface to fast distance metric functions. The various metrics can be accessed via the
get_metric()
class method and the metric string identifier (see below).Examples
>>> from sklearn.neighbors import DistanceMetric >>> dist = DistanceMetric.get_metric('euclidean') >>> X = [[0, 1, 2], [3, 4, 5]] >>> dist.pairwise(X) array([[ 0. , 5.19615242], [ 5.19615242, 0. ]])
Available Metrics
The following lists the string metric identifiers and the associated distance metric classes:
Metrics intended for realvalued vector spaces:
identifier
class name
args
distance function
“euclidean”
EuclideanDistance
sqrt(sum((x  y)^2))
“manhattan”
ManhattanDistance
sum(x  y)
“chebyshev”
ChebyshevDistance
max(x  y)
“minkowski”
MinkowskiDistance
p
sum(x  y^p)^(1/p)
“wminkowski”
WMinkowskiDistance
p, w
sum(w * (x  y)^p)^(1/p)
“seuclidean”
SEuclideanDistance
V
sqrt(sum((x  y)^2 / V))
“mahalanobis”
MahalanobisDistance
V or VI
sqrt((x  y)' V^1 (x  y))
Metrics intended for twodimensional vector spaces: Note that the haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.
identifier
class name
distance function
“haversine”
HaversineDistance
2 arcsin(sqrt(sin^2(0.5*dx) + cos(x1)cos(x2)sin^2(0.5*dy)))
Metrics intended for integervalued vector spaces: Though intended for integervalued vectors, these are also valid metrics in the case of realvalued vectors.
identifier
class name
distance function
“hamming”
HammingDistance
N_unequal(x, y) / N_tot
“canberra”
CanberraDistance
sum(x  y / (x + y))
“braycurtis”
BrayCurtisDistance
sum(x  y) / (sum(x) + sum(y))
Metrics intended for booleanvalued vector spaces: Any nonzero entry is evaluated to “True”. In the listings below, the following abbreviations are used:
N : number of dimensions
NTT : number of dims in which both values are True
NTF : number of dims in which the first value is True, second is False
NFT : number of dims in which the first value is False, second is True
NFF : number of dims in which both values are False
NNEQ : number of nonequal dimensions, NNEQ = NTF + NFT
NNZ : number of nonzero dimensions, NNZ = NTF + NFT + NTT
identifier
class name
distance function
“jaccard”
JaccardDistance
NNEQ / NNZ
“matching”
MatchingDistance
NNEQ / N
“dice”
DiceDistance
NNEQ / (NTT + NNZ)
“kulsinski”
KulsinskiDistance
(NNEQ + N  NTT) / (NNEQ + N)
“rogerstanimoto”
RogersTanimotoDistance
2 * NNEQ / (N + NNEQ)
“russellrao”
RussellRaoDistance
NNZ / N
“sokalmichener”
SokalMichenerDistance
2 * NNEQ / (N + NNEQ)
“sokalsneath”
SokalSneathDistance
NNEQ / (NNEQ + 0.5 * NTT)
Userdefined distance:
identifier
class name
args
“pyfunc”
PyFuncDistance
func
Here
func
is a function which takes two onedimensional numpy arrays, and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following propertiesNonnegativity: d(x, y) >= 0
Identity: d(x, y) = 0 if and only if x == y
Symmetry: d(x, y) = d(y, x)
Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)
Because of the Python object overhead involved in calling the python function, this will be fairly slow, but it will have the same scaling as other distances.

__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(*args, **kwargs)Initialize self.
Convert the true distance to the reduced distance.
Get the given distance metric from the string identifier.
Compute the pairwise distances between X and Y
Convert the Reduced distance to the true distance.

dist_to_rdist
()¶ Convert the true distance to the reduced distance.
The reduced distance, defined for some metrics, is a computationally more efficient measure which preserves the rank of the true distance. For example, in the Euclidean distance metric, the reduced distance is the squaredeuclidean distance.

get_metric
()¶ Get the given distance metric from the string identifier.
See the docstring of DistanceMetric for a list of available metrics.
 Parameters
 metricstring or class name
The distance metric to use
 **kwargs
additional arguments will be passed to the requested metric

pairwise
()¶ Compute the pairwise distances between X and Y
This is a convenience routine for the sake of testing. For many metrics, the utilities in scipy.spatial.distance.cdist and scipy.spatial.distance.pdist will be faster.
 Parameters
 Xarray_like
Array of shape (Nx, D), representing Nx points in D dimensions.
 Yarray_like (optional)
Array of shape (Ny, D), representing Ny points in D dimensions. If not specified, then Y=X.
 Returns
 ——
 distndarray
The shape (Nx, Ny) array of pairwise distances between points in X and Y.

rdist_to_dist
()¶ Convert the Reduced distance to the true distance.
The reduced distance, defined for some metrics, is a computationally more efficient measure which preserves the rank of the true distance. For example, in the Euclidean distance metric, the reduced distance is the squaredeuclidean distance.