Title: | Uniform Manifold Approximation and Projection |
---|---|
Description: | Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for 'python' package 'umap-learn' (requires separate installation, see vignette for more details). |
Authors: | Tomasz Konopka [aut, cre] |
Maintainer: | Tomasz Konopka <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.11.0 |
Built: | 2024-12-25 04:11:47 UTC |
Source: | https://github.com/tkonopka/umap |
project data points onto an existing umap embedding
## S3 method for class 'umap' predict(object, data, ...)
## S3 method for class 'umap' predict(object, data, ...)
object |
trained object of class umap |
data |
matrix with data |
... |
additional arguments (not used) |
new matrix
# embedd iris dataset using default settings iris.umap = umap(iris[,1:4]) # create a dataset with structure like iris, but with perturbation iris.perturbed = iris[,1:4] + matrix(rnorm(nrow(iris)*4, 0, 0.1), ncol=4) # project perturbed dataset perturbed.embedding = predict(iris.umap, iris.perturbed) # output is a matrix with embedding coordinates head(perturbed.embedding)
# embedd iris dataset using default settings iris.umap = umap(iris[,1:4]) # create a dataset with structure like iris, but with perturbation iris.perturbed = iris[,1:4] + matrix(rnorm(nrow(iris)*4, 0, 0.1), ncol=4) # project perturbed dataset perturbed.embedding = predict(iris.umap, iris.perturbed) # output is a matrix with embedding coordinates head(perturbed.embedding)
Computes a manifold approximation and projection
umap( d, config = umap.defaults, method = c("naive", "umap-learn"), preserve.seed = TRUE, ... )
umap( d, config = umap.defaults, method = c("naive", "umap-learn"), preserve.seed = TRUE, ... )
d |
matrix, input data |
config |
object of class umap.config |
method |
character, implementation. Available methods are 'naive' (an implementation written in pure R) and 'umap-learn' (requires python package 'umap-learn') |
preserve.seed |
logical, leave TRUE to insulate external code from randomness within the umap algorithms; set FALSE to allow randomness used in umap algorithms to alter the external random-number generator |
... |
list of settings; values overwrite defaults from config; see documentation of umap.default for details about available settings |
object of class umap, containing at least a component with an embedding and a component with configuration settings
# embedd iris dataset using default settings iris.umap = umap(iris[,1:4]) # display object summary iris.umap # display embedding coordinates head(iris.umap$layout)
# embedd iris dataset using default settings iris.umap = umap(iris[,1:4]) # display object summary iris.umap # display embedding coordinates head(iris.umap$layout)
A list with parameters customizing a UMAP embedding. Each component of the list is an effective argument for umap().
umap.defaults
umap.defaults
An object of class umap.config
of length 22.
n_neighbors: integer; number of nearest neighbors
n_components: integer; dimension of target (output) space
metric: character or function; determines how distances between data points are computed. When using a string, available metrics are: euclidean, manhattan. Other available generalized metrics are: cosine, pearson, pearson2. Note the triangle inequality may not be satisfied by some generalized metrics, hence knn search may not be optimal. When using metric.function as a function, the signature must be function(matrix, origin, target) and should compute a distance between the origin column and the target columns
n_epochs: integer; number of iterations performed during layout optimization
input: character, use either "data" or "dist"; determines whether the primary input argument to umap() is treated as a data matrix or as a distance matrix
init: character or matrix. The default string "spectral" computes an initial embedding using eigenvectors of the connectivity graph matrix. An alternative is the string "random", which creates an initial layout based on random coordinates. This setting.can also be set to a matrix, in which case layout optimization begins from the provided coordinates.
min_dist: numeric; determines how close points appear in the final layout
set_op_ratio_mix_ratio: numeric in range [0,1]; determines who the knn-graph is used to create a fuzzy simplicial graph
local_connectivity: numeric; used during construction of fuzzy simplicial set
bandwidth: numeric; used during construction of fuzzy simplicial set
alpha: numeric; initial value of "learning rate" of layout optimization
gamma: numeric; determines, together with alpha, the learning rate of layout optimization
negative_sample_rate: integer; determines how many non-neighbor points are used per point and per iteration during layout optimization
a: numeric; contributes to gradient calculations during layout optimization. When left at NA, a suitable value will be estimated automatically.
b: numeric; contributes to gradient calculations during layout optimization. When left at NA, a suitable value will be estimated automatically.
spread: numeric; used during automatic estimation of a/b parameters.
random_state: integer; seed for random number generation used during umap()
transform_state: integer; seed for random number generation used during predict()
knn: object of class umap.knn; precomputed nearest neighbors
knn.repeat: number of times to restart knn search
verbose: logical or integer; determines whether to show progress messages
umap_learn_args: vector of arguments to python package umap-learn
# display all default settings umap.defaults # create a new settings object with n_neighbors set to 5 custom.settings = umap.defaults custom.settings$n_neighbors = 5 custom.settings
# display all default settings umap.defaults # create a new settings object with n_neighbors set to 5 custom.settings = umap.defaults custom.settings$n_neighbors = 5 custom.settings
construct a umap.knn object describing nearest neighbors
umap.knn(indexes, distances)
umap.knn(indexes, distances)
indexes |
matrix, integers linking data points to nearest neighbors |
distances |
matrix, distance values between pairs of points specified in the matrix of indexes |
object of class umap.knn, which is a list with matrices with indexes of nearest neighbors and distances to those neighbors
# this example describes a set of three data points (indexes 1,2,3) # which are equidistant from each other. Hence the distance between # pairs (i, j) is 0 for i=j and 1 otherwise. three.indexes = matrix(c(1,2,3, 2,1,3, 3,1,2), nrow=3, ncol=3) three.distances = matrix(c(0, 1, 1, 0, 1, 1, 0, 1, 1), nrow=3, ncol=3) umap.knn(three.indexes, three.distances)
# this example describes a set of three data points (indexes 1,2,3) # which are equidistant from each other. Hence the distance between # pairs (i, j) is 0 for i=j and 1 otherwise. three.indexes = matrix(c(1,2,3, 2,1,3, 3,1,2), nrow=3, ncol=3) three.distances = matrix(c(0, 1, 1, 0, 1, 1, 0, 1, 1), nrow=3, ncol=3) umap.knn(three.indexes, three.distances)