Uniform Manifold Approximation and Projection (UMAP) {Manifold}
Description
Well-known methods for dimensionality reduction are Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). UMAP handles nonlinear relationships and structures a bit better than PCA.
Example
from umap import UMAP
# We reduce the input embeddings from 384 dimensions to 5 dimensions
umap_model = UMAP(n_components=5, min_dist=0.0, metric="cosine", random_state=42)
reduced_embeddings = umap_model.fit_transform(embeddings)
- We can use the
n_components
parameter to decide the shape of the lower-dimensional space, namely 5 dimensions. Generally, values between 5 and 10 work well to capture high-dimensional global structures. - The
min_dist
parameter is the minimum distance between embedded points. We are setting this to 0 as that generally results in tighter clusters. We setmetric
tocosine
as Euclidean-based methods have issues dealing with high-dimensional data.