Skip to content

alpardayalman/DimRedPaperBenchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Dimension Reduction Toolkit

A comprehensive Python toolkit for dimension reduction techniques including t-SNE, PCA, and UMAP, with extensive benchmarking capabilities for research papers.

Citation

This repository contains the code relative to [paper]. Please cite [paper] when referring to this repository.

@misc{DimRedPaperBenchmark:2025,
  author       = {Enis Teper, Dilge Karakaş, Serena Tomakyan and Alp Arda Yalman},
  title        = {{DimRedPaperBenchmark}},
  howpublished = {\url{https://github.com/alpardayalman/DimRedPaperBenchmark}},
  year         = {2025},
  url          = {https://github.com/alpardayalman/DimRedPaperBenchmark}
}

Features

  • Multiple Dimension Reduction Algorithms:

    • Linear Methods:
      • Principal Component Analysis (PCA)
      • Kernel PCA (RBF, polynomial, sigmoid kernels)
    • Manifold Learning:
      • t-Distributed Stochastic Neighbor Embedding (t-SNE)
      • Uniform Manifold Approximation and Projection (UMAP)
      • ISOMAP (Isometric Mapping)
      • Locally Linear Embedding (LLE)
      • Laplacian Eigenmaps
    • Deep Learning:
      • Autoencoder-based dimension reduction
      • Variational Autoencoder (VAE)
  • Comprehensive Benchmarking:

    • Preservation of local and global structure
    • Clustering quality assessment
    • Classification performance
    • Visualization quality metrics
    • Computational efficiency analysis
  • Research-Ready:

    • Reproducible experiments
    • Statistical significance testing
    • Detailed performance metrics
    • Publication-ready visualizations

Installation

# Install with uv (recommended)
uv pip install -r requirements.txt

# Or with pip
pip install -r requirements.txt


## Quick Start

```python
from src.dimension_reduction import DimensionReducer
from src.benchmarking import BenchmarkSuite
import numpy as np

# Generate sample data
X = np.random.randn(1000, 100)

# Initialize dimension reducer
reducer = DimensionReducer()

# Apply different techniques
pca_result = reducer.fit_transform(X, method='pca', n_components=2)
tsne_result = reducer.fit_transform(X, method='tsne', n_components=2)
umap_result = reducer.fit_transform(X, method='umap', n_components=2)

# Benchmark the results
benchmark = BenchmarkSuite()
results = benchmark.evaluate_all(X, {
    'PCA': pca_result,
    't-SNE': tsne_result,
    'UMAP': umap_result
})

print(results.summary())

Benchmarking Strategies

1. Structure Preservation Metrics

  • Local Structure: Neighborhood preservation, trustworthiness
  • Global Structure: Continuity, stress, Shepard diagrams
  • Topological Structure: Persistent homology, Mapper algorithm

2. Downstream Task Performance

  • Clustering: Silhouette score, Calinski-Harabasz index, Davies-Bouldin index
  • Classification: Accuracy, F1-score, ROC-AUC with cross-validation
  • Regression: RΒ² score, mean squared error

3. Visualization Quality

  • Separation: Inter-class distance, intra-class compactness
  • Overlap: Jaccard similarity, overlap coefficient
  • Aesthetics: Stress minimization, edge crossing reduction

4. Computational Efficiency

  • Time Complexity: Training and inference time
  • Memory Usage: Peak memory consumption
  • Scalability: Performance vs. dataset size

Project Structure

src/
β”œβ”€β”€ dimension_reduction/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ base.py              # Base classes and interfaces
β”‚   β”œβ”€β”€ pca.py               # PCA implementation
β”‚   β”œβ”€β”€ kernel_pca.py        # Kernel PCA implementation
β”‚   β”œβ”€β”€ tsne.py              # t-SNE implementation
β”‚   β”œβ”€β”€ umap.py              # UMAP implementation
β”‚   β”œβ”€β”€ isomap.py            # ISOMAP implementation
β”‚   β”œβ”€β”€ lle.py               # Locally Linear Embedding
β”‚   β”œβ”€β”€ laplacian_eigenmap.py # Laplacian Eigenmaps
β”‚   β”œβ”€β”€ autoencoder.py       # Autoencoder implementation
β”‚   β”œβ”€β”€ vae.py               # Variational Autoencoder
β”‚   └── cli.py               # Command-line interface
β”œβ”€β”€ benchmarking/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ metrics.py           # Evaluation metrics
β”‚   β”œβ”€β”€ structure.py         # Structure preservation metrics
β”‚   β”œβ”€β”€ clustering.py        # Clustering evaluation
β”‚   β”œβ”€β”€ classification.py    # Classification evaluation
β”‚   └── visualization.py     # Visualization quality metrics
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ data_loader.py       # Data loading utilities
β”‚   β”œβ”€β”€ visualization.py     # Plotting utilities
β”‚   └── reporting.py         # Results reporting
└── experiments/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ benchmark_experiment.py  # Main benchmarking experiment
    └── hyperparameter_tuning.py # Hyperparameter optimization

tests/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ test_dimension_reduction/
β”œβ”€β”€ test_benchmarking/
└── test_utils/

docs/
β”œβ”€β”€ api.md
β”œβ”€β”€ benchmarking_guide.md
└── examples/

Usage Examples

Basic Dimension Reduction

from src.dimension_reduction import DimensionReducer

# Initialize with default parameters
reducer = DimensionReducer()

# Fit and transform data
X_reduced = reducer.fit_transform(X, method='umap', n_components=2)

Comprehensive Benchmarking

from src.benchmarking import BenchmarkSuite
from src.dimension_reduction import DimensionReducer

# Prepare data and methods
methods = ['pca', 'tsne', 'umap']
reducer = DimensionReducer()

# Generate embeddings
embeddings = {}
for method in methods:
    embeddings[method] = reducer.fit_transform(X, method=method, n_components=2)

# Run comprehensive benchmark
benchmark = BenchmarkSuite()
results = benchmark.evaluate_all(X, embeddings, y_labels=y)

# Generate report
report = benchmark.generate_report(results)
report.save('benchmark_results.html')

Custom Benchmarking

from src.benchmarking import StructurePreservation, ClusteringQuality

# Custom structure preservation analysis
structure_metrics = StructurePreservation()
local_quality = structure_metrics.local_structure_preservation(X, embeddings)
global_quality = structure_metrics.global_structure_preservation(X, embeddings)

# Custom clustering analysis
clustering_metrics = ClusteringQuality()
silhouette_scores = clustering_metrics.silhouette_analysis(embeddings, y)

Advanced Dimension Reduction Methods

The toolkit now includes several advanced dimension reduction algorithms:

Kernel PCA

# Non-linear PCA using different kernel functions
kpca_rbf = reducer.fit_transform(X, method='kernel_pca', kernel='rbf', gamma=0.1)
kpca_poly = reducer.fit_transform(X, method='kernel_pca', kernel='poly', degree=3)

Manifold Learning

# ISOMAP - preserves geodesic distances
isomap_result = reducer.fit_transform(X, method='isomap', n_neighbors=10)

# Locally Linear Embedding - preserves local neighborhood relationships
lle_result = reducer.fit_transform(X, method='lle', n_neighbors=10, reg=1e-3)

# Laplacian Eigenmaps - spectral embedding based on graph Laplacian
le_result = reducer.fit_transform(X, method='laplacian_eigenmap', n_neighbors=10)

Deep Learning Methods

# Variational Autoencoder with custom architecture
vae_result = reducer.fit_transform(
    X, 
    method='vae', 
    hidden_dims=[128, 64], 
    epochs=100, 
    batch_size=32
)

# Autoencoder with custom parameters
ae_result = reducer.fit_transform(
    X, 
    method='autoencoder', 
    hidden_dims=[128, 64], 
    epochs=100
)

Algorithm Comparison

# Compare all available methods
methods = ['pca', 'kernel_pca', 'isomap', 'lle', 'laplacian_eigenmap', 'tsne', 'umap', 'vae']
results = reducer.compare_methods(X, methods=methods, n_components=2)

# Get detailed comparison with custom parameters
method_params = {
    'kernel_pca': {'kernel': 'rbf', 'gamma': 0.1},
    'isomap': {'n_neighbors': 15},
    'lle': {'n_neighbors': 15, 'reg': 1e-2},
    'laplacian_eigenmap': {'n_neighbors': 15, 'affinity': 'rbf'},
    'tsne': {'perplexity': 30, 'n_iter': 1000},
    'umap': {'n_neighbors': 15, 'min_dist': 0.1},
    'vae': {'epochs': 50, 'batch_size': 64}
}

detailed_results = reducer.compare_methods(
    X, 
    methods=methods, 
    n_components=2, 
    method_params=method_params
)

If you use this toolkit in your research, please cite:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •