StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

Here is the cleaned-up version of your markdown without the abnormal characters:

StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

📢 News

[April 2026] StreamFP has been accepted to The Web Conference 2026 (WWW '26)!

📖 Overview

StreamFP is a novel stream learning framework designed to handle non-stationary data streams with high efficiency and robustness against catastrophic forgetting. It introduces learnable fingerprints—compact parameter vectors that summarize the model state—to guide data selection processes.

Key challenges in Stream Learning (SL) addressed by StreamFP:

Data Redundancy: Incoming streams often contain redundant data that wastes computation.
Catastrophic Forgetting: Incremental updates can overwrite earlier knowledge.
Efficiency: Traditional model-based selection is often too computationally expensive for real-time streams.

StreamFP achieves superior accuracy and efficiency compared to state-of-the-art methods (e.g., Camel, ER, GradMatch) across varying data arrival rates.

🚀 Methodology

StreamFP consists of three key components driven by a shared set of learnable fingerprints [cite: 141-144]:

Fingerprint-based Coreset Selection (FCS): Selects informative samples from incoming batches based on fingerprint similarity, prioritizing data that balances novelty and familiarity.
Fingerprint-based Buffer Update (FBU): Dynamically maintains the replay buffer by preserving representative historical samples and discarding redundant ones.
Fingerprint Attunement (FA): A lightweight plugin that uses pre-trained ViT attention to calibrate fingerprints online with negligible overhead.

🛠️ Installation

Prerequisites

Linux or macOS
Python 3.8+
PyTorch 1.12+ and CUDA 11.3+

Setup

# Clone the repository
git clone https://github.com/CGCL-codes/StreamFP.git
cd StreamFP

# Create and activate conda environment
conda env create -f environment.yml
conda activate sl

# (Optional) Install FastMoE (main path: build without NCCL)
# NOTE: FastMoE builds a CUDA extension. If you see errors like "nccl.h: No such file or directory",
# you can build without NCCL by setting USE_NCCL=0 (recommended unless you need NCCL-based distributed comm).
conda install -y cmake ninja

git clone --recursive https://github.com/laekov/fastmoe.git
cd fastmoe

# Option 1: disabling distributed features
USE_NCCL=0 python setup.py install

# Option 2: enabling distributed features
python setup.py install

# Quick check
python -c "import fmoe, fmoe_cuda; print('FastMoE installed:', fmoe_cuda.__file__)"
cd ..

📂 Datasets

Create a data/ directory in the project root.

Download the datasets and extract them into the corresponding dataset folders under data/.

Clear10 / Clear100: Download from Clear Benchmark.
Stream-51: Download from Stream-51 GitHub.
CORe50: Run the provided script to download and setup:

sh core50.sh

⚡ Quick Start

Basic Usage

To run a standard experiment, use the scripts provided in experiments/:

# Run Clear10 experiment
sh experiments/clear10.sh

# Run Clear100 experiment
sh experiments/clear100.sh

# Run Core50 experiment
sh experiments/core50.sh

# Run Stream-51 experiment
sh experiments/stream51.sh

Custom Configuration

You can customize the training by modifying the arguments in run.py. Key arguments include:

--selection_method: Strategy for coreset selection (e.g., StreamFP, Camel, Random).
--update_method: Strategy for buffer update (e.g., StreamFP, ER, GSS).
--skip_batch: Enable batch skipping for high-speed streams (default: 1).
- 0: no skipping (process every batch)
- k > 0: after processing one batch, skip the next k batches (reduces processing frequency)
--traintime_limit: Per-batch training time budget to simulate real-time constraints.

Example command:

python -u run.py --config configs/clear10.yaml \
  --repeat 1 --overwrite 1 \
  --selection_method StreamFP --update_method StreamFP \
  --mem_size 102 --traintime_limit 10

📊 Results

StreamFP consistently outperforms baselines in both Accuracy and Forgetting metrics. Below is a comparison on Stream-51 and Clear10 datasets:

Dataset	Method	Accuracy (%)	Forgetting (%)	Runtime (s)
Stream-51	ER	59.99	3.70	1883.75
	StreamFP	64.44	2.25	2049.52
Clear10	ER	51.90	1.09	412.50
	StreamFP	54.94	0.82	448.80

Detailed results can be found in the results_log/ directory after training.

📜 Citation

If you find this work useful for your research, please cite our WWW '26 paper:

@inproceedings{li2026streamfp,
  title={StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning},
  author={Li, Changwu and Shi, Tongjun and Zhang, Shuhao and Chen, Binbin and He, Bingsheng and Liao, Xiaofei and Jin, Hai},
  booktitle={Proceedings of the ACM Web Conference 2026 (WWW '26)},
  year={2026},
  publisher={ACM},
  address={Dubai, United Arab Emirates},
  doi={10.1145/3774904.3792584}
}

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
buffer		buffer
configs		configs
cords/cords		cords/cords
dataloaders		dataloaders
dataselections		dataselections
experiments		experiments
learners		learners
models		models
paper		paper
utils		utils
LICENSE		LICENSE
README.md		README.md
core50.sh		core50.sh
environment.yml		environment.yml
requirements.txt		requirements.txt
run.py		run.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

📢 News

📖 Overview

🚀 Methodology

🛠️ Installation

Prerequisites

Setup

📂 Datasets

⚡ Quick Start

Basic Usage

Custom Configuration

📊 Results

📜 Citation

📝 License

About

Uh oh!

Releases 1

Packages

Contributors 3

Uh oh!

Languages

License

CGCL-codes/StreamFP

Folders and files

Latest commit

History

Repository files navigation

StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

📢 News

📖 Overview

🚀 Methodology

🛠️ Installation

Prerequisites

Setup

📂 Datasets

⚡ Quick Start

Basic Usage

Custom Configuration

📊 Results

📜 Citation

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Uh oh!

Languages

Packages