Skip to content
View sanketny8's full-sized avatar

Block or report sanketny8

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
sanketny8/README.md
Typing SVG

๐Ÿš€ Designing scalable distributed systems & production-grade AI platforms

Twitter LinkedIn Portfolio



Profile Views GitHub Followers GitHub Stars


๐Ÿš€ About Me

I'm a Senior AI Engineer & Backend Architect specializing in building high-scale distributed systems and production-grade GenAI platforms. My expertise lies in bridging the gap between cutting-edge AI research and robust, scalable backend engineering.

๐Ÿ”ญ What I'm Currently Focused On

  • ๐Ÿ—๏ธ Designing cloud-native, event-driven backends for large-scale LLM and generative AI workloads.
  • โšก Building low-latency, fault-tolerant distributed systems with smart batching, async I/O, and backpressure-aware routing.
  • ๐Ÿ“ฆ Orchestrating GPU/CPU workloads on Kubernetes with autoscaling, bin-packing, and workload-aware scheduling.
  • ๐Ÿค– Developing agentic AI backends for multi-agent orchestration, tool use, and long-running workflows with reliable state.
  • ๐Ÿง  Implementing LLM serving infrastructure (streaming APIs, KV cache reuse, vLLM/TensorRT-LLM, quantization) for high throughput.
  • ๐ŸŽฏ Applying system design patterns for AI to production inference stacks.
  • ๐Ÿ” Building end-to-end observability for latency, error budgets, drift, and GPU utilization (metrics, tracing, structured logs, SLOs).
  • ๐Ÿ’ฐ Engineering cost-efficient GPU infrastructure with autoscaling, right-sizing, spot capacity, and usage-based metering.
  • ๐Ÿ” Hardening AI systems for security and abuse (authn/z, rate limits, prompt injection defenses, secure data paths).
  • ๐Ÿš€ Automating CI/CD and infrastructure-as-code for AI services using containers, GitOps, and Terraform-style workflows.

๐Ÿ’ก My Philosophy

I believe in building systems that scale, sharing knowledge openly, and treating infrastructure as code. Every system I design is built for resilience, observability, and performance.

๐Ÿ“ซ Want to collaborate?

Open to: System Design discussions โ€ข Backend architecture โ€ข Open source collaboration โ€ข Speaking opportunities โ€ข Code reviews


๐Ÿ’ก Core Competencies

๐Ÿ—๏ธ Distributed Systems & Backend Architecture

System Design Scalability High Perf Resilience

๐Ÿค– LLMs & GenAI Expertise

RAG Fine-tuning Agents VectorDB


๐Ÿ› ๏ธ Technical Expertise

๐Ÿง  AI & GenAI Stack


Frameworks: Transformers โ€ข LlamaIndex โ€ข LangGraph
Inference: vLLM โ€ข TGI โ€ข ONNX โ€ข Triton โ€ข TensorRT
Training: PEFT โ€ข DeepSpeed โ€ข FSDP โ€ข bitsandbytes
Models: GPT-4 โ€ข Claude 3 โ€ข Llama 3 โ€ข Mistral

โš™๏ธ Backend & Distributed Systems


Core: Microservices โ€ข Event-Driven โ€ข CQRS โ€ข DDD
Messaging: NATS โ€ข Redis Streams โ€ข RabbitMQ
Protocol: gRPC โ€ข GraphQL โ€ข REST โ€ข WebSockets
Observability: OpenTelemetry โ€ข Prometheus โ€ข Jaeger

โ˜๏ธ Cloud & MLOps


Serving: KServe โ€ข Ray โ€ข vLLM Operator โ€ข TorchServe
GPU Ops: NVIDIA Operator โ€ข DCGM โ€ข MIG โ€ข MPS
GitOps: ArgoCD โ€ข Flux โ€ข Helm โ€ข GitHub Actions
Clouds: AWS โ€ข GCP โ€ข Azure โ€ข Lambda Labs

๐Ÿ—„๏ธ Data Engineering


Vector DBs: Pinecone โ€ข Milvus โ€ข Chroma โ€ข pgvector
Databases: MongoDB โ€ข Elasticsearch โ€ข DynamoDB
Processing: Spark โ€ข Airflow โ€ข dbt โ€ข Kafka Streams
Storage: S3 โ€ข MinIO โ€ข Delta Lake โ€ข Iceberg

๐Ÿ“Š GitHub Statistics & Activity

Activity Graph

๐ŸŽฏ Featured Projects

Stars Forks

Production-grade RAG platform with advanced chunking, hybrid search, and multi-LLM support.

๐ŸŽฏ Key Features:

  • โšก Hybrid retrieval with cross-encoder reranking
  • ๐Ÿ”€ Multi-LLM router (OpenAI, Anthropic, local)
  • ๐Ÿ“Š Comprehensive evaluation (RAGAS)
  • ๐Ÿ’ฐ Cost optimization & semantic caching

๐Ÿ› ๏ธ Tech: Python โ€ข FastAPI โ€ข LangChain โ€ข Weaviate โ€ข vLLM

Stars Forks

Multi-agent system with agentic AI patterns, tool use, planning & orchestration.

๐ŸŽฏ Key Features:

  • ๐Ÿง  ReAct pattern with self-reflection
  • ๐Ÿ”ง Dynamic tool registry
  • ๐Ÿ’พ Multi-tier memory system
  • ๐Ÿ›ก๏ธ Sandboxed execution

๐Ÿ› ๏ธ Tech: Python โ€ข LangGraph โ€ข GPT-4 โ€ข Claude โ€ข Weaviate

Stars

End-to-end platform for fine-tuning LLMs with experiment tracking & deployment.

๐ŸŽฏ Key Features:

  • ๐Ÿ”ง LoRA/QLoRA fine-tuning
  • ๐Ÿ“Š Comprehensive evaluation
  • ๐Ÿ“ˆ MLflow + W&B tracking
  • ๐Ÿš€ Auto-deployment to vLLM

๐Ÿ› ๏ธ Tech: PyTorch โ€ข Transformers โ€ข PEFT โ€ข MLflow โ€ข vLLM

Stars

High-performance Go backend for LLM routing, caching & observability.

๐ŸŽฏ Key Features:

  • ๐Ÿ”€ Multi-provider routing
  • โšก Semantic caching
  • ๐ŸŽซ Token-based rate limiting
  • ๐Ÿ“Š Sub-10ms p99 latency

๐Ÿ› ๏ธ Tech: Go โ€ข gRPC โ€ข Redis โ€ข PostgreSQL โ€ข OpenTelemetry

๐Ÿ—๏ธ GPU Kubernetes Platform

Stars

Production K8s infrastructure optimized for GPU workloads & model serving.

๐ŸŽฏ Key Features:

  • ๐ŸŽฎ GPU node pools (T4, A10G, A100)
  • ๐Ÿš€ KServe + vLLM operator
  • ๐Ÿ“Š DCGM monitoring
  • ๐Ÿ”„ GitOps with ArgoCD

๐Ÿ› ๏ธ Tech: Terraform โ€ข Kubernetes โ€ข Helm โ€ข ArgoCD โ€ข KServe

๐Ÿ“š More Projects

๐Ÿ”— Explore all my repositories:

GitHub Repos

โญ Additional Projects:

  • Vector DB Benchmarks
  • LLM Evaluation Framework
  • Prompt Engineering Library
  • AI Cost Optimizer

๐Ÿ“ Latest Blog Posts


โšก Recent GitHub Activity


๐Ÿ“ˆ Learning Journey & Current Focus

๐ŸŽฏ Current Focus Areas

Building Optimizing Exploring Researching


๐Ÿ”ฌ Research Implementation

Graph RAG HyDE Reflexion Constitutional AI

๐Ÿ“š Continuous Learning

  • ๐Ÿ”ฌ Experimenting with cutting-edge techniques (Graph RAG, Corrective RAG, Constitutional AI)
  • ๐Ÿ“ Writing about lessons learned building GenAI systems at scale
  • ๐Ÿ› ๏ธ Contributing to open-source AI projects (vLLM, LangChain, Transformers)
  • ๐ŸŽ“ Implementing recent AI research papers (RAPTOR, HyDE, Reflexion)
  • ๐Ÿ’ฌ Sharing insights on prompt engineering, RAG optimization, and LLMOps

๐Ÿ“ซ Let's Connect!

๐Ÿ’ฌ I'm always open to interesting conversations and collaborations!

Collaborations Discussions Community


๐ŸŒ Find Me On

Twitter LinkedIn Portfolio



๐Ÿ’ Support My Work

โญ Star repositories you find useful โ€ข ๐Ÿ“ข Share projects with your network โ€ข ๐Ÿค Contribute to open-source โ€ข ๐Ÿ’ฌ Connect for collaborations


"Building the future of AI, one production system at a time"

Made with Love Built for Production

โญ๏ธ From sanketny8 with ๐Ÿ’œ

Pinned Loading

  1. llm-engineering-fundamentals llm-engineering-fundamentals Public

    Production-ready transformer implementation from scratch: 10 complete projects covering tokenization, positional embeddings, attention, transformers, and more. 148 passing tests, modern techniques โ€ฆ

    Python