I'm a Senior AI Engineer & Backend Architect specializing in building high-scale distributed systems and production-grade GenAI platforms. My expertise lies in bridging the gap between cutting-edge AI research and robust, scalable backend engineering.
- ๐๏ธ Designing cloud-native, event-driven backends for large-scale LLM and generative AI workloads.
- โก Building low-latency, fault-tolerant distributed systems with smart batching, async I/O, and backpressure-aware routing.
- ๐ฆ Orchestrating GPU/CPU workloads on Kubernetes with autoscaling, bin-packing, and workload-aware scheduling.
- ๐ค Developing agentic AI backends for multi-agent orchestration, tool use, and long-running workflows with reliable state.
- ๐ง Implementing LLM serving infrastructure (streaming APIs, KV cache reuse, vLLM/TensorRT-LLM, quantization) for high throughput.
- ๐ฏ Applying system design patterns for AI to production inference stacks.
- ๐ Building end-to-end observability for latency, error budgets, drift, and GPU utilization (metrics, tracing, structured logs, SLOs).
- ๐ฐ Engineering cost-efficient GPU infrastructure with autoscaling, right-sizing, spot capacity, and usage-based metering.
- ๐ Hardening AI systems for security and abuse (authn/z, rate limits, prompt injection defenses, secure data paths).
- ๐ Automating CI/CD and infrastructure-as-code for AI services using containers, GitOps, and Terraform-style workflows.
I believe in building systems that scale, sharing knowledge openly, and treating infrastructure as code. Every system I design is built for resilience, observability, and performance.
๐ซ Want to collaborate?
Open to: System Design discussions โข Backend architecture โข Open source collaboration โข Speaking opportunities โข Code reviews
|
Frameworks: Transformers โข LlamaIndex โข LangGraph Inference: vLLM โข TGI โข ONNX โข Triton โข TensorRT Training: PEFT โข DeepSpeed โข FSDP โข bitsandbytes Models: GPT-4 โข Claude 3 โข Llama 3 โข Mistral |
Core: Microservices โข Event-Driven โข CQRS โข DDD Messaging: NATS โข Redis Streams โข RabbitMQ Protocol: gRPC โข GraphQL โข REST โข WebSockets Observability: OpenTelemetry โข Prometheus โข Jaeger |
|
Serving: KServe โข Ray โข vLLM Operator โข TorchServe GPU Ops: NVIDIA Operator โข DCGM โข MIG โข MPS GitOps: ArgoCD โข Flux โข Helm โข GitHub Actions Clouds: AWS โข GCP โข Azure โข Lambda Labs |
Vector DBs: Pinecone โข Milvus โข Chroma โข pgvector Databases: MongoDB โข Elasticsearch โข DynamoDB Processing: Spark โข Airflow โข dbt โข Kafka Streams Storage: S3 โข MinIO โข Delta Lake โข Iceberg |
|
Production-grade RAG platform with advanced chunking, hybrid search, and multi-LLM support. ๐ฏ Key Features:
๐ ๏ธ Tech: Python โข FastAPI โข LangChain โข Weaviate โข vLLM |
Multi-agent system with agentic AI patterns, tool use, planning & orchestration. ๐ฏ Key Features:
๐ ๏ธ Tech: Python โข LangGraph โข GPT-4 โข Claude โข Weaviate |
|
End-to-end platform for fine-tuning LLMs with experiment tracking & deployment. ๐ฏ Key Features:
๐ ๏ธ Tech: PyTorch โข Transformers โข PEFT โข MLflow โข vLLM |
High-performance Go backend for LLM routing, caching & observability. ๐ฏ Key Features:
๐ ๏ธ Tech: Go โข gRPC โข Redis โข PostgreSQL โข OpenTelemetry |
๐๏ธ GPU Kubernetes PlatformProduction K8s infrastructure optimized for GPU workloads & model serving. ๐ฏ Key Features:
๐ ๏ธ Tech: Terraform โข Kubernetes โข Helm โข ArgoCD โข KServe |
๐ Explore all my repositories: โญ Additional Projects:
|
- ๐ฌ Experimenting with cutting-edge techniques (Graph RAG, Corrective RAG, Constitutional AI)
- ๐ Writing about lessons learned building GenAI systems at scale
- ๐ ๏ธ Contributing to open-source AI projects (vLLM, LangChain, Transformers)
- ๐ Implementing recent AI research papers (RAPTOR, HyDE, Reflexion)
- ๐ฌ Sharing insights on prompt engineering, RAG optimization, and LLMOps
โญ Star repositories you find useful โข ๐ข Share projects with your network โข ๐ค Contribute to open-source โข ๐ฌ Connect for collaborations