Tag
Explore by tags
Provides NumPy-compatible array operations with composable program transformations — automatic differentiation, JIT compilation to XLA, and vmap/pmap for vectorization and parallel execution. Optimized for GPUs/TPUs and widely used for research and large-scale model training.
High-throughput inference and serving engine for LLMs that reduces KV-cache memory use with PagedAttention. Provides continuous batching, CUDA/HIP acceleration, Hugging Face integration, quantization, and an OpenAI-compatible API for production LLM serving.
Scans LLMs for security and safety failures — probing for hallucination, data leakage, prompt injection, jailbreaks, toxicity, and misinformation. A CLI red‑teaming kit that runs static, dynamic and adaptive probes across many providers and outputs structured JSONL reports.
