Privacy-first, self-hosted personal knowledge manager with block-level references, Markdown WYSIWYG editing and large-document performance; offers local-first storage, OpenAI-based AI writing/Q&A integration, OCR, mobile apps and Docker deployment.
Pocket-sized multimodal LLM for efficient image- and video-understanding on mobile and edge devices, featuring mixed 4x/16x visual-token compression (MiniCPM‑V 4.6), compact 1.3B variants, and ready guides for iOS/Android/HarmonyOS deployment.
Continuously captures your screen and spoken conversations, transcribes them in real time, generates summaries and action items, and exposes a memory-backed chat that can retrieve what you've seen and heard. Works across desktop, mobile and wearable devices and supports local SDKs and cloud sync.
Enables LLMs and agents to control iOS/Android emulators, simulators and real devices via MCP — exposing structured accessibility snapshots plus screenshot-based tools for taps, typing, swipes and data extraction as a local MCP server.
Cross‑platform AI client for web, desktop, and mobile that lets teams pick model providers, run local or on‑prem inference, and keep data self‑hosted — aimed at enterprise self‑deployment to avoid vendor lock‑in.
Provides 22 scripts that let Claude Code build, test, and interact with iOS apps by wrapping xcodebuild and controlling the simulator via simctl/idb. Uses accessibility-driven UI navigation, progressive build summaries, and compressed screenshots to cut token cost and fragility for AI agents and developers.
Delivers multilingual, on-device text-to-speech via ONNX Runtime with prebuilt ONNX assets and cross-platform SDKs (Python, Node, mobile); targets low-latency, privacy-preserving TTS with ready demos and 31-language support in v3.
Generates production-ready App Store and Google Play screenshots from app metadata and style preferences using AI. Scaffolds a Next.js project, composes ad-style slides with localized/RTL support, and exports PNGs at all required Apple and Google resolutions.
Orchestrates parallel CLI-based AI agents in isolated git worktrees so you can run multiple coding agents side-by-side, review AI-generated diffs, and link PRs/CI to each worktree. Desktop client with a mobile companion and BYO model subscriptions.
Delivers an ultra-efficient, edge-friendly multimodal image-and-video-to-text model optimized for on-device deployment. Uses mixed 4x/16x visual token compression, a low-FLOPs visual encoder, and multiple quantized variants for mobile and embedded inference.
Open egocentric multimodal dataset for embodied AI and robot learning captured on commodity iPhone Pro: ~200 hours and ~10M RGB frames with LiDAR depth, ARKit 6‑DoF poses, IMU, two‑hand MANO mocap, room meshes, and hierarchical action captions.