Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

kimi

llama.cpp

LLM

llm

lora

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

metal

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

numpy

nvidia

ocr

ollama

openai

opencode

pandas

paper

physics

pi

plugin

polars

postgres

privacy

prompt-engineering

pwa

python

pytorch

qwen

react

reasoning

retrieval

RL

robotics

rust

science

security

segmentation

shodan

skillkit

sora

speech

sqlite

ssh

stt

swe

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

web-search

windsurf

xAI

xai

SiYuan

siyuan-note (B3log)

Privacy-first, self-hosted personal knowledge manager with block-level references, Markdown WYSIWYG editing and large-document performance; offers local-first storage, OpenAI-based AI writing/Q&A integration, OCR, mobile apps and Docker deployment.

github typescript docker ocr ai-api+4

MiniCPM-V

OpenBMB, ModelBest +1

Pocket-sized multimodal LLM for efficient image- and video-understanding on mobile and edge devices, featuring mixed 4x/16x visual-token compression (MiniCPM‑V 4.6), compact 1.3B variants, and ready guides for iOS/Android/HarmonyOS deployment.

multimodal vision video LLM huggingface+5

Karakeep

Mohamed BassemLocalhost Labs Ltd, karakeep-app (GitHub)

Self-hostable “bookmark everything” app for saving links, notes, images and PDFs with automatic fetching of previews, full-text search, OCR, and LLM-based automatic tagging and summarization (supports local models via ollama). Targets users who want AI-assisted organization in a self-hosted stack.

typescript LLM ollama openai ai-tools+9

omi

Continuously captures your screen and spoken conversations, transcribes them in real time, generates summaries and action items, and exposes a memory-backed chat that can retrieve what you've seen and heard. Works across desktop, mobile and wearable devices and supports local SDKs and cloud sync.

ai-client chatbot audio python rust+5

mobile-mcp

Gives an LLM agent direct control of iOS and Android apps over one MCP interface, across simulators, emulators, and real devices. Reads the native accessibility tree to pick elements deterministically, using screenshot coordinates only as fallback.

mcp mcp-server typescript nodejs github+4

Magenta RealTime 2

Magenta (Google)

A toolkit and open-weights system for real-time streaming music generation — offers two model sizes (230M / 2.4B), a Python inference library (JAX/MLX), and a C++ engine optimized for Apple Silicon for embedding into DAWs and apps; real-time streaming requires M‑series chips.

audio huggingface google python ai-inference+3

Thunderbolt

Thunderbird (Mozilla)

Cross‑platform AI client for web, desktop, and mobile that lets teams pick model providers, run local or on‑prem inference, and keep data self‑hosted — aimed at enterprise self‑deployment to avoid vendor lock‑in.

ai-client llm ollama docker android+5

OpenMed

Maziyar Panahi, OpenMed team

Turns clinical text into structured, de-identified clinical signals—entity extraction and PII de-identification—that run entirely on local hardware. Provides 1,000+ specialized medical NER models, multilingual support, Apple MLX acceleration, and Apache‑2.0 licensing.

python nlp huggingface privacy ios+3

iOS Simulator Skill for Claude Code

Provides 22 scripts that let Claude Code build, test, and interact with iOS apps by wrapping xcodebuild and controlling the simulator via simctl/idb. Uses accessibility-driven UI navigation, progressive build summaries, and compressed screenshots to cut token cost and fragility for AI agents and developers.

agent-skills claude-code claude ios gitHub+5

Supertonic

Delivers multilingual, on-device text-to-speech via ONNX Runtime with prebuilt ONNX assets and cross-platform SDKs (Python, Node, mobile); targets low-latency, privacy-preserving TTS with ready demos and 31-language support in v3.

audio speech multilingual huggingface python+8

Mindwtr

Cross-platform GTD task manager for desktop, mobile, and web that covers the full Capture→Clarify→Organize→Reflect→Engage workflow with a local-first data model and flexible sync backends. Optional BYOK AI copilot and automation (CLI, REST API, MCP) help clarify and break down tasks.

LLM mcp mcp-server mcp-client ai-tools+10

App Store & Google Play Screenshots Generator

Generates production-ready App Store and Google Play screenshots from app metadata and style preferences using AI. Scaffolds a Next.js project, composes ad-style slides with localized/RTL support, and exports PNGs at all required Apple and Google resolutions.

agent-skills ai-image AIGC github typescript+7