Provides 100,000 generated low-quality↔high-quality image pairs created with modern multi-frame/multi-modal models to boost generalization of image restoration methods; includes train/test JSONL lists, baseline training code, and pretrained checkpoints under CC BY‑NC‑ND 4.0.
Provides ~1M synthetic Salvadoran‑Spanish personas (148k records, ~300M tokens) grounded in 2024 census distributions for demographics, occupations and locations; intended for training/evaluating localized LLMs and synthetic-data workflows. CC BY 4.0, adults only.
Why this matters
Real-world image restoration struggles because training pairs rarely match diverse natural degradations. GGT-100K addresses that gap by using modern multi-frame/multi-modal models (MFMs) to synthesize realistic LQ–HQ pairs at scale, so restoration models trained with these pairs generalize better to unseen real degradations.
Great fit if you train or evaluate image restoration models and need broader real-world degradation coverage for better generalization; useful for researchers benchmarking SOTA restorers and for engineers wanting ready-made training splits and checkpoints. Look elsewhere if you require fully permissive commercial licensing (CC BY‑NC‑ND forbids commercial reuse and derivatives) or if you need pixel-perfect, human-photographed ground truth rather than model-generated GT.
Use GGT-100K as an augmentation or additional training corpus alongside existing real/synthetic datasets when assessing robustness to diverse degradations. It complements traditional datasets by providing generative ground truth derived from MFMs and is most informative when compared side-by-side with models trained without GGT-100K.