Proposes TASTE, an automatic pipeline that synthesizes challenging agent benchmark tasks by sampling and evolving valid tool-sequence patterns; uses an adaptive contrastive n-gram model and LLM validity judgments to produce τ^c-Bench with broader tool-use coverage and higher difficulty.