how does deepseek r1's performance in math-heavy benchmarks compare to gpt-4oline中国官网入口Go deepseek spec