返回首页

JamesSmith_PHacking_2021E-_可以_“证实”_一切的方法_

In 2011, a group of researchers conducted a scientific study to find an impossible result: that listening to certain songs can make you younger. 2011 年,一群研究者 进行了一项科学研究, 其发现让人难以置信: 聆听某些歌曲能让你变年轻。
Their study involved real people, truthfully reported data, and commonplace statistical analyses . 他们的研究用到真人参与、 诚实回报的资料, 以及常用的统计分析。
So how did they do it? 他们怎么做到的?
The answer lies in a statistical method scientists often use to try to figure out whether their results mean something or if they’re random noise. 答案是一种统计方法,科学家通常会用它来判别 研究结果是有意义的, 或者只是随机杂音。
conducted:v.组织;安排;实施;执行;指挥;带领;引导;(conduct的过去分词和过去式) scientific:adj.科学的,系统的; involved:adj.有关的; v.涉及; (involve的过去式和过去分词) truthfully:adv.诚实地;深信不疑地; commonplace:n.老生常谈;司空见惯的事;普通的东西;adj.平凡的;陈腐的; statistical:adj.统计的;统计学的; analyses:n.分析;解析;分解;梗概(analysis的复数形式); random:adj.[数]随机的;任意的;胡乱的;n.随意;adv.胡乱地;
In fact, the whole point of the music study was to point out ways this method can be misused . 事实上,这项音乐研究的重点 就是要点出这个方法 可能如何被误用。
A famous thought experiment explains the method: there are eight cups of tea, four with the milk added first, and four with the tea added first. 有一个着名的思想实验 就解释了这个方法: 有八杯茶, 其中四杯先加牛奶,另外四杯先加茶。
A participant must determine which are which according to taste. 受试者要根据味道 来判断哪一杯是哪一种。
misused:vt.滥用;误用;虐待;n.滥用;误用;虐待; participant:adj.参与的;有关系的;n.参与者;关系者; determine:v.决定;确定;测定;查明;形成;影响;裁决;安排; according to:根据,据说;
There are 70 different ways the cups can be sorted into two groups of four, and only one is correct. 将任意四杯分成一组, 一共会有七十种组合, 其中只有一种是正确的。
So, can she taste the difference? 她能尝出差异吗?
That’s our research question.
To analyze her choices, we define what’s called a null hypothesis : that she can’t distinguish the teas. 为了分析各种选择,我们要先设定所谓的虚无假说: 她无法分辨。
If she can’t distinguish the teas, she’ll still get the right answer 1 in 70 times by chance . 如果她无法分辨, 她仍然有答对的可能,猜对的机率有七十分之一。
1 in 70 is roughly .014. 七十分之一约为 0.014。
That single number is called a p-value. 这个数字叫做 p 值。
analyze:v.对…进行分析,分解(等于analyse); define:v.定义;使明确;规定; null hypothesis:n.虚无假设(即用两组人分别实验而结果相同); distinguish:vt.区分;辨别;使杰出,使表现突出;vi.区别,区分;辨别; by chance:偶然;意外地; roughly:adv.粗糙地;概略地;
In many fields, a p-value of .05 or below is considered statistically significant , meaning there’s enough evidence to reject the null hypothesis. 在许多领域中,等于或小于 0.05 的 p 值被认为具有统计显着性, 意即已有证据足以摒弃这个虚无假设。
Based on a p-value of .014, they’d rule out the null hypothesis that she can’t distinguish the teas. 因为这个研究的 p 值为 0.014, 他们就会将「她无法分辨」的 虚无假说排除。
Though p-values are commonly used by both researchers and journals to evaluate scientific results, they’re really confusing , even for many scientists. 虽然研究者和期刊都经常使用 p 值 来评估科学研究结果, 但就连许多科学家 也会对 p 值感到困惑,
statistically:adv.统计地;统计学上; significant:adj.重大的;有效的;有意义的;值得注意的;意味深长的;n.象征;有意义的事物; evidence:n.证据,证明;迹象;明显;v.证明; reject:v.排斥;拒收;拒绝接受;不予考虑;n.废品;次品;不合格者;被剔除者; journals:n.学术期刊(journal的复数);日记;日记账; evaluate:v.评价;评估;估计; confusing:adj.令人困惑; v.使糊涂; (confuse的现在分词)
That’s partly because all a p-value actually tells us is the probability of getting a certain result, assuming the null hypothesis is true. 部分原因是 p 值其实只是告诉我们, 如果虚无假设是真的, 得到某个结果的机率有多高。
So if she correctly sorts the teas, the p-value is the probability of her doing so assuming she can’t tell the difference. 所以,如果她把茶正确地分类, p 值就是在假设 她无法分辨的前提下 正确分辨的机率,
But the reverse isn’t true: the p-value doesn’t tell us the probability that she can taste the difference, which is what we’re trying to find out. 但反过来就不见得是对的: p 值不会告诉我们 她分辨错误的机率, 这机率才是我们想找出的答案。
probability:n.可能性;机率;[数]或然率; assuming:conj.假设…为真; adj.傲慢的; v.假定; (assume的现在分词) reverse:n.反面; v.颠倒; adj.相反的;
So if a p-value doesn’t answer the research question, why does the scientific community use it? 所以,如果 p 值不能解答研究问题, 为什么仍被科学界采用?
Well, because even though a p-value doesn’t directly state the probability that the results are due to random chance, it usually gives a pretty reliable indication . 因为虽然 p 值不能直接代表 随机猜中的机率, 但它通常仍然能提供蛮可靠的暗示,
At least, it does when used correctly. 至少是在正确使用的情况下。
community:n.社区;[生态]群落;共同体;团体; directly:adv.直接地;立即;马上;正好地;坦率地;conj.一…就; reliable:adj.可信赖的;可依靠的;真实可信的;可靠的; indication:n.显示;表明;标示;象征;
And that’s where many researchers, and even whole fields, have run into trouble. 这就是许多研究者,甚至整个研究领域遇到问题的地方了。
Most real studies are more complex than the tea experiment. 大部分真正的研究 都比这个茶的实验复杂许多。
Scientists can test their research question in multiple ways, and some of these tests might produce a statistically significant result, 科学家可以用多种方式 来测试他们的研究, 有些测试可能会产生 具有统计显着性的结果,
It might seem like a good idea to test every possibility. 测试每一种可能性似乎是个好点子,
complex:adj.复杂的;合成的;n.复合体;综合设施; multiple:adj.数量多的;多种多样的;n.倍数;
But it’s not, because with each additional test, the chance of a false positive increases. 但事实并非如此, 结果是伪真的可能性就会增加。
Searching for a low p-value, and then presenting only that analysis , is often called p-hacking. 找一个很低的 p 值,并只呈现对应该 p 值的分析, 通常被称为 p 值骇客。
It’s like throwing darts until you hit a bullseye and then saying you only threw the dart that hit the bull’s eye. 这就像是不断射飞镖, 直到命中红心, 然后宣称你只射了 命中红心的那个飞镖。
additional:adj.附加的,额外的; positive:adj.积极的;[数]正的,[医][化学]阳性的;确定的;n.正数;[摄]正片; only that:只是;要不是; analysis:n.分析;分解;验定; darts:n.镖(dart的复数形式);射镖游戏;v.投掷;投射(dart的单三形式); bullseye:n.靶眼;圆心;牛眼灯;
This is exactly what the music researchers did. 那些声称音乐可以驻颜的研究者 用的就是这一招。
They played three groups of participants each a different song and collected lots of information about them. 针对三组受试者, 他们各播放一首不同的歌曲, 接着收集许多实验的资讯。
The analysis they published included only two out of the three groups. 他们发表的分析 只包含三组当中的两组。
Of all the information they collected, to “control for variation in baseline age across participants.” 在他们所收集到的所有资讯中, 以「控制各受试者 基线年龄的差异」。
participants:n.参与者(participant的复数形式); variation:n.变异;变体;变奏;变种; baseline:n.基线;底线;
They also paused their experiment after every ten participants, and continued if the p-value was above .05, but stopped when it dipped below .05. 而且每做完十个受试者, 他们就会把实验暂停, 如果 p 值高于 0.05 就会继续, 若低于 0.05,就停下来。
They found that participants who heard one song were 1.5 years younger than those who heard the other song, with a p-value of .04. 他们发现,听某一首歌曲的受试者 比听另一首歌曲的受试者 还要年轻一岁半,
Usually it’s much tougher to spot p-hacking, because we don’t know the results are impossible: the whole point of doing experiments is to learn something new. 一般来说,p 值骇客很难被发现, 因为我们不会知道结果是不可能的: 做实验的目的就是想取得新知。
dipped:v.蘸;浸;(使)下降;把(汽车前灯的)远光调为近光;(dip的过去分词和过去式)
Fortunately , there’s a simple way to make p-values more reliable: pre-registering a detailed plan for the experiment and analysis beforehand that others can check, 幸运的是,有一个简单的方法可以让 p 值变得更可靠: 事先登录实验及分析计画, 让他人能够檢查,
so researchers can’t keep trying different analyses until they find a significant result. 这样研究者就无法 不断尝试不同的分析, 直到找到显着的结果为止。
And, in the true spirit of scientific inquiry , there’s even a new field that’s basically science doing science on itself: studying scientific practices in order to improve them. 而且,根据真正的科学调查精神, 甚至有一个新领域,基本上是科学在对自己做科学: 研究的是科学的研究方法,以改善它们。
Fortunately:adv.幸运地; beforehand:adv.事先;预先;adj.提前的;预先准备好的; inquiry:n.询问;查询;打听;调查; basically:adv.主要地,基本上; improve:v.改进;改善;