|
|
JamesSmith_PHacking_2021E-_可以_“证实”_一切的方法_
|
In 2011, a group of researchers conducted a scientific study to find an impossible result: that listening to certain songs can make you younger. |
2011 年,一群研究者 进行了一项科学研究, 其发现让人难以置信: 聆听某些歌曲能让你变年轻。 |
Their study involved real people, truthfully reported data, and commonplace statistical analyses . |
他们的研究用到真人参与、 诚实回报的资料, 以及常用的统计分析。 |
So how did they do it? |
他们怎么做到的? |
The answer lies in a statistical method scientists often use to try to figure out whether their results mean something or if they’re random noise. |
答案是一种统计方法,科学家通常会用它来判别 研究结果是有意义的, 或者只是随机杂音。 |
conducted:v.组织;安排;实施;执行;指挥;带领;引导;(conduct的过去分词和过去式) scientific:adj.科学的,系统的; involved:adj.有关的; v.涉及; (involve的过去式和过去分词) truthfully:adv.诚实地;深信不疑地; commonplace:n.老生常谈;司空见惯的事;普通的东西;adj.平凡的;陈腐的; statistical:adj.统计的;统计学的; analyses:n.分析;解析;分解;梗概(analysis的复数形式); random:adj.[数]随机的;任意的;胡乱的;n.随意;adv.胡乱地;
|
In fact, the whole point of the music study was to point out ways this method can be misused . |
事实上,这项音乐研究的重点 就是要点出这个方法 可能如何被误用。 |
A famous thought experiment explains the method: there are eight cups of tea, four with the milk added first, and four with the tea added first. |
有一个着名的思想实验 就解释了这个方法: 有八杯茶, 其中四杯先加牛奶,另外四杯先加茶。 |
A participant must determine which are which according to taste. |
受试者要根据味道 来判断哪一杯是哪一种。 |
misused:vt.滥用;误用;虐待;n.滥用;误用;虐待; participant:adj.参与的;有关系的;n.参与者;关系者; determine:v.决定;确定;测定;查明;形成;影响;裁决;安排; according to:根据,据说;
|
There are 70 different ways the cups can be sorted into two groups of four, and only one is correct. |
将任意四杯分成一组, 一共会有七十种组合, 其中只有一种是正确的。 |
So, can she taste the difference? |
她能尝出差异吗? |
That’s our research question. |
|
To analyze her choices, we define what’s called a null hypothesis : that she can’t distinguish the teas. |
为了分析各种选择,我们要先设定所谓的虚无假说: 她无法分辨。 |
If she can’t distinguish the teas, she’ll still get the right answer 1 in 70 times by chance . |
如果她无法分辨, 她仍然有答对的可能,猜对的机率有七十分之一。 |
1 in 70 is roughly .014. |
七十分之一约为 0.014。 |
That single number is called a p-value. |
这个数字叫做 p 值。 |
analyze:v.对…进行分析,分解(等于analyse); define:v.定义;使明确;规定; null hypothesis:n.虚无假设(即用两组人分别实验而结果相同); distinguish:vt.区分;辨别;使杰出,使表现突出;vi.区别,区分;辨别; by chance:偶然;意外地; roughly:adv.粗糙地;概略地;
|
In many fields, a p-value of .05 or below is considered statistically significant , meaning there’s enough evidence to reject the null hypothesis. |
在许多领域中,等于或小于 0.05 的 p 值被认为具有统计显着性, 意即已有证据足以摒弃这个虚无假设。 |
Based on a p-value of .014, they’d rule out the null hypothesis that she can’t distinguish the teas. |
因为这个研究的 p 值为 0.014, 他们就会将「她无法分辨」的 虚无假说排除。 |
Though p-values are commonly used by both researchers and journals to evaluate scientific results, they’re really confusing , even for many scientists. |
虽然研究者和期刊都经常使用 p 值 来评估科学研究结果, 但就连许多科学家 也会对 p 值感到困惑, |
statistically:adv.统计地;统计学上; significant:adj.重大的;有效的;有意义的;值得注意的;意味深长的;n.象征;有意义的事物; evidence:n.证据,证明;迹象;明显;v.证明; reject:v.排斥;拒收;拒绝接受;不予考虑;n.废品;次品;不合格者;被剔除者; journals:n.学术期刊(journal的复数);日记;日记账; evaluate:v.评价;评估;估计; confusing:adj.令人困惑; v.使糊涂; (confuse的现在分词)
|
That’s partly because all a p-value actually tells us is the probability of getting a certain result, assuming the null hypothesis is true. |
部分原因是 p 值其实只是告诉我们, 如果虚无假设是真的, 得到某个结果的机率有多高。 |
So if she correctly sorts the teas, the p-value is the probability of her doing so assuming she can’t tell the difference. |
所以,如果她把茶正确地分类, p 值就是在假设 她无法分辨的前提下 正确分辨的机率, |
But the reverse isn’t true: the p-value doesn’t tell us the probability that she can taste the difference, which is what we’re trying to find out. |
但反过来就不见得是对的: p 值不会告诉我们 她分辨错误的机率, 这机率才是我们想找出的答案。 |
probability:n.可能性;机率;[数]或然率; assuming:conj.假设…为真; adj.傲慢的; v.假定; (assume的现在分词) reverse:n.反面; v.颠倒; adj.相反的;
|
So if a p-value doesn’t answer the research question, why does the scientific community use it? |
所以,如果 p 值不能解答研究问题, 为什么仍被科学界采用? |
Well, because even though a p-value doesn’t directly state the probability that the results are due to random chance, it usually gives a pretty reliable indication . |
因为虽然 p 值不能直接代表 随机猜中的机率, 但它通常仍然能提供蛮可靠的暗示, |
At least, it does when used correctly. |
至少是在正确使用的情况下。 |
community:n.社区;[生态]群落;共同体;团体; directly:adv.直接地;立即;马上;正好地;坦率地;conj.一…就; reliable:adj.可信赖的;可依靠的;真实可信的;可靠的; indication:n.显示;表明;标示;象征;
|
And that’s where many researchers, and even whole fields, have run into trouble. |
这就是许多研究者,甚至整个研究领域遇到问题的地方了。 |
Most real studies are more complex than the tea experiment. |
大部分真正的研究 都比这个茶的实验复杂许多。 |
Scientists can test their research question in multiple ways, and some of these tests might produce a statistically significant result, |
科学家可以用多种方式 来测试他们的研究, 有些测试可能会产生 具有统计显着性的结果, |
It might seem like a good idea to test every possibility. |
测试每一种可能性似乎是个好点子, |
complex:adj.复杂的;合成的;n.复合体;综合设施; multiple:adj.数量多的;多种多样的;n.倍数;
|
But it’s not, because with each additional test, the chance of a false positive increases. |
但事实并非如此, 结果是伪真的可能性就会增加。 |
Searching for a low p-value, and then presenting only that analysis , is often called p-hacking. |
找一个很低的 p 值,并只呈现对应该 p 值的分析, 通常被称为 p 值骇客。 |
It’s like throwing darts until you hit a bullseye and then saying you only threw the dart that hit the bull’s eye. |
这就像是不断射飞镖, 直到命中红心, 然后宣称你只射了 命中红心的那个飞镖。 |
additional:adj.附加的,额外的; positive:adj.积极的;[数]正的,[医][化学]阳性的;确定的;n.正数;[摄]正片; only that:只是;要不是; analysis:n.分析;分解;验定; darts:n.镖(dart的复数形式);射镖游戏;v.投掷;投射(dart的单三形式); bullseye:n.靶眼;圆心;牛眼灯;
|
This is exactly what the music researchers did. |
那些声称音乐可以驻颜的研究者 用的就是这一招。 |
They played three groups of participants each a different song and collected lots of information about them. |
针对三组受试者, 他们各播放一首不同的歌曲, 接着收集许多实验的资讯。 |
The analysis they published included only two out of the three groups. |
他们发表的分析 只包含三组当中的两组。 |
Of all the information they collected, to “control for variation in baseline age across participants.” |
在他们所收集到的所有资讯中, 以「控制各受试者 基线年龄的差异」。 |
participants:n.参与者(participant的复数形式); variation:n.变异;变体;变奏;变种; baseline:n.基线;底线;
|
They also paused their experiment after every ten participants, and continued if the p-value was above .05, but stopped when it dipped below .05. |
而且每做完十个受试者, 他们就会把实验暂停, 如果 p 值高于 0.05 就会继续, 若低于 0.05,就停下来。 |
They found that participants who heard one song were 1.5 years younger than those who heard the other song, with a p-value of .04. |
他们发现,听某一首歌曲的受试者 比听另一首歌曲的受试者 还要年轻一岁半, |
Usually it’s much tougher to spot p-hacking, because we don’t know the results are impossible: the whole point of doing experiments is to learn something new. |
一般来说,p 值骇客很难被发现, 因为我们不会知道结果是不可能的: 做实验的目的就是想取得新知。 |
dipped:v.蘸;浸;(使)下降;把(汽车前灯的)远光调为近光;(dip的过去分词和过去式)
|
Fortunately , there’s a simple way to make p-values more reliable: pre-registering a detailed plan for the experiment and analysis beforehand that others can check, |
幸运的是,有一个简单的方法可以让 p 值变得更可靠: 事先登录实验及分析计画, 让他人能够檢查, |
so researchers can’t keep trying different analyses until they find a significant result. |
这样研究者就无法 不断尝试不同的分析, 直到找到显着的结果为止。 |
And, in the true spirit of scientific inquiry , there’s even a new field that’s basically science doing science on itself: studying scientific practices in order to improve them. |
而且,根据真正的科学调查精神, 甚至有一个新领域,基本上是科学在对自己做科学: 研究的是科学的研究方法,以改善它们。 |
Fortunately:adv.幸运地; beforehand:adv.事先;预先;adj.提前的;预先准备好的; inquiry:n.询问;查询;打听;调查; basically:adv.主要地,基本上; improve:v.改进;改善;
|