【案例】数据告诉你,为什么总是谣言比真相跑得快(中英文)
文章来源:科学杂志/再建巴别塔 翻译:刘航、陈天蓝 校对:罗人杰 虚假消息以及其对政治、经济、社会生态可能产生的影响引发了全世界的担忧。为了探究虚假消息究竟是如何传播的,Vosoughi et al.把2006至2017年发布在推特上的传言级联(rumor cascades)搜集在一起进行研究。研究数据显示,大约有126,000条传言被近三百万人传播。虚假消息往往比真实消息传得更广:位列传言级联前1%的内容被散布到了一千至十万人中;然而真实消息的受众却很少能超过一千人。同时,虚假消息的传播速度也比真实消息快得多。消息本身的新奇度和受众的情绪体验可能是造成上述现象的原因。 我们调查了2006至2017年推特上所有核实过的真假消息不同的散布程度。(研究的数据由三百万人转发了超过四百五十万次的126,000条消息组成。)我们根据来自六个独立的事实核查机构的信息来判定消息的真假,它们的判定结果有着95-98%的一致性。虚假消息在任何种类的消息中都明显比真实消息散布得更广、更快、更深入,这一现象在政治消息方面尤为突出,甚至超过了恐怖主义、自然灾害、科学资讯、都市传说、金融消息等方面的传言。我们发现虚假消息比真实消息更加新奇,这点说明了人们更加倾向于分享新奇的消息。虚假的故事激起人们的恐惧、憎恶与惊讶,而真实的故事则激发人们的期望、悲伤、喜悦与信任。与普遍看法不同,机器人加速真假消息传播的程度是相同的,这暗示着虚假消息的传播速度超过真实消息并不是因为机器人,而是人类自身的原因。 关于决策、合作、交流与市场的基础理论都认为对于事实和准确的定义对于几乎每个人的行为决策都至关重要,然而真实与虚假的消息却同样通过网络媒介快速传播。定义真假已经变成一项政府理所当然的工作,而不是人们基于各种事实多层次的讨论。我们的经济体系也未能免于虚假消息的干扰。错误传言影响了股价,并动摇了人们对于大规模投资的积极性。例如,一条声称奥巴马在一场爆炸中受伤的推特使1300亿美金的股值人间蒸发。受网上流传的假消息的影响,我们对于一切消息的原有反应均受到了破坏。 新的技术在促进即时消息交换与大规模消息级联的同时,也助长了虚假消息的传播。然而,尽管我们越来越依靠这些新技术获得消息,我们却很少知道它们到底在多大程度上助长虚假消息的传播。关于虚假消息传播的坊间分析受到了媒体足够的重视,可是很少有大规模的实证调查来探究虚假消息的渗透过程及其社会根源。关于虚假消息传播的研究目前仅局限于小范围的、特殊的样本,忽视了两大重要的科学问题:真实消息与虚假消息的散布过程究竟有何不同?个人判断中的哪些因素造成了这些不同呢? 目前的研究都着重于单一传言的传播,比如希格斯玻色子的发现(the discovery of the Higgs boson)、2010年的海地地震;或者是研究发源于同一个灾难性事件的多种传言,比如2013年波士顿马拉松爆炸案;又或者建立传言散布的理论模型、发明传言甄别技术与可信度衡量办法、探求限制传言传播的方法。但是,几乎没有研究能够彻底地探究为什么虚假消息与真实消息的传播过程不同。 比如说,尽管Del Vicario et al.和 Bessi et al.研究了科学消息与阴谋论的传播,他们并没有衡量这些消息的真实性。科学消息与阴谋论并不一定都是真的,而且它们在文风上截然不同,这种文风的不同对于它们的传播有着重要的影响,但与它们的真实性毫无关系。为了理解虚假消息是如何传播的,我们有必要将真实与虚假的科学消息、阴谋论区分开来,分别研究它们的散布过程,并且将消息按照主题、文风的不同进行分类比较。迄今为止唯一通过真实性来辨别传言的是Friggeri et al.的研究。他分析了散布在Facebook上的4000条传言,但是他的侧重点在于事实调查是如何影响传言传播的,而非虚假消息与真实消息的散布过程有何不同。 在如今的政治生态与学术文献中,围绕着“伪造消息”、社交网络上针对美国内政的境外干涉以及我们对于何谓消息、伪造消息、虚假消息、传言、传言级联的理解产生了许多不固定的术语。在以往,我们用真实性判定伪造消息,但是如今“伪造消息”一词在我们的政治与媒体生态中被高度极化。政客们会利用一种精明的政治策略,将不利于自己身份的消息判定为不可靠的或编造的,并将有利于自身的消息列为可靠消息。由于这个原因,“伪造消息”这个术语已经失去了它本来的意思,从而失去了其学术性。因此,在这篇论文中,我们有意地避免使用“伪造消息”这个术语,而是使用更加客观明确的 “真实消息”与“虚假消息”。尽管“伪造消息”与“误报”都暗示着对事实的有意歪曲,我们在论文中并不会探究传言制造者的企图。相反,我们会将焦点放在真实性与被验证为真或假的消息上。 同时,我们有意地接纳了对于“消息”这一词的宽泛的定义。传统看法认为人们对于某一具体事件的阐释与评价是构成“消息”的基础,但现在人们把推特上任何一个公开的说法都叫消息。我们将消息定义为任何含有个人主张的言论,而将传言定义为事件或言论在推特上散布的社会现象。就是说,传言本质上是社会性的,它涉及人与人之间的观点交流。而消息,不管是否被分享,只是一种言论而已。 当一名用户通过发推特、传照片、贴文章链接等方式就某一主题发表个人言论时,传言级联便开始产生了。他人会通过转发的方式扩大传言的影响。一条传言的扩散过程可以看成是一个或多个级联的集合(级联是指由同一个消息来源不停转发从而形成的传言扩散模式)。比如说,一个人可以通过发表对某一具体事件的言论来触发一个传言级联,而第二个人则基于相同的事件建构起独立于第一层级的第二层传言级联。如果两个层级之间互相独立,那么它们就是同一传言的两个级联。级联的规模由转发数量决定,而级联的层数则由用户基于同一事件单独发帖的次数决定。比如说,如果10个人分别发了有关传言A的帖子,但是没有人转发,那么传言A就有10个层级,每个层级的规模为1。同样,如果2个人分别发了关于传言B的帖子,每个帖子都分别有100个人转发,那么传言B就有2个层级,每个层级的规模为100。 我们利用从推特创始之初(2006年)至2017年所有经核实的传言级联中提取出的综合数据探究了真实消息、虚假消息与半真半假消息不同的散布过程。数据包含了被三百万人转发了超过四百五十万次的126,000条消息。我们将那六所独立的事实核查机构(snopes.com, politifact.com, factcheck.org, truthorfiction.com, hoax-slayer.com, and urbanlegends.about.com)调查过的所有传言级联都作为调查样本(这六个机构的判定结果有着95-98%的高度一致性),解构传言的标题、正文以及结论,并自动收集推特上这些传言相应的级联。我们收集了传言所有的英文回复并且利用文字识别技术从图片中提取文字。对于每一条转发,我们都提取出原帖以及所有对原帖的转发。接着,我们量化了级联的深度(原帖被不同用户转发的次数),规模(级联中涉及的用户数),最大广度(在任何深度中级联中所能容纳的最大用户数),和构造式病毒(structural virality)(这是一种插入内容之中的测度,这些内容通过单一的庞大消息源或者多层级模式传播——在这种模式中每个个体的直接参与都是整个传播的一部分)。 当一个传言被转发,级联的深度、规模、最大广度和构造式病毒都会上升(图 1A)。在级联数1~1000的区间内,虚假传言占更大的比例;而在级联数大于1000的区间内,真实传言占更大的比例(图 1B)。政治方面的传言也呈现这一特征(图 1D)。虚假传言的总量在2013、2015年末达到高峰,2016年末再次登顶,与最近的总统选举存在关联(图 1C)。数据还显示,政治方面的虚假传言在2012与2016年总统选举时显著增加,而在2014年俄罗斯合并克里米亚半岛时,半真半假的传言陡增(图 1E)。政治传言是我们数据中最大的传言类别,它含有45000个级联,之后依次是都市传说、商业、恐怖主义、科学、环境、自然灾害方面的传言。 ▲传言级联 (A)传言级联的一个例子,以及它的深度、规模、最大广度和构造式扩散过程。“Nodes”指的是推特用户。 (B)真、假、混合型(半真半假)级联的互补累积分布函数(The complementary cumulative distribution functions (CCDFs)),该函数描述了拥有特定级联数的传言在其类别中所占的比例。 (C) 2006~2017年推特上所有散布的真、假、混合传言的季度计数(Quarterly counts),在每个类别中都标注出了具体的样本。 (D)所有真、假、混合型政治方面的传言的互补累积分布函数(CCDFs)。 (E) 2006~2017年推特上所有散布的真、假、混合型政治方面传言的季度计数(Quarterly counts),在每个类别中都标注出了具体的样本。 (F)七种最常见类别传言级联的总数直方图。 当我们分析真假消息的扩散过程时,我们发现虚假消息在任何消息类别中都明显比真实消息散布得更广、更快、更深入。相比于真实级联,明显更多的虚假级联超过了深度10,而虚假级联的前0.01%比真实级联在推特中多散布了8个单位,比原帖多散布了19个单位(图 2A)。虚假消息也比真实消息传到了更多人耳中。位列传言级联前1%的内容被散布到了一千至十万人中;与之形成鲜明反差的是,真实消息的受众却很少能超过一千人(图 2B)。虚假消息在级联的每一个深度上都比真实消息传到更多人耳中,这意味着许多人所转发的虚假消息比真实消息更多(图 2C)。病毒式传播助长了虚假消息的传播,也就是说,虚假消息不仅仅通过传统方式传播,相反,它们更多是采用以病毒式分支流程为特征的点对点传播模式(图 2D)。 ▲真假传闻的互补累积分布函数(CCDFs) (A)深度 (B)规模 (C)最大广度 (D)结构式病毒 (E and F)真假传言级联散布到某一(E)深度与某一(F)用户数量所需要的分钟数 (G)每个深度上不同的用户数 (H)真假级联每个深度的平均广度。在(H)中,图表呈对数正态分布。标准误差集中在传言层面。(也就是说,同一个传言的不同级联集中在一起) 真实消息若要传到1500人耳中,需要花比虚假消息多5倍的时间(图 2F);若要形成一个深度为10的级联,则要花虚假消息20倍的时间(图 2E)。在每个深度的级联上,虚假消息都比真实消息散布得更广(图 2H)、被更多用户转发(图 2G).。 虚假的政治消息(图 1D)传播得更深入(图 3A)、更广泛(图 3C)、受众更多(图 3B)并且比任何其他类别的虚假消息都具有病毒性(图 3D)。虚假的政治消息的传播也更快达到一定深度(图 3E),而且,它传到20000人耳中所需要的时间,几乎是其他类别的虚假消息传到10000人耳中所需要的时间的三分之一(图 3F)。虽然其他种类的虚假消息在1-10的深度抵达同样数量的独特用户,但虚假政治消息通常会在深度上超过10的情况下达到最独特的用户(图3G)。虽然所有其他种类的虚假消息以较浅的深度传播得稍微更广,虚假政治消息则以更大的深度传递地更广,表明更流行的虚假政治消息表现出更广和更快的扩散动态(图3H)。对所有消息种类的分析表明那些关于政治、都市传说和科学的消息传播到的人数最多,然而关于政治和都市传说的消息传播得最快,并且在结构式病毒方面,它们是最具病毒性的。 ▲虚假政治和其他类型的传言级联的补充累积分布函数(CCDF) (A) 深度 (B) 规模 (C) 最大广度 (D) 结构式病毒 (E和F)虚假政治消息和其他虚假消息级联散布到某一(E)深度和某一(F)用户数量所需要的分钟数 (G) 每个深度上不同的用户数 (H) 传言级联每个深度的平均广度。在(H)中,图表呈对数正态分布。标准误差集中在传言层面。 人们可能怀疑网络中结构因素或者在级联中的用户个体性格特征解释了为什么假比真以更快的速度行进:可能那些传递虚假消息的人“追随”更多的人,并且有更多的追随者,推文发得更频繁,他们更多是那些通过“验证”的用户,或者使用推特的时间更久。但是当我们比较涉及真假传言级联的用户时,发现在每种情况下,事实与此恰恰相反:传递虚假消息的用户明显有更少的追随者(K-S test = 0.104, P ~ 0.0)、自己也追随更少的人(K-S test = 0.136, P ~ 0.0);在推特上明显不那么活跃(K-S test = 0.054, P ~ 0.0);被验证的显然更少(K-S test = 0.004, P < 0.001);使用推特的时间更少(K-S test = 0.125, P ~ 0.0)。尽管有这些差异,虚假消息的扩散依旧比事实更广更快,所以原因并非如此。 ▲估计消息的传播,真假消息的新奇性和对消息的回应中含有的情绪内容之间的相关性的模型 (A) 关于这些测量测试关于参与真假传言级联的用户的描述性统计以及关于这些度量在真假传言级联中分布差异的K-S测试。 (B) 预计用户转发传言偏好的逻辑回归模型结果作为一个多变量函数展示在左边;系数:logit系数;z,得分。 (C) 与用户转发传言推文前60天中其推特语库中所显示内容相比较,在真(绿色)假(红色)传言推文中消息唯一性(IU)、缩放的Bhattacharyya距离(BD)和K-L散度(KL)的差异。 (D) 对真(绿色)假(红色)传言的回应中含有的情绪内容,由NRC分类为七个层次。 (E) 与用户观看传言推文前60天中看到的推文语库中内容相比较,真假传言推文IU,KL和BD的平均值和方差。以及关于它们在真假传言中差异的K-S测试。 (F) 对真假传言的回应中含有的情绪内容(由NRC分类为七个层次)的平均值和方差,以及关于它们在真假传言中差异的K-S测试。所有标准误差都集中在传言层面,并且所有模型都在传言级别上通过集中稳定的标准误差进行估计。 当我们预设一个转发偏好的模型时,我们发现虚假消息被转发的可能性比真实消息多70%,即使当我们控制了账户年龄,活动水平、转发者数量和原始推文作者的关注量以及原始推文作者是否是已经验证的用户后。由于用户的特征和网络框架不能解释真假消息传播的差异,我们寻找了它们传播差异的其他解释。 一种解释来自信息理论和贝叶斯决策理论。新奇吸引人们的注意力,促进了富有成效的决策制定,并且鼓励信息分享,因为新奇更新了我们对这个世界的理解。当一条消息是新奇的,它不仅令人惊讶,而且更有价值,无论是从信息的理论前景(此处它对做决策提供了巨大的帮助)还是从社会前景(此处它传达了人们处于“知道”或者有渠道获取独特的“内部”消息的社会地位)来看。因此我们测试了虚假消息是否比真实消息更新奇以及是否推特用户更偏向于转发新奇的消息。 为了评定新奇性,我们随机选择5000个传播真假传言的用户,并在决定转发传言前60天内他们所接触的推文中随机抽取25000个样本。之后我们指定了一个LDA模型(latent Dirichlet Allocation Topic model)(其中包含200个主题,并且在1000万条英语推文中试验过)计算传言推文和用户转发传言推文前接触的所有推文间的信息距离。它显示出我们数据中的每条推文在200个主题中的概率分布。然后,通过将传言推文的主题分布和用户转发前60天中所接触到推文的主题分布加以比较,我们测量了真假传言中信息的新奇程度。我们发现,在所有新奇性度量中,虚假传言比真实消息要新奇得多,显示出明显更高的消息唯一性(K-S检验=0.457,P~0.0),Kullback-Leibler(K-L)发散性(K-S检验=0.433,P~0.0)和Bhattacharyya距离(K-S检验=0.4)。15,P~0)(类似于Hellinger距离)。最后的两个指标用于测量输入推文主题内容的概率分布和用户先前接触的推文语库之间的差异。 尽管测量中虚假传言比真实消息更新奇,但用户却未必察觉到了这点。因此,通过比较用户对真假传言回应中的情绪内容,我们评估用户对包含真假传言的信息的看法。我们使用加拿大国家委员会(NRC)编制的标准词典对回复中的情绪进行分类,得到了一个详尽的包含140000英语词汇以及它们与8种情绪(基于Plutchik的工作,基本情绪是:愤怒、恐惧、期望、信任、惊讶、悲伤、快乐、厌恶)之间联系的列表,以及 32000个推特标签与其相关情绪的列表。我们从回复推文中移除禁用词和网址后,计算了推文中出现的与8种情绪相关联的单词比重,为每个回复设立一个情绪偏向(即以上归纳的情绪之一)。我们发现虚假传言所激发的回复中表达了更多的惊讶(KS测试= 0.205,P~0.0)(证实了新奇性的假设)和厌恶(KS测试= 0.102,P~0.0),然而真实传言所激发的回复中则表达了更多的悲伤(KS测试= 0.037,P~0.0)、期望(KS测试= 0.038,P~0.0)、愉快(KS测试=0.061,P~0.0)和信任(KS测试= 0.060,P~0.0)(图4,D和F)。虚假消息回复中表达的情绪似乎显明了,除新奇之外,还有激发人们分享虚假消息的其他因素。我们不能认定新奇导致转发或者新奇是使虚假消息转发更多的唯一原因,即使我们的确发现虚假消息更新奇并且新奇的消息更可能被转发。 大量诊断统计和操作检查验证了我们的结果并证明它们的鲁棒性(译注:指算法的稳定性)。第一,由于每个真假传言都存在多层级联,因此与相同传言的级联相关联的方差和误差项将是相关的。因此,我们选择了集中稳定的标准误差,并计算了它们在传言水平上集中的所有方差。通过比较有无集中误差的分析来检测我们结果的鲁棒性,我们发现即使这种集中降低了我们估算的准确性,我们结果的方向、大小和重要性也没有改变,而且chi-square (P ~ 0.0) 和拟合优度检验(d = 3.4649×10-6,P~1.0)表明这些模型是很精确的。 第二,为了让六个组织核查推文事实,我们选择样本的限制中可能会出现选择偏好。事实核查可能会挑选某些类型的传言或许更偏向于它们。为了验证我们的分析在这一选择上的鲁棒性以及我们的结果对所有真假传言级联的普适性,我们独立检验另一个未经任何事实核查组织验证的传言级联样本。这些传言是由三个MIT和Wellesley大学的本科生查证的。自2016年起,我们训练这些学生使用我们自动传言探测算法在300万份英文推特中去检测传言。这些本科生助手们使用网上简单的搜索引擎调查了这些检测过的传言的真实性。在他们研究基础上,我们要求他们标记这些传言为真、假或者混合,并且移除掉所有以前被事实核查组织查证过的传言。我们的这些助手们独立工作且没有受到其他干涉,他们调查的13240个传言级联有90%的吻合度,达到了0.88的Fleiss’ kappa。当我们比较助手们达成一致的真假传言的传播动态时,发现与我们主数据预计的结果十分吻合。那些稳定数据中的虚假传言的深度、规模、最大广度、结构式病毒和速度,以及每个深度上的最大用户数量数值上都更大。当我们扩展到对那些仅获得了大多数人同意而不是有着一致意见的消息时,我们得到了同样的结果。 第三,尽管真假消息的传播方式的差异的确值得一探究竟,不管其中是否有机器人活动,但人们依旧可能担心我们关于人类判断的结论可能会因为我们分析中机器人的存在而脱轨。因此,在进行分析之前,我们用了一个复杂的机器检测算法来识别、移除所有的机器人。当我们把机器人的流量增加进分析之中,我们发现我们的主要结论都没有改变——在所有类别的消息中,虚假消息依旧比真实消息传播得更远、更快、更深、更广。当我们移除所有的由机器人开始的推文级联时(包括人类对机器人原始推文的转发),或者当我们用第二个独立的机器人检测算法,并且(为了证实我们分析的鲁棒性)改变算法探测的灵敏度阈值时,分析结果依然保持不变。机器人参与同时加速了真假传言的传播,它大致上同等地影响了它们的传播。这就表明虚假消息比真实消息传播得更远、更快、更深、更广的原因出于人类,而非机器人。 最后,更多对真假消息传播差异的行为解释的研究显然是必要的。尤其是需要与用户更直接的互动,通过采访、调查、临床实验甚至神经影像,对驱使人们传播真假消息的动机有更清晰的认识。在以后的工作中,我们支持人们运用这些方法或其他途径去调查驱使人们传播真假消息的因素。 虚假消息可能会导致恐怖袭击或者自然灾害期间资源的错误分配、商业投资失误和选举误导。不幸的是,即使网络虚假消息的数量明显增加,对于虚假消息传播方式和原因的科学理解目前还建立在临时的而非大规模、系统的研究上。我们对推特上传播的已验明的真假传言的分析证实:虚假消息的传递更具有渗透性,它同样推翻了关于虚假消息传播的传统观念,人们可能认为网络框架与个人偏好促进了虚假消息的传播,但是结果却恰恰相反。 尽管网络和个人因素更偏好于事实,但人们却更有可能转发虚假消息,推动虚假消息传播。此外,即使最近国会委员会就美国虚假消息问题举行的例会仍聚焦于自动机器人在传播虚假消息中扮演的角色,我们的结论也是:人类的行为比机器人更多促成真假消息传递的差异。这表明虚假消息遏制政策也应该要强调行为干涉,例如标榜、鼓励阻止假消息的传播,而不是完全集中在削减机器人。理解虚假消息如何传播只是朝着控制它迈出的第一步。我们希望我们的工作在虚假消息传播的原因、结果和可能的解决方法方面能激起的更大范围的研究。 Science 09 Mar 2018: Vol. 359, Issue 6380, pp. 1146-1151 DOI: 10.1126/science.aap9559 Lies spread faster than the truth There is worldwide concern over false news and the possibility that it can influence political, economic, and social well-being. To understand how false news spreads, Vosoughi et al. used a data set of rumor cascades on Twitter from 2006 to 2017. About 126,000 rumors were spread by ~3 million people. False news reached more people than the truth; the top 1% of false news cascades diffused to between 1000 and 100,000 people, whereas the truth rarely diffused to more than 1000 people. Falsehood also diffused faster than the truth. The degree of novelty and the emotional reactions of recipients may be responsible for the differences observed. Science, this issue p. 1146 Abstract We investigated the differential diffusion of all of the verified true and false news stories distributed on Twitter from 2006 to 2017. The data comprise ~126,000 stories tweeted by ~3 million people more than 4.5 million times. We classified news as true or false using information from six independent fact-checking organizations that exhibited 95 to 98% agreement on the classifications. Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information, and the effects were more pronounced for false political news than for false news about terrorism, natural disasters, science, urban legends, or financial information. We found that false news was more novel than true news, which suggests that people were more likely to share novel information. Whereas false stories inspired fear, disgust, and surprise in replies, true stories inspired anticipation, sadness, joy, and trust. Contrary to conventional wisdom, robots accelerated the spread of true and false news at the same rate, implying that false news spreads more than the truth because humans, not robots, are more likely to spread it. Foundational theories of decision-making (1–3), cooperation (4), communication (5), and markets (6) all view some conceptualization of truth or accuracy as central to the functioning of nearly every human endeavor. Yet, both true and false information spreads rapidly through online media. Defining what is true and false has become a common political strategy, replacing debates based on a mutually agreed on set of facts. Our economies are not immune to the spread of falsity either. False rumors have affected stock prices and the motivation for large-scale investments, for example, wiping out $130 billion in stock value after a false tweet claimed that Barack Obama was injured in an explosion (7). Indeed, our responses to everything from natural disasters (8, 9) to terrorist attacks (10) have been disrupted by the spread of false news online. New social technologies, which facilitate rapid information sharing and large-scale information cascades, can enable the spread of misinformation (i.e., information that is inaccurate or misleading). But although more and more of our access to information and news is guided by these new technologies (11), we know little about their contribution to the spread of falsity online. Though considerable attention has been paid to anecdotal analyses of the spread of false news by the media (12), there are few large-scale empirical investigations of the diffusion of misinformation or its social origins. Studies of the spread of misinformation are currently limited to analyses of small, ad hoc samples that ignore two of the most important scientific questions: How do truth and falsity diffuse differently, and what factors of human judgment explain these differences? Current work analyzes the spread of single rumors, like the discovery of the Higgs boson (13) or the Haitian earthquake of 2010 (14), and multiple rumors from a single disaster event, like the Boston Marathon bombing of 2013 (10), or it develops theoretical models of rumor diffusion (15), methods for rumor detection (16), credibility evaluation (17, 18), or interventions to curtail the spread of rumors (19). But almost no studies comprehensively evaluate differences in the spread of truth and falsity across topics or examine why false news may spread differently than the truth. For example, although Del Vicario et al. (20) and Bessi et al. (21) studied the spread of scientific and conspiracy-theory stories, they did not evaluate their veracity. Scientific and conspiracy-theory stories can both be either true or false, and they differ on stylistic dimensions that are important to their spread but orthogonal to their veracity. To understand the spread of false news, it is necessary to examine diffusion after differentiating true and false scientific stories and true and false conspiracy-theory stories and controlling for the topical and stylistic differences between the categories themselves. The only study to date that segments rumors by veracity is that of Friggeri et al. (19), who analyzed ~4000 rumors spreading on Facebook and focused more on how fact checking affects rumor propagation than on how falsity diffuses differently than the truth (22). In our current political climate and in the academic literature, a fluid terminology has arisen around “fake news,” foreign interventions in U.S. politics through social media, and our understanding of what constitutes news, fake news, false news, rumors, rumor cascades, and other related terms. Although, at one time, it may have been appropriate to think of fake news as referring to the veracity of a news story, we now believe that this phrase has been irredeemably polarized in our current political and media climate. As politicians have implemented a political strategy of labeling news sources that do not support their positions as unreliable or fake news, whereas sources that support their positions are labeled reliable or not fake, the term has lost all connection to the actual veracity of the information presented, rendering it meaningless for use in academic classification. We have therefore explicitly avoided the term fake news throughout this paper and instead use the more objectively verifiable terms “true” or “false” news. Although the terms fake news and misinformation also imply a willful distortion of the truth, we do not make any claims about the intent of the purveyors of the information in our analyses. We instead focus our attention on veracity and stories that have been verified as true or false. We also purposefully adopt a broad definition of the term news. Rather than defining what constitutes news on the basis of the institutional source of the assertions in a story, we refer to any asserted claim made on Twitter as news (we defend this decision in the supplementary materials section on “reliable sources,” section S1.2). We define news as any story or claim with an assertion in it and a rumor as the social phenomena of a news story or claim spreading or diffusing through the Twitter network. That is, rumors are inherently social and involve the sharing of claims between people. News, on the other hand, is an assertion with claims, whether it is shared or not. A rumor cascade begins on Twitter when a user makes an assertion about a topic in a tweet, which could include written text, photos, or links to articles online. Others then propagate the rumor by retweeting it. A rumor’s diffusion process can be characterized as having one or more cascades, which we define as instances of a rumor-spreading pattern that exhibit an unbroken retweet chain with a common, singular origin. For example, an individual could start a rumor cascade by tweeting a story or claim with an assertion in it, and another individual could independently start a second cascade of the same rumor (pertaining to the same story or claim) that is completely independent of the first cascade, except that it pertains to the same story or claim. If they remain independent, they represent two cascades of the same rumor. Cascades can be as small as size one (meaning no one retweeted the original tweet). The number of cascades that make up a rumor is equal to the number of times the story or claim was independently tweeted by a user (not retweeted). So, if a rumor “A” is tweeted by 10 people separately, but not retweeted, it would have 10 cascades, each of size one. Conversely, if a second rumor “B” is independently tweeted by two people and each of those two tweets is retweeted 100 times, the rumor would consist of two cascades, each of size 100. Here we investigate the differential diffusion of true, false, and mixed (partially true, partially false) news stories using a comprehensive data set of all of the fact-checked rumor cascades that spread on Twitter from its inception in 2006 to 2017. The data include ~126,000 rumor cascades spread by ~3 million people more than 4.5 million times. We sampled all rumor cascades investigated by six independent fact-checking organizations (snopes.com, politifact.com, factcheck.org, truthorfiction.com, hoax-slayer.com, and urbanlegends.about.com) by parsing the title, body, and verdict (true, false, or mixed) of each rumor investigation reported on their websites and automatically collecting the cascades corresponding to those rumors on Twitter. The result was a sample of rumor cascades whose veracity had been agreed on by these organizations between 95 and 98% of the time. We cataloged the diffusion of the rumor cascades by collecting all English-language replies to tweets that contained a link to any of the aforementioned websites from 2006 to 2017 and used optical character recognition to extract text from images where needed. For each reply tweet, we extracted the original tweet being replied to and all the retweets of the original tweet. Each retweet cascade represents a rumor propagating on Twitter that has been verified as true or false by the fact-checking organizations (see the supplementary materials for more details on cascade construction). We then quantified the cascades’ depth (the number of retweet hops from the origin tweet over time, where a hop is a retweet by a new unique user), size (the number of users involved in the cascade over time), maximum breadth (the maximum number of users involved in the cascade at any depth), and structural virality (23) (a measure that interpolates between content spread through a single, large broadcast and that which spreads through multiple generations, with any one individual directly responsible for only a fraction of the total spread) (see the supplementary materials for more detail on the measurement of rumor diffusion). As a rumor is retweeted, the depth, size, maximum breadth, and structural virality of the cascade increase (Fig. 1A). A greater fraction of false rumors experienced between 1 and 1000 cascades, whereas a greater fraction of true rumors experienced more than 1000 cascades (Fig. 1B); this was also true for rumors based on political news (Fig. 1D). The total number of false rumors peaked at the end of both 2013 and 2015 and again at the end of 2016, corresponding to the last U.S. presidential election (Fig. 1C). The data also show clear increases in the total number of false political rumors during the 2012 and 2016 U.S. presidential elections (Fig. 1E) and a spike in rumors that contained partially true and partially false information during the Russian annexation of Crimea in 2014 (Fig. 1E). Politics was the largest rumor category in our data, with ~45,000 cascades, followed by urban legends, business, terrorism, science, entertainment, and natural disasters (Fig. 1F). Fig. 1 Rumor cascades. (A) An example rumor cascade collected by our method as well as its depth, size, maximum breadth, and structural virality over time. “Nodes” are users. (B) The complementary cumulative distribution functions (CCDFs) of true, false, and mixed (partially true and partially false) cascades, measuring the fraction of rumors that exhibit a given number of cascades. (C) Quarterly counts of all true, false, and mixed rumor cascades that diffused on Twitter between 2006 and 2017, annotated with example rumors in each category. (D) The CCDFs of true, false, and mixed political cascades. (E) Quarterly counts of all true, false, and mixed political rumor cascades that diffused on Twitter between 2006 and 2017, annotated with example rumors in each category. (F) A histogram of the total number of rumor cascades in our data across the seven most frequent topical categories. When we analyzed the diffusion dynamics of true and false rumors, we found that falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information [Kolmogorov-Smirnov (K-S) tests are reported in tables S3 to S10]. A significantly greater fraction of false cascades than true cascades exceeded a depth of 10, and the top 0.01% of false cascades diffused eight hops deeper into the Twittersphere than the truth, diffusing to depths greater than 19 hops from the origin tweet (Fig. 2A). Falsehood also reached far more people than the truth. Whereas the truth rarely diffused to more than 1000 people, the top 1% of false-news cascades routinely diffused to between 1000 and 100,000 people (Fig. 2B). Falsehood reached more people at every depth of a cascade than the truth, meaning that many more people retweeted falsehood than they did the truth (Fig. 2C). The spread of falsehood was aided by its virality, meaning that falsehood did not simply spread through broadcast dynamics but rather through peer-to-peer diffusion characterized by a viral branching process (Fig. 2D). Fig. 2 Complementary cumulative distribution functions (CCDFs) of true and false rumor cascades. (A) Depth. (B) Size. (C) Maximum breadth. (D) Structural virality. (E and F) The number of minutes it takes for true and false rumor cascades to reach any (E) depth and (F) number of unique Twitter users. (G) The number of unique Twitter users reached at every depth and (H) the mean breadth of true and false rumor cascades at every depth. In (H), plot is lognormal. Standard errors were clustered at the rumor level (i.e., cascades belonging to the same rumor were clustered together; see supplementary materials for additional details). It took the truth about six times as long as falsehood to reach 1500 people (Fig. 2F) and 20 times as long as falsehood to reach a cascade depth of 10 (Fig. 2E). As the truth never diffused beyond a depth of 10, we saw that falsehood reached a depth of 19 nearly 10 times faster than the truth reached a depth of 10 (Fig. 2E). Falsehood also diffused significantly more broadly (Fig. 2H) and was retweeted by more unique users than the truth at every cascade depth (Fig. 2G). False political news (Fig. 1D) traveled deeper (Fig. 3A) and more broadly (Fig. 3C), reached more people (Fig. 3B), and was more viral than any other category of false information (Fig. 3D). False political news also diffused deeper more quickly (Fig. 3E) and reached more than 20,000 people nearly three times faster than all other types of false news reached 10,000 people (Fig. 3F). Although the other categories of false news reached about the same number of unique users at depths between 1 and 10, false political news routinely reached the most unique users at depths greater than 10 (Fig. 3G). Although all other categories of false news traveled slightly more broadly at shallower depths, false political news traveled more broadly at greater depths, indicating that more-popular false political news items exhibited broader and more-accelerated diffusion dynamics (Fig. 3H). Analysis of all news categories showed that news about politics, urban legends, and science spread to the most people, whereas news about politics and urban legends spread the fastest and were the most viral in terms of their structural virality (see fig. S11 for detailed comparisons across all topics). Fig. 3 Complementary cumulative distribution functions (CCDFs) of false political and other types of rumor cascades. (A) Depth. (B) Size. (C) Maximum breadth. (D) Structural virality. (E and F) The number of minutes it takes for false political and other false news cascades to reach any (E) depth and (F) number of unique Twitter users. (G) The number of unique Twitter users reached at every depth and (H) the mean breadth of these false rumor cascades at every depth. In (H), plot is lognormal. Standard errors were clustered at the rumor level. One might suspect that structural elements of the network or individual characteristics of the users involved in the cascades explain why falsity travels with greater velocity than the truth. Perhaps those who spread falsity “followed” more people, had more followers, tweeted more often, were more often “verified” users, or had been on Twitter longer. But when we compared users involved in true and false rumor cascades, we found that the opposite was true in every case. Users who spread false news had significantly fewer followers (K-S test = 0.104, P ~ 0.0), followed significantly fewer people (K-S test = 0.136, P ~ 0.0), were significantly less active on Twitter (K-S test = 0.054, P ~ 0.0), were verified significantly less often (K-S test = 0.004, P < 0.001), and had been on Twitter for significantly less time (K-S test = 0.125, P ~ 0.0) (Fig. 4A). Falsehood diffused farther and faster than the truth despite these differences, not because of them. Fig. 4 Models estimating correlates of news diffusion, the novelty of true and false news, and the emotional content of replies to news. (A) Descriptive statistics on users who participated in true and false rumor cascades as well as K-S tests of the differences in the distributions of these measures across true and false rumor cascades. (B) Results of a logistic regression model estimating users’ likelihood of retweeting a rumor as a function of variables shown at the left. coeff, logit coefficient; z, z score. (C) Differences in the information uniqueness (IU), scaled Bhattacharyya distance (BD), and K-L divergence (KL) of true (green) and false (red) rumor tweets compared to the corpus of prior tweets the user was exposed to in the 60 days before retweeting the rumor tweet. (D) The emotional content of replies to true (green) and false (red) rumor tweets across seven dimensions categorized by the NRC. (E) Mean and variance of the IU, KL, and BD of true and false rumor tweets compared to the corpus of prior tweets the user has seen in the 60 days before seeing the rumor tweet as well as K-S tests of their differences across true and false rumors. (F) Mean and variance of the emotional content of replies to true and false rumor tweets across seven dimensions categorized by the NRC as well as K-S tests of their differences across true and false rumors. All standard errors are clustered at the rumor level, and all models are estimated with cluster-robust standard errors at the rumor level. When we estimated a model of the likelihood of retweeting, we found that falsehoods were 70% more likely to be retweeted than the truth (Wald chi-square test, P ~ 0.0), even when controlling for the account age, activity level, and number of followers and followees of the original tweeter, as well as whether the original tweeter was a verified user (Fig. 4B). Because user characteristics and network structure could not explain the differential diffusion of truth and falsity, we sought alternative explanations for the differences in their diffusion dynamics. One alternative explanation emerges from information theory and Bayesian decision theory. Novelty attracts human attention (24), contributes to productive decision-making (25), and encourages information sharing (26) because novelty updates our understanding of the world. When information is novel, it is not only surprising, but also more valuable, both from an information theoretic perspective [in that it provides the greatest aid to decision-making (25)] and from a social perspective [in that it conveys social status on one that is “in the know” or has access to unique “inside” information (26)]. We therefore tested whether falsity was more novel than the truth and whether Twitter users were more likely to retweet information that was more novel. To assess novelty, we randomly selected ~5000 users who propagated true and false rumors and extracted a random sample of ~25,000 tweets that they were exposed to in the 60 days prior to their decision to retweet a rumor. We then specified a latent Dirichlet Allocation Topic model (27), with 200 topics and trained on 10 million English-language tweets, to calculate the information distance between the rumor tweets and all the prior tweets that users were exposed to before retweeting the rumor tweets. This generated a probability distribution over the 200 topics for each tweet in our data set. We then measured how novel the information in the true and false rumors was by comparing the topic distributions of the rumor tweets with the topic distributions of the tweets to which users were exposed in the 60 days before their retweet. We found that false rumors were significantly more novel than the truth across all novelty metrics, displaying significantly higher information uniqueness (K-S test = 0.457, P ~ 0.0) (28), Kullback-Leibler (K-L) divergence (K-S test = 0.433, P ~ 0.0) (29), and Bhattacharyya distance (K-S test = 0.415, P ~ 0.0) (which is similar to the Hellinger distance) (30). The last two metrics measure differences between probability distributions representing the topical content of the incoming tweet and the corpus of previous tweets to which users were exposed. Although false rumors were measurably more novel than true rumors, users may not have perceived them as such. We therefore assessed users’ perceptions of the information contained in true and false rumors by comparing the emotional content of replies to true and false rumors. We categorized the emotion in the replies by using the leading lexicon curated by the National Research Council Canada (NRC), which provides a comprehensive list of ~140,000 English words and their associations with eight emotions based on Plutchik’s (31) work on basic emotion—anger, fear, anticipation, trust, surprise, sadness, joy, and disgust (32)—and a list of ~32,000 Twitter hashtags and their weighted associations with the same emotions (33). We removed stop words and URLs from the reply tweets and calculated the fraction of words in the tweets that related to each of the eight emotions, creating a vector of emotion weights for each reply that summed to one across the emotions. We found that false rumors inspired replies expressing greater surprise (K-S test = 0.205, P ~ 0.0), corroborating the novelty hypothesis, and greater disgust (K-S test = 0.102, P ~ 0.0), whereas the truth inspired replies that expressed greater sadness (K-S test = 0.037, P ~ 0.0), anticipation (K-S test = 0.038, P ~ 0.0), joy (K-S test = 0.061, P ~ 0.0), and trust (K-S test = 0.060, P ~ 0.0) (Fig. 4, D and F). The emotions expressed in reply to falsehoods may illuminate additional factors, beyond novelty, that inspire people to share false news. Although we cannot claim that novelty causes retweets or that novelty is the only reason why false news is retweeted more often, we do find that false news is more novel and that novel information is more likely to be retweeted. Numerous diagnostic statistics and manipulation checks validated our results and confirmed their robustness. First, as there were multiple cascades for every true and false rumor, the variance of and error terms associated with cascades corresponding to the same rumor will be correlated. We therefore specified cluster-robust standard errors and calculated all variance statistics clustered at the rumor level. We tested the robustness of our findings to this specification by comparing analyses with and without clustered errors and found that, although clustering reduced the precision of our estimates as expected, the directions, magnitudes, and significance of our results did not change, and chi-square (P ~ 0.0) and deviance (d) goodness-of-fit tests (d = 3.4649 × 10–6, P ~ 1.0) indicate that the models are well specified (see supplementary materials for more detail). Second, a selection bias may arise from the restriction of our sample to tweets fact checked by the six organizations we relied on. Fact checking may select certain types of rumors or draw additional attention to them. To validate the robustness of our analysis to this selection and the generalizability of our results to all true and false rumor cascades, we independently verified a second sample of rumor cascades that were not verified by any fact-checking organization. These rumors were fact checked by three undergraduate students at Massachusetts Institute of Technology (MIT) and Wellesley College. We trained the students to detect and investigate rumors with our automated rumor-detection algorithm running on 3 million English-language tweets from 2016 (34). The undergraduate annotators investigated the veracity of the detected rumors using simple search queries on the web. We asked them to label the rumors as true, false, or mixed on the basis of their research and to discard all rumors previously investigated by one of the fact-checking organizations. The annotators, who worked independently and were not aware of one another, agreed on the veracity of 90% of the 13,240 rumor cascades that they investigated and achieved a Fleiss’ kappa of 0.88. When we compared the diffusion dynamics of the true and false rumors that the annotators agreed on, we found results nearly identical to those estimated with our main data set (see fig. S17). False rumors in the robustness data set had greater depth (K-S test = 0.139, P ~ 0.0), size (K-S test = 0.131, P ~ 0.0), maximum breadth (K-S test = 0.139, P ~ 0.0), structural virality (K-S test = 0.066, P ~ 0.0), and speed (fig. S17) and a greater number of unique users at each depth (fig. S17). When we broadened the analysis to include majority-rule labeling, rather than unanimity, we again found the same results (see supplementary materials for results using majority-rule labeling). Third, although the differential diffusion of truth and falsity is interesting with or without robot, or bot, activity, one may worry that our conclusions about human judgment may be biased by the presence of bots in our analysis. We therefore used a sophisticated bot-detection algorithm (35) to identify and remove all bots before running the analysis. When we added bot traffic back into the analysis, we found that none of our main conclusions changed—false news still spread farther, faster, deeper, and more broadly than the truth in all categories of information. The results remained the same when we removed all tweet cascades started by bots, including human retweets of original bot tweets (see supplementary materials, section S8.3) and when we used a second, independent bot-detection algorithm (see supplementary materials, section S8.3.5) and varied the algorithm’s sensitivity threshold to verify the robustness of our analysis (see supplementary materials, section S8.3.4). Although the inclusion of bots, as measured by the two state-of-the-art bot-detection algorithms we used in our analysis, accelerated the spread of both true and false news, it affected their spread roughly equally. This suggests that false news spreads farther, faster, deeper, and more broadly than the truth because humans, not robots, are more likely to spread it. Finally, more research on the behavioral explanations of differences in the diffusion of true and false news is clearly warranted. In particular, more robust identification of the factors of human judgment that drive the spread of true and false news online requires more direct interaction with users through interviews, surveys, lab experiments, and even neuroimaging. We encourage these and other approaches to the investigation of the factors of human judgment that drive the spread of true and false news in future work. False news can drive the misallocation of resources during terror attacks and natural disasters, the misalignment of business investments, and misinformed elections. Unfortunately, although the amount of false news online is clearly increasing (Fig. 1, C and E), the scientific understanding of how and why false news spreads is currently based on ad hoc rather than large-scale systematic analyses. Our analysis of all the verified true and false rumors that spread on Twitter confirms that false news spreads more pervasively than the truth online. It also overturns conventional wisdom about how false news spreads. Though one might expect network structure and individual characteristics of spreaders to favor and promote false news, the opposite is true. The greater likelihood of people to retweet falsity more than the truth is what drives the spread of false news, despite network and individual factors that favor the truth. Furthermore, although recent testimony before congressional committees on misinformation in the United States has focused on the role of bots in spreading false news (36), we conclude that human behavior contributes more to the differential spread of falsity and truth than automated robots do. This implies that misinformation-containment policies should also emphasize behavioral interventions, like labeling and incentives to dissuade the spread of misinformation, rather than focusing exclusively on curtailing bots. Understanding how false news spreads is the first step toward containing it. We hope our work inspires more large-scale research into the causes and consequences of the spread of false news as well as its potential cures. Supplementary Materials www.sciencemag.org/content/359/6380/1146/suppl/DC1 Materials and Methods Figs. S1 to S20 Tables S1 to S39 References (37–75) http://www.sciencemag.org/about/science-licenses-journal-article-reuse This is an article distributed under the terms of the Science Journals Default License. Acknowledgments: We are indebted to Twitter for providing funding and access to the data. We are also grateful to members of the MIT research community for invaluable discussions. The research was approved by the MIT institutional review board. The analysis code is freely available at https://goo.gl/forms/AKIlZujpexhN7fY33. The entire data set is also available, from the same link, upon signing an access agreement stating that (i) you shall only use the data set for the purpose of validating the results of the MIT study and for no other purpose; (ii) you shall not attempt to identify, reidentify, or otherwise deanonymize the data set; and (iii) you shall not further share, distribute, publish, or otherwise disseminate the data set. Those who wish to use the data for any other purposes can contact and make a separate agreement with Twitter. 编辑:吴悠
|