声明: 本站全部内容源自互联网,不进行任何盈利行为

仅做 整合 / 美化 处理

首页: https://dream-plan.cn

【TED】如何解读基因组并组装人类

 

For the next 16 minutes, I'm going to take you on a journey 接下来的一刻钟,我要带大家踏上一段旅程 that is probably the biggest dream of humanity: 这大概是全人类的终极梦想—— to understand the code of life. 解读生命的密码! So for me, everything started many, many years ago 我的经历开始于很多很多年以前, when I met the first 3D printer. 那时我遇到了第一台3D打印机。 The concept was fascinating. 3D打印真是个非常赞的概念 A 3D printer needs three elements: 它需要三个要素: a bit of information, some raw material, some energy, 少量的信息,一些原材料,再加上点能量 and it can produce any object that was not there before. 就能制造出以前从没存在过的任何东西。 I was doing physics, I was coming back home 当时我正在研究物理学 and I realized that I actually always knew a 3D printer. 有天我回到家,突然意识到我家里就有台3D打印机 And everyone does. 而且每人家里都有一台 It was my mom. 就是我妈妈。 (Laughter) 我妈妈用这三个要素: My mom takes three elements: 少量的信息—— a bit of information, which is between my father and my mom in this case, 来自我爸和我妈的共同投入 raw elements and energy in the same media, that is food, 原材料和能量的共同来源——食物 and after several months, produces me. 历时几个月,制造出了我 And I was not existent before. 而我以前从来没有存在过! So apart from the shock of my mom discovering that she was a 3D printer, 除了震惊的发现我妈其实是台3D打印机 I immediately got mesmerized by that piece, 我还立即被另一个部分吸引了 the first one, the information. 第一个要素,信息—— What amount of information does it take 到底需要多少信息 to build and assemble a human? 才能制造和组装一个人呢? Is it much? Is it little? 是要很多?还是很少? How many thumb drives can you fill? 要用多少个U盘去储存? Well, I was studying physics at the beginning 我最开始是学物理的, and I took this approximation of a human as a gigantic Lego piece. 我想如果把人看成是一个巨型的乐高玩具 So, imagine that the building blocks are little atoms 小的乐高模块就像是原子—— and there is a hydrogen here, a carbon here, a nitrogen here. 这里有氢原子,这边有碳原子,上面这有氮原子。 So in the first approximation, 按照最初的这个设定 if I can list the number of atoms that compose a human being, 如果能够列出组成人类的所有原子 I can build it. 应该就能组装出一个人。 Now, you can run some numbers 大致计算一下 and that happens to be quite an astonishing number. 得到的结果非常惊人。 So the number of atoms, 所需要的原子的总数, the file that I will save in my thumb drive to assemble a little baby, 全部存到U盘里面——即便是组装一个小婴儿 will actually fill an entire Titanic of thumb drives -- 用掉的U盘就能装满整个泰坦尼克号 multiplied 2,000 times. 再乘以2000倍... This is the miracle of life. 这就是生命的奇迹。 Every time you see from now on a pregnant lady, 现在你再看到一个孕妇 she's assembling the biggest amount of information that you will ever encounter. 她正在组装你能见到的最大量的信息 Forget big data, forget anything you heard of. 不要谈大数据,不要谈以前听说过的数字 This is the biggest amount of information that exists. 这就是现存的,最最大量的信息。 (Applause) (掌声) But nature, fortunately, is much smarter than a young physicist, 但是...... 好在大自然比一个年轻的物理学家要聪明多了。 and in four billion years, managed to pack this information in a small crystal we call DNA. 在四十亿年的进化过程中这些信息被压缩在叫做DNA的小晶体当中。 We met it for the first time in 1950 when Rosalind Franklin, 在1950年代我们第一次知道了DNA an amazing scientist, a woman, 那时一位杰出的女科学家Rosalind Franklin took a picture of it. 给DNA拍了张照 But it took us more than 40 years to finally poke inside a human cell, take out this crystal, 但我们花了超过40年的时间,才最终能够从人类细胞中提取这种晶体, unroll it, and read it for the first time. 展开来,第一次去阅读它。 The code comes out to be a fairly simple alphabet, 这个遗传密码由简单的字母表组成, four letters: A, T, C and G. 四个字母,A,T,C和G (碱基)。 And to build a human, you need three billion of them. 要组装一个人,需要30亿个字母。 Three billion. How many are three billion? 30亿....30亿是多少? It doesn't really make any sense as a number, right? 光这么说大家可能都没概念, So I was thinking how I could explain myself better 我在想怎么表达才能让人更清楚, about how big and enormous this code is. 这些遗传密码的数量到底有多庞大。 But there is -- I mean, I'm going to have some help, 所以...我需要点帮助... and the best person to help me introduce the code 最合适来帮我介绍遗传密码的人, is actually the first man to sequence it, Dr. Craig Venter. 就是第一位进行人类基因组测序的人,Craig Venter 博士。 So welcome onstage, Dr. Craig Venter. 我们欢迎Craig Venter博士到台上来—— (Applause) (掌声) Not the man in the flesh, 不是他本人—— but for the first time in history, this is the genome of a specific human, 但这是史上第一次,一个人的基因组 printed page-by-page, letter-by-letter: 被一页一页,一个字母一个字母的打印在纸上—— 262,000 pages of information, 450 kilograms, 总共26万2千页,450千克, shipped from the United States to Canada 从美国运到加拿大 thanks to Bruno Bowden, Lulu.com, a start-up, did everything. 感谢Bruno Bowden还有 Lulu.com—— It was an amazing feat. 他们负责完成了这一切,一项壮举。 But this is the visual perception of what is the code of life. 这些就是生命密码给人最直观的视觉感受。 And now, for the first time, I can do something fun. 现在我可以来玩点有趣的—— I can actually poke inside it and read. 从这里面挑一段来读一读。 So let me take an interesting book ... like this one. 我来找一本有意思的...比如这一本... I have an annotation; it's a fairly big book. 我放了书签在里面,这书太厚了... So just to let you see what is the code of life. 给你们看一下,生命的密码长什么样子 Thousands and thousands and thousands and millions of letters. 成百上千...成千上万...上百万的字母... And they apparently make sense. 它们当然都有意义。 Let's get to a specific part. 让我来找一段特殊的 Let me read it to you: 读给你们听... "AAG, AAT, ATA." "AAG, AAT, ATA" To you it sounds like mute letters, 你们可能觉得像是听天书, but this sequence gives the color of the eyes to Craig. 但这段序列决定了Craig眼睛的颜色。 I'll show you another part of the book. 在看看另外一段... This is actually a little more complicated. 这一段稍微复杂一些... Chromosome 14, book 132: 第14号染色体,书本编号132... (Laughter) (笑声) As you might expect. 你们想象到了哦... (Laughter) (笑声) "ATT, CTT, GATT." "ATT, CTT, GATT" This human is lucky, 这个人很幸运, because if you miss just two letters in this position -- 因为如果他在这个位点上少了2个字母, two letters of our three billion -- 30亿中的2个... he will be condemned to a terrible disease: 他就会患上一种非常可怕的疾病—— cystic fibrosis. 囊肿性纤维化(cystic fibrosis) We have no cure for it, we don't know how to solve it, 目前没有治疗的方法,这是绝症, and it's just two letters of difference from what we are. 仅仅是2个字母的区别。 A wonderful book, a mighty book, 这是一部鸿篇巨著, a mighty book that helped me understand 它帮助我理解,也能让你们看到 and show you something quite remarkable. 一件更加另人叹为观止的事。 Every one of you -- what makes me, me and you, you -- 我们中的每一个人,是什么让我成为我,让你成为你... is just about five million of these, 大概只占这其中的500万... half a book. 只有半本书... For the rest, we are all absolutely identical. 所有剩下的,我们完全一模一样。 Five hundred pages is the miracle of life that you are. 500页,涵盖了你的生命奇迹; The rest, we all share it. 余下的,我们全都一样。 So think about that again when we think that we are different. 讨论人与人差异的时候反思一下, This is the amount that we share. 我们有这么多共通的东西。 So now that I have your attention, 现在我已经引起了你们的兴趣, the next question is: 下一步就是: How do I read it? 怎么去读取这些信息? How do I make sense out of it? 怎么理解和运用它们? Well, for however good you can be at assembling Swedish furniture, 不管你在组装宜家家居上有多在行... this instruction manual is nothing you can crack in your life. 这么长的说明书...基本是不可能完成的任务 And so, in 2014, two famous TEDsters, 2014年,两位著名的TED参加者 Peter Diamandis and Craig Venter himself, decided to assemble a new company. Peter Diamandis 和 Craig Venter 决定成立一个新公司 Human Longevity was born, 人类长寿公司(Human Longevity, Inc.)诞生了。 with one mission: 唯一的任务—— trying everything we can try and learning everything we can learn from these books, 竭尽全力,穷尽其学的研究这些书目 with one target -- 只为达到一个目的: making real the dream of personalized medicine, 让个人化医疗成为现实。 understanding what things should be done to have better health 怎么做才能提高人类健康水平 and what are the secrets in these books. 了解这些书目背后的秘密。 An amazing team, 40 data scientists and many, many more people, 一个强大的团队,拥有40位数据分析人员还有很多其他的人力支持 a pleasure to work with. 和他们一起工作十分愉快。 The concept is actually very simple. 实际上工作流程不很复杂 We're going to use a technology called machine learning. 我们用一种叫做机器学习的方法。 On one side, we have genomes -- thousands of them. 一方面,我们有几千个基因组; On the other side, we collected the biggest database of human beings: 另一边我们建立一个超大的人类信息数据库: phenotypes, 3D scan, NMR -- everything you can think of. 性状,3D扫描,核磁共振,所有能想到的 Inside there, on these two opposite sides, 在这两个端点之间, there is the secret of translation. 有神秘的翻译在进行。 And in the middle, we build a machine. 我们在中间建了一个机器, We build a machine and we train a machine -- 建好之后训练这台机器—— well, not exactly one machine, many, many machines -- 实际上不只一台机器,而是很多台... to try to understand and translate the genome in a phenotype. 试图去理解基因组并把它翻译成性状。 What are those letters, and what do they do? 有哪些字母——它们控制什么性状—— It's an approach that can be used for everything, 这是普适的方法,可以用在所有问题上, but using it in genomics is particularly complicated. 但用在基因组学上异常的复杂。 Little by little we grew and we wanted to build different challenges. 一点一点有了进展,我们再尝试更有挑战性的东西 We started from the beginning, from common traits. 最开始我们从常见的特征下手, Common traits are comfortable because they are common, 常见特征最容易因为它们太常见了, everyone has them. 每个人都有。 So we started to ask our questions: 我们开始提出如下问题: Can we predict height? 能预测身高吗? Can we read the books and predict your height? 能不能根据这些信息预测身高? Well, we actually can, with five centimeters of precision. 可以,在5厘米的误差范围以内。 BMI is fairly connected to your lifestyle, BMI 主要跟生活习惯有关, but we still can, we get in the ballpark, eight kilograms of precision. 但我们仍然能预测得差不多,8千克上下的误差。 Can we predict eye color? 眼睛的颜色能不能预测? Yeah, we can. Eighty percent accuracy. 可以,80%准确率。 Can we predict skin color? 皮肤颜色? Yeah we can, 80 percent accuracy. 可以,80%准确。 Can we predict age? 年龄? We can, because apparently, the code changes during your life. 可以,因为很明显基因随着年龄产生变化。 It gets shorter, you lose pieces, it gets insertions. DNA 会变短,缺失一些片段,插入另外一些片段 We read the signals, and we make a model. 我们读取这些信号,然后建立模型。 Now, an interesting challenge: 现在来个有意思点的挑战: Can we predict a human face? 我们能不能预测人的面孔? It's a little complicated, 这个略有点复杂, because a human face is scattered among millions of these letters. 因为有几百万个碱基都对人脸产生影响。 And a human face is not a very well-defined object. 而且人脸并不是一个构造十分精准的物体。 So, we had to build an entire tier of it 所以必须要建立一整个单独的模块, to learn and teach a machine what a face is, 给机器去训练和学习人脸是什么, and embed and compress it. 再把这个模块压缩整合进去。 And if you're comfortable with machine learning, 如果你对机器学习有点概念的话, you understand what the challenge is here. 就能够想象这个挑战是有多大。 Now, after 15 years -- 15 years after we read the first sequence -- 现在15年过去了——15年前我们读取第一条序列 this October, we started to see some signals. ——今年10月,我们总算有了些进展, And it was a very emotional moment. 当时还是很激动人心的。 What you see here is a subject coming in our lab. This is a face for us. 这是我们的一个测试对象,一张人的脸—— So we take the real face of a subject, we reduce the complexity, 我们要对测试对象的面孔进行简化, because not everything is in your face -- 因为并不是所有的特征都是面孔的一部分—— lots of features and defects and asymmetries come from your life. 很多特点、缺陷和不对称是生活的痕迹。 We symmetrize the face, and we run our algorithm. 把面孔调整对称之后,跟我们运算的结果比较。 The results that I show you right now, this is the prediction we have from the blood. 现在给你们看,我们根据血液样本生成的预测。 (Applause) (掌声) Wait a second. 等一下—— In these seconds, your eyes are watching, left and right, left and right, 你们的眼睛正在左右两边交替看, and your brain wants those pictures to be identical. 大脑希望两幅图是一模一样的。 So I ask you to do another exercise, to be honest. 我其实想请大家反过来, Please search for the differences, which are many. 找找两幅图的不同点,其实非常多。 The biggest amount of signal comes from gender, 性别提供最多的信息, then there is age, BMI, the ethnicity component of a human. 接下来是年龄,BMI(体质指数),种族; And scaling up over that signal is much more complicated. 再考虑更多因素会变得更加复杂。 But what you see here, even in the differences, 但是这样的结果,即便有很多不同, lets you understand that we are in the right ballpark, 表示我们已经接近了, that we are getting closer. And it's already giving you some emotions. 正在逐渐靠得更近——而且这已经能够鼓舞人心了 This is another subject that comes in place, 这是另外一个测试对象, and this is a prediction. 这边是预测结果。 A little smaller face, we didn't get the complete cranial structure, 脸小了一点,完整的颅骨结构没预测到。 but still, it's in the ballpark. 但至少像那么回事。 This is a subject that comes in our lab, 这是又一个测试对象, and this is the prediction. 这是预测结果。 So these people have never been seen in the training of the machine. 这些面孔在训练机器的时候是没有用过的, These are the so-called "held-out" set. 就是所谓的随机测试组。 But these are people that you will probably never believe. 并且你们不认识这些人,可能说服力不太够。 We're publishing everything in a scientific publication, 我们在学术期刊上发表了这些结果, you can read it. 你们可以去读一下。 But since we are onstage, Chris challenged me. 但既然我们在台上,Chris给我出了个点子, I probably exposed myself and tried to predict someone that you might recognize. 我可以挑战一下,尝试预测一个你们都认识的人。 So, in this vial of blood -- and believe me, you have no idea 这里有管血液——你们很难想象 what we had to do to have this blood now, here -- 我们为了带一管血液到这里花了多少工夫... in this vial of blood is the amount of biological information that we need to do a full genome sequence. 这支试管里的血液足够完成一次全基因组测序 We just need this amount. 只需要这么多。 We ran this sequence, and I'm going to do it with you. 完成了测序,下面我们一条条来看—— And we start to layer up all the understanding we have. 我们综合了所有已知的信息—— In the vial of blood, we predicted he's a male. 从血液测试的结果,我们预测这是一名男性, And the subject is a male. 被试是男性。 We predict that he's a meter and 76 cm. 预测他身高1米76, The subject is a meter and 77 cm. 被试身高1米77。 So, we predicted that he's 76; the subject is 82. 预测他体重76kg,被试是82kg; We predict his age, 38. 我们还预测了年龄,38岁 The subject is 35. 被试实际是35岁。 We predict his eye color. Too dark. 预测了眼睛的颜色,有点偏深了; We predict his skin color. 预测他的皮肤颜色, We are almost there. 基本上准确。 That's his face. 这是他的面孔... Now, the reveal moment: 现在到了揭晓的时刻: the subject is this person. 被试对象是这个人。 (Laughter) (笑声) And I did it intentionally. 我是有意拿自己做测试的, I am a very particular and peculiar ethnicity. 我属于一个特别又特殊的种族, Southern European, Italians -- they never fit in models. 南欧人,意大利人——从来都不符合模型预测。 And it's particular -- that ethnicity is a complex corner case for our model. 而且这一种族在模型里是一个复杂的边界情况。 But there is another point. 但还有另一个重点—— So, one of the things that we use a lot to recognize people 最常用的来辨识人的方法, will never be written in the genome. 不是由基因组编译的。 It's our free will, it's how I look. 是人们的自由意志——我想让自己看起来怎么样, Not my haircut in this case, but my beard cut. 虽然我的发型不是我自己决定的,但胡子是的。 So I'm going to show you, I'm going to, in this case, transfer it -- 下面我们来看一下—— and this is nothing more than Photoshop, no modeling -- 单纯的用photoshop,不用建模—— the beard on the subject. 把胡子加上去。 And immediately, we get much, much better in the feeling. 是不是立即觉得变得很相像了。 So, why do we do this? 那么,我们为什么要研究这些? We certainly don't do it for predicting height 当然不是为了预测身高, or taking a beautiful picture out of your blood. 或者是根据血液样本得到一张美照; We do it because the same technology and the same approach, 我们研究是因为同样的技术和手段—— the machine learning of this code, 对基因组的机器学习, is helping us to understand how we work, 能帮助我们了解人类自身, how your body works, how your body ages, 你的身体怎么运作,身体如何老化, how disease generates in your body, 疾病是如何产生的, how your cancer grows and develops, 癌症是怎么出现和恶化的; how drugs work 药物如何起作用—— and if they work on your body. 药物是不是能够对你有效。 This is a huge challenge. 这是一个巨大的挑战, This is a challenge that we share with thousands of other researchers around the world. 而且是一个全球的科学家都面临的挑战 It's called personalized medicine. ——个性化医疗。 It's the ability to move from a statistical approach 从只能借助统计学方法—— where you're a dot in the ocean, 每个人都只是沧海一粟—— to a personalized approach, 到能够实现有针对性的治疗, where we read all these books 通过解码这些基因信息, and we get an understanding of exactly how you are. 我们能够彻底了解每一个人。 But it is a particularly complicated challenge, 但这是一项异常复杂的挑战, because of all these books, as of today, 因为到目前为止在这么庞大的基因组信息中, we just know probably two percent: 我们大概只了解2%: four books of more than 175. 175本书里的4本... And this is not the topic of my talk, 当然这不是我今天演讲的主题, because we will learn more. 因为我们会进步,会了解更多—— There are the best minds in the world on this topic. 有很多顶尖的人才在从事这项工作。 The prediction will get better, the model will get more precise. 预测能力会提升,模型会更准确。 And the more we learn, 随着了解的逐渐深入, the more we will be confronted with decisions 我们需要做的决定会越来越多, that we never had to face before 而且是一些从前没有想象过的决定—— about life, about death, about parenting. 关于生,关于死,关于子孙后代... So, we are touching the very inner detail on how life works. 所以我们在此的讨论,涉及生命最本质的东西, And it's a revolution that cannot be confined in the domain of science or technology. 这些改变不只是在科学和技术层面。 This must be a global conversation. 我们必须要有全球性的对话, We must start to think of the future we're building as a humanity. 必须要为全人类的未来设想。 We need to interact with creatives, with artists, with philosophers, 我们需要和创新人才、艺术家、哲学家交流, with politicians. 还需要政治家的参与。 Everyone is involved, because it's the future of our species. 每个人都身在其中,因为这关乎人类的未来。 Without fear, but with the understanding 不需要惊慌—— that the decisions that we make in the next year 但必须了解我们现在做出的每一项决定, will change the course of history forever. 都会彻底改变历史。 Thank you. 谢谢。 (Applause) (持久的掌声)

萌ICP备20223985号