bioinformatics

back to index

124 results

Data Mining: Concepts and Techniques: Concepts and Techniques
by Jiawei Han , Micheline Kamber and Jian Pei
Published 21 Jun 2011

For example, we may find a group of genes that express themselves similarly, which is highly interesting in bioinformatics, such as in finding pathways. ■ When analyzing in the sample/condition dimension, we treat each sample/condition as an object and treat the genes as attributes. In this way, we may find patterns of samples/conditions, or cluster samples/conditions into groups. For example, we may find the differences in gene expression by comparing a group of tumor samples and nontumor samples. Gene expression Gene expression matrices are popular in bioinformatics research and development. For example, an important task is to classify a new gene using the expression data of the gene and that of other genes in known classes.

Every enterprise benefits from collecting and analyzing its data: Hospitals can spot trends and anomalies in their patient records, search engines can do better ranking and ad placement, and environmental and public health agencies can spot patterns and abnormalities in their data. The list continues, with cybersecurity and computer network intrusion detection; monitoring of the energy consumption of household appliances; pattern analysis in bioinformatics and pharmaceutical data; financial and business intelligence data; spotting trends in blogs, Twitter, and many more. Storage is inexpensive and getting even less so, as are data sensors. Thus, collecting and storing data is easier than ever before. The problem then becomes how to analyze the data.

Web mining can help us learn about the distribution of information on the WWW in general, characterize and classify web pages, and uncover web dynamics and the association and other relationships among different web pages, users, communities, and web-based activities. It is important to keep in mind that, in many applications, multiple types of data are present. For example, in web mining, there often exist text data and multimedia data (e.g., pictures and videos) on web pages, graph data like web graphs, and map data on some web sites. In bioinformatics, genomic sequences, biological networks, and 3-D spatial structures of genomes may coexist for certain biological objects. Mining multiple data sources of complex data often leads to fruitful findings due to the mutual enhancement and consolidation of such multiple sources. On the other hand, it is also challenging because of the difficulties in data cleaning and data integration, as well as the complex interactions among the multiple sources of such data.

pages: 362 words: 104,308

Forty Signs of Rain
by Kim Stanley Robinson
Published 29 May 2004

It was not a matter of her being warm and fuzzy, as you might expect from the usual characterizations of feminine thought—on the contrary, Anna’s scientific work (she still often coauthored papers in statistics, despite her bureaucratic load) often displayed a finicky perfectionism that made her a very meticulous scientist, a first-rate statistician—smart, quick, competent in a range of fields and really excellent in more than one. As good a scientist as one could find for the rather odd job of running the Bioinformatics Division at NSF, good almost to the point of exaggeration—too precise, too interrogatory—it kept her from pursuing a course of action with drive. Then again, at NSF maybe that was an advantage. In any case she was so intense about it. A kind of Puritan of science, rational to an extreme. And yet of course at the same time that was all such a front, as with the early Puritans; the hyperrational coexisted in her with all the emotional openness, intensity, and variability that was the American female interactional paradigm and social role.

Anna had been watching him, and now she said, “I suppose it is a bit of a rat race.” “Well, no more than anywhere else. In fact if I were home it’d probably be worse.” They laughed. “And you have your journal work too.” “That’s right.” Frank waved at the piles of typescripts: three stacks for Review of Bioinformatics, two for The Journal of Sociobiology. “Always behind. Luckily the other editors are better at keeping up.” Anna nodded. Editing a journal was a privilege and an honor, even though usually unpaid—indeed, one often had to continue to subscribe to a journal just to get copies of what one had edited.

Frank scrolled down the pages of the application with practiced speed. Yann Pierzinski, Ph.D. in biomath, Caltech. Still doing postdoc work with his thesis advisor there, a man Frank had come to consider a bit of a credit hog, if not worse. It was interesting, then, that Pierzinski had gone down to Torrey Pines to work on a temporary contract, for a bioinformatics researcher whom Frank didn’t know. Perhaps that had been a bid to escape the advisor. But now he was back. Frank dug into the substantive part of the proposal. The algorithm set was one Pierzinski had been working on even back in his dissertation. Chemical mechanics of protein creation as a sort of natural algorithm, in effect.

pages: 239 words: 45,926

As the Future Catches You: How Genomics & Other Forces Are Changing Your Work, Health & Wealth
by Juan Enriquez
Published 15 Feb 2001

The machines and technology coming out of the digital and genetic revolutions may allow people to leverage their mental capacity a thousand … A million … Or a trillionfold. Biology is now driven by applied math … statistics … computer science … robotics … The world’s best programmers are increasingly gravitating toward biology … You will be hearing a lot about two new fields in the coming months … Bioinformatics and Biocomputing. You rarely see bioinformaticians … They are too valuable to companies and universities. Things are moving too fast … And they are too passionate about what they do … To spend a lot of time giving speeches and interviews. But if you go into the bowels of Harvard Medical School … And are able to find the genetics department inside the Warren Alpert Building … (A significant test of intelligence in and of itself … Start by finding the staircase inspired by the double helix … and go past the bathrooms marked XX and XY …) There you can find a small den where George Church hangs out, surrounded by computers.

This is ground zero for a wonderful commune of engineers, physicists, molecular biologists, and physicians …3 And some of the world’s smartest graduate students … Who are trying to make sense of the 100 terabytes of data that come out of gene labs yearly … A task equivalent to trying to sort and use a million new encyclopedias … every year.4 You can’t build enough “wet” labs (labs full of beakers, cells, chemicals, refrigerators) to process and investigate all the opportunities this scale of data generates. The only way for Church & Co. to succeed … Is to force biology to divide … Into theoretical and applied disciplines. Which is why he is one of the founders of bioinformatics … A new discipline that attempts to predict what biologists will find … When they carry out wet-lab experiments in a few months, years, or decades. In a sense, this mirrors Craig Venter’s efforts at The Institute for Genomic Research and Celera. Celera and Church’s labs are information centers … not traditional labs … And a few smart people are going to be able to do … A lot of biology … Very quickly.

Countries, regions, governments, and companies that assume they are … And will remain … Dominant … Soon lose their competitive edge. (Particularly those whose leadership ignores or disparages emerging technologies … Remember those old saws: The sun never sets on the British Empire … Vive La France! … All roads lead to Rome … China, the Middle Kingdom.) Which is one of the reasons bioinformatics is so important … And why you should pay attention. What we are seeing is just the beginning of the digital-genomics convergence. When you think of a DNA molecule and its ability to … Carry our complete life code within each of our cells … Accurately copy the code … Billions of times per day … Read and execute life’s functions … Transmit this information across generations … It becomes clear that … The world’s most powerful and compact coding and information-processing system … is a genome.

pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money
by Frank J. Ohlhorst
Published 28 Nov 2012

Much of the disruption is fed by improved instrument and sensor technology; for instance, the Large Synoptic Survey Telescope has a 3.2-gigabyte pixel camera and generates over 6 petabytes of image data per year. It is the platform of Big Data that is making such lofty goals attainable. The validation of Big Data analytics can be illustrated by advances in science. The biomedical corporation Bioinformatics recently announced that it has reduced the time it takes to sequence a genome from years to days, and it has also reduced the cost, so it will be feasible to sequence an individual’s genome for $1,000, paving the way for improved diagnostics and personalized medicine. The financial sector has seen how Big Data and its associated analytics can have a disruptive impact on business.

Big Data has transformed astronomy from a field in which taking pictures of the sky was a large part of the job to one in which the pictures are all in a database already and the astronomer’s task is to find interesting objects and phenomena in the database. Transformation is taking place in the biological arena as well. There is now a well-established tradition of depositing scientific data into a public repository and of creating public databases for use by other scientists. In fact, there is an entire discipline of bioinformatics that is largely devoted to the maintenance and analysis of such data. As technology advances, particularly with the advent of next-generation sequencing, the size and number of available experimental data sets are increasing exponentially. Big Data has the potential to revolutionize more than just research; the analytics process has started to transform education as well.

The data preparation challenge even extends to analysis that uses only a single data set. Here there is still the issue of suitable database design, further complicated by the many alternative ways in which to store the information. Particular database designs may have certain advantages over others for analytical purposes. A case in point is the variety in the structure of bioinformatics databases, in which information on substantially similar entities, such as genes, is inherently different but is represented with the same data elements. Examples like these clearly indicate that database design is an artistic endeavor that has to be carefully executed in the enterprise context by professionals.

pages: 565 words: 151,129

The Zero Marginal Cost Society: The Internet of Things, the Collaborative Commons, and the Eclipse of Capitalism
by Jeremy Rifkin
Published 31 Mar 2014

Reducing the cost of electricity in the management of data centers goes hand in hand with cutting the cost of storing data, an ever larger part of the data-management process. And the sheer volume of data is mushrooming faster than the capacity of hard drives to save it. Researchers are just beginning to experiment with a new way of storing data that could eventually drop the marginal cost to near zero. In January 2013 scientists at the European Bioinformatics Institute in Cambridge, England, announced a revolutionary new method of storing massive electronic data by embedding it in synthetic DNA. Two researchers, Nick Goldman and Ewan Birney, converted text from five computer files—which included an MP3 recording of Martin Luther King Jr.’s “I Have a Dream” speech, a paper by James Watson and Francis Crick describing the structure of DNA, and all of Shakespeare’s sonnets and plays—and converted the ones and zeros of digital information into the letters that make up the alphabet of the DNA code.

Researchers add that DNA information can be preserved for centuries, as long as it is kept in a dark, cool environment.65 At this early stage of development, the cost of reading the code is high and the time it takes to decode information is substantial. Researchers, however, are reasonably confident that an exponential rate of change in bioinformatics will drive the marginal cost to near zero over the next several decades. A near zero marginal cost communication/energy infrastructure for the Collaborative Age is now within sight. The technology needed to make it happen is already being deployed. At present, it’s all about scaling up and building out.

Its network of thousands of scientists and plant breeders is continually searching for heirloom and wild seeds, growing them out to increase seed stock, and ferrying samples to the vault for long-term storage.32 In 2010, the trust launched a global program to locate, catalog, and preserve the wild relatives of the 22 major food crops humanity relies on for survival. The intensification of genetic-Commons advocacy comes at a time when new IT and computing technology is speeding up genetic research. The new field of bioinformatics has fundamentally altered the nature of biological research just as IT, computing, and Internet technology did in the fields of renewable-energy generation and 3D printing. According to research compiled by the National Human Genome Research Institute, gene-sequencing costs are plummeting at a rate that exceeds the exponential curves of Moore’s Law in computing power.33 Dr.

pages: 287 words: 86,919

Protocol: how control exists after decentralization
by Alexander R. Galloway
Published 1 Apr 2004

Isomorphic Biopolitics As a final comment, it is worthwhile to note that the concept of “protocol” is related to a biopolitical production, a production of the possibility for experience in control societies. It is in this sense that Protocol is doubly materialist—in the sense of networked bodies inscribed by informatics, and Foreword: Protocol Is as Protocol Does xix in the sense of this bio-informatic network producing the conditions of experience. The biopolitical dimension of protocol is one of the parts of this book that opens onto future challenges. As the biological and life sciences become more and more integrated with computer and networking technology, the familiar line between the body and technology, between biologies and machines, begins to undergo a set of transformations.

Individual subjects are not only civil subjects, but also medical subjects for a medicine increasingly influenced by genetic science. The ongoing research and clinical trials in gene therapy, regenerative medicine, and genetic diagnostics reiterate the notion of the biomedical subject as being in some way amenable to a database. In addition to this bio-informatic encapsulation of individual and collective bodies, the transactions and economies between bodies are also being affected. Research into stem cells has ushered in a new era of molecular bodies that not only are self-generating like a reservoir (a new type of tissue banking), but that also create a tissue economy of potential biologies (lab-grown tissues and organs).

If layering is dependent upon portability, then portability is in turn enabled by the existence of ontology standards. These are some of the sites that Protocol opens up concerning the possible relations between information and biological networks. While the concept of biopolitics is often used at its most general level, Protocol asks us to respecify biopolitics in the age of biotechnology and bioinformatics. Thus one site of future engagement is in the zones where info-tech and bio-tech intersect. The “wet” biological body has not simply been superceded by “dry” computer code, just as the wet body no longer accounts for the virtual body. Biotechnologies of all sorts demonstrate this to us—in vivo tissue engineering, ethnic genome projects, gene-finding software, unregulated genetically modified foods, portable DNA diagnostics kits, and distributed proteomic computing.

pages: 361 words: 86,921

The End of Medicine: How Silicon Valley (And Naked Mice) Will Reboot Your Doctor
by Andy Kessler
Published 12 Oct 2009

These small jets can’t handle a volcanic eruption, can they? I decided to ask a question about something I might understand the answer to. “Don, you keep talking about this bioinformatics thing. Is it just some Oracle database? Is there something special about it?” Don leaned forward. “Nothing is easy in this world. We need a database of all known proteins plus a standardized way to store information from new proteins as they are characterized. The National Cancer Institute has something called CaBIG, the Cancer Bioinformatics Information Grid. It tries to define things from clinical trial data to genomics—even biospecimens.” “Okay.” “We’ll be part of it.

So we try to standardize simple things, like calibrating machines, like creating a real database of proteins and on and on. “To get biomarkers to be real, you have to have both specificity and sensitivity. Picking up on just one protein and not missing it. But there are millions of proteins—we probably know about 5,000 of them. I’m funding a bioinformatics platform to hold all this protein info. It’s amazing no one has done this yet. I’ve got a bunch of ex-Microsofters writing code.” “Why care about all those proteins? Don’t only a few work?” I asked. “Sure, but which ones? Turns out that if we use two markers—CA-125 and something else—maybe we can get the effectiveness to 0.95.

scanning of stents in arthritis ASCOT-LLA trials aspirin Astrophysical Journal Atmospheric Test Ban Treaty (1963) ATMs atomic mass atomographs ATP III trials Audible automation autopsies Avogadro’s number baby-boom generation back problems bacteria Baker, Laurence baldness Balestra, Mark balloon catheters banking industry barium platinocyanide Bell Labs Bentley, John Berlin, Andy Bernstein, Dr. beta-blockers beta-lymphocyte stimulator (BLyS) Bextra Bialystock, Max Billion-Dollar Molecule bioinformatics biological markers (biomarkers) biology: academic vs. applied analogue nature of digital technology for laws of molecular bioluminescence imagers biopsies Bio-Rad classification book biotechnology Bittner, Craig bladder infections Blake, Dr. “bleed to read” standard blood clots blood flow blood pressure blood serum blood tests: cost of laboratories for for molecular diagnosis results of for tumors blood vessels Blue Cross bone marrow cancer bone metastases bone scans Botox Bracewell, Ronald N.

pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms
by Mehmed Kantardzić
Published 2 Jan 2003

In contrast to the (global) model structure, a temporal pattern is a local model that makes a specific statement about a few data samples in time. Spikes, for example, are patterns in a real-valued time series that may be of interest. Similarly, in symbolic sequences, regular expressions represent well-defined patterns. In bioinformatics, genes are known to appear as local patterns interspersed between chunks of noncoding DNA. Matching and discovery of such patterns are very useful in many applications, not only in bioinformatics. Due to their readily interpretable structure, patterns play a particularly dominant role in data mining. There have been many techniques used to model global or local temporal events. We will introduce only some of the most popular modeling techniques.

It paints a picture of the state-of-the-art techniques that can boost the capabilities of many existing data-mining tools and gives the novel developments of feature selection that have emerged in recent years, including causal feature selection and Relief. The book contains real-world case studies from a variety of areas, including text classification, web mining, and bioinformatics. Saul, L. K., et al., Spectral Methods for Dimensionality Reduction, in Semisupervised Learning, B. Schööelkopf, O. Chapelle and A. Zien eds., MIT Press, Cambridge, MA, 2005. Spectral methods have recently emerged as a powerful tool for nonlinear dimensionality reduction and manifold learning.

The question is when to use the linear kernel as a first choice. If the number of features is large, one may not need to map data to a higher dimensional space. Experiments showed that the nonlinear mapping does not improve the SVM performance. Using the linear kernel is good enough, and C is the only tuning parameter. Many microarray data in bioinformatics and collection of electronic documents for classification are examples of this data set type. As the number of features is smaller, and the number of samples increases, SVM successfully maps data to higher dimensional spaces using nonlinear kernels. One of the methods for finding optimal parameter values for an SVM is a grid search.

pages: 256 words: 67,563

Explaining Humans: What Science Can Teach Us About Life, Love and Relationships
by Camilla Pang
Published 12 Mar 2020

How to learn from your mistakes Deep learning, feedback loops and human memory 11. How to be polite Game theory, complex systems and etiquette Afterword Acknowledgements Index About the Author Dr Camilla Pang holds a PhD in Biochemistry from University College London and is a Postdoctoral Scientist specialising in Translational Bioinformatics. At the age of eight, Camilla was diagnosed with Autistic Spectrum Disorder (ASD), and ADHD at 26-years-old. Her career and studies have been heavily influenced by her diagnosis and she is driven by her passion for understanding humans, our behaviours and how we work. To my mother Sonia, father Peter and sister Lydia Introduction It was five years into my life on Earth that I started to think I’d landed in the wrong place.

If the rules are (mostly) unwritten, and no one can agree who sets them, then what can we do to avoid the nightmare scenario of a major etiquette breach? Being someone who is rather fond of a rulebook, I decided that the only way was to write my own. If no one would tell me what the laws of etiquette were, I would have to work them out for myself. In doing so, relying on techniques from computer modelling, game theory and my own field of bioinformatics, I have learned that a rulebook is perhaps the wrong way to think about etiquette. Because the rules are one thing, and they do exist, but they are not the only variable. It’s also about how they are tweaked, interpreted and applied into discrete situations. Individual behaviour is as important as collective habits, and the two influence each other in an unfolding symbiosis that you can never fully predict.

What makes it OK for my sister to mock my Frida Kahlo-esque unibrow, but not (I can promise you) for me to point out that her painted-on brows are reminiscent of Super Mario? We need a method for matching behaviour to context and filling in the gaps between our knowledge and ignorance of new situations. That is where homology, which we use to model the similarities between proteins, comes into its own. Homology is a core technique of bioinformatics, my field of study, where it is used to fill in the gaps in data sets we are still exploring, inferring from related cases. There will always be some missing data, but we can overcome this by using what we know about equivalent situations to inform what we don’t about this one. For instance, if you are trying to develop a new drug treatment for a particular form of cancer, and you have found a suitable protein to target, what you need to establish is its structure – the thing you will bind your treatment on to.

pages: 381 words: 78,467

100 Plus: How the Coming Age of Longevity Will Change Everything, From Careers and Relationships to Family And
by Sonia Arrison
Published 22 Aug 2011

SU’s mission is practical: “to assemble, educate and inspire leaders who strive to understand and facilitate the development of exponentially advancing technologies in order to address humanity’s grand challenges.”20 The academic tracks are geared toward understanding how fast-moving technologies can work together, and more than half of them have a direct impact on the field of longevity research. These tracks include AI and robotics; nanotechnology, networks, and computing systems; biotechnology and bioinformatics; medicine and neuroscience; and futures studies and forecasting.21 SU is a place where mavens speak to those who are superfocused on changing the world for the better. It is no surprise, then, that it also functions as an institutional “connector”—the third component needed to successfully spread a game-changing meme.

If the source code of humans can be identified, then it is not that much of a leap to think about re-engineering it. Suddenly, biology became a field that computer geeks could attempt to tackle, which not only resulted in smart biohackers forming do-it-yourself biology clubs, but also increased the pace of advances in biology. Bioinformatics are moving at the speed of Moore’s Law and sometimes faster. To the extent that wealthy technology moguls influence public opinion and hackers seem cool, the context for the longevity meme is sizzling hot. In a Wired magazine interview in April 2010, Bill Gates, America’s richest man, told reporter Steven Levy that if he were a teenager today, “he’d be hacking biology.”57 Gates elaborated, saying, “Creating artificial life with DNA synthesis, that’s sort of the equivalent of machine-language programming.”

Policy makers, activists, journalists, educators, investors, philanthropists, analysts, entrepreneurs, and a whole host of others need to come together to fight for their lives. We now know that aging is plastic and that humanity’s time horizons are not set in stone. Larry Ellison, Bill Gates, Peter Thiel, Jeff Bezos, Larry Page, Sergey Brin, and Paul Allen have all recognized the wealth of opportunity in the bioinformatics revolution, but this is not enough. Other heroes must come forward—perhaps there is even one reading this sentence right now. The goal is more healthy time, which, as we have seen throughout this book, will lead to greater wealth and prospects for happiness. A longer health span means more time to enjoy the wonders of life, including relationships with family and friends, career building, knowledge seeking, adventure, and exploration.

pages: 350 words: 96,803

Our Posthuman Future: Consequences of the Biotechnology Revolution
by Francis Fukuyama
Published 1 Jan 2002

The Human Genome Project would not have been possible without parallel advances in the information technology required to record, catalog, search, and analyze the billions of bases making up human DNA. The merger of biology and information technology has led to the emergence of a new field, known as bioinformatics.3 What will be possible in the future will depend heavily on the ability of computers to interpret the mind-boggling amounts of data generated by genomics and proteomics and to build reliable models of phenomena such as protein folding. The simple identification of genes in the genome does not mean that anyone knows what it is they do.

Norton, 1994); Kathryn Brown, “The Human Genome Business Today,” Scientific American 283 (July 2000): 50–55; and Kevin Davies, Cracking the Genome: Inside the Race to Unlock Human DNA (New York: Free Press, 2001). 2 Carol Ezzell, “Beyond the Human Genome,” Scientific American 283, no. 1 ( July 2000): 64–69. 3 Ken Howard, “The Bioinformatics Gold Rush,” Scientific American 283, no. 1 (July 2000): 58–63. 4 Interview with Stuart A. Kauffman, “Forget In Vitro—Now It’s ‘In Silico,’” Scientific American 283, no. I July 2000): 62–63. 5 Gina Kolata, “Genetic Defects Detected in Embryos Just Days Old,” The New York Times, September 24, 1992, p.

Coppin. The Politics of Purity: Harvey Washington Wiley and the Origins of Federal Food Policy Ann Arbor, Mich.: University of Michigan Press, 1999. Hirschi, Travis, and Michael Gottfredson. A General Theory of Crime. Stanford, Calif.: Stanford University Press, 1990. Howard, Ken. “The Bioinformatics Gold Rush.” Scientific American 283, no. I (July 2000): 58–63. Hrdy, Sarah B., and Glenn Hausfater. Infanticide: Comparative and Evolutionary Perspectives. New York: Aldine Publishing, 1984. Hubbard, Ruth. The Politics of Women’s Biology. New Brunswick, N.J.: Rutgers University Press, 1990.

Beautiful Data: The Stories Behind Elegant Data Solutions
by Toby Segaran and Jeff Hammerbacher
Published 1 Jul 2009

In the financial services domain, large data stores of past market activity are built to serve as the proving ground for complex new models developed by the Data Scientists of their domain, known as Quants. Outside of industry, I’ve found that grad students in many scientific domains are playing the role of the Data Scientist. One of our hires for the Facebook Data team came from a bioinformatics lab where he was building data pipelines and performing offline data analysis of a similar kind. The well-known Large Hadron Collider at CERN generates reams of data that are collected and pored over by graduate students looking for breakthroughs. Recent books such as Davenport and Harris’s Competing on Analytics (Harvard Business School Press, 2007), Baker’s The Numerati (Houghton Mifflin Harcourt, 2008), and Ayres’s Super Crunchers (Bantam, 2008) have emphasized the critical role of the Data Scientist across industries in enabling an organization to improve over time based on the information it collects.

DNA As a Data Source To a programming language, DNA is simply a string: char(3*10^6) human_genome; The full genomic information for man consists of 3 billion characters and is easily handled in memory by even the most inefficient home-brewed language. However, the process of determining the exact order of these 3 billion bases requires a significant effort spanning chemistry, bioinformatics, laboratory procedures, and a lot of spinning disks. The Human Genome Project aimed, for the first time, to sequence every one of these characters. A number of large, high-throughput institutes from around the world put academic competition aside and set about a task that would last 13 years and consume billions of dollars.

Although unsuitable for analysis, this data is useful should any run require a manual review to identify imaging problems or artifacts (oil, poor DNA clustering, and even fingerprints aren’t uncommon). Once the sequencing data is available, it is stored in two formats in a high-performance Oracle database. While production systems make good use of databases, bioinformatics tools tend to continue to work against flat files on a physical filesystem. To be sure that we cater to all tastes, the vast swaths of sequence information available in this sequence archive are also presented to Sanger’s internal compute farms via a Fuse user-space filesystem. This approach scales surprisingly well.

pages: 285 words: 78,180

Life at the Speed of Light: From the Double Helix to the Dawn of Digital Life
by J. Craig Venter
Published 16 Oct 2013

Within a computer it would be possible to explore the functions of proteins, protein–protein interactions, protein–DNA interactions, regulation of gene expression, and other features of cellular metabolism. In other words, a virtual cell could provide a new perspective on both the software and hardware of life. In the spring of 1996 Tomita and his students at the Laboratory for Bioinformatics at Keio started investigating the molecular biology of Mycoplasma genitalium (which we had sequenced in 1995) and by the end of that year had established the E-Cell Project. The Japanese team had constructed a model of a hypothetical cell with only 127 genes, which were sufficient for transcription, translation, and energy production.

Currently Novartis and other vaccine companies rely on the World Health Organization to identify and distribute the seed viruses. To speed up the process we are using a method called “reverse vaccinology,” which was first applied to the development of a meningococcal vaccine by Rino Rappuoli, now at Novartis. The basic idea is that the entire pathogenic genome of an influenza virus can be screened using bioinformatic approaches to identify and analyze its genes. Next, particular genes are selected for attributes that would make good vaccine targets, such as outer-membrane proteins. Those proteins then undergo normal testing for immune responses. My team has sequenced genes representing the diversity of influenza viruses that have been encountered since 2005.

“Natural selection as the process of accumulating genetic information in adaptive evolution.” Genetical Research 2 (1961): pp. 127–40. 7. Sydney Brenner. “Life’s code script.” Nature 482 (February 23, 2012): p. 461. 8. W. J. Kress and D. L. Erickson. “DNA barcodes: Genes, genomics, and bioinformatics.” Proceedings of the National Academy of Sciences 105, no. 8 (2008): pp. 2761–62. 9. Lulu Qian and Erik Winfree. “Scaling up digital circuit computation with DNA strand displacement cascades.” Science 332, no. 6034 (June 3, 2011): pp. 1196–201. 10. George M. Church, Yuan Gao, and Sriram Kosuri.

pages: 405 words: 117,219

In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence
by George Zarkadakis
Published 7 Mar 2016

This dualistic software–hardware paradigm is applied across many fields, including life itself. Cells are the ‘computers’ that run a ‘program’ called the genetic code, or genome. The ‘code’ is written on the DNA. Cutting-edge research in biology does not take place in vitro in a wet lab, but in silico in a computer. Bioinformatics – the accumulation, tagging, storing, manipulation and mining of digital biological data – is the present, and future, of biology research. The computer metaphor for life is reinforced by its apparently successful application to real problems. Many disruptive new technologies in molecular biology – for instance ‘DNA printing’ – function on the basis of digital information.

Norbert Wiener’s cybernetic dream is slowly becoming a reality: the more information we have about systems, the more control we can exercise over them with the help of our computers. Big data are our newfound economic bounty. The big data economy In 2010, I took a contract as External Relations Officer at the European Bioinformatics Institute (EBI) at Hinxton, Cambridge. The Institute is part of the intergovernmental European Molecular Biology Laboratory, and its core mission is to provide an infrastructure for the storage and manipulation of biological data. This is the data that researchers in the life sciences produce every day, including information about the genes of humans and of other species, chemical molecules that might provide the basis for new therapies, proteins, and also about research findings in general.

As someone who facilitated communications between the Institute and potential government funders across Europe, I had first-hand experience of the importance that governments placed on biological data. Almost everyone understood the potential for driving innovation through this data, and was ready to support the expansion of Europe’s bioinformatics infrastructure, even as Europe was going through the Great Recession. The message was simple and clear: whoever owned the data owned the future. Governments and scientists are not the only ones to have jumped on the bandwagon of big data. The advent of social media and Google Search has transformed the marketing operations of almost every business in the world, big and small.

Pearls of Functional Algorithm Design
by Richard Bird
Published 15 Sep 2010

Final remarks The origins of the maximum segment sum problem go back to about 1975, and its history is described in one of Bentley’s (1987) programming pearls. For a derivation using invariant assertions, see Gries (1990); for an algebraic approach, see Bird (1989). The problem refuses to go away, and variations are still an active topic for algorithm designers because of potential applications in data-mining and bioinformatics; see Mu (2008) for recent results. The interest in the non-segment problem is what it tells us about any maximum marking problem in which the marking criterion can be formulated 78 Pearls of Functional Algorithm Design as a regular expression. For instance, it is immediate that there is an O(nk ) algorithm for computing the maximum at-least-length-k segment problem because F ∗ T n F ∗ (n ≥ k ) can be recognised by a k -state automaton.

The function sorttails is needed as a preliminary step in the Burrows–Wheeler algorithm for data compression, a problem we will take up in the following pearl. The problem of sorting the suffixes of a string has been treated extensively in the literature because it has other applications in string matching and bioinformatics; a good source is Gusfield (1997). This pearl was rewritten a number of times. Initially we started out with the idea of computing perm, a permutation that sorts a list. But perm is too specific in the way it treats duplicates: there is more than one permutation that sorts a list containing duplicate elements.

– array index, 25, 29, 87, 100 – prefix, 103, 119, 127 accumArray, 2, 5, 82, 123 applyUntil, 82 array, 29, 85 bounds, 25 break , 154, 164, 182 compare, 29 concatMap, 42 elems, 85 foldrn – fold over nonempty lists, 42 fork , 35, 83, 94, 118 inits, 66, 67, 117 listArray, 25, 100 minors, 172 nodups, 149 nub, 64 partition, 4 partitions, 38 reverse, 119, 244 scanl, 118, 238 scanr , 70 sort, 28, 95 sortBy, 29, 94 span, 67 subseqs, 57, 65, 157, 163 tails, 7, 79, 100, 102 transpose, 98, 150, 193 unfoldr , 202, 243 zip, 35, 83 zipWith, 83 Abelian group, 27 abides property, 3, 22 abstraction function, 129, 211, 226 accumulating function, 2 accumulating parameter, 131, 138, 140, 177, 253 adaptive encoding, 200 amortised time, 5, 118, 131, 133 annotating a tree, 170 arithmetic decoding, 201 arithmetic expressions, 37, 156 array update operation, 3, 6 arrays, 1, 2, 21, 29, 85, 99 association list, 29, 238 asymptotic complexity, 27 bags, 25, 50, 51 balanced trees, 21, 54, 234 Bareiss algorithm, 186 bijection, 129 binary search, 7, 10, 14, 15, 19, 54 binomial trees, 178 bioinformatics, 77, 90 Boolean satisfiability, 155 borders of a list, 103 bottom-up algorithm, 41 boustrophedon product, 245, 251, 260 breadth-first search, 136, 137, 178 Bulldozer algorithm, 196 bzip2, 101 call-tree, 168 Cartesian coordinates, 141, 155 Cartesian product, 149 celebrity clique, 56 Chió’s identity, 182 clique, 56 combinatorial patterns, 242 comparison-based sorting, 10, 16, 27 computaional geometry, 188 conjugate, 263 constraint satisfaction, 155 continuations, 273 coroutines, 273 275 276 cost function, 41, 48, 52 cyclic structures, 133, 179 data compression, 91, 198 data mining, 77 data refinement, 5, 48, 108, 114, 129, 210 deforestation, 168 depth-first search, 137, 221, 222 destreaming, 214 destreaming theorem, 214 Dilworth’s theorem, 54 divide and conquer, 1, 3, 5, 7, 8, 15, 21–23, 27, 29, 30, 65, 81, 171 dot product, 185 dynamic programming, 168 EOF (end-of-file symbol), 203 exhaustive search, 12, 33, 39, 57, 148, 156 facets, 190 failure function, 133 fictitious values, 14, 77 finite automaton, 74, 136 fission law of foldl, 130 fixpoint induction, 205 forests, 42, 174 fringe of a tree, 41 frontier, 137 fully strict composition, 243 fusion law of foldl, 76, 130, 195 fusion law of foldr , 34, 51, 52, 61, 247, 260, 261, 265 fusion law of foldrn, 43 fusion law of fork , 35 fusion law of unfoldr , 206, 212 Galil’s algorithm, 122 garbage collection, 165, 166 Garsia–Wachs algorithm, 49 Gaussian elimination, 180 graph traversal, 178, 221 Gray path order, 258 greedy algorithms, 41, 48, 50, 140 Gusfield’s Z algorithm, 116 Hu–Tucker algorithm, 49 Huffman coding, 91, 198, 201 immutable arrays, 25 incremental algorithm, 188, 191, 204 incremental decoding, 216 incremental encoding, 203, 209 indexitis, 150 inductive algorithm, 42, 93, 102 integer arithmetic, 182, 198, 208 integer division, 182 intermediate data structure, 168 interval expansion, 209, 210 inversion table, 10 inverting a function, 12, 93 involution, 150 iterative algorithm, 10, 82, 109, 113 Index Knuth and Ruskey algorithm, 258 Knuth’s spider spinning algorithm, 242 Koda–Ruskey algorithm, 242 law of iterate, 99 laws of filter , 118, 152 laws of fork , 35 lazy evaluation, 33, 147, 185, 243 leaf-labelled trees, 41, 165, 168 left spines, 43, 45, 177 left-inverse, 129 Leibniz formula, 180 lexicographic ordering, 45, 52, 64, 102, 104 linear ordering, 43 linked list, 225 longest common prefix, 103, 112, 120 longest decreasing subsequence, 54 loop invariants, 62, 111 lower bounds, 16, 27, 28, 64 Mahajan and Vinay’s algorithm, 186 majority voting problem, 62 matrices, 147, 181 matrix Cartesian product, 149 maximum marking problems, 77 maximum non-segment sum, 73 maximum segment sum, 73 maximum surpasser count, 7 McCarthy S-expression, 221 memo table, 163 memoisation, 162 merge, 26, 142, 158 mergesort, 29, 89, 171, 173 minimal element, 53 minimum cost tree, 44 minimum element, 53 minors, 181 model checking, 155 monads, 3, 114, 155 monotonicity condition, 48, 53 move-to-front encoding, 91 multisets, 25 narrowing, 199 nondeterministic functions, 43, 51 normal form, 160 online list labelling, 241 Open Problems Project, 31 optimal bracketing, 176 optimisation problems, 48, 176 order-maintenance problem, 241 overflow, 214 parametricitiy, 62 partial evaluation, 134 partial ordering, 53 partial preorder, 52 partition sort, 85 partition sorting, 87 perfect binary trees, 171 Index permutations, 79, 90, 91, 96, 97, 180, 189, 242, 251 planning algorithm, 136, 138 plumbing combinators, 36 prefix, 66 prefix ordering, 103, 105, 119 preorder traversal, 245, 270 principal submatrices, 185 program transformation, 221 PSPACE completeness, 136 queues, 109, 137, 248, 249 Quicksort, 5, 85, 89 radix sort, 95, 101 ranking a list, 79 rational arithmetic, 180, 188, 198 rational division, 181 recurrence relations, 15, 31, 88 refinement, 44, 48, 51–53, 80 regular cost function, 49 regular expression, 74 relations, 48, 167, 229 representation function, 129, 211 right spines, 177 Rose trees, 164, 245 rotations of a list, 91 rule of floors, 215 run-length encoding, 91 saddleback search, 14 safe replacement, 222 scan lemma, 118, 125 segments, 73, 171 Shannon–Fano coding, 198 sharing, 168, 173 shortest upravel, 50 simplex, 188 skeleton trees, 165 sliding-block puzzle, 136 smart constructors, 48, 170, 177 smooth algorithms, 241 solving a recursion, 98 sorting, 9, 10, 16, 91, 149 sorting numbers, 1, 3 sorting permutation, 10 space/time trade-offs, 156 spanning tree, 178 stable sorting algorithm, 86, 95 stacks, 137, 221, 222 streaming, 203, 214 streaming theorem, 204 string matching, 112, 117, 127 stringology, 103 subsequences, 50, 64, 74, 162, 177, 242 suffix tree, 101 suffixes, 79, 100 Sylvester’s identity, 186 thinning algorithm, 161 top-down algorithm, 41 totally acyclic digraph, 258 transitions, 242 trees, 130, 165, 248 tries, 163 tupling law of foldl, 118, 125 tupling law of foldr , 247 unfolds, 168 unmerges, 158, 159, 165 unravel, 50 upper triangular matrix, 185 Vandermonde’s convolution, 17 well-founded recursion, 4, 30 while loop, 111, 113 wholemeal programming, 150 windows of a text, 120 Young tableau, 28 277

Dinosaurs Rediscovered
by Michael J. Benton
Published 14 Sep 2019

These fights might seem inconsequential, but we are considering the fundamentals of how to document the wonders of biodiversity, and we are also addressing origins. Documenting biodiversity and origins is big science now – indeed, it forms part of the modern techniques termed, rather forbiddingly, phylogenomics and bioinformatics. Phylogenomics is the new discipline of establishing evolutionary trees from molecular data. Bioinformatics is the field of managing large data sets in the life sciences and number-crunching those data to produce information on the genetic basis of disease, adaptations, and cell function, and has applications fundamental to medicine and agriculture.

McNeill 215, 216, 218, 228–29, 234, 252 Allen, Percy 73 alligators 118, 164–65, 194 Allosaurus 49, 121, 188 animated skin of 250 diet 206 fact file 188–89 feeding mechanisms 186–88, 190–91, 193, 193 medullary bone 145 Morrison Formation 69, 71 movement 248 skulls 17–18, X teeth and bite force 188, 189, 192, 196 Alvarez, Luis 259–62, 260, 264, 267, 285, 286 Alvarez, Walter 259, 260, 261–62, 264 amber dinosaurs preserved in 131–32, VI extracting DNA from fossils in 136, 137 American Museum of Natural History (AMNH) 54, 156, 166, 243 American National Science Foundation 52 Amherst College Museum, Connecticut 223, 224–25, 227 Amphicoelias 206 analogues, modern 16 Anatosaurus 221, 221 Anchiornis 68–69, 70, V fact file 70 feathers 125, 126 flight 245 footprints 224–25, 225 angiosperms 78–79 animation 249–52, 251 Ankylosaurus 65, 79, 272 extinction 276 fact file 272–73 Hell Creek Formation 270 use of arms and legs 236 Anning, Mary 195 apatite 142 Apatosaurus 206 Archaeopteryx 110, 112, IV as ‘missing link’ fossil 114, 121 fact file 112–13 flight 114, 124, 247 Richard Owen and 111, 114 skeleton found at Solnhofen 111, 277 archosauromorphs 35–36, 37 archosaurs 16, 21–22, 35, 39, 56 Armadillosuchus 201 Asaro, Frank 259 Asilisaurus 32–33 asteroid impact 254–69, 275–76, 280, 281, 286–87, XIX Attenborough, David 98, 213 B Bakker, Bob 109–10, 115, 126 asteroid impact and extinction 262 Deinonychus 110, 111, 221, 244–45 dinosaurs as warm-blooded creatures 109, 116, 117 modern birds as dinosaurs 110 speed of dinosaurs 230 validity of Owen’s Dinosauria 57, 59 Baron, Matt 80–83 Barosaurus 206 Barreirosuchus 201 Barrett, Paul 80–83 Baryonyx 193 Bates, Karl 192 Bayesian statistical methods 273, 275 BBC Horizon 229, 264–65 Walking with Dinosaurs 249–52, 251 beetles 78, 139, 204 Beloc, Haiti 265–66, 265 Bernard Price Palaeontological Institute 160, 163 Bernardi, Massimo 43, 46 biodiversity, documenting 52 bioinformatics 52 bipedal dinosaurs arms and legs 235–40 early images of 219–21 movement and posture 221–22, 222, 249 speed 228 Bird, Roland T. 242–43 birds 145 brains 129 breathing 118 eggs 155, 158, 159, 166 evolution of 277, 278–79, 279–81, 280 feathers 125–26, 127 flight 244, 247, 248 gastroliths 194 growth 174 identifying ancestral genetic sequences 151–52 intelligence 128 as living dinosaurs 110–15, 118, 120–21, 124, 132 and the mass extinction 277–81 medullary bone 143, 145 Mesozoic birds from China 118–24 movement 234 sexual selection 126 using feet to hold prey down 235, 235 bite force 191–94 blood, identifying dinosaur 141–43 Bonaparte, José 239 bones 99 age of 155 bone histology 116–18, 119 bone remodelling 116–17 casting 100 composition 142 excavating from rock 87–99, 105 extracting blood from 141–42 first found 65 first illustrated 65 growth lines 116, 117, 154–55, 170, 172–73, 184 how dinosaurs’ jaws worked 186 mapping 93–94 reconstructing 99–101 structures 170, XIII Brachiosaurus 49, 69, 178–79 diet 206, 207–8 fact file 178–79 Morrison Formation 69 size 175 bracketing 15–17 brain size 128–30, XI, XII breakpoint analysis 42, 43 breathing 118 Bristol City Museum 104 Bristol Dinosaur Project 101–4 British Museum, London 111, 114 Brontosaurus 69, 225 Brookes, Richard 65 Brown, Barnum 273 Brusatte, Steve 32, 36–37, 39 bubble plots 42, 43 Buckland, William 67, 195 Buckley, Michael 142 Burroughs, Edgar Rice, The Land that Time Forgot 134 Butler, Richard 32 Button, David 208, 213 C Camarasaurus 175, 206, 208–9, 209, 213, IX Cano, Raúl 136 Carcharodontosaurus 196 Carnegie, Andrew 211 Carnian Pluvial Episode 40, 42, 43, 45, 46, 50 carnivores 201 see also individual dinosaurs Carnotaurus 201, 238, 239, 240 fact file 239 carotenoids 124 cartilage 142 Caudipteryx 121, 123 fact file 123 Centrosaurus 87, 88 fact file 88–89 ceratopsians 79, 143, 156 diversity of 272, 275 use of arms and legs 236 Ceratosaurus 69, 71, 187, 206 Cetiosaurus 57, 66 Chapman Andrews, Roy 156, 166 Charig, Alan 22–23, 34, 39 Chasmosaurus 87 Chen, Pei-ji 121 Chicxulub crater, Mexico 264–68, 267, 285, 286 Chin, Karen 195, 204 China Jurassic dinosaurs 68–71 Mesozoic birds from 118–24 Chinsamy-Turan, Anusuya 145 chitin 139 chromosomes 151–52 Chukar partridges 248 clades 55, 82, 110 cladistics 53–55, 82–83 cladograms 55, 56 Clashach, Scotland 85, 86 classic model 21, 21 classification, evolutionary trees 52–84, 60–61 climate climate change 22, 40, 41, 43 Cretaceous 269 identifying ancient 46–47 Late Triassic 40, 41, 43, 49 Triassic Period 48, 49 cloning 134–35, 137, 148–51, 150 Coelophysis 193, 236, I, X Colbert, Ned 22, 23, 34 Romer-Colbert ecological relay model 22, 35, 36, 39–40 size and core temperature 118 cold-blooded animals 116 collagen 142, 143 colour of dinosaurs 124–25 of feathers 8–10, 17, 139, V computational methods 35–39 Conan Doyle, Sir Arthur, The Lost World 133–34, 133, 135 Confuciusornis 144, 145, 147, XIII fact file 146–47 conifers 22, 131, 197, III Connecticut Valley 223–26, 224–25, 227, 243 contamination of DNA 138 continental plates 47 Cope, Edward 208 coprolites 195, 195, 197, 204 coprophagy 204 crests 126, 128, 143 Cretaceous 50, 71–75 birds 277–78 climate 269 decline of dinosaurs 274, 275 dinosaur evolution rates 77 ecosystems 205 in North America 240–42 ornithopods 71 sauropods 71 see also Early Cretaceous; Late Cretaceous Cretaceous–Palaeogene boundary 260, 261–62, 265–66, 269 evolution of birds 276, 277, 278–79 Cretaceous Terrestrial Revolution 77–80, 131 Crichton, Michael, Jurassic Park 134–35, 136 criticism and scientific method 287–88 crocodiles 218 Adamantina Formation food web 201–3 eggs and babies 155, 159, 164, 165 feeding methods 194 function of the snout 193 crurotarsans 39 CT (computerized tomographic) scanning 97, 99 dinosaur embryos 160, 162 dinosaur skulls 163, 191 Currie, Phil 86, 91, 121 Cuvier, Georges 257 D Dal Corso, Jacopo 40 Daohugou Bed, China 68 Darwin, Charles 23, 107, 114, 132, 287 Daspletosaurus 170, 171 dating dinosaurian diversification 44–46 de-extinction science 149, 151 death of dinosaurs see extinction Deccan Traps 268, 285, 287 Deinonychus 112, 114, 121 fact file 112–13 John Ostrom’s monograph on 110, 111, 113, 116, 244–45 movement 221 dentine 196, 197 Dial, Ken 248 diet collapsing food webs 204–5 dinosaur food webs 201–4 fossil evidence for 194–95 microwear on teeth and diet 199–201 niche division and specialization in 205–13 digital models 17, 18, 19, 191–94, 231–34, 249, 252 dimorphism, sexual 126, 143 dinomania 107 Dinosaur Park Formation, Drumheller 86, 91–99, 100 Dinosaur Provincial Park, Alberta 86, 87, 91–92, 91 Dinosaur Ridge, Colorado 240 Dinosauria 33, 55, 82, 107 discovery of the clade 57–59 Diplodocus 175, 210–11, II diet 207, 208–9, 213 fact file 210–11 Morrison Formation 69 skulls IX teeth and bite force 209, 213 diversification of dinosaurs 29, 44–46 DNA (deoxyribonucleic acid) 134–35 cloning 148–51 dinosaurian genome 151–52 extracting from fossils in amber 136 extracting from museum skins and skeletons 138 identifying dinosaur 136–37 survival of in fossils 138–39, 141 Doda, Bajazid 180 Dolly the sheep 148, 149 Dromaeosaurus 87, 121 duck-billed dinosaurs see hadrosaurs dung beetles 204 dwarf dinosaurs 180–84 Dysalotosaurus 145 Dzik, Jerzy 29, 31 E Early Cretaceous diversity of species on land and in sea 78 Jehol Beds 124 Wealden 72–74, 74, 75, 78 ecological relay model 21, 22, 35, 36, 39 ecology, and the origin of dinosaurs 23–25 education, using dinosaurs in 101–4 eggs, birds 155, 158, 159, 166 eggs, dinosaur 154, 155–56 dinosaur embryos 160–63 nests and parental care 163–67 size of 158–59 El Kef, Tunisia 276 Elgin, Scotland 25–26, 26, 34, 85–86 embryos, dinosaur 154, 160–63 enamel, tooth 196, 197 enantiornithines 277–78 encephalization quotient (EQ) 130 engineering models 17–18 Eoraptor 29 Erickson, Greg 154–55, 170, 172–73, 184–85, 197 eumelanin 124 eumelanosomes V Euoplocephalus 87, 88 fact file 88–89 Europasaurus 117 European Synchrotron Radiation Facility (ESRF) 162 evolution 13, 23, 40 evolutionary trees 52–84, 60–61, 281 Richard Owen’s views on 106–7, 114 size and 181, 184 Evolution (journal) 109 excavations 87–99 Dinosaur Park Formation 86, 91–99, 100 recording 92–97 extant phylogenetic bracket 16, 217 external fundamental system (EFS) 170 extinction Carnian Pluvial Episode 40, 42, 43, 45, 46, 50 end-Triassic event 64 mass extinction 254–85 Permian–Triassic mass extinction 14, 33–34, 46, 222 sudden or gradual 270–75 eyes 100 F faeces, fossil 194, 195, 197, 204 Falkingham, Peter 192, 226 feathers 99, 245 in amber 131, VI bird feathers 125–26, 127 colour of 8–10, 17, 139, V as insulation 126 melanosomes 8–10, 8, 17, 124–25, 132, V sexual signalling 126, 128, 143 Sinosauropteryx 8–9, 8, 10, 17, 119, 120–21, 125, 126 Field, Dan 279, 281 films, dinosaurs in 249–52 Jurassic Park 134–35, 136, 217, 252 finding dinosaurs 87–105 finite element analysis (FEA) 18, 190–91, 199, 208 fishes 128, 159, 163–64, 196 flight 244–49 flowering plants 78–79, III food webs 71–75, 201–4 Adamantina Formation 201–4, 202–3 collapsing 204–5 Wealden 74, 75 footprints 223–27, 240 megatracksites 242 photogrammetry 94 swimming tracks 242, 243 fossils casting 100 extracting skeletons from 94–99, 105 plants 269 reconstructing 99–101 scanning 97, 99 survival of organic molecules in 138–39, 141 Framestore 249–50 Froude, William 228–29 G Galton, Peter 58, 59, 110, 115, 221, 221 Garcia, Mariano 232, 234 gastroliths 194 Gatesy, Stephen 226, 231 gaur 148–49 Gauthier, Jacques 53, 59, 245 genetic engineering, bringing dinosaurs back to life with 148–51 genome, dinosaurian 151–52 geological time scale 6–7, 44–45 gharials 193, 194 gigantothermy 117, 118 Gill, Pam 199 glasses, impact 265–66, 269 gliding 245, 247, 248 Gorgosaurus 87, 170, 171 Granger, Walter 157 Great Exhibition (1851) 107, 108 Gregory, William 157 Grimaldi, David 131 growth dwarf dinosaurs 180–84 growth rates 154, 170–74, 184 growth rings 116, 117, 154–55, 170, 172–73, 184 growth spurts 145 how dinosaurs could be so huge 175–79 Gryposaurus 87 Gubbio, Italy 260, 261–62, 265, 266, 286 H hadrosaurs 79, 143 Dinosaur Park Formation 91–99, 100 diversity of 272, 275 first skeleton 218–19, 220 teeth 196–97, 198, 201, XVIII use of arms and legs 236 Hadrosaurus foulkii 220 Haiti 265–66, 265 Haldane, J.

pages: 253 words: 83,473

The Demon in the Machine: How Hidden Webs of Information Are Finally Solving the Mystery of Life
by Paul Davies
Published 31 Jan 2019

MEASURING INFORMATION Before I get to the big picture I need to drill down a bit into the concept of information. We use the word a lot in daily life, in all sorts of contexts, ranging from bus timetables to military intelligence. Millions of people work for information technology companies. The growing field of bioinformatics attracts billions of dollars of funding. The economy of the United States is based in large part on information-based industries, and the field is now so much a part of everyday affairs that we usually simply refer to it as ‘IT’. But this casual familiarity glosses over some deep conceptual issues.

Above all, the speed of computers has increased vastly, while the cost has plummeted. And software improvements have contributed at least as much as hardware embellishments to the success of the product. It took a century following the publication of Darwin’s theory for the informational story of life to enter the evolutionary narrative. The field of bioinformatics is now a vast and sprawling industry, accumulating staggering amounts of data and riding high on hyperbole. The publication in 2003 of the first complete human genome sequence, following a mammoth international effort, was hailed as a game-changer for biology in general and medical science in particular.

Even today, in the absence of foreknowledge, nobody can predict from a genomic sequence what the actual organism might look like, let alone how random changes in the genome sequence would translate into changes in phenotype. Genes make a difference only when they are expressed (that is, switched on), and it is here, in the field of gene control and management, that the real bioinformatics story begins. This emerging subject is known as epigenetics, and it is far richer and more subtle than genetics in isolation. More and more epigenetic factors which drive the organization of biological information patterns and flows are being discovered. The refinement and extension of Darwinism that is now emerging – what I am calling Darwinism 2.0 – is yielding an entirely new perspective on the power of information in biology, ushering in a major revision of the theory of evolution.

ucd-csi-2011-02
by Unknown
Published 1 Mar 2011

Thus the work is more directed at the problem of Wikipedia vandalism than the issue of authoritativeness that is the subject of this paper. 3 Extracting and Comparing Network Motif Profiles The idea of characterizing networks in terms of network motif profiles is well established and has had a considerable impact in bioinformatics [10]. Our objective is to characterize Wikipedia pages in terms of network motif profiles and then examine whether or not different pages have characteristic network motif profiles. The datasets we considered were entries in the English language Wikipedia 2 on famous sociologists and footballers in the English Premiership 4 (see Table 1).

pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands
by Eric Topol
Published 6 Jan 2015

Indeed, the state of California, which has the largest prenatal screening program in the world, with more than four hundred thousand expectant mothers assessed annually, already provides these tests to all pregnant women who have increased risk.26 Of course, we could also sequence the fetus’s entire genome instead of just doing the simpler screens. While that is not a commercially available test, and there are substantial bioinformatic challenges that lie ahead before it could be scalable, the anticipatory bioethical issues that this engenders are considerable.27 We are a long way off for determining what would constitute acceptable genomic criteria for early termination of pregnancy, since this not only relies on accurately determining a key genomic variant linked to a serious illness, but also understanding whether this condition would actually manifest.

Now it is possible to use sequencing to unravel the molecular diagnosis of an unknown condition, and the chances for success are enhanced when there is DNA from the mother and father, or other relatives, to use for anchoring and comparative sequencing analysis. At several centers around the country, the success rate for making the diagnosis ranges between 25 percent and 50 percent. It requires considerable genome bioinformatic expertise, for a trio of individuals will generate around 750 billion data points (six billion letters per sequence, three people, each done forty times to assure accuracy). Of course, just making the diagnosis is not the same as coming up with an effective treatment or a cure. But there have been some striking anecdotal examples of children whose lives were saved or had dramatic improvement.

The most far-reaching component of the molecular stethoscope appears to be cell-free RNA, which can potentially be used to monitor any organ of the body.82 Previously that was unthinkable in a healthy person. How could one possibly conceive of doing a brain or liver biopsy in someone as part of a normal checkup? Using high-throughput sequencing of cell-free RNA in the blood, and sophisticated bioinformatic methods to analyze this data, Stephen Quake and his colleagues at Stanford were able to show it is possible to follow the gene expression from each of the body’s organs from a simple blood sample. And that is changing all the time in each of us. This is an ideal case for deep learning to determine what these dynamic genomic signatures mean, to determine what can be done to change the natural history of a disease in the making, and to develop the path for prevention.

pages: 834 words: 180,700

The Architecture of Open Source Applications
by Amy Brown and Greg Wilson
Published 24 May 2011

Amy Brown (editorial): Amy has a bachelor's degree in Mathematics from the University of Waterloo, and worked in the software industry for ten years. She now writes and edits books, sometimes about software. She lives in Toronto and has two children and a very old cat. C. Titus Brown (Continuous Integration): Titus has worked in evolutionary modeling, physical meteorology, developmental biology, genomics, and bioinformatics. He is now an Assistant Professor at Michigan State University, where he has expanded his interests into several new areas, including reproducibility and maintainability of scientific software. He is also a member of the Python Software Foundation, and blogs at http://ivory.idyll.org. Roy Bryant (Snowflock): In 20 years as a software architect and CTO, Roy designed systems including Electronics Workbench (now National Instruments' Multisim) and the Linkwalker Data Pipeline, which won Microsoft's worldwide Winning Customer Award for High-Performance Computing in 2006.

Rosangela Canino-Koning (Continuous Integration): After 13 years of slogging in the software industry trenches, Rosangela returned to university to pursue a Ph.D. in Computer Science and Evolutionary Biology at Michigan State University. In her copious spare time, she likes to read, hike, travel, and hack on open source bioinformatics software. She blogs at http://www.voidptr.net. Francesco Cesarini (Riak): Francesco Cesarini has used Erlang on a daily basis since 1995, having worked in various turnkey projects at Ericsson, including the OTP R1 release. He is the founder of Erlang Solutions and co-author of O'Reilly's Erlang Programming.

Returning to distributed systems and HDFS, Rob found many familiar problems, but all of the numbers had two or three more zeros. James Crook (Audacity): James is a contract software developer based in Dublin, Ireland. Currently he is working on tools for electronics design, though in a previous life he developed bioinformatics software. He has many audacious plans for Audacity, and he hopes some, at least, will see the light of day. Chris Davis (Graphite): Chris is a software consultant and Google engineer who has been designing and building scalable monitoring and automation tools for over 12 years. Chris originally wrote Graphite in 2006 and has lead the open source project ever since.

pages: 400 words: 94,847

Reinventing Discovery: The New Era of Networked Science
by Michael Nielsen
Published 2 Oct 2011

An overview of work on the Allen Brain Atlas may be found in Jonah Lehrer’s excellent article [120]. Most of the facts I relate are from that article. The paper announcing the atlas of gene expression in the mouse brain is [121]. Overviews of some of the progress and challenges in mapping the human connectome may be found in [119] and [125]. p 108: Bioinformatics and cheminformatics are now well-established fields, with a significant literature, and I won’t attempt to single out any particular reference for special mention. Astroinformatics has emerged more recently. See especially [24] for a manifesto on the need for astroinformatics. p 113: A report on the 2005 Playchess.com freestyle chess tournament may be found at [37], with follow-up commentary on the winners at [39].

See architecture of attention; restructuring expert attention augmented reality, 41, 87 autism-vaccine controversy, 156 Avatar (film), 34 Axelrod, Robert, 219 Baker, David, 146 basic research: economic scale of, 203 secrecy in, 87, 184–86 Bayh-Dole Act, 184–85 Benkler, Yochai, 218, 224 Bennett, John Caister, 149 Berges, Aida, 155 Bermuda Agreement, 7, 108, 190, 192, 222 Berners-Lee, Tim, 218 bioinformatics, 108 biology: data-driven intelligence in, 116–19 data web for, 121–22 open source, 48. See also genetics birdwatchers, 150 black holes, orbiting pair of, 96, 100–101, 103, 112, 114 Blair, Tony, 7, 156 Block, Peter, 218 blogs: architecture of attention and, 42, 56 as basis of Polymath Project, 1–2, 42 invention of, 20 in quantum computing, 187 rumors on, 201–2 scientific, 6, 165–69, 203–4 Borgman, Christine, 218 Boroson, Todd, 100–101, 103, 114 Borucki, William, 201 botany, 107 Brahe, Tycho, 104 brain atlases, 106, 108 British Chiropractic Association, 165–66 Brown, Zacary, 23–24, 27, 35, 41, 223 Burkina Faso, open architecture project in, 46–48 Bush, Vannevar, 217, 218 business: data-driven intelligence for, 112 data sharing methods in, 120.

See also amplifying collective intelligence Colwell, Robert, 218 combinatorial line, 211 comet hunters, 148–49 comment sites: successful examples of, 234 user-contributed, 179–81 commercialization of science, 87, 184–86 Company of Strangers, The (Seabright), 37 comparative advantage: architecture of attention and, 32, 33, 43, 56 examples from the sciences, 82, 83, 84, 85 for InnoCentive Challenges, 24, 43 modularity and, 56 technical meaning of, 223 competition: data sharing and, 103–4 as obstacle to collaboration, 86 in protein structure prediction, 147–48 for scientific jobs, 8, 9, 178, 186 Complexity Zoo, 233 computer code: in bioinformatics, 108 centralized development of new tools, 236 citation of, 196, 204–5 for complex experiments, 203 height=" information commons in, 57–59 sharing, 87, 183, 193, 204–5. See also Firefox; Linux; MathWorks competition; open source software computer games: addictive quality of, 146, 147 for folding proteins (see Foldit) connectome, human, 106, 121 conversation, offline small-group, 39–43 conversational critical mass, 30, 31, 33, 42 Cornell University Laboratory of Ornithology, 150 Cox, Alan, 57 Creative Commons, 219, 220 creative problem solving, 24, 30, 34, 35, 36, 38.

The New Harvest: Agricultural Innovation in Africa
by Calestous Juma
Published 27 May 2017

These include rice, corn, mosquito, chicken, cattle, and 82 THE NEW HARVEST dozens of plant, animal, and human pathogens. The challenge facing Africa is building capacity in bioinformatics to understand the location and functions of genes. It is through the annotation of genomes that scientists can understand the role of genes and their potential contributions to agriculture, medicine, environmental management, and other fields. Bioinformatics could do for Africa what computer software did for India. The field would also give African science a new purpose and help to integrate the region into the global knowledge ecology.

See African Union Australia, 63–64, 67, 131 Awuah, Patrick, 241 Babban Gona agricultural franchise (Nigeria), 214–16 “Back Home” projects (Uganda Rural Development and Training Program), 153–54 Index bananas: diseases affecting, 70–71; EARTH University production of, 171; “Golden Banana” variety and, 72–73; transgenic varieties of, 66, 70–73 Bangladesh, 71–72, 75, 202 Bangladesh Agricultural Research Institute, 71–72 banks: agricultural sector financing and, 5–6, 93–94, 100–101, 107, 143, 185; clusters and, 107; educational partnerships and, 176; infrastructure and, 143; stateowned, 107; technology and, 49, 52 Banque Régionale de Solidarité (BRS), 100–101 beans: entrepreneurship and, 164; infrastructure and, 120, 122; innovation and, 92–93 Benin: educational videos on agriculture in, 202–3; gender inequality in, 149; rice cluster in, 99–102; solar-powered irrigation in, 129 Bhoomi Project, 52 biodiversity, 73, 77–78, 255–56, 259 bioinformatics, 82 biopolymers, 39, 56–58 biosafety, 79–80, 82 biotechnology: African Panel on Modern Biotechnology and, 251; benefits of, 68–76; biodiversity, 73, 77; debates regarding safety of, 76–80, 82; food security and, 64; frontiers of, 61–63; genomes and, xxi, 23, 62, 81–82; GM crops and, xxi, 62, 249; incomes and, 68, 79; innovation and, xviii, 23, 41, 63–70, 190, 239, 242–43, 251; land-saving aspects of, 74; 303 “leapfrogging” and, 64–65, 68; regulation and, xxi, 61, 63, 72, 76–81; research and, 87, 111, 190; transgenic crops and, 62–81; trends in, 63–67 Black Sigatoka fungus, 71 Blue Skies Agro-processing Company, Ltd., 197 Boston (Massachusetts), 243 Brazil: Agricultural Research Corporation in, 30, 113–14; drought-resistant crops in, 74; entrepreneurship and education in, 165–66; flash drying in, 90; fruit exports from, 197; infrastructure in, 114; innovation and, 113–14; National System for Agriculture Research and Innovation (SNPA) in, 114; technology and, 242–44 Brazilian Agricultural Research Corporation (EMBRAPA), 30, 113–14, 243 Brazilian Development Cooperation Agency, 245 breadfruit, 211–13 Breadfruit Institute, 213–14 brinjal crops, 71–72 BRS (Banque Régionale de Solidarité), 100–101 BSS-Société Industrielle pour la Production du Riz (BSS-SIPRi), 100–101 Burkina Faso: aquaculture in, 24; CAADP and, 27–28; cereal cultivation in, 36; service sector in, 22; transgenic crops in, 65, 71 Burundi, 174, 205 businesses.

pages: 571 words: 105,054

Advances in Financial Machine Learning
by Marcos Lopez de Prado
Published 2 Feb 2018

Geurts (2013): “Understanding variable importances in forests of randomized trees.” Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 431–439. Strobl, C., A. Boulesteix, A. Zeileis, and T. Hothorn (2007): “Bias in random forest variable importance measures: Illustrations, sources and a solution.” BMC Bioinformatics, Vol. 8, No. 25, pp. 1–11. White, A. and W. Liu (1994): “Technical note: Bias in information-based measures in decision tree induction.” Machine Learning, Vol. 15, No. 3, pp. 321–329. Note 1 http://blog.datadive.net/selecting-good-features-part-iii-random-forests/. CHAPTER 9 Hyper-Parameter Tuning with Cross-Validation 9.1 Motivation Hyper-parameter tuning is an essential step in fitting an ML algorithm.

Beyond the basic library for organizing user data into files, the HDF Group also provides a suite of tools and specialization of HDF5 for different applications. For example, HDF5 includes a performance profiling tool. NASA has a specialization of HDF5, named HDF5-EOS, for data from their Earth-Observing System (EOS); and the next-generation DNA sequence community has produced a specialization named BioHDF for their bioinformatics data. HDF5 provides an efficient way for accessing the storage systems on HPC platform. In tests, we have demonstrated that using HDF5 to store stock markets data significantly speeds up the analysis operations. This is largely due to its efficient compression/decompression algorithms that minimize network traffic and I/O operations, which brings us to our next point. 22.5.3 In Situ Processing Over the last few decades, CPU performance has roughly doubled every 18 months (Moore's law), while disk performance has been increasing less than 5% a year.

In economics, the same data-driven research activities have led to the wildly popular behavioral economics (Camerer and Loewenstein [2011]). Much of the recent advances in data-driven research are based on machine learning applications (Qiu et al. [2016], Rudin and Wagstaff [2014]). Their successes in a wide variety of fields, such as planetary science and bioinformatics, have generated considerable interest among researchers from diverse domains. In the rest of this section, we describe a few examples applying advanced data analysis techniques to various fields, where many of these use cases originated in the CIFT project. 22.6.1 Supernova Hunting In astronomy, the determination of many important facts such as the expansion speed of the universe, is performed by measuring the light from exploding type Ia supernovae (Bloom et al. [2012]).

pages: 523 words: 148,929

Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100
by Michio Kaku
Published 15 Mar 2011

I imagine in the near future, many people will have the same strange feeling I did, holding the blueprint of their bodies in their hands and reading the intimate secrets, including dangerous diseases, lurking in the genome and the ancient migration patterns of their ancestors. But for scientists, this is opening an entirely new branch of science, called bioinformatics, or using computers to rapidly scan and analyze the genome of thousands of organisms. For example, by inserting the genomes of several hundred individuals suffering from a certain disease into a computer, one might be able to calculate the precise location of the damaged DNA. In fact, some of the world’s most powerful computers are involved in bioinformatics, analyzing millions of genes found in plants and animals for certain key genes. This could even revolutionize TV detective shows like CSI.

See Robotics/­AI Artificial vision Artsutanov, Yuri ASIMO robot, 2.­1, 2.­2, 2.­3 Asimov, Isaac, 2.­1, 6.­1, 8.­1 ASPM gene Asteroid landing Atala, Anthony Atomic force microscope Augmented reality Augustine Commission report, 6.­1, 6.­2 Avatar (movie), 1.­1, 2.­1, 6.­1, 7.­1 Avatars Backscatter X-­rays Back to the Future movies, 5.­1, 5.­2 Badylak, Stephen Baldwin, David E.­ Baltimore, David, 1.­1, 3.­1, 3.­2, 3.­3 Benford, Gregory Big bang research Binnig, Gerd Bioinformatics Biotechnology. See Medicine/­biotechnology Birbaumer, Niels Birth control Bismarck, Otto von Blade Runner (movie) Blue Gene computer Blümich, Bernhard, 1.­1, 1.­2 Boeing Corporation Booster-­rocket technologies Bova, Ben, 5.­1, 5.­2 Boys from Brazil, The (movie) Brain artificial body parts, adaptation to basic structure of emotions and growing a human brain Internet contact lenses and locating every neuron in as neural network parallel processing in reverse engineering of simulations of “­Brain drain”­ to the United States BrainGate device Brain injuries, treatment for Branson, Richard Brave New World (Huxley) Breast cancer Breazeal, Cynthia Brenner, Sydney Brooks, Rodney, 2.­1, 2.­2, 4.­1 Brown, Dan Brown, Lester Buckley, William F.­

See also Intellectual capitalism Carbon nanotubes, 4.­1, 6.­1 Carbon sequestration Cars driverless electric maglev, 5.­1, 9.­1 Cascio, Jamais Catoms Cave Man Principle biotechnology and computer animations and predicting the future and replicators and, 4.­1, 4.­2 robotics/AI and, 2.­1, 2.­2 sports and Cerf, Vint, 4.­1, 6.­1 Chalmers, David Charles, Prince of Wales Chemotherapy Chernobyl nuclear accident Chevy Volt Chinese Empire, 7.­1, 7.­2 Church, George Churchill, Winston, itr.­1, 8.­1 Cipriani, Christian Civilizations alien civilizations characteristics of various Types entropy and information processing and resistance to Type I civilization rise and fall of great empires rise of civilization on Earth science and wisdom, importance of transition from Type 0 to Type I, itr.­1, 8.­1, 8.­2 Type II civilizations, 8.­1, 8.­2, 8.­3 Type III civilizations, 8.­1, 8.­2 waste heat and Clarke, Arthur C.­ Clausewitz, Carl von Cloning, 3.­1, 3.­2 Cloud computing, 1.­1, 7.­1 Cochlear implants Code breaking Collins, Francis Comets Common sense, 2.­1, 2.­2, 2.­3, 7.­1, 7.­2 Computers animations created by augmented reality bioinformatics brain simulations carbon nanotubes and cloud computing, 1.­1, 7.­1 digital divide DNA computers driverless cars exponential growth of computer power (Moore’s law), 1.­1, 1.­2, 1.­3, 4.­1 fairy tale life and far future (2070) four stages of technology and Internet glasses and contact lenses, 1.­1, 1.­2 medicine and midcentury (2030) mind control of molecular and atomic transistors nanotechnology and near future (present to 2030) optical computers parallel processing physics of computer revolution quantum computers quantum dot computers quantum theory and, 1.­1, 4.­1, 4.­2, 4.­3 scrap computers self-­assembly and silicon chips, limitations of, 1.­1, 1.­2, 4.­1 telekinesis with 3-­D technology universal translators virtual reality wall screens See also Mind reading; Robotics/­AI Condorcet, Marquis de Conscious robots, 2.­1, 2.­2 Constellation Program COROT satellite, 6.­1, 8.­1 Crick, Francis Criminology Crutzen, Paul Culture in Type I civilization Customization of products Cybertourism, itr.­1, itr.­2 CYC project Damasio, Antonio Dating in 2100, 9.­1, 9.­2, 9.­3, 9.­4 Davies, Stephen Da Vinci robotic system Dawkins, Richard, 3.­1, 3.­2, 3.­3 Dawn computer Dean, Thomas Decoherence problem Deep Blue computer, 2.­1, 2.­2, 2.­3 Delayed gratification DEMO fusion reactor Depression treatments Designer children, 3.­1, 3.­2, 3.­3 Developing nations, 7.­1, 7.­2 Diamandis, Peter Dictatorships Digital divide Dinosaur resurrection Disease, elimination of, 3.­1, 8.­1 DNA chips DNA computers Dog breeds Donoghue, John, 1.­1, 1.­2 Dreams, photographing of Drexler, Eric Driverless cars Duell, Charles H.­

pages: 560 words: 158,238

Fifty Degrees Below
by Kim Stanley Robinson
Published 25 Oct 2005

And then Francesca Taolini, who had arranged for Yann’s hire by a company she consulted for, in the same way Frank had hoped to. Did she suspect that Frank had been after Yann? Did she know how powerful Yann’s algorithm might be? He googled her. Turned out, among many interesting things, that she was helping to chair a conference at MIT coming soon, on bioinformatics and the environment. Just the kind of event Frank might attend. NSF even had a group going already, he saw, to talk about the new federal institutes. Meet with her first, then go to Atlanta to meet with Yann—would that make his stock in the virtual market rise, triggering more intense surveillance?

So at work Anna spent her time trying to concentrate, over a persistent underlying turmoil of worry about her younger son. Work was absorbing, as always, and there was more to do than there was time to do it in, as always. And so it provided its partial refuge. But it was harder to dive in, harder to stay under the surface in the deep sea of bioinformatics. Even the content of the work reminded her, on some subliminal level, that health was a state of dynamic balance almost inconceivably complex, a matter of juggling a thousand balls while unicycling on a tightrope over the abyss—in a gale—at night—such that any life was an astonishing miracle, brief and tenuous.

Take a problem, break it down into parts (analyze), quantify whatever parts you could, see if what you learned suggested anything about causes and effects; then see if this suggested anything about long-term plans, and tangible things to do. She did not believe in revolution of any kind, and only trusted the mass application of the scientific method to get any real-world results. “One step at a time,” she would say to her team in bioinformatics, or Nick’s math group at school, or the National Science Board; and she hoped that as long as chaos did not erupt worldwide, one step at a time would eventually get them to some tolerable state. Of course there were all the hysterical operatics of “history” to distract people from this method and its incremental successes.

pages: 623 words: 448,848

Food Allergy: Adverse Reactions to Foods and Food Additives
by Dean D. Metcalfe
Published 15 Dec 2008

In silico methods for evaluating human allergenicity to novel proteins. Bioinformatics Workshop Meeting Report, February 23–24, 2005. Toxicol Sci 2005;88:307–10. 74 Ladics GS, Bannon GA, Silvanovich A, Cressman, RF. Comparison of conventional FASTA identity searches with the 80 amino acid sliding window FASTA search for the elucidation of potential identities to known allergens. Mol Nutr Food Res 2007;51:985–998. 75 Bannon G, Ogawa T. Evaluation of available IgE-binding epitope data and its utility in bioinformatics. Mol Nutr Food Res 2006;50:638–44. 76 Hileman RE, Silvanovich A, Goodman RE, et al. Bioinformatic methods for allergenicity assessment using a comprehensive allergen database.

Food allergen protein families Based on their shared amino acid sequences and conserved three-dimensional structures, proteins can be classified into families using various bioinformatics tools which form the basis of several protein family databases, one of which is Pfam [8]. Over the past 10 years or so there has been an explosion in the numbers of well characterized allergens, which have been sequenced and are being collected into a number of databases to facilitate bioinformatic analysis [9]. We have undertaken this analysis for both plant [1] and animal food allergens [10] along with pollen allergens [2]. They show similar distributions with the majority of allergens in each group falling into just 3–12 families with a tail 43 44 Chapter 4 of between 14 and 23 families comprising between 1 and 3 allergens each.

However, Aalberse [72] has noted that proteins sharing less than 50% identity across the full length of the protein sequence are unlikely to be cross-reactive, and immunological cross-reactivity may not occur unless the proteins share at least 70% identity. Recent published work has led to the harmonization of the methods used for bioinformatic searches and a better understanding of the data generated [73,74] from such studies. An additional bioinformatics approach can be taken by searching for 100% identity matches along short sequences contained in the query sequence as they are compared to sequences in a database. These regions of short amino acid sequence homologies are intended to represent the smallest sequence that could function as an IgE-binding epitope [75].

pages: 218 words: 62,621

A Short History of Humanity: How Migration Made Us Who We Are
by Johannes Krause and Thomas Trappe
Published 8 Apr 2021

(The Pentagon wanted to be better prepared for biological warfare.) More than a hundred contenders entered the competition, but only three made it to the finals. The winning three-person team was announced in autumn 2013, and it included one of my colleagues at the University of Tübingen, Daniel Huson, a specialist in bioinformatics. Huson later worked with our institute to develop a related algorithm that could match a billion DNA sequences to their organism of origin within twenty-four hours. The program indicates how much of a skeleton’s DNA is human and how much derives from microbes, bacteria, or viruses—and, crucially, which ones—and it is 200 times faster than older algorithms; instead of waiting nearly a year for results, you wait one day.

Heartfelt thanks go to Mark Achtmann, Kurt Alt, Natasha Arora, Hervé Bocherens, Jane Buikstra, Alexandra Buzhilova, David Caramelli, Stewart Cole, Nicholas Conard, Isabelle Crevecoeur, Dominique Delsate, Dorothée Drucker, Mateja Hajdinjak, Fredrik Halgren, Svend Hansen, Michaela Harbeck, Katerina Harvati, Jean-Jacques Hublin, Daniel Huson, Corina Knipper, Kristian Kristiansen, Carles Lalueza-Fox, Iosif Lazaridis, Mark Lipson, Sandra Lösch, Frank Maixner, Iain Mathieson, Michael McCormick, Kay Nieselt, Inigo Olalde, Ludovic Orlando, Ernst Pernicka, Sabine Reinhold, Roberto Risch, Hélèn Rougier, Patrick Semal, Pontus Skoglund, Viviane Slon, Anne Stone, Jiri Svoboda, Frédérique Valentin, Joachim Wahl, Albert Zink, and many other colleagues from the fields of archaeology, anthropology, bioinformatics, genetics, and medicine. Without them, we would never have been able to reconstruct all these stories from Europe’s past. Johannes Krause would also like to thank his colleagues and staff at the University of Tübingen, the Max Planck Institute for the Science of Human History in Jena, and the Max Planck Institute for Evolutionary Anthropology in Leipzig.

pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom
by Yochai Benkler
Published 14 May 2006

As more of the process of drug discovery of potential leads can be done by modeling and computational analysis, more can be organized for peer production. The relevant model here is open bioinformatics. Bioinformatics generally is the practice of pursuing solutions to biological questions using mathematics and information technology. Open bioinformatics is a movement within bioinformatics aimed at developing the tools in an open-source model, and in providing access to the tools and the outputs on a free and open basis. Projects like these include the Ensmbl Genome Browser, operated by the European Bioinformatics Institute and the Sanger Centre, or the National Center for Biotechnology Information (NCBI), both of which use computer databases to provide access to data and to run various searches on combinations, patterns, and so forth, in the data.

pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again
by Eric Topol
Published 1 Jan 2019

In many leading medical schools throughout the country, there’s an “arms race” for Adam 1s and academic achievement, as Jonathan Stock at Yale University School of Medicine aptly points out.61 We need to be nurturing the Adam 2s, which is something that is all too often an area of neglect in medical education. There are many other critical elements that need to be part of the medical school curriculum. Future doctors need a far better understanding of data science, including bioinformatics, biocomputing, probabilistic thinking, and the guts of deep learning neural networks. Much of their efforts in patient care will be supported by algorithms, and they need to understand all the liabilities, to recognize bias, errors, false output, and dissociation from common sense. Likewise, the importance of putting the patient’s values and preferences first in any human-machine collaboration cannot be emphasized enough.

JAMA, 2017. 318(22): pp. 2199–2210. 52. Golden, J. A., “Deep Learning Algorithms for Detection of Lymph Node Metastases from Breast Cancer: Helping Artificial Intelligence Be Seen.” JAMA, 2017. 318(22): pp. 2184–2186. 53. Yang, S. J., et al., “Assessing Microscope Image Focus Quality with Deep Learning.” BMC Bioinformatics, 2018. 19(1): p. 77. 54. Wang et al., Deep Learning for Identifying Metastatic Breast Cancer. 55. Wong, D., and S. Yip, “Machine Learning Classifies Cancer.” Nature, 2018. 555(7697): pp. 446–447; Capper, D., et al., “DNA Methylation-Based Classification of Central Nervous System Tumours.” Nature, 2018. 555(7697): pp. 469–474. 56.

Nitta, N., et al., “Intelligent Image-Activated Cell Sorting.” Cell, 2018. 175(1): pp. 266–276 e13. 72. Weigert, M., et al., Content-Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy, bioRxiv. 2017; Yang, S. J., et al., “Assessing Microscope Image Focus Quality with Deep Learning.” BMC Bioinformatics, 2018. 19(1): p. 77. 73. Ouyang, W., et al., “Deep Learning Massively Accelerates Super-Resolution Localization Microscopy.” Nat Biotechnol, 2018. 36(5): pp. 460–468. 74. Stumpe, M., “An Augmented Reality Microscope for Realtime Automated Detection of Cancer,” Google AI Blog. 2018. 75. Wise, J., “These Robots Are Learning to Conduct Their Own Science Experiments,” Bloomberg. 2018. 76.

The Data Journalism Handbook
by Jonathan Gray , Lucy Chambers and Liliana Bounegru
Published 9 May 2012

Election financing (Helsingin Sanomat) 2. Brainstorm for ideas The participants of HS Open 2 came up with twenty different prototypes about what to do with the data. You can find all the prototypes on our website (text in Finnish). A bioinformatics researcher called Janne Peltola noted that campaign funding data looked like the gene data they research, in terms of containing many interdependencies. In bioinformatics there is an open source tool called Cytoscape that is used to map these interdependencies. So we ran the data through Cytoscape, and got a very interesting prototype. 3. Implement the idea on paper and on the Web The law on campaign funding states that elected members of parliament must declare their funding two months after the elections.

Exploring Everyday Things with R and Ruby
by Sau Sheong Chang
Published 27 Jun 2012

CRAN is hosted by the R Foundation (the same organization that is developing R) and contains 3,646 packages as of this writing. CRAN is also mirrored in many sites worldwide. Another public repository is Bioconductor (http://www.bioconductor.org), an open source project that provides tools for bioinformatics and is primarily R-based. While the packages in Bioconductor are focused on bioinformatics, it doesn’t mean that they can’t be used for other domains. As of this writing, there are 516 packages in Bioconductor. Finally, there is R-Forge (http://r-forge.r-project.org), a collaborative software development application for R. It is based on FusionForge, a fork from GForge (on which RubyForge was based), which in turn was forked from the original software that was used to build SourceForge.

pages: 295 words: 66,912

Walled Culture: How Big Content Uses Technology and the Law to Lock Down Culture and Keep Creators Poor
by Glyn Moody
Published 26 Sep 2022

In 1997, Wired magazine published his in-depth feature about Linux and Linus Torvalds, the first mainstream article to describe the then-new world of free software. Moody’s full-length book on the topic, Rebel Code: Linux and the Open Source Revolution, appeared in 2001. His book Digital Code of Life: How Bioinformatics is Revolutionizing Science, Medicine, and Business, about the new field of bioinformatics, was published in 2004. In addition, Moody has written nearly 2,000 posts for Techdirt, and over 400 articles for Ars Technica. More recently, his writing has focussed on digital rights and privacy. Numerous posts about copyright, another area of particular interest, have appeared on the Copybuzz and Walled Culture blogs.

Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data
by Leslie Sikos
Published 10 Jul 2015

TopQuadrant (2015) TopBraid Composer Maestro Edition. www.topquadrant.com/tools/ide-topbraid-composer-maestro-edition/. Accessed 31 March 2015. 13. The Apache Software Foundation (2015) Apache Stanbol. http://stanbol.apache.org. Accessed 31 March 2015. 14. Fluent Editor. www.cognitum.eu/semantics/FluentEditor/. Accessed 15 April 2015. 15. The European Bioinformatics Institute (2015) ZOOMA. www.ebi.ac.uk/fgpt/zooma/. Accessed 31 March 2015. 16. Harispe, S. (2014) Semantic Measures Library & ToolKit. www.semantic-measures-library.org. Accessed 29 March 2015. 17. Motik, B., Shearer, R., Glimm, B., Stoilos, G., Horrocks, I. (2013) HermiT OWL Reasoner. http://hermit-reasoner.com.

The Toolkit features an AML text editor and a visual editor, an AML validator, and provides mapping and testing view for AML. Semantic Automated Discovery and Integration (SADI) Semantic Automated Discovery and Integration (SADI) is a lightweight set of Semantic Web Service design patterns (https://code.google.com/p/sadi/). It was primarily designed for scientific service publication and is especially useful in bioinformatics. Powered by web standards, SADI implements Semantic Web technologies to consume and produce RDF instances of OWL-DL classes, where input and output class URIs resolve to an OWL document through HTTP GET. SADI supports RDF/XML and Notation3 serializations. The SADI design patterns provide automatic discovery of appropriate services, based on user needs, and can automatically chain these services into complex analytical workflows.

pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It)
by Salim Ismail and Yuri van Geest
Published 17 Oct 2014

Third, once that doubling pattern starts, it doesn’t stop. We use current computers to design faster computers, which then build faster computers, and so on. Finally, several key technologies today are now information-enabled and following the same trajectory. Those technologies include artificial intelligence (AI), robotics, biotech and bioinformatics, medicine, neuroscience, data science, 3D printing, nanotechnology and even aspects of energy. Never in human history have we seen so many technologies moving at such a pace. And now that we are information-enabling everything around us, the effects of the Kurzweil’s Law of Accelerating Returns are sure to be profound.

What was particularly interesting was the fact that none of the winners had prior experience with natural language processing (NLP). Nonetheless, they beat the experts, many of them with decades of experience in NLP under their belts. This can’t help but impact the current status quo. Raymond McCauley, Biotechnology & Bioinformatics Chair at Singularity University, has noticed that “When people want a biotech job in Silicon Valley, they hide their PhDs to avoid being seen as a narrow specialist.” So, if experts are suspect, where should we turn instead? As we’ve already noted, everything is measurable. And the newest profession making those measurements is the data scientist.

pages: 608 words: 150,324

Life's Greatest Secret: The Race to Crack the Genetic Code
by Matthew Cobb
Published 6 Jul 2015

Often the only basis for identifying the function of a gene is because its DNA sequence is similar to a gene in a different organism where a function has been demonstrated. This has led to a new discipline called genomics, which involves obtaining genomes and understanding their nature and evolution. It includes a new set of techniques, collectively called bioinformatics, which combine computing and population genetics to make inferences about the patterns of evolution and enable us to determine which genes have a common origin or function. Training biologists in the techniques of computer science will be an important part of twenty-first-century scientific education.

A. et al., ‘Exonic transcription factor binding directs codon choice and affects protein evolution’, Science, vol. 342, 2013, pp. 1367–72. Stern, K. G., ‘Nucleoproteins and gene structure’, Yale Journal of Biology and Medicine, vol. 19, 1947, pp. 937–49. Stevens, H., Life Out of Sequence: A Data-Driven History of Bioinformatics, London, University of Chicago Press, 2013. Strasser, B. J., ‘A world in one dimension: Linus Pauling, Francis Crick and the Central Dogma of molecular biology’, History and Philosophy of the Life Sciences, vol. 28, 2006, pp. 491–512. Stretton, A. O. W., ‘The first sequence: Fred Sanger and insulin’, Genetics, vol. 162, 2002, pp. 527–32.

awards 38, 50 Francis Crick on 132, 136, 216 health 38 on nucleic acids as the transforming principle 43–53 reactions to his ideas 55–9, 62–4, 68–70 transformation in pneumococci 34–41 Avery, Roy (brother of Oswald) 44–5, 59, 63 B Bacillus thuringiensis 270 bacteria based on ‘synthetic’ DNA 267 capsule formation and virulence 36–7 DNA sequences online 235 enzymatic adaptation 152 generality of transformation in 59 negative feedback in biosynthesis by 153–5 sexual reproduction 51 transformation in E. coli 51–2, 56, 61, 63 transformation in pneumococci 36–9, 63 bacteriophages see phages Bakewell, Robert 1–2 Baltimore, David 251–2 Bar-Hillel, Yehoshuua 144 Barnett, Leslie 193 base pairing complementary base pairing 106, 109 frequency in different genomes 295 κ and Π base pairs 278 spontaneous 102 unnatural base pairs 277–8, 285 Z and P base pairs 278 base sequence as the genetic code 111 relation to amino acid sequence 117, 124–6, 133 variability 54, 62, 70 bases, DNA hydrogen bonding between 58, 92, 101, 106 ratio of pyrimidines to purines 42, 91, 102, 106, 109 sequence variation and specificity 57–8 bases, nucleic acid defined 316 investigations of DNA and RNA 198 orientation 42 proportions within and between species 62, 90 tetranucleotide hypothesis 7, 42, 51, 54, 62, 90 see also purines; pyrimidines Bateson, Gregory 22 Baulcombe, David 259 Beadle, George at Chemical Basis of Heredity symposium 132 comments on Benzer’s work 162 Nobel Prize 215 one-gene-one-enzyme hypothesis 9–11, 204, 243–4 at the Washington Physics conference 33 behaviour, genetic effects 304–5 Beighton, Elwyn 102 Beljanski, Mirko 189–90 Bell, Florence 91, 93, 104 Benner, Steven 277–8 Benzer, Seymour 161–3, 165, 187n, 203, 215, 302 Berg, Paul 279, 281, 285 Bergmann, Max 46 β-galactosidase 152–3, 156, 158, 160, 165 ‘Big Science’ 311–12 Bigelow, Julian 22–4, 27 Biochemical and Biophysical Research Communications 180 bioinformatics 238 The Biological Replication of Macromolecules symposium 130 ‘Biological units endowed with genetic continuity’ meeting 53, 59–60 biosecurity 280–1, 285 biotechnology DNA fingerprinting as 231 fermentation as 268 genetically modified organisms 269–71, 284 regulation of 284–5 synthetic biology 277 Birney, Ewan 242, 247, 271 bits (binary digits) 27, 78 Blair, Tony 233 ‘blender experiments’ 68 Boivin, André on DNA leading to RNA 71, 140, 214 Mirsky and 56–7, 59 transformation in E. coli 51–2, 56 on varying DNA quantities 60–1 Botstein, David 231 Boveri, Theodor 3 Brachet, Jean 58, 71–2, 116 Bragg, Sir Laurence 94–5, 100, 105, 108 BRCA1 gene 234 Brenner, Sydney adaptor hypothesis 121, 135, 209 on cell-free systems 182 on the coding problem 172 coinage of ‘codon’ 203 collaboration with Crick 121, 125, 165–6, 189, 192–3 developmental biology interest 216 disproves overlapping code idea 123–4, 200 messenger RNA idea 165–7, 172, 178, 182, 190 Nobel Prize 215 nonsense codons 213 on using polynucleotides 189 work with viruses 174, 192, 200, 213 Bridges, Calvin 4 Brillouin, Léon 76, 202 Britten, Roy 243 Brookhaven Laboratory 174 BSE (bovine spongiform encephalopathy) 253–4 Burnet, Macfarlane Enzyme, Antigen and Virus: A Study of Macromolecular Pattern in Action 134–5, 139, 141, 146–7 on information flows 139–41, 146–7 meeting with Avery 34–5 on non-coding DNA 141, 222 Bush, Vannevar 20–1, 26 C ‘C-value paradox’ 246 caddis-fly 175 Caenorhabditis elegans 231–2, 258, 277 Cairns, John 218 Caldwell, P.

pages: 584 words: 149,387

Essential Scrum: A Practical Guide to the Most Popular Agile Process
by Kenneth S. Rubin
Published 19 Jul 2012

In 1988 he was fortunate to join ParcPlace Systems, a start-up company formed as a Xerox PARC spin-off, whose charter was to bring object-oriented technology out of the research labs and release it to the world. As a Smalltalk development consultant with many different organizations in the late 1980s and throughout the 1990s, Kenny was an early adopter of agile practices. His first use of Scrum was in 2000 for developing bioinformatics software. In the course of his career, Kenny has held many roles, including successful stints as a Scrum product owner, ScrumMaster, and member of development teams. In addition, he has held numerous executive management roles: CEO, COO, VP of Engineering, VP of Product Management, and VP of Professional Services.

His multifaceted background gives Kenny the ability to understand (and explain) Scrum and its implications equally well from multiple perspectives: from the development team to the executive board. Chapter 1. Introduction On June 21, 2000, I was employed as Executive Vice President at Genomica, a bioinformatics company in Boulder, Colorado. I remember the date because my son Asher was born at one o’clock that morning. His birth was a good start to the day. Asher was actually born on his predicted due date (in the United States this happens about 5% of the time). So we (really my wife, Jenine) had finished our nine-month “project” on schedule.

This need for rapid exploration and feedback did not mesh well with the detailed, up-front planning we had been doing. We also wanted to avoid big up-front architecture design. A previous attempt to create a next generation of Genomica’s core product had seen the organization spend almost one year doing architecture-only work to create a grand, unified bioinformatics platform. When the first real scientist-facing application was put on top of that architecture, and we finally validated design decisions made many months earlier, it took 42 seconds to tab from one field on the screen to the next field. If you think a typical user is impatient, imagine a molecular biologist with a Ph.D. having to wait 42 seconds!

pages: 323 words: 92,135

Running Money
by Andy Kessler
Published 4 Jun 2007

In an era of relatively stable currencies, the modern-day investor has to dig, early and often and everywhere. I’d still rather dig than get whacked by a runaway yen-carry trade. Another cycle is coming. The drivers of it are still unclear. 296 Running Money Likely suspects are things like wireless data, on-command computing, nanotechnology, bioinformatics, genomic sorting—who the hell knows what it will be. But this is what I do. Looking for the next barrier, the next piece of technology, the next waterfall and the next great, longterm investment. Sounds quaint. I’ve come a long way from tripping across Homa Simpson dolls trying to raise money in Hong Kong.

See AOL Andreessen, Marc, 197, 199 animation, 134–35 AOL (America Online), 69–73, 207, 208, 223, 290 Cisco routers and, 199 Inktomic cache software and, 143 Netscape Navigator purchase, 201, 225 Telesave deal, 72–73 TimeWarner deal, 223, 229 as top market cap company, 111 Apache Web server, 247 Apple Computer, 45, 127, 128 Apple II, 183 Applied Materials, 245 Archimedes (propeller ship), 94 Arkwright, Richard, 65 ARPANET, 186, 187, 189, 191 Arthur Andersen, 290 Artists and Repertoire (A&R), 212, 216 Asian debt crisis, 3, 150, 151, 229, 260 yen and, 162–65, 168, 292 @ (at sign), 187 AT&T, 61, 185–86, 189 August Capital, 2, 4 auto industry, 267–68 Aziz, Tariq, 26 Babbage, Charles, 93 Baker, James, 26 Balkanski, Alex, 44, 249 bandwidth, 60, 111, 121, 140, 180, 188–89 Baran, Paul, 184, 185 Barbados, 251, 254 300 Index Barksdale, Jim, 198, 199–201 Barksdale Group, 201 BASE, 249 BASIC computer language, 126, 127 BBN. See Bolt, Baranek and Newman Bechtolsheim, Andy, 191 Bedard, Kipp, 19–20 Bell, Dave, 127 Bell Labs, 103, 110 Berry, Hank, 205–6 Beyond.com, 208 Bezos, Jeff, 228 Biggs, Barton, 163 big-time trends. See waterfalls bioinformatics, 296 biotech industry, 237 Black, Joseph, 54 Blutcher (steam locomotive), 92 Boggs, David, 189, 190 Bolt, Baranek and Newman, 184, 187 bonds, 11, 30–31, 164 Bonsal, Frank, 144–49 Borislow, Daniel, 72–73 Bosack, Len, 191 Boulton, Matthew, 55–58, 65, 66, 89 Boulton & Watt Company, 56–58, 64, 65, 89, 246, 247, 272 Bowman, Larry, 291–92 Bowman Capital, 291 Brady bonds, 164 Britain, 42, 50–59, 258 industrial economy, 42, 64–68, 91–95, 272 patent law, 55 textile manufacture, 64–68 wealth creation, 257, 271–72 broadband, 164, 225 browsers, 196–201 Brunel, I.

Algorithms Unlocked
by Thomas H. Cormen
Published 15 Jan 2013

The size of a clique is the number of vertices it contains. As you might imagine, cliques play a role in social network theory. Modeling each individual as a vertex and relationships between individuals as undirected edges, a clique represents a group of individuals all of whom have relationships with each other. Cliques also have applications in bioinformatics, engineering, and chemistry. The clique problem takes two inputs, a graph G and a positive integer k, and asks whether G has a clique of size k. For example, the graph on the next page has a clique of size 4, shown with heavily shaded vertices, and no other clique of size 4 or greater. 192 Chapter 10: Hard?

The size of a vertex cover is the number of vertices it contains. As in the clique problem, the vertex-cover problem takes as input an undirected graph G and a positive integer m. It asks whether G has a vertex cover of size m. Like the clique problem, the vertex-cover problem has applications in bioinformatics. In another application, you have a building with hallways and cameras that can scan up to 360 degrees located at the intersections of hallways, and you want to know whether m cameras will allow you to see all the hallways. Here, edges model hallways and vertices model intersections. In yet another application, finding vertex covers helps in designing strategies to foil worm attacks on computer networks.

pages: 315 words: 89,861

The Simulation Hypothesis
by Rizwan Virk
Published 31 Mar 2019

Within computer science, video games and entertainment have played a unique role in driving the development of both hardware and software. Examples include the development of GPUs (graphics processing units) for optimized rendering, CGI (computer-generated effects), and CAD (computer-aided design), as well as artificial intelligence and bioinformatics. The most recent incarnation of fully immersive entertainment technology is virtual reality (VR). Despite wondering about the simulation hypothesis for many years, it wasn’t until VR and AI reached their current level of sophistication that I could see a clear path to how we might develop all-encompassing simulations like the one depicted in The Matrix, which led me to write this book.

Within computer science and AI, biological processes have shown that they can be utilized to get much smarter and more unique results—most of today’s machine learning is based on the conditioning of neural networks, which are based on biological algorithms. While there is still some way to go, the burgeoning field of bioinformatics and modeling of biological processes has made information and computation an integral part of the organic world! Most importantly, the physical world, which was thought of in classical physics as a set of physical objects moving in continuous paths around the heavens, has been updated. As quantum physics reveals that there is no such thing as a physical object, that most objects consist of empty space and electrons, we start to get into metaphysical questions about what is real in the world.

pages: 326 words: 88,968

The Science and Technology of Growing Young: An Insider's Guide to the Breakthroughs That Will Dramatically Extend Our Lifespan . . . And What You Can Do Right Now
by Sergey Young
Published 23 Aug 2021

Among the field of impatient researchers working on this problem was a German geneticist and biostatistician, UCLA professor Dr. Steve Horvath. From the time he was a teenager, Horvath dreamed of extending the human healthspan. For decades, however, his academic and professional interests took him down different paths—through mathematics and bioinformatics. By the time he came back to aging, Horvath had developed a different perspective than other biologists and researchers—he had become accustomed to looking at algorithms more than organisms. So Horvath aimed to combine the two perspectives and set about finding what data patterns could be associated with aging.

And yet, four months after she received her first dose of Opdivo, McKeown’s cancer was in full remission. Moores is part of a federally funded clinical study in the United States called I-PREDICT. The cancer patients in this trial all previously underwent conventional treatments in vain. The I-PREDICT team of radiologists, oncologists, geneticists, pharmacologists, and bioinformatics experts pool knowledge from each of their fields to arrive at drug combinations that are precisely tuned to the individual patient’s genetic presentation. Of seventy-three patients who were treated using this pharmacogenetic approach, those who received more precise treatments matched to their genomic alterations fared twice as well as those who did not.2 This is precision medicine (PM), sometimes called personalized medicine or predictive medicine, and it is about to completely transform every single aspect of health care.

pages: 599 words: 98,564

The Mutant Project: Inside the Global Race to Genetically Modify Humans
by Eben Kirksey
Published 10 Nov 2020

As the DNA sequencing data from the twins trickled back into the laboratory, it fell primarily on the shoulders of one person to make sense of the code. A star undergraduate student in Dr. He’s bioinformatics course, whom I will call Goran, was hired into the laboratory straight after graduation. The young computer whiz spent long hours hunched over his keyboard in a tiny office on the SUSTech campus he shared with a junior bioinformatics technician and the contact person for patients in the study. His lab mates thought of Goran as wise beyond his years. When Ryan Ferrell joined the team, the young technician slid his computer over so that the pair could share a desk.

pages: 648 words: 108,814

Solr 1.4 Enterprise Search Server
by David Smiley and Eric Pugh
Published 15 Nov 2009

WebMynd is one of the largest installations of Solr, indexing up to two million HTML documents per day, and making heavy use of Solr's multicore features to enable a partially active index. Jerome Eteve holds a BSC in physics, maths and computing and an MSC in IT and bioinformatics from the University of Lille (France). After starting his career in the field of bioinformatics, where he worked as a biological data management and analysis consultant, he's now a senior web developer with interests ranging from database level issues to user experience online. He's passionate about open source technologies, search engines, and web application architecture.

The Deep Learning Revolution (The MIT Press)
by Terrence J. Sejnowski
Published 27 Sep 2018

The training set was 3D structures determined by x-ray crystallography. To our surprise, the secondary structure predictions for new proteins were far better than the best methods based on biophysics.10 This landmark study was the first application of machine learning to molecular sequences, a field that is now called bioinformatics. Backpropagating Errors 117 Another network that learned how to form the past tense of English verbs became a cause célèbre in the world of cognitive psychology as the rule-based old guard battled it out with the avant-garde PDP Group.11 The regular way to form the past tense of an English verb is to add the suffix “ed,” as in forming “trained” from “train.”

Margarita, 299n27 Belief networks, 52 Bell, Anthony J., 82 infomax ICA algorithm, 81, 83, 83f, 84, 86 neural nets and, 82, 90, 296n15 photograph, 83f on water structure, 296n15 writings, 79, 85f, 295n2, 295n4, 295n6, 296n9, 306n24 Bellman, Richard, 145, 304n4 Bellman equation. See Dynamic programming, algorithm for Benasich, April A., 184, 308n22 Bengio, Yoshua, 135, 139f, 141, 141f, 302nn4–5, 303n20, 304n25, 304n28 Berg, Howard C., 319n12 Berger, Hans, 86 Berra, Yogi, x Berry, Halle, 235, 236f, 237 Bi, Guoqiang Q., 216f Big data, 10, 164, 229 Bioinformatics, 116 Biophysics, 116. See also under Johns Hopkins University “Biophysics of Computation, The” (course), 104 Birds consulting with each other, 29f Birdsong, 155–156, 157f Bishop, Christopher M., 279 Index Black boxes the case against, 253–255 neural network as a black box, 123 Blakeslee, Sandra, 316n14 Blandford, Roger, 312n1 Blind source separation problem, 81, 82f, 83f Blocks World, 27 Boahen, Kwabena A., 313n14 Boltzmann, Ludwig, 99 Boltzmann learning, unsupervised, 106 Boltzmann machine backpropagation of errors contrasted with, 112 Charles Rosenberg on, 112 criticisms of, 106 diagram, 98b at equilibrium, 99 Geoffrey Hinton and, 49, 79, 104, 105f, 106, 110, 112, 127 hidden units, 98b, 101, 102, 104, 106, 109 learning mirror symmetries, 102, 104 limitations, 107 multilayer, 49, 104, 105f, 106, 109 for handwritten digit recognition and generation, 104, 105f, 106 overview, 97, 98b, 99, 101, 135 perceptron contrasted with, 99, 101, 102, 106, 109 restricted, 106 separating figure from ground with, 97, 100f supervised and unsupervised versions, 106 Boltzmann machine learning algorithm, 99, 101, 109, 133, 158 goal of, 99 history in neuroscience, 101 “wake” and “sleep” phases, 98b, 101–102 323 Boole, George, 54, 55f Boolean logic, 54 Border-ownerships cells, 99 Botvinick, Matthew, 317n15 Brain.

pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics)
by Trevor Hastie , Robert Tibshirani and Jerome Friedman
Published 25 Aug 2009

S TAT I S T I C S  ---- › springer.com The Elements of Statistical Learning During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics.

With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of “data mining”; statistical and computational problems in biology and medicine have created “bioinformatics.” Vast amounts of data are being generated in many fields, and the statistician’s job is to make sense of it all: to extract important patterns and trends, and understand “what the data says.” We call this learning from data. The challenges in learning from data have led to a revolution in the statistical sciences.

Estimation of sparse Markov networks using modified logistic regression and the lasso, submitted. Hoerl, A. E. and Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12: 55–67. Hothorn, T. and Bühlmann, P. (2006). Model-based boosting in high dimensions, Bioinformatics 22(22): 2828–2829. Huber, P. (1964). Robust estimation of a location parameter, Annals of Mathematical Statistics 53: 73–101. Huber, P. (1985). Projection pursuit, Annals of Statistics 13: 435–475. Hunter, D. and Lange, K. (2004). A tutorial on MM algorithms, The American Statistician 58(1): 30–37.

pages: 137 words: 36,231

Information: A Very Short Introduction
by Luciano Floridi
Published 25 Feb 2010

Consider the following examples: medical information is information about medical facts (attributive use), not information that has curative properties; digital information is not information about something digital, but information that is in itself of digital nature (predicative use); and military information can be both information about something military (attributive) and of military nature in itself (predicative). When talking about biological or genetic information, the attributive sense is common and uncontroversial. In bioinformatics, for example, a database may contain medical records and genealogical or genetic data about a whole population. Nobody disagrees about the existence of this kind of biological or genetic information. It is the predicative sense that is more contentious. Are biological or genetic processes or elements intrinsically informational in themselves?

pages: 444 words: 117,770

The Coming Wave: Technology, Power, and the Twenty-First Century's Greatest Dilemma
by Mustafa Suleyman
Published 4 Sep 2023

CAR T-cell therapies engineer bespoke immune response white blood cells to attack cancers; genetic editing looks set to cure hereditary heart conditions. Thanks to lifesaving treatments like vaccines, we are already accustomed to the idea of intervening in our biology to help us fight disease. The field of systems biology aims to understand the “larger picture” of a cell, tissue, or organism by using bioinformatics and computational biology to see how the organism works holistically; such efforts could be the foundation for a new era of personalized medicine. Before long the idea of being treated in a generic way will seem positively medieval; everything, from the kind of care we receive to the medicines we are offered, will be precisely tailored to our DNA and specific biomarkers.

More than a million researchers accessed the tool within eighteen months of launch, including virtually all the world’s leading biology labs, addressing questions from antibiotic resistance to the treatment of rare diseases to the origins of life itself. Previous experiments had delivered the structure of about 190,000 proteins to the European Bioinformatics Institute’s database, about 0.1 percent of known proteins in existence. DeepMind uploaded some 200 million structures in one go, representing almost all known proteins. Whereas once it might have taken researchers weeks or months to determine a protein’s shape and function, that process can now begin in a matter of seconds.

Scikit-Learn Cookbook
by Trent Hauck
Published 3 Nov 2014

We'll walk through the various univariate feature selection methods: >>> from sklearn import datasets >>> X, y = datasets.make_regression(1000, 10000) 184 www.it-ebooks.info Chapter 5 Now that we have the data, we will compare the features that are included with the various methods. This is actually a very common situation when you're dealing in text analysis or some areas of bioinformatics. How to do it... First, we need to import the feature_selection module: >>> from sklearn import feature_selection >>> f, p = feature_selection.f_regression(X, y) Here, f is the f score associated with each linear model fit with just one of the features. We can then compare these features and based on this comparison, we can cull features. p is also the p value associated with that f value.

pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See
by Gary Price , Chris Sherman and Danny Sullivan
Published 2 Jan 2003

FishBase is a relational database with fish information to cater to different professionals such as research scientists, fisheries managers, zoologists, and many more. FishBase on the Web contains practically all fish species known to science.” Search Form URL: http://www.fishbase.org/search.cfm GeneCards http://bioinformatics.weizmann.ac.il “GeneCards is a database of human genes, their products, and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others [gene listing].” Search Form URL: http://bioinformatics.weizmann.ac.il/cards/ Integrated Taxonomic Information System (Biological Names) http://www.itis.usda.gov/plantproj/itis/index.html “The Integrated Taxonomic Information System (ITIS) is a partnership of U.S., Canadian, and Mexican agencies, other organizations, and taxonomic specialists cooperating on the development of an online, scientifically credible, list of biological names focusing on the biota of North America.”

pages: 199 words: 47,154

Gnuplot Cookbook
by Lee Phillips
Published 15 Feb 2012

I am grateful to the users of my gnuplot web pages for their interest, questions, and suggestions over the years, and to my family for their patience and support. About the Reviewers Andreas Bernauer is a Software Engineer at Active Group in Germany. He graduated at Eberhard Karls Universität Tübingen, Germany, with a Degree in Bioinformatics and received a Master of Science degree in Genetics from the University of Connecticut, USA. In 2011, he earned a doctorate in Computer Engineering from Eberhard Karls Universität Tübingen. Andreas has more than 10 years of professional experience in software engineering. He implemented the server-side scripting engine in the scheme-based SUnet web server, hosted the Learning-Classifier-System workshops in Tübingen.

pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006
by Ben Goertzel and Pei Wang
Published 1 Jan 2007

On the contrary, we believe that most of the current AI research works make little direct contribution to AGI, though these works have value for many other reasons. Previously we have mentioned “machine learning” as an example. One of us (Goertzel) has published extensively about applications of machine learning algorithms to bioinformatics. This is a valid, and highly important sort of research – but it doesn’t have much to do with achieving general intelligence. There is no reason to believe that “intelligence” is simply a toolbox, containing mostly unconnected tools. Since the current AI “tools” have been built according to very different theoretical considerations, to implement them as modules in a big system will not necessarily make them work together, correctly and efficiently.

Unlike most contemporary AI projects, it is specifically oriented towards artificial general intelligence (AGI), rather than being restricted by design to one narrow domain or range of cognitive functions. The NAIE integrates aspects of prior AI projects and approaches, including symbolic, neural-network, evolutionary programming and reinforcement learning. The existing codebase is being applied in bioinformatics, NLP and other domains. To save space, some of the discussion in this paper will assume a basic familiarity with NAIE structures such as Atoms, Nodes, Links, ImplicationLinks and so forth, all of which are described in previous references and in other papers in this volume. 1.2. Cognitive Development in Simulated Androids Jean Piaget, in his classic studies of developmental psychology [8] conceived of child development as falling into four stages, each roughly identified with an age group: infantile, preoperational, concrete operational, and formal.

pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology
by Ray Kurzweil
Published 14 Jul 2005

Kurzweil Technologies is working with UT to develop pattern recognition-based analysis from either "Holter" monitoring (twenty-four-hour recordings) or "Event" monitoring (thirty days or more). 190. Kristen Philipkoski, "A Map That Maps Gene Functions," Wired News, May 28, 2002, http://www.wired.com/news/medtech/0,1286,52723,00.html. 191. Jennifer Ouellette, "Bioinformatics Moves into the Mainstream," The Industrial Physicist (October–November 2003), http://www.sciencemasters.com/bioinformatics.pdf. 192. Port, Arndt, and Carey, "Smart Tools." 193. "Protein Patterns in Blood May Predict Prostate Cancer Diagnosis," National Cancer Institute, October 15, 2002, http://www.nci.nih.gov/newscenter/ProstateProteomics, reporting on Emanuel F.

DARPA's Information Processing Technology Office's project in this vein is called LifeLog, http://www.darpa.mil/ipto/Programs/lifelog; see also Noah Shachtman, "A Spy Machine of DARPA's Dreams," Wired News, May 20, 2003, http://www.wired.com/news/business/0,1367,58909,00.html; Gordon Bell's project (for Microsoft) is MyLifeBits, http://research.microsoft.com/research/barc/MediaPresence/MyLifeBits.aspx; for the Long Now Foundation, see http://longnow.org. 44. Bergeron is assistant professor of anesthesiology at Harvard Medical School and the author of such books as Bioinformatics Computing, Biotech Industry: A Global, Economic, and Financing Overview, and The Wireless Web and Healthcare. 45. The Long Now Foundation is developing one possible solution: the Rosetta Disk, which will contain extensive archives of text in languages that may be lost in the far future. They plan to use a unique storage technology based on a two-inch nickel disk that can store up to 350,000 pages per disk, with an estimated life expectancy of 2,000 to 10,000 years.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

Exact inference algorithms for pedigree analysis, resembling variable elimination, were developed in the 1970s (Cannings et al., 1978). Bayesian networks have been used for identifying human genes by reference to mouse genes (Zhang et al., 2003), inferring cellular networks (Friedman, 2004), genetic linkage analysis to locate disease-related genes (Silberstein et al., 2013), and many other tasks in bioinformatics. We could go on, but instead we’ll refer you to Pourret et al. (2008), a 400-page guide to applications of Bayesian networks. Published applications over the last decade run into the tens of thousands, ranging from dentistry to global climate models. Judea Pearl (1985), in the first paper to use the term “Bayesian networks,” briefly described an inference algorithm for general networks based on the cutset conditioning idea introduced in Chapter 5.

In Kaggle data science competitions they were the most popular approach of winning teams from 2011 through 2014, and remain a common approach to this day (although deep learning and gradient boosting have become even more common among recent winners). The randomForest package in R has been a particular favorite. In finance, random forests have been used for credit card default prediction, household income prediction, and option pricing. Mechanical applications include machine fault diagnosis and remote sensing. Bioin-formatic and medical applications include diabetic retinopathy, microarray gene expression, mass spectrum protein expression analysis, biomarker discovery, and protein-protein interaction prediction. 19.8.3Stacking Whereas bagging combines multiple base models of the same model class trained on different data, the technique of stacked generalization (or stacking for short) combines multiple base models from different model classes trained on the same data.

The first algorithms for learning Bayes net structures used conditional independence tests (Pearl, 1988; Pearl and Verma, 1991). Spirtes et al. (1993) implemented a comprehensive approach in the TETRAD package for Bayes net learning. Algorithmic improvements since then led to a clear victory in the 2001 KDD Cup data mining competition for a Bayes net learning method (Cheng et al., 2002). (The specific task here was a bioinformatics problem with 139,351 features!) A structure-learning approach based on maximizing likelihood was developed by Cooper and Herskovits (1992) and improved by Heckerman et al. (1994). More recent algorithms have achieved quite respectable performance in the complete-data case (Moore and Wong, 2003; Teyssier and Koller, 2005).

pages: 271 words: 52,814

Blockchain: Blueprint for a New Economy
by Melanie Swan
Published 22 Jan 2014

“Primecoin: The Cryptocurrency Whose Mining Is Actually Useful.” Bitcoin Magazine, July 8, 2013. http://bitcoinmagazine.com/5635/primecoin-the-cryptocurrency-whose-mining-is-actually-useful/. 127 Myers, D.S., A.L. Bazinet, and M.P. Cummings. “Expanding the Reach of Grid Computing: Combining Globus-and BOINC-Based Systems.” Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, February 6, 2007 (Draft). http://lattice.umiacs.umd.edu/latticefiles/publications/lattice/myers_bazinet_cummings.pdf. 128 Clenfield, J. and P. Alpeyev. “The Other Bitcoin Power Struggle.” Bloomberg Businessweek, April 24, 2014. http://www.businessweek.com/articles/2014-04-24/bitcoin-miners-seek-cheap-electricity-to-eke-out-a-profit. 129 Gimein, M.

pages: 573 words: 157,767

From Bacteria to Bach and Back: The Evolution of Minds
by Daniel C. Dennett
Published 7 Feb 2017

Empirical work in both areas has made enough progress in recent decades to encourage further inquiry, taking on board the default (and tentative) assumption that the “trees” of existing lineages we can trace back eventually have single trunks. Phylogenetic diagrams, or cladograms, such as the Great Tree of Life (which appears as figure 9.1) showing all the species, or more limited trees of descent in particular lineages, are getting clearer and clearer as bio-informatics research on the accumulation of differences in DNA sequences plug the gaps and correct the mistakes of earlier anatomical and physiological sleuthing.45 Glossogenetic trees, lineages of languages (figure 9.2), are also popular thinking tools, laying out the relations of descent among language families (and individual words) over many centuries.

Texts of Homer’s Iliad and Odyssey, for instance, were known to descend by copying from texts descended from texts descended from texts going back to their oral ancestors in Homeric times. Philologists and paleographers had been reconstructing lineages of languages and manuscripts (e.g., the various extant copies of Plato’s Dialogues) since the Renaissance, and some of the latest bio-informatic techniques used today to determine relationships between genomes are themselves refined descendants of techniques developed to trace patterns of errors (mutations) in ancient texts. As Darwin noted, “The formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously the same” (1871, p. 59).

pages: 201 words: 63,192

Graph Databases
by Ian Robinson , Jim Webber and Emil Eifrem
Published 13 Jun 2013

The social network example helps illustrate how different technologies deal with con‐ nected data, but is it a valid use case? Do we really need to find such remote “friends?” But substitute social networks for any other domain, and you’ll see we experience similar performance, modeling and maintenance benefits. Whether music or data center man‐ agement, bio-informatics or football statistics, network sensors or time-series of trades, graphs provide powerful insight into our data. Let’s look, then, at another contemporary application of graphs: recommending products based on a user’s purchase history and the histories of their friends, neighbours, and other people like them.

The Rise of Yeast: How the Sugar Fungus Shaped Civilisation
by Nicholas P. Money
Published 22 Feb 2018

Carpology is the branch of botany concerned with the study of fruits and seeds. According to modern terminology, which specifies that fruits and seeds are produced by plants, we say that the Tulasne brothers described the fruit bodies and spores of fungi. 27. N. P. Money, Fungal Biology 117, 463–5 (2013). 28. H. Nilsson et al., Evolutionary Bioinformatics Online 4, 193–201 (2008); R. Blaalid et al., Molecular Ecology Resources 13, 218–24 (2013). 29. Kurtzman, Fell, and Boekhout (n. 13). 30. W. T. Starmer and M.-A. Lachance, Yeast Ecology, in C. P. Kurtzman, J. W. Fell, and T. Boekhout, The Yeasts: A Taxonomic Study, 5th edition (Amsterdam: Springer, 2011), 88–107. 31.

pages: 552 words: 168,518

MacroWikinomics: Rebooting Business and the World
by Don Tapscott and Anthony D. Williams
Published 28 Sep 2010

“We’re changing by orders of magnitude the sampling ability we have for the oceans,” says Benoît Pirenne, associate director of Neptune Canada.8 To cope with the flood of data, researchers using Neptune’s Oceans 2.0 platform can tag everything from images to data feeds to video streams from undersea cameras, identifying sightings of little-known organisms or examples of rare phenomena. Wikis provide a shared space for group learning, discussion, and collaboration, while a Facebook-like social networking application helps connect researchers working on similar problems. Meanwhile, over at the European Bioinformatics Institute, scientists are using Web services to revolutionize the way they extract and interpret data from different sources, and to create entirely new data services. Imagine, for example, you wanted to find out everything there is to know about a species, from its taxonomy and genetic sequence to its geographical distribution.

Now imagine you had the power to weave together all the latest data on that species from all of the world’s biological databases with just one click. It’s not far-fetched. That power is here, today. Projects like these have inspired researchers in many fields to emulate the changes that are already sweeping disciplines such as bioinformatics and high-energy physics. Having said that, there will be some difficult adjustments and issues such as privacy and national security to confront along the way. “We’re going from a data poor to a data rich world,” says Smarr. “And there’s a lag whenever an exponential change like this transforms the impossible into the routine.”

pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy
by Pistono, Federico
Published 14 Oct 2012

The more companies automate, because of the need to increase their productivity, the more jobs will be lost, forever. The future of work and innovation is not in the past that we know, but in unfamiliar territory of the future that is yet to come. New and exciting fields are emerging every day. Synthetic biology, neurocomputation, 3D printing, contour crafting, molecular engineering, bioinformatics, life extension, robotics, quantum computing, artificial intelligence, machine learning, these new frontiers that are rapidly evolving and are just the beginning of a new, amazing era of our species that will bring about the greatest transformation of all time. A transformation that will make the industrial revolution look like an event of minor importance.

pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else
by Steve Lohr
Published 10 Mar 2015

He had agreed to give a talk in Seattle at a conference hosted by Sage Bionetworks, a nonprofit organization dedicated to accelerate the sharing of data for biological research. Hammerbacher knew the two medical researchers who had founded the nonprofit, Stephen Friend and Eric Schadt. He had talked to them about how they might use big-data software to cope with the data explosion in bioinformatics and genomics. But the preparation for the speech forced him to really think about biology and technology, reading up and talking to people. The more Hammerbacher looked into it, the more intriguing the subject looked. Biological research, he says, could go the way of finance with its closed, proprietary systems and data being hoarded rather than shared.

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
by Thomas H. Davenport
Published 4 Feb 2014

The testing firm Kaplan uses its big data to begin advising customers on effective learning and test-preparation strategies. Novartis focuses on big data—the health-care industry calls it informatics—to develop new drugs. Its CEO, Joe Jimenez, commented in an interview, “If you think about the amounts of data that are now available, bioinformatics capability is becoming very important, as is the ability to mine that data and really understand, for example, the specific mutations that are leading to certain types of cancers.”7 These companies’ big data efforts are directly focused on products, services, and customers. This has important implications, of course, for the organizational locus of big data and the processes and pace of new product development.

pages: 245 words: 71,886

Spike: The Virus vs The People - The Inside Story
by Jeremy Farrar and Anjana Ahuja
Published 15 Jan 2021

The wheels of COG-UK started turning on 4 March 2020, when Sharon sent a one-line email to five close contacts: Cambridge colleague Julian Parkhill, a former head of Pathogen Genomics at the Wellcome Trust Sanger Institute (dedicated to genome sequencing); Judith Breuer, a virolo-gist at University College London; Nick Loman, an expert in microbial genomics and bioinformatics at Birmingham University; David Aanensen, a genomic surveillance specialist at the Big Data Institute at Oxford University; and Richard Myers, at Public Health England. I wonder if you could call me on my mobile this afternoon, 2pm onwards. Sharon By then, Sharon had joined SAGE and was being bombarded, as we all were, with hundreds of emails a day.

HBase: The Definitive Guide
by Lars George
Published 29 Aug 2011

If we were to take 140 bytes per message, as used by Twitter, it would total more than 17 TB every month. Even before the transition to HBase, the existing system had to handle more than 25 TB a month.[12] In addition, less web-oriented companies from across all major industries are collecting an ever-increasing amount of data. For example: Financial Such as data generated by stock tickers Bioinformatics Such as the Global Biodiversity Information Facility (http://www.gbif.org/) Smart grid Such as the OpenPDC (http://openpdc.codeplex.com/) project Sales Such as the data generated by point-of-sale (POS) or stock/inventory systems Genomics Such as the Crossbow (http://bowtie-bio.sourceforge.net/crossbow/index.shtml) project Cellular services, military, environmental Which all collect a tremendous amount of data as well Storing petabytes of data efficiently so that updates and retrieval are still performed well is no easy feat.

A abort() method, HBaseAdmin class, Basic Operations Abortable interface, Basic Operations Accept header, switching REST formats, Supported formats, JSON (application/json), Protocol Buffer (application/x-protobuf) access control, Introduction to Coprocessors, HBase Versus Bigtable Bigtable column families for, HBase Versus Bigtable coprocessors for, Introduction to Coprocessors ACID properties, The Problem with Relational Database Systems add() method, Bytes class, The Bytes Class add() method, Put class, Single Puts addColumn() method, Get class, Single Gets addColumn() method, HBaseAdmin class, Schema Operations addColumn() method, Increment class, Multiple Counters addColumn() method, Scan class, Introduction addFamily() method, Get class, Single Gets addFamily() method, HTableDescriptor class, Table Properties addFamily() method, Scan class, Introduction, Client API: Best Practices add_peer command, HBase Shell, Replication alter command, HBase Shell, Data definition Amazon, The Dawn of Big Data, S3, S3 data requirements of, The Dawn of Big Data S3 (Simple Storage Service), S3, S3 Apache Avro, Introduction to REST, Thrift, and Avro (see Avro) Apache binary release for HBase, Apache Binary Release, Apache Binary Release Apache HBase, Quick-Start Guide (see HBase) Apache Hive, Hive (see Hive) Apache Lucene, Search Integration, Search Integration Apache Maven, Building the Examples (see Maven) Apache Pig, Pig (see Pig) Apache Solr, Search Integration Apache Whirr, deployment using, Apache Whirr, Apache Whirr Apache ZooKeeper, Implementation (see ZooKeeper) API, Native Java (see client API) append feature, for durability, Durability append() method, HLog class, HLog Class architecture, storage, Storage (see storage architecture) assign command, HBase Shell, Tools assign() method, HBaseAdmin class, Cluster Operations AssignmentManager class, The Region Life Cycle AsyncHBase client, Other Clients atomic read-modify-write, Dimensions, Tables, Rows, Columns, and Cells, Storage API, General Notes, Atomic compare-and-set, Atomic compare-and-set, Atomic compare-and-delete, Atomic compare-and-delete, Row Locks, WALEdit Class compare-and-delete operations, Atomic compare-and-delete, Atomic compare-and-delete compare-and-set, for put operations, Atomic compare-and-set, Atomic compare-and-set per-row basis for, Tables, Rows, Columns, and Cells, Storage API, General Notes row locks for, Row Locks for WAL edits, WALEdit Class auto-sharding, Auto-Sharding, Auto-Sharding Avro, Introduction to REST, Thrift, and Avro, Introduction to REST, Thrift, and Avro, Avro, Avro, Operation, Installation, Operation, Operation, Operation, Operation, Advanced Schemas documentation for, Operation installing, Installation port used by, Operation schema compilers for, Avro schema used by, Advanced Schemas starting server for, Operation stopping, Operation B B+ trees, B+ Trees, B+ Trees backup masters, adding, Adding a local backup master, Adding a backup master, Adding a backup master balancer, Load Balancing, Load Balancing, Node Decommissioning balancer command, HBase Shell, Tools, Load Balancing balancer() method, HBaseAdmin class, Cluster Operations, Load Balancing balanceSwitch() method, HBaseAdmin class, Cluster Operations, Load Balancing balance_switch command, HBase Shell, Tools, Load Balancing, Node Decommissioning base64 command, XML (text/xml) Base64 encoding, with REST, XML (text/xml), JSON (application/json) BaseEndpointCoprocessor class, The BaseEndpointCoprocessor class, The BaseEndpointCoprocessor class BaseMasterObserver class, The BaseMasterObserver class, The BaseMasterObserver class BaseRegionObserver class, The BaseRegionObserver class, The BaseRegionObserver class Batch class, The CoprocessorProtocol interface, The BaseEndpointCoprocessor class batch clients, Batch Clients batch operations, Batch Operations, Batch Operations, Caching Versus Batching, Caching Versus Batching, Custom Filters for scans, Caching Versus Batching, Caching Versus Batching, Custom Filters on tables, Batch Operations, Batch Operations batch() method, HTable class, Batch Operations, Batch Operations, Introduction to Counters Bigtable storage architecture, Backdrop, Summary, Nomenclature, HBase Versus Bigtable, HBase Versus Bigtable “Bigtable: A Distributed Storage System for Structured Data” (paper, by Google), Preface, Backdrop bin directory, Apache Binary Release BinaryComparator class, Comparators BinaryPrefixComparator class, Comparators binarySearch() method, Bytes class, The Bytes Class bioinformatics, data requirements of, The Dawn of Big Data BitComparator class, Comparators block cache, Single Gets, Introduction, Column Families, Column Families, Bloom Filters, Region Server Metrics, Client API: Best Practices, Configuration Bloom filters affecting, Bloom Filters controlling use of, Single Gets, Introduction, Client API: Best Practices enabling and disabling, Column Families metrics for, Region Server Metrics settings for, Configuration block replication, MapReduce Locality, MapReduce Locality blocks, Column Families, HFile Format, HFile Format, HFile Format, HFile Format compressing, HFile Format size of, Column Families, HFile Format Bloom filters, Column Families, Bloom Filters, Bloom Filters bypass() method, ObserverContext class, The ObserverContext class Bytes class, Single Puts, Single Gets, The Bytes Class, The Bytes Class C caching, Caching Versus Batching, Caching Versus Batching, Caching Versus Batching, The HTable Utility Methods, Client API: Best Practices, HBase Configuration Properties (see also block cache; Memcached) regions, The HTable Utility Methods for scan operations, Caching Versus Batching, Caching Versus Batching, Client API: Best Practices, HBase Configuration Properties Cacti server, JMXToolkit on, JMX Remote API call() method, Batch class, The CoprocessorProtocol interface CAP (consistency, availability, and partition tolerance) theorem, Nonrelational Database Systems, Not-Only SQL or NoSQL?

pages: 284 words: 79,265

The Half-Life of Facts: Why Everything We Know Has an Expiration Date
by Samuel Arbesman
Published 31 Aug 2012

Nature Reviews Drug Discovery 5, no. 8 (August 2006): 689–702. 112 software designed to find undiscovered patterns: See TRIZ, a method of invention and discovery. For example, here: www.aitriz.org. 112 computerized systems devoted to drug repurposing: Sanseau, Philippe, and Jacob Koehler. “Editorial: Computational Methods for Drug Repurposing.” Briefings in Bioinformatics 12, no. 4 (July 1, 2011): 301–2. 112 can generate new and interesting: Darden, Lindley. “Recent Work in Computational Scientific Discovery.” In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society (1997) 161–66. 113 names a novel, computationally created: See TheoryMine: http://theorymine.co.uk. 116 A Cornell professor of earth and atmospheric sciences: Cisne, John L.

pages: 232 words: 72,483

Immortality, Inc.
by Chip Walter
Published 7 Jan 2020

he would ask. “No,” she would reply. “Why not?” “Because it was wicked hard to study and nobody is going to tackle it. They wouldn’t know where to begin.” Well, that was just too delicious a problem. So de Grey forsook his former job and took up gerontology while handling software development and bioinformatics at the Cambridge genetics lab where Adelaide and her students worked. Over the next several years he would harangue Adelaide for information, pore over textbooks and journals, pester biologists with every kind of question, and show up at conferences to interrogate anyone he could find. Despite becoming a gerontologist, de Grey didn’t care much for others in the field.

The Pattern Seekers: How Autism Drives Human Invention
by Simon Baron-Cohen
Published 14 Aug 2020

This paper is included in a thematic collection of articles, Philosophical Transactions of the Royal Society of London: Series B, ed. U. Frith and C. Hayes, 367(1599, 2012), 1471–2970. Hayes challenges the idea of abrupt cognitive change between humans and our ancestors, in favor of incremental changes. 22. See S. López et al. (2015), “Human dispersal out of Africa: A lasting debate,” Evolutionary Bioinformatics 11, 57–68; and N. Conard (2008), “A critical view of the evidence for a Southern African origin of behavioural modernity,” South African Archaeological Society Goodwin Series 10, 175–178. 23. Reports at the time dated the bone flute to be at least 35,000 years old, and Nicholas Conard wrote in an email to New York Times reporter John Noble Wilford that it was more like 40,000 years old.

pages: 741 words: 199,502

Human Diversity: The Biology of Gender, Race, and Class
by Charles Murray
Published 28 Jan 2020

For the visually similar figure in chapter 7, the unit of analysis was the individual and the cell entries were measures of genetic distance—Wright’s fixation index, FST. 9: The Landscape of Ancestral Population Differences 1. Responsibility for the GWAS Catalog was subsequently shared with the European Bioinformatics Institute (EBI). The GWAS Catalog is downloadable free of charge at its website, ebi.ac.uk/gwas. The level of statistical significance required for entry in the GWAS Catalog is p <1.0×10–5, which is more inclusive than the standard for statistical significance in the published literature (p <1.0×10–8).

LoParo, Devon, and Irwin Waldman. 2014. “Twins’ Rearing Environment Similarity and Childhood Externalizing Disorders: A Test of the Equal Environments Assumption.” Behavior Genetics 44 (6): 606–13. Lopez, Saioa, Lucy van Dorp, and Garrett Hallenthal. 2016. “Human Dispersal out of Africa: A Lasting Debate.” Evolutionary Bioinformatics 11 (S2): 57–68. Low, Bobbi S. 2015. Why Sex Matters: A Darwinian Look at Human Behavior. Princeton, NJ: Princeton University Press. Lubinski, David, and Camilla P. Benbow. 2006. “Study of Mathematically Precocious Youth After 35 Years: Uncovering Antecedents for the Development of Math-Science Expertise.”

pages: 314 words: 94,600

Business Metadata: Capturing Enterprise Knowledge
by William H. Inmon , Bonnie K. O'Neil and Lowell Fryman
Published 15 Feb 2008

The terms were stored in a 11179 registry, and the registry metadata was mapped to UML structures from the Class Diagram. The solution includes three main layers: ✦ Layer 1: Enterprise Vocabulary Services: DL (description logics) and ontology, thesaurus ✦ Layer 2: CADSR: Metadata Registry, consisting of Common Data Elements ✦ Layer 3: Cancer Bioinformatics Objects, using UML Domain Models The NCI Thesaurus contains over 48,000 concepts. Although its emphasis is on machine understandability, NCI has managed to translate description logic somewhat into English. Linking concepts together is accomplished through roles, which are also concepts themselves.

pages: 313 words: 84,312

We-Think: Mass Innovation, Not Mass Production
by Charles Leadbeater
Published 9 Dec 2010

They will bring with them the web’s culture of lateral, semi-structured free association. This new organisational landscape is taking shape all around us. Scientific research is becoming ever more a question of organising a vast number of pebbles. Young scientists especially in emerging fields like bioinformatics draw on hundreds of data banks; use electronic lab notebooks to record and then share their results daily, often through blogs and wikis; work in multi-disciplinary teams threaded around the world organised by social networks; they publish their results, including open source versions of the software used in their experiments and their raw data, in open access online journals.

pages: 426 words: 83,128

The Journey of Humanity: The Origins of Wealth and Inequality
by Oded Galor
Published 22 Mar 2022

Lipset, Seymour Martin, ‘Some social requisites of democracy: Economic development and political legitimacy’, American Political Science Review 53, no. 1 (1959): 69–105. Litina, Anastasia, ‘Natural land productivity, cooperation and comparative development’, Journal of Economic Growth 21, no. 4 (2016): 351–408. López, Saioa, Lucy Van Dorp and Garrett Hellenthal, ‘Human dispersal out of Africa: A lasting debate’, Evolutionary Bioinformatics 11 (2015): EBO-S33489. Lucas, Adrienne M., ‘The impact of malaria eradication on fertility’, Economic Development and Cultural Change 61, no. 3 (2013): 607–31. Lucas, Adrienne M., ‘Malaria eradication and educational attainment: evidence from Paraguay and Sri Lanka’, American Economic Journal: Applied Economics 2, no. 2 (2010): 46–71.

pages: 286 words: 90,530

Richard Dawkins: How a Scientist Changed the Way We Think
by Alan Grafen; Mark Ridley
Published 1 Jan 2006

The invention of an algorithmic biology Seth Bullock BIOLOGY and computing might not seem the most comfortable of bedfellows. It is easy to imagine nature and technology clashing as the green-welly brigade rub up awkwardly against the back-room boffins. But collaboration between the two fields has exploded in recent years, driven primarily by massive investment in the emerging field of bioinformatics charged with mapping the human genome. New algorithms and computational infrastructures have enabled research groups to collaborate effectively on a worldwide scale in building huge, exponentially growing genomic databases, to ‘mine’ these mountains of data for useful information, and to construct and manipulate innovative computational models of the genes and proteins that have been identified.

pages: 354 words: 91,875

The Willpower Instinct: How Self-Control Works, Why It Matters, and What You Can Doto Get More of It
by Kelly McGonigal
Published 1 Dec 2011

“Depression, Craving, and Substance Use Following a Randomized Trial of Mindfulness-Based Relapse Prevention.” Journal of Consulting and Clinical Psychology 78 (2010): 362–74. Chapter 10: Final Thoughts Page 237—“Only reasonable conclusion to a book about scientific ideas is: Draw your own conclusions”: Credit for this suggestion goes to Brian Kidd, Senior Bioinformatics Research Specialist, Institute for Infection Immunity and Transplantation, Stanford University. INDEX acceptance inner power of Adams, Claire addiction addict loses his cravings candy addict conquers sweet tooth chocoholic takes inspiration from Hershey’s Kisses dopamine’s role in drinking drug e-mail Facebook shopping smoker under social influence smoking Advisor-Teller Money Manager Intervention (ATM) Ainslie, George Air Force Academy, U.S.

pages: 313 words: 95,077

Here Comes Everybody: The Power of Organizing Without Organizations
by Clay Shirky
Published 28 Feb 2008

Despite these resources and incentives, however, the solution didn’t come from China. On April 12, Genome Sciences Centre (GSC), a small Canadian lab specializing in the genetics of pathogens, published the genetic sequence of SARS. On the way, they had participated in not just one open network, but several. Almost the entire computational installation of GSC is open source; bioinformatics tools with names like BLAST, Phrap, Phred, and Consed, all running on Linux. GSC checked their work against Genbank, a public database of genetic sequences. They published their findings on their own site (run, naturally, using open source tools) and published the finished sequence to Genbank, for everyone to see.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists
by Gary Marcus and Jeremy Freeman
Published 1 Nov 2014

When geneticists began exome sequencing in earnest, they encountered an unexpected complication. It turns out that each human individual carries a surprisingly high number of potentially deleterious mutations, typically more than one hundred. These are mutations that alter or disturb protein sequences in a way that is predicted to have a damaging effect on protein function, based on bioinformatic (computer-based) analyses. Each mutation might be extremely rare in the population, or even unique to the person or family in which it is found. How do we sift out the true causal mutations, the ones that are functionally implicated in the disorder or trait we are studying, against a broader background of irrelevant genomic change?

pages: 396 words: 96,049

Upgrade
by Blake Crouch
Published 6 Jul 2022

“How big—” “Two, maybe. Likely five people.” “Any idea of—” God, I knew every question he would ask before he asked it. So much wasted time. So much inefficiency. “—who they might be?” I said, “She would need people who, as a group, could encompass biochemistry, molecular biology, genetics, and bioinformatics. Every one of them working at the height of their powers. I can’t imagine her pulling this off without a quantum-annealing or exascale processor.” I was speaking too fast. The average person speaks 100 to 130 words per minute. I was pushing 180. When had that started? I needed to slow down, stop drawing attention to my exploding intellect.

pages: 285 words: 86,858

How to Spend a Trillion Dollars
by Rowan Hooper
Published 15 Jan 2020

With our windfall we can not only fund Lewin’s BioGenome Project, but expand it to cover all life forms. With the rate of extinction so high, it’s hard to think of a more vital endeavour. The Human Genome Project promises to revolutionise medicine in a wave of change that is still only just gathering pace. Alongside that, the fields of forensics, archaeology and bioinformatics have changed beyond recognition. And every $1 of public money invested in the Human Genome Project generated more than $140 in economic activity. The Earth BioGenome Project could bring unimaginable advances in science – improved medicines and materials, biofuels and crops. Having a vast genetic databank of millions more species will improve our understanding of the evolution of life on Earth.

pages: 471 words: 94,519

Managing Projects With GNU Make
by Robert Mecklenburg and Andrew Oram
Published 19 Nov 2004

(question mark), Wildcards calling functions and, Wildcards character classes, Wildcards expanding, Wildcards misuse, Wildcards pattern rules and, Rules ^ (tilde), Wildcards Windows filesystem, Cygwin and, Filesystem wordlist function, String Functions words function, String Functions X XML, Ant, XML Preprocessing build files, Ant preprocessing book makefile, XML Preprocessing About the Author Robert Mecklenburg began using Unix as a student in 1977 and has been programming professionally for 23 years. His make experience started in 1982 at NASA with Unix version 7. Robert received his Ph.D. in Computer Science from the University of Utah in 1991. Since then, he has worked in many fields ranging from mechanical CAD to bioinformatics, and he brings his extensive experience in C++, Java, and Lisp to bear on the problems of project management with make Colophon Our look is the result of reader comments, our own experimentation, and feedback from distribution channels. Distinctive covers complement our distinctive approach to technical topics, breathing personality and life into potentially dry subjects.

pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance
by Emanuel Derman
Published 1 Jan 2004

With their deep pockets, he said "they had guys spending all their time running diff RMSs files and the O'Connor code" (Dill is one of the great suite of UNIX tools that make a programmer's life easier. It compares two different files of text and finds any common strings of words in them, a simpler version of current bio-informatics programs that search for common strings of DNA in the mouse and human genome.) I have no idea whether there were in fact commonalities, but even independent people coding the same wellknown algorithm might end up writing vaguely similar chunks of code. O'Connor eventually disappeared, too, absorbed into Swiss Bank, which itself subsequently merged with UBS.

pages: 313 words: 34,042

Tools for Computational Finance
by Rüdiger Seydel
Published 2 Jan 2002

.: Statistics of Financial Markets: An Introduction Hurwitz, A.; Kritikos, N.: Lectures on Number Theory Frauenthal, J. C.: Mathematical Modeling in Epidemiology Huybrechts, D.: Complex Geometry: An Introduction Freitag, E.; Busam, R.: Complex Analysis Isaev, A.: Introduction to Mathematical Methods in Bioinformatics Friedman, R.: Algebraic Surfaces and Holomorphic Vector Bundles Fuks, D. B.; Rokhlin, V. A.: Beginner’s Course in Topology Fuhrmann, P. A.: A Polynomial Approach to Linear Algebra Gallot, S.; Hulin, D.; Lafontaine, J.: Riemannian Geometry Istas, J.: Mathematical Modeling for the Life Sciences Iversen, B.: Cohomology of Sheaves Jacod, J.; Protter, P.: Probability Essentials Jennings, G.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity
by Amy Webb
Published 5 Mar 2019

Over-the-counter medications are mostly gone, too, but compounding pharmacies have seen a resurgence. That’s because AGI helped accelerate critical developments in genetic editing and precision medicine. You now consult a computational pharmacist: specially trained pharmacists who have backgrounds in bioinformatics, medicine, and pharmacology. Computational pharmacy is a medical specialty, one that works closely with a new breed of AI-GPs: general practitioners who are trained in both medicine and technology. While AGI has obviated certain medical specialists—radiologists, immunologists, allergists, cardiologists, dermatologists, endocrinologists, anesthesiologists, neurologists, and others—doctors working in those fields had plenty of time to repurpose their skills for adjacent fields.

RDF Database Systems: Triples Storage and SPARQL Query Processing
by Olivier Cure and Guillaume Blin
Published 10 Dec 2014

This fact is mainly due to the expansion of the Web and the load of information that can be harvested from our interactions with it, such as via personal computers, laptops, smartphones, and tablet devices. This data can be represented using various models and in the context of use cases thriving on the Web—that is, for social, geographical, recommendations, bioinformatics, network management, and fraud detection, to name a few, the graph data model is a particularly relevant choice. RDF, with its W3C recommendation status and its set of companions like SPARQL, SKOS, RDFS, and OWL, plays a primordial role in the graph data model ecosystem.The quantity and quality of tools, such as parsers, editors, and APIs, implemented to ease the use of RDF data attests for the strong enthusiasm surrounding this standard, as well as the importance to manage this data appropriately.The number of academic, open-source and commercial RDF stores presented in this book emphasize the importance of this tool category, the diversity of possible approaches, as well as the complexity to design efficient systems.

pages: 420 words: 100,811

We Are Data: Algorithms and the Making of Our Digital Selves
by John Cheney-Lippold
Published 1 May 2017

Louise Amoore, “On the Emergence of a Security Risk Calculus for Our Times,” Theory, Culture & Society 28, no. 6 (2011): 27. 20. Alexander Galloway, Gaming: Essays on Algorithmic Culture (Minneapolis: University of Minnesota Press, 2006), 103. 21. Nicholas Negroponte, Being Digital (New York: Vintage, 1995), 4. 22. Eugene Thacker, “Bioinformatics and Bio-logics,” Postmodern Culture 13, no. 2 (2003): 58. 23. Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think (Boston: Eamon Dolan / Houghton Mifflin Harcourt, 2013). 24. Tyler Reigeluth, “Why Data Is Not Enough: Digital Traces as Control of Self and Self-Control,” Surveillance & Society 12, no. 2 (2014): 249. 25.

pages: 502 words: 107,510

Natural Language Annotation for Machine Learning
by James Pustejovsky and Amber Stubbs
Published 14 Oct 2012

“Coupled Semi-Supervised Learning for Information Extraction.” In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). Chomsky, Noam. 1957. Syntactic Structures. Paris: Mouton. Chuzhanova, N.A., A.J. Jones, and S. Margetts.1998. “Feature selection for genetic sequence classification. “Bioinformatics 14(2):139–143. Culotta, Aron, Michael Wick, Robert Hall, and Andrew McCallum. 2007. “First-Order Probabilistic Models for Coreference Resolution.” In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL).

pages: 364 words: 99,897

The Industries of the Future
by Alec Ross
Published 2 Feb 2016

While Seltzer makes the case that virtually every bit of our personal information is now available to those who want it, I do think there are parts of our lives that remain private and that we must fight to keep private. And I think the best way to do that is by focusing on defining rules for data retention and proper use. Most of our health information remains private, and the need for privacy will grow with the rise of genomics. John Quackenbush, a professor of computational biology and bioinformatics at Harvard, explained that “as soon as you touch genomic data, that information is fundamentally identifiable. I can erase your address and Social Security number and every other identifier, but I can’t anonymize your genome without wiping out the information that I need to analyze.” The danger of genomic information being widely available is difficult to overstate.

pages: 696 words: 111,976

SQL Hacks
by Andrew Cumming and Gordon Russell
Published 28 Nov 2006

Mimer is also taking active part in the standardization of SQL as a member of the ISO SQL-standardization committee ISO/IEC JTC1/SC32, WorkGroup 3, Database Languages. You can download free development versions of Mimer SQL from http://www.mimer.com. Troels Arvin lives with his wife and son in Copenhagen, Denmark. He went half-way through medical school before realizing that computer science was the thing to do. He has since worked in the web, bioinformatics, and telecommunications businesses. Troels is keen on database technology and maintains a slowly growing web page on how databases implement the SQL standard: http://troels.arvin.dk/db/rdbms. Acknowledgments We would like to thank our editor, Brian Jepson, for his hard work and exceptional skill; his ability to separate the wheat from the chaff was invaluable.

pages: 359 words: 110,488

Bad Blood: Secrets and Lies in a Silicon Valley Startup
by John Carreyrou
Published 20 May 2018

In the process of writing this book, I reached out to all of the key figures in the Theranos saga and offered them the opportunity to comment on any allegations concerning them. Elizabeth Holmes, as is her right, declined my interview requests and chose not to cooperate with this account. Prologue November 17, 2006 Tim Kemp had good news for his team. The former IBM executive was in charge of bioinformatics at Theranos, a startup with a cutting-edge blood-testing system. The company had just completed its first big live demonstration for a pharmaceutical company. Elizabeth Holmes, Theranos’s twenty-two-year-old founder, had flown to Switzerland and shown off the system’s capabilities to executives at Novartis, the European drug giant.

pages: 424 words: 108,768

Origins: How Earth's History Shaped Human History
by Lewis Dartnell
Published 13 May 2019

‘Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa’, Proceedings of the National Academy of Sciences of the United States of America 103(25): 9578–83. López, S., L. van Dorp and G. Hellenthal (2015). ‘Human Dispersal Out of Africa: A Lasting Debate’, Evolutionary Bioinformatics Online 11(Suppl 2): 57–68. Lutgens, F. K. and E. J. Tarbuck (2000). The Atmosphere: An Introduction to Meteorology, 8th edition, Prentice Hall. Lyons, T. W., C. T. Reinhard and N. J. Planavsky (2014). ‘The rise of oxygen in Earth’s early ocean and atmosphere’, Nature 506: 307–15. Macalister, T. (2015).

A Brief History of Everyone Who Ever Lived
by Adam Rutherford
Published 7 Sep 2016

Nowadays it has become a tiresome cliché to say that a person’s passion or quintessential characteristic is ‘in their DNA’. The satirical magazine Private Eye has a whole column dedicated to this phrase flopping out of journalists’ and celebrities’ mouths. Well, Ewan Birney is a man with DNA in his DNA. These days he heads the European Bioinformatics Institute in Hinxton, just outside Cambridge, one of the great global genome powerhouses. While our contemporaries went off to Koh Samui or Goa to find themselves on their year off before going up to university, Ewan had won a place in the lab of James Watson, at Cold Spring Harbor, just at the birth of genomics, the biological science that would come to dominate all others.

pages: 292 words: 106,826

Boom: Bubbles and the End of Stagnation
by Byrne Hobart and Tobias Huber
Published 29 Oct 2024

“P-hacking,” whereby data is manipulated to make patterns appear statistically significant. The reproducibility crisis identified in Ioannidis’s paper has since been confirmed by myriad empirical studies. Almost every scientific field has been affected, from clinical trials in medicine to research in bioinformatics, neuroimaging, cognitive science, epidemiology, economics, political science, psychiatry, education, sociology, computer science, machine learning, and AI. But it’s not just the social sciences that are affected by the reproducibility crisis—even the so-called hard sciences are infected by it.

pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server
by Unknown
Published 13 Jan 2012

Without you, I wouldn't have this wonderful open source project to be so incredibly proud to be a part of! I look forward to meeting more of you at the next LuceneRevolution or Euro Lucene conference. About the Reviewers Jerome Eteve holds a MSc in IT and Sciences from the University of Lille (France). After starting his career in the field of bioinformatics where he worked as a Biological Data Management and Analysis Consultant, he's now a Senior Application Developer with interests ranging from architecture to delivering a great user experience online. He's passionate about open source technologies, search engines, and web application architecture.

pages: 437 words: 113,173

Age of Discovery: Navigating the Risks and Rewards of Our New Renaissance
by Ian Goldin and Chris Kutarna
Published 23 May 2016

Dwyer, Terence, PhD. (2015, October 1). “The Present State of Medical Science.” Interviewed by C. Kutarna, University of Oxford. 9. National Human Genome Research Institute (1998). “Twenty Questions about DNA Sequencing (and the Answers).” NHGRI. Retrieved from community.dur.ac.uk/biosci.bizhub/Bioinformatics/twenty_questions_about_DNA.htm. 10. Rincon, Paul (2014, January 15). “Science Enters $1,000 Genome Era.” BBC News. Retrieved from www.bbc.co.uk. 11. Regalado, Antonio (2014, September 24). “Emtech: Illumina Says 228,000 Human Genomes Will Be Sequenced This Year.” MIT Technology Review.

pages: 524 words: 120,182

Complexity: A Guided Tour
by Melanie Mitchell
Published 31 Mar 2009

The best-known applications are in the field of coding theory, which deals with both data compression and the way codes need to be structured to be reliably transmitted. Coding theory affects nearly all of our electronic communications; cell phones, computer networks, and the worldwide global positioning system are a few examples. Information theory is also central in cryptography and in the relatively new field of bioinformatics, in which entropy and other information theory measures are used to analyze patterns in gene sequences. It has also been applied to analysis of language and music and in psychology, statistical inference, and artificial intelligence, among many other fields. Although information theory was inspired by notions of entropy in thermodynamics and statistical mechanics, it is controversial whether or not information theory has had much of a reverse impact on those and other fields of physics.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Published 21 Sep 2015

Statistical Methods for Speech Recognition,* by Fred Jelinek (MIT Press, 1997), describes their application to speech recognition. The story of HMM-style inference in communication is told in “The Viterbi algorithm: A personal history,” by David Forney (unpublished; online at arxiv.org/pdf/cs/0504020v2.pdf). Bioinformatics: The Machine Learning Approach,* by Pierre Baldi and Søren Brunak (2nd ed., MIT Press, 2001), is an introduction to the use of machine learning in biology, including HMMs. “Engineers look to Kalman filtering for guidance,” by Barry Cipra (SIAM News, 1993), is a brief introduction to Kalman filters, their history, and their applications.

pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline
by Cathy O'Neil and Rachel Schutt
Published 8 Oct 2013

What people might not know is that the “datafication” of our offline behavior has started as well, mirroring the online data collection revolution (more on this later). Put the two together, and there’s a lot to learn about our behavior and, by extension, who we are as a species. It’s not just Internet data, though—it’s finance, the medical industry, pharmaceuticals, bioinformatics, social welfare, government, education, retail, and the list goes on. There is a growing influence of data in most sectors and most industries. In some cases, the amount of data collected might be enough to be considered “big” (more on this in the next chapter); in other cases, it’s not. But it’s not only the massiveness that makes all this new data interesting (or poses challenges).

pages: 399 words: 118,576

Ageless: The New Science of Getting Older Without Getting Old
by Andrew Steele
Published 24 Dec 2020

Any errors or omissions are my own. I would like to thank the Francis Crick Institute for allowing me to continue as a visiting researcher, allowing me to retain access to the scientific literature which underpins this book, in particular to Nick Luscombe for giving a physicist a chance to work in biology, and to the whole Bioinformatics and Computational Biology Lab for helping give me the grounding without which I would not have been able to write it. I am also hugely indebted to my editors, Alexis Kirschbaum, Kristine Puopolo and Jasmine Horsey, for their faith in my writing, for finding the book you’ve just read hidden in my first draft and for making the editing process thoroughly enjoyable.

Succeeding With AI: How to Make AI Work for Your Business
by Veljko Krunic
Published 29 Mar 2020

Similar to many other areas that have captured the popular imagination, it’s not universally agreed what all the fields are that are a part of data science. Some of the fields that are often considered part of data science include statistics, programming, mathematics, machine learning, operational research, and others [66]. Closely related fields that are sometimes considered part of data science include bioinformatics and quantitative analysis. While AI and data science closely overlap, they aren’t identical, because AI includes fields such as robotics, which are traditionally not considered part of data science. Harris, Murphy, and Vaisman’s book [66] provides a good summary of the state of data science before the advancement of deep learning.

pages: 370 words: 112,809

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future
by Orly Lobel
Published 17 Oct 2022

Illiberal countries have used facial recognition and other technologies to surveil minorities, to control speech, and to rapidly extract immense amounts of behavioral, biometric, and genetic information. Indeed, in many ways the race is skewed in favor of less democratic and more authoritarian countries, which can mandate disclosures of bioinformatics, for example, and do not have the same privacy safeguards in place that slow down data collection and experimentation. To be sure, the same technology can serve to support and to surveil, to learn and to manipulate, to heal and to harm, to detect and to conceal, to equalize and to exclude. The silicon curtain is the new term to describe the barriers to the transfer of technology between China and the West.

pages: 502 words: 124,794

Nexus
by Ramez Naam
Published 16 Dec 2012

He was occupied at the moment, but would come by in a few hours. Niran hung up the phone, smiled to himself. It would be wonderful to see Thanom again. 35 ROOTS "I wasn't born Samantha Cataranes. I was born Sarita Catalan. I grew up in southern California, in a little town near San Diego. My parents were Roberto and Anita. They both worked in bioinformatics, had met on the job. I had a sister, Ana." Sorrow welled up from her. Tears began to flow again, silently running down the side of her face. Kade felt troubled, concerned, empathic. He stroked her hair, sent kindness. "My parents were hippies. The kind of hippies who worked in tech but went camping with the family, had singalongs with friends.

pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy
by Sharon Bertsch McGrayne
Published 16 May 2011

Ron Howard, who had become interested in Bayes while at Harvard, was working on Bayesian networks in Stanford’s economic engineering department. A medical student, David E. Heckerman, became interested too and for his Ph.D. dissertation wrote a program to help pathologists diagnose lymph node diseases. Computerized diagnostics had been tried but abandoned decades earlier. Heckerman’s Ph.D. in bioinformatics concerned medicine, but his software won a prestigious national award in 1990 from the Association for Computing Machinery, the professional organization for computing. Two years later, Heckerman went to Microsoft to work on Bayesian networks. The Federal Drug Administration (FDA) allows the manufacturers of medical devices to use Bayes in their final applications for FDA approval.

pages: 476 words: 120,892

Life on the Edge: The Coming of Age of Quantum Biology
by Johnjoe McFadden and Jim Al-Khalili
Published 14 Oct 2014

Olsson, “Increased transcription levels induce higher mutation rates in a hypermutating cell line,” Journal of Immunology, vol. 166: 8 (2001), pp. 5051–7. 8 P. Cui, F. Ding, Q. Lin, L. Zhang, A. Li, Z. Zhang, S. Hu and J. Yu, “Distinct contributions of replication and transcription to mutation rate variation of human genomes,” Genomics, Proteomics and Bioinformatics, vol. 10: 1 (2012), pp. 4–10. 9 J. Cairns, J. Overbaugh and S. Millar, “The origin of mutants,” Nature, vol. 335 (1988), pp. 142–5. 10 John Cairns on Jim Watson, Cold Spring Harbor Oral History Collection. Interview available at: http://library.cshl.edu/oralhistory/interview/james-d-watson/meeting-jim-watson/watson/. 11 J.

pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots
by John Markoff
Published 24 Aug 2015

Immediately after he read the message, two large men burst into his office and instructed him that it was essential he immediately accompany them to an undisclosed location in Woodside, the elite community populated by Silicon Valley’s technology executives and venture capitalists. This was Page’s surprise fortieth birthday party, orchestrated by his wife, Lucy Southworth, a Stanford bioinformatics Ph.D. A crowd of 150 people in appropriate alien-themed costumes had gathered, including Google cofounder Sergey Brin, who wore a dress. In the basement of the sprawling mansion where the party was held, a robot arm grabbed small boxes one at a time and gaily tossed the souvenirs to an appreciative crowd.

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by Dipanjan Sarkar
Published 1 Dec 2016

Topic models are also often known as probabilistic statistical models, which use specific statistical techniques including singular valued decomposition and latent dirichlet allocation to discover connected latent semantic structures in text data that yield topics and concepts. They are used extensively in text analytics and even bioinformatics. Automated document summarizationis the process of using a computer program or algorithm based on statistical and ML techniques to summarize a document or corpus of documents such that we obtain a short summary that captures all the essential concepts and themes of the original document or corpus.

pages: 458 words: 135,206

CTOs at Work
by Scott Donaldson , Stanley Siegel and Gary Donaldson
Published 13 Jan 2012

With a teammate we developed a brand-new type of biological sensor that we called “TIGER” (Threat ID through Genetic Evaluation of Risk). That technology won The Wall Street Journal “gold” Technology Innovation Award in 2009 for the best invention of the year. It relies on a combination of advanced biotech hardware with groundbreaking bio-informatics techniques that were based on our radar signal processing expertise. Information from a sensor like that can feed into our epidemiology and disease tracking work. That's an example of a sensor at the front end through information flow at the back end. In the cyber security domain, our subsidiary, CloudShield, has a very special piece of hardware that enables real-time, deep packet inspection of network traffic at network line speeds, and that allows you to find cyber threats embedded in the traffic.

pages: 532 words: 139,706

Googled: The End of the World as We Know It
by Ken Auletta
Published 1 Jan 2009

Measured by growth, it was Google’s best year, with revenues soaring 60 percent to $16.6 billion, with international revenues contributing nearly half the total, and with profits climbing to $4.2 billion. Google ended the year with 16,805 full-time employees, offices in twenty countries, and the search engine available in 117 languages. And the year had been a personally happy one for Page and Brin. Page married Lucy Southworth, a former model who earned her Ph.D. in bioinformatics in January 2009 from Stanford; they married seven months after Brin wed Anne Wojcicki. But Sheryl Sandberg was worried. She had held a ranking job in the Clinton administration before, joining Google in 2001, where she supervised all online sales for AdWords and AdSense, and was regularly hailed by Fortune magazine as one of the fifty most powerful female executives in America.

pages: 486 words: 132,784

Inventors at Work: The Minds and Motivation Behind Modern Inventions
by Brett Stern
Published 14 Oct 2012

Dougherty: Oftentimes, inventors who are prosecuting their application pro se are unaware that they may ask the examiner for assistance in drafting allowable claims if there is allowable subject matter in the written disclosure. The examiner’s function is to allow valid patents. So, they will help the inventor come to an allowable subject matter if it exists in the application. Stern: Which technologies or fields exhibit high-growth trends in terms of patents? Calvert: One area that is going to be big is bioinformatics, which is biology and computer software working together. Dougherty: Medical device art is a high-growth area, too. People are living longer and they’re seeking to reduce costs for an enhanced life. Devices are getting smaller. Nanotechnology is already enabling medical devices, for example, that can travel through your bloodstream, collecting and reporting medical data in real time.

pages: 398 words: 31,161

Gnuplot in Action: Understanding Data With Graphs
by Philipp Janert
Published 2 Jan 2010

Then, the project had to be ■ ■ ■ ■ Free and open source Available for the Linux platform Active and mature Available as a standalone product and allowing interactive use (this requirement eliminates libraries and graphics command languages) 348 APPENDIX C ■ ■ C.3.1 Reasonably general purpose (this eliminates specialized tools for molecular modeling, bio-informatics, high-energy physics, and so on) Comparable to or going beyond gnuplot in at least some respects Math and statistics programming environments R The R language and environment (www.r-project.org) are in many ways the de facto standard for statistical computing and graphics using open source tools.

pages: 445 words: 129,068

The Speed of Dark
by Elizabeth Moon
Published 1 Jan 2002

I believe God is important and does not make mistakes. My mother used to joke about God making mistakes, but I do not think if He is God He makes mistakes. So it is not a silly question. Do I want to be healed?And of what? The only self I know is this self, the person I am now, the autistic bioinformatics specialist fencer lover of Marjory. And I believe in his only begotten son, Jesus Christ, who actually in the flesh asked that question of the man by the pool. The man who perhaps—the story does not say—had gone there because people were Page 183 tired of him being sick and disabled, who perhaps had been content to lie down all day, but he got in the way.

pages: 339 words: 57,031

From Counterculture to Cyberculture: Stewart Brand, the Whole Earth Network, and the Rise of Digital Utopianism
by Fred Turner
Published 31 Aug 2006

Like the scientists and technicians of the Rad Lab and Los Alamos in World War II, the contributors to the first Artificial Life Conference quickly established an intellectual trading zone. Specialists in robotics presented papers on questions of cultural evolution; computer scientists used new algorithms to model seemingly biological patterns of growth; bioinformatics specialists applied what they believed to be principles of natural ecologies to the development of social structures. For these scientists, as formerly for members of the Rad Lab and the cold war research institutes that followed it, systems theory served as a contact language and computers served as key supports for a systems orientation toward interdisciplinary work.

pages: 476 words: 148,895

Cooked: A Natural History of Transformation
by Michael Pollan
Published 22 Apr 2013

European Molecular Biology Organization, Vol. 7, No. 10, 2006. Bravo, Javier A., et al. “Ingestion of Lactobacillus Strain Regulates Emotional Behavior and Central GABA Receptor Expression in a Mouse Via the Vagus Nerve.” www.pnas.org/cgi/doi/10.1073/pnas.1102999108. Desiere, Frank, et al. “Bioinformatics and Data Knowledge: The New Frontiers for Nutrition and Food.” Trends in Food Science & Technology 12 (2002): 215–29. Douwes, J., et al. “Farm Exposure in Utero May Protect Against Asthma.” European Respiratory Journal 32 (2008): 603–11. Ege, M.J., et al. Parsifal study team. “Prenatal Farm Exposure Is Related to the Expression of Receptors of the Innate Immunity and to Atopic Sensitization in School-Age Children.”

pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking
by Foster Provost and Tom Fawcett
Published 30 Jun 2013

Neural networks for credit scoring. In Goonatilake, S., & Treleaven, P. (Eds.), Intelligent Systems for Finance and Business, pp. 61–69. John Wiley and Sons Ltd., West Sussex, England. Letunic, & Bork (2006). Interactive tree of life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics, 23 (1). Lin, J.-H., & Vitter, J. S. (1994). A theory for memory-based learning. Machine Learning, 17, 143–167. Lloyd, S. P. (1982). Least square quantization in PCM. IEEE Transactions on Information Theory, 28 (2), 129–137. MacKay, D. (2003). Information Theory, Inference and Learning Algorithms, Chapter 20.

pages: 772 words: 150,109

As Gods: A Moral History of the Genetic Age
by Matthew Cobb
Published 15 Nov 2022

The data point to a natural spillover event such as we have seen in the past and will almost certainly see again.112 The long, painstaking research that was required before the bat origin of SARS was identified explains why there was no immediate agreement on which animal species was the original host of SARS-CoV-2 – such things take a long time even in the absence of a global pandemic.113 One possible solution to concerns about identifying manipulated pathogens, and indeed a potential resolution to some of the more outlandish speculation about the origin of SARS-CoV-2, may lie in the use of genetic engineering forensics – complex bioinformatic analyses – to determine whether an organism involved in a disease outbreak has been genetically modified and, if so, to infer its likely origin. This work is in its infancy, but a network of laboratories, RefBio, has recently been set up under the auspices of the United Nations to gather sequence data from future events.114 ✴ Throughout the half-century history of genetic engineering there have been persistent concerns that the apparent simplicity of the methods involved might enable terrorists or biohackers to replicate experiments with potentially disastrous results.

pages: 504 words: 89,238

Natural language processing with Python
by Steven Bird , Ewan Klein and Edward Loper
Published 15 Dec 2009

[Heim and Kratzer, 1998] Irene Heim and Angelika Kratzer. Semantics in Generative Grammar. Blackwell, 1998. [Hirschman et al., 2005] Lynette Hirschman, Alexander Yeh, Christian Blaschke, and Alfonso Valencia. Overview of BioCreAtIvE: critical assessment of information extrac tion for biology. BMC Bioinformatics, 6, May 2005. Supplement 1. [Hodges, 1977] Wilfred Hodges. Logic. Penguin Books, Harmondsworth, 1977. [Huddleston and Pullum, 2002] Rodney D. Huddleston and Geoffrey K. Pullum. The Cambridge Grammar of the English Language. Cambridge University Press, 2002. [Hunt and Thomas, 2000] Andrew Hunt and David Thomas.

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies
by Nick Bostrom
Published 3 Jun 2014

Advances in Monte Carlo approximation techniques, for example, are directly applied in computer vision, robotics, and computational genetics. Another advantage is that it lets researchers from different disciplines more easily pool their findings. Graphical models and Bayesian statistics have become a shared focus of research in many fields, including machine learning, statistical physics, bioinformatics, combinatorial optimization, and communication theory.35 A fair amount of the recent progress in machine learning has resulted from incorporating formal results originally derived in other academic fields. (Machine learning applications have also benefitted enormously from faster computers and greater availability of large data sets

pages: 855 words: 178,507

The Information: A History, a Theory, a Flood
by James Gleick
Published 1 Mar 2011

The “jumping the shark” entry in Wikipedia advised in 2009, “See also: jumping the couch; nuking the fridge.” Is this science? In his 1983 column, Hofstadter proposed the obvious memetic label for such a discipline: memetics. The study of memes has attracted researchers from fields as far apart as computer science and microbiology. In bioinformatics, chain letters are an object of study. They are memes; they have evolutionary histories. The very purpose of a chain letter is replication; whatever else a chain letter may say, it embodies one message: Copy me. One student of chain-letter evolution, Daniel W. VanArsdale, listed many variants, in chain letters and even earlier texts: “Make seven copies of it exactly as it is written” [1902]; “Copy this in full and send to nine friends” [1923]; “And if any man shall take away from the words of the book of this prophecy, God shall take away his part out of the book of life” [Revelation 22:19].♦ Chain letters flourished with the help of a new nineteenth-century technology: “carbonic paper,” sandwiched between sheets of writing paper in stacks.

pages: 741 words: 164,057

Editing Humanity: The CRISPR Revolution and the New Era of Genome Editing
by Kevin Davies
Published 5 Oct 2020

In 2002, Eugene Koonin, a Russian expat computational biologist at the National Center for Biotechnology Information at the NIH, and his colleague Kira Makarova, described a series of bacterial genes they suspected to be part of a DNA repair system.10 What they didn’t realize was that these genes were sitting adjacent to the CRISPR array and—as we shall soon see—play an essential role in the function of CRISPR and gene editing. * * * After a few years working in Oxford, Mojica returned to Alicante in 1997 to set up his own group. With little funding, Mojica tried to do some very cheap experiments, “even though I had no idea about bioinformatics.” The nagging question was the origin of the spacer DNA, the sequencers interspersed between the repeats. “The easiest thing is to look at the databases and expect that something comes out, but we didn’t get anything—until 2003.” By now, the DNA databases were bursting with bacterial and archaea genomes, many of which carried versions of these repeats.

pages: 612 words: 187,431

The Art of UNIX Programming
by Eric S. Raymond
Published 22 Sep 2003

XHTML, the latest version of HTML, is also an XML application described by a DTD, which explains the family resemblance between XHTML and DocBook tags. The XHTML toolchain consists of Web browsers that can format HTML as flat ASCII, together with any of a number of ad-hoc HTML-to-print utilities. Many other XML DTDs are maintained to help people exchange structured information in fields as diverse as bioinformatics and banking. You can look at a list of repositories to get some idea of the variety available. The DocBook Toolchain Normally, what you'll do to make XHTML from your DocBook sources is use the xmlto(1) front end. Your commands will look like this: bash$ xmlto xhtml foo.xml bash$ ls *.html ar01s02.html ar01s03.html ar01s04.html index.html In this example, you converted an XML-DocBook document named foo.xml with three top-level sections into an index page and two parts.

pages: 821 words: 178,631

The Rust Programming Language
by Steve Klabnik and Carol Nichols
Published 14 Jun 2018

Through efforts such as this book, the Rust teams want to make systems concepts more accessible to more people, especially those new to programming. Companies Hundreds of companies, large and small, use Rust in production for a variety of tasks. Those tasks include command line tools, web services, DevOps tooling, embedded devices, audio and video analysis and transcoding, cryptocurrencies, bioinformatics, search engines, Internet of Things applications, machine learning, and even major parts of the Firefox web browser. Open Source Developers Rust is for people who want to build the Rust programming language, community, developer tools, and libraries. We’d love to have you contribute to the Rust language.

pages: 1,331 words: 183,137

Programming Rust: Fast, Safe Systems Development
by Jim Blandy and Jason Orendorff
Published 21 Nov 2017

We’ll also cover a wide range of topics that come up naturally as your project grows, including how to document and test Rust code, how to silence unwanted compiler warnings, how to use Cargo to manage project dependencies and versioning, how to publish open source libraries on crates.io, and more. Crates Rust programs are made of crates. Each crate is a Rust project: all the source code for a single library or executable, plus any associated tests, examples, tools, configuration, and other junk. For your fern simulator, you might use third-party libraries for 3D graphics, bioinformatics, parallel computation, and so on. These libraries are distributed as crates (see Figure 8-1). Figure 8-1. A crate and its dependencies The easiest way to see what crates are and how they work together is to use cargo build with the --verbose flag to build an existing project that has some dependencies.

pages: 648 words: 183,275

The Rust Programming Language, 2nd Edition
by Steve Klabnik and Carol Nichols
Published 27 Feb 2023

Through efforts such as this book, the Rust teams want to make systems concepts more accessible to more people, especially those new to programming. Companies Hundreds of companies, large and small, use Rust in production for a variety of tasks, including command line tools, web services, DevOps tooling, embedded devices, audio and video analysis and transcoding, cryptocurrencies, bioinformatics, search engines, Internet of Things applications, machine learning, and even major parts of the Firefox web browser. Open Source Developers Rust is for people who want to build the Rust programming language, community, developer tools, and libraries. We’d love to have you contribute to the Rust language.

pages: 933 words: 205,691

Hadoop: The Definitive Guide
by Tom White
Published 29 May 2009

This is a good example where both SQL and MapReduce are required for solving the end user problem and something that is possible to achieve easily with Hive. Data analysis Hive and Hadoop can be easily used for training and scoring for data analysis applications. These data analysis applications can span multiple domains such as popular websites, bioinformatics companies, and oil exploration companies. A typical example of such an application in the online ad network industry would be the prediction of what features of an ad makes it more likely to be noticed by the user. The training phase typically would involve identifying the response metric and the predictive features.

pages: 1,201 words: 233,519

Coders at Work
by Peter Seibel
Published 22 Jun 2009

But we have to be willing to try and take advantage of that, but also take advantage of the integration of systems and the fact that data's coming from everywhere. It's no longer encapsulated with the program, the code. We're seeing now, I think, vast amounts of data, which is accessible. And it's numeric data as well as the informational kinds of data, and will be stored all over the globe, especially if you're working in some of the bioinformatics kind of stuff. And we have to be able to create a platform, probably composed of a lot of parts, which is going to enable those things to come together—computational capability that is probably quite different than we have now. And we also need to, sooner or later, address usability and integrity of these systems.

pages: 798 words: 240,182

The Transhumanist Reader
by Max More and Natasha Vita-More
Published 4 Mar 2013

Consequently, there is an unbridgeable gap which would-be enhancers cannot ethically cross. This view incorporates a rather static view of what it will be possible for future genetic ­enhancers to know and test beforehand. Any genetic enhancement techniques will first be ­extensively tested and perfected in animal models. Second, a vastly expanded bioinformatics enterprise will become crucial to understanding the ramifications of proposed genetic inter­ventions (National Resource Center for Cell Analysis). As scientific understanding improves, the risk versus benefit calculations of various prospective genetic enhancements of embryos will shift. The arc of ­scientific discovery and technological progress strongly suggests that it will happen in the next few decades.

pages: 903 words: 235,753

The Stack: On Software and Sovereignty
by Benjamin H. Bratton
Published 19 Feb 2016

This also relates to what Heidegger once called our “confrontation with planetary technology” (an encounter that he never managed to actually make and which most Heideggerians manage to endlessly defer, or “differ”).15 That encounter should be motivated by an invested interest in several “planetary technologies” working at various scales of matter, and based on, in many respects, what cheap supercomputing, broadband networking, and isomorphic data management methodologies make possible to research and application. These include—but are no means limited to—geology (e.g., geochemistry, geophysics, oceanography, glaciology), earth sciences (e.g., focusing on the atmosphere, lithospere, biosphere, hydrosphere), as well as the various programs of biotechnology (e.g., bioinformatics, synthetic biology, cell therapy), of nanotechnology (e.g., materials, machines, medicines), of economics (e.g., modeling price, output cycles, disincentivized externalities), of neuroscience (e.g., behavioral, cognitive, clinical), and of astronomy (e.g., astrobiology, extragalactic imaging, cosmology).

pages: 1,373 words: 300,577

The Quest: Energy, Security, and the Remaking of the Modern World
by Daniel Yergin
Published 14 May 2011

We did not know that DNA was the genetic material until 1946. The Green Revolution in the late 1960s was an example of beginning to apply modern biology to plant improvement.”19 Many of the people working in this field are applying the know-how that emerged from the sequencing of the human genome. Calling on the new fields of bioinformatics and computational biology, and using what is called highthroughput experimentation, they seek to identify specific genes and their functions. The aim is to speed up the process of evolution, selecting for characteristics that will make such tall grasses as miscanthus and switchgrass effective energy crops that can grow in marginal lands that would not be cultivated for food.

pages: 1,199 words: 332,563

Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition
by Robert N. Proctor
Published 28 Feb 2012

MCV faculty also helped undermine public health advocacy: in 1990 James Kilpatrick from biostatistics, working also as a consultant for the Tobacco Institute, wrote to the editor of the New York Times criticizing Stanton Glantz and William Parmley’s demonstration of thirty-five thousand U.S. cardiovascular deaths per annum from exposure to secondhand smoke.49 Glantz by this time was commonly ridiculed by the industry, which even organized skits (to practice courtroom scenarios) in which health advocates were given thinly disguised names: Glantz was “Ata Glance” or “Stanton Glass, professional anti-smoker”; Alan Blum was “Alan Glum” representing “Doctors Ought to Kvetch” or “Doctors Opposed to People Exhaling Smoke” (DOPES); Richard Daynard was “Richard Blowhard” from the “Product Liability Education Alliance,” and so forth.50 VCU continues even today to have close research relationships with Philip Morris, covering topics as diverse as pharmacogenomics, bioinformatics, and behavioral genetics.51 SYMBIOSIS It would be a mistake to characterize this interpenetration of tobacco and academia as merely a “conflict of interest”; the relationship has been far more symbiotic. We are really talking about a confluence of interests, and sometimes even a virtual identity of interests.