Champions of Change Blog

Open Science for a Changing Planet

Posted by Rebecca Moore on June 20, 2013 at 6:41 PM EDT

Rebecca Moore is being honored as a Champion of Change for the vision she has demonstrated and for her commitment to open science.

In 2008, I was asked by indigenous Amazon Indians in Brazil to come and teach them how to use Google Earth to defend their land from threats like illegal logging. While I was there a leading geoscientist approached me. He said that while Google Earth and Maps were great tools for visualizing satellite imagery and map data, they were not optimal for conducting scientific analysis of that data such as mapping trends and monitoring change in places like the Brazilian Amazon. He wondered, would Google consider building such a technology to address the need for monitoring change?

I soon learned that more than a million acres of Brazilian Amazon were disappearing every year, often due to illegal logging in remote parts of the rainforest where law enforcement on the ground was spread thin. From a climate perspective, deforestation is said to account for between 14 and 20 percent of greenhouse gas emissions. This accounts for more than all the transportation in the world put together (including cars, planes, trucks, ships, etc).

To tackle this problem, freely-available daily satellite imagery could be used as the foundation of an automated alerting system powered by scientific algorithms. However the scale of the data and subsequent data processing required were daunting. To monitor change in the Amazon requires many terabytes of satellite imagery, and the data analysis can take weeks or months to run on a single computer. By then, once deforestation is discovered, it is often too late.

I brought this challenge back to Google headquarters, and we built a new technology platform that we call Google Earth Engine. This analytical engine works for all the world’s satellite and environmental data, both historical and daily-updating. It also has an easy-to-use software framework for scientists to run their algorithms on thousands of computers in parallel in Google’s data centers. With a team of some of the best software engineers at Google, we developed collaborations with government agencies such as NASA, USGS and NOAA to bring their treasure trove of decades of planetary data out of tape vaults (like this.) We put it online, into Google Earth Engine, and made it ready for analysis.

We launched Earth Engine in 2010, and I’m happy to say that now more than a thousand scientists all over the world (e.g., in the United States, Brazil and Australia) are using it for everything from monitoring deforestation to forecasting drought. They are estimating agricultural crop yield and predicting where chimpanzees are likely to build their nests.

In collaboration with scientist Matthew Hansen and CONAFOR, Mexico’s National Forestry Commission, we produced the highest resolution forest and water map of Mexico ever created. The map required 15,000 hours of computation, but was completed in less than a day on Google Earth Engine. We used 1,000 computers and over 53,000 Landsat images. On a single computer it would have taken almost three years.

We also recently used Google Earth Engine to create a stunning historical perspective on the changes to the Earth’s surface over time. We compiled more than a quarter-century of Landsat images of Earth (millions of images and trillions of pixels) into an interactive time-lapse experience – perhaps the most comprehensive picture of our changing planet ever made available to the public. Now, anyone can view stunning phenomena such as the sprouting of Dubai’s artificial Palm Islands, the retreat of Alaska’s Columbia Glacier, the deforestation of the Brazilian Amazon and urban growth in Las Vegas from 1984 to 2012.

As we continue to develop the platform, we hope more scientists will use the new Earth Engine API to integrate their applications online—for disease mitigation, disaster response, and other beneficial uses. If you’re interested in partnering with us, we want to hear from you—visit our website! We look forward to seeing what’s possible when scientists, governments, NGO’s, universities, and others gain access to open data and computing resources to collaborate online. Together we can advance science, inform policy, democratize access to satellite data, and help protect the earth’s environment.

Rebecca Moore is an Engineering Manager at Google.

Learn more about Technology
GenoSpace: Making Big Data in Health Care Useful and Usable

Posted by John Quackenbush on June 20, 2013 at 6:34 PM EDT

John Quackenbush is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

In a June 2000 press conference, President Bill Clinton announced the initial survey of the entire human genome. In describing the extraordinary scientific and technical achievement, Clinton remarked, “Without a doubt, this is the most important, most wondrous map ever produced by humankind.”

Announced at the dawn of a new millennium, the completion of that first genome sequence sparked excitement worldwide and ignited the imaginations of both the scientific community and the general public. Upon unlocking of the genetic blueprint found in each of our cells, we now had a unique window, not only into the commonalities shared among all humans, but also into what makes each of us unique. Most importantly, decoding our genetic code created the opportunity for rapid advances in medical research that we hoped would lead to a promising era of personalized medicine. Yet for more than a decade, the prohibitive cost of genome sequencing challenged our progress.

But no longer. Thanks to new, advanced technologies, the cost of sequencing a genome has plummeted and is continuing to decrease rapidly. What once took years and cost billions of dollars can now be carried out in little more than a day, and for less than $3,000. However, one challenging question remains: How can we efficiently and effectively collect, manage, analyze, interpret, and share vast amounts of sequencing data — all six billion chemical “letters” gleaned from our genomes and the information they encode — in ways that make them usable, useful, and above all, private and secure?

The answer to this ultimate “Big Data” question has always been clear to my colleague Mick Corell and me. After years of work handling genomic data and developing analytical methods for discovering genomic factors that influence disease, we came to understand the limitations of existing technologies for meeting the unique challenges of this new genomic era. So in 2011 we launched GenoSpace, a company focused on developing innovative and accessible computational tools for making genomic data accessible. We have created a cloud-computing environment that enables complex genomic and clinical data to be securely stored, accessed, and analyzed, as well as tools that make information accessible to a broad range of users, including scientists, drug companies, health care organizations, physicians, and patients themselves, all of whom have different needs.

Our partnership with the Multiple Myeloma Research Foundation (MMRF) represents an important example of how our technology can help drive innovation. The MMRF’s CoMMpass study is an open-science project that unites competitors from the pharmaceutical and biotech industries, academic institutions, and community cancer centers through a publicly available, IP-free data ecosystem focused on finding a cure for this devastating disease. And we at GenoSpace are proud to have created the MMRF Researcher Gateway, featuring some of the most advanced and innovative data-analysis tools available, which is open to the world. We are also excited to have developed the MMRF Community Gateway, a new resource that will bring patients together to share information and become active partners in searching for a cure.

My colleagues and I at GenoSpace are honored to be recognized as a White House Open Science Champion of Change. Nearly every major scientific revolution has been driven by access to new sources of data and information. GenoSpace is proud to be a leader in driving the revolution in biomedical research and health care by creating tools that foster collaboration, spur medical research, and accelerate the pace of scientific progress — all the while recognizing the important role that each individual patient can play in the process of medical discovery.

John Quackenbush is a Professor of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute and the Harvard School of Public Health. He is also CEO of GenoSpace, which he and Mick Correll co-founded in 2011.

Learn more about Technology
Reaching for the Stars: Bringing the Cosmos to a Computer Near You

Posted by Jeremiah Ostriker on June 20, 2013 at 6:30 PM EDT

Jeremiah Ostriker is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

I have been very lucky.

Growing up in a big city where the lights obscured the night sky, I became more and more curious about what was “out there,” above the bustle of urban life, above the US, far above the little speck of dust we call planet Earth. Books on the solar system of planets and on the world of stars that made up our Milky Way galaxy fascinated me. Big telescopes, on the ground and in the good climate of the Western part of our country, were beginning to peer into the wider universe of galaxies and cosmology. Hubble had discovered the expanding universe whose fate was now a serious scientific subject, no longer in the realm of philosophy or theology but a domain where measurement and calculation based on physical laws was possible.

At that time, astronomers applied for time on telescopes, took observations, and then carried the results to their home institutions where they sat in desk drawers until years of slow analysis allowed the results to be published in technical journals. In my first job as a student at Yerkes Observatory in Wisconsin, I sat for months with a mechanical calculator analyzing data (and making mistakes) for a senior astronomer.

Needless to say, all data was proprietary, and competition was fierce. If someone else published results before you did, it was a disaster, so cooperation between groups was unheard of. Science moved – but relatively slowly.

But, starting in the mid-1960s, roughly a half-century ago, revolutionary changes, driven largely by technology, changed the game, and progress accelerated at an extraordinary rate. New electronic detectors were far more efficient, and the information was recorded electronically, making the use of digital computers possible. Astronomy took off.

With a boost from “Moore’s Law” (the increase of computing power by a factor of two every 18 months) we went from bits to megabytes to terabytes in a few decades. Instead of looking at one spot or another on the sky and storing what we saw, we could now survey large areas; we could do it in several colors from the far infrared to the ultraviolet. And, we could do it again and again to see what was changing in the distant universe. The revolution was incredible.

Then telescopes were placed on satellites. The obscuring and blurring effects of our atmosphere were bypassed, and astonishing results could be obtained, regardless of the cloud cover, 24/7. Now we astronomers had leapt off the planet ourselves. Policies on the availability and distribution of the results had to change, and did change in a fashion as revolutionary as the results themselves.

The Sloan Digital Sky Survey exemplifies this transition in science. Astronomers, who had experience in cooperating with one another across boundaries since the Renaissance, led the way. It was initiated in conversations in Princeton in the late 1980s with Jim Gunn leading on the technical side and me on the organizational aspects. In the next several years a consortium was established with funds raised from private sources, universities, the Sloan Foundation, and several US government agencies to put a relatively small (2.5m) telescope at Apache Point Observatory in New Mexico and repeatedly scan the available sky in a dedicated mode. Over 40 institutions, both US and foreign, have taken active roles in the enterprise. The results have far surpassed the wildest early expectations.

Very early on in the endeavor, Principles of Operation were established that made open to all participants all of the data acquired by any of them – a startling change from most existing astronomical practice. It was further required that all data be archived and made available to the scientific world and to the public within two years after it was initially acquired.

This practice had an extraordinary, and intended, byproduct. It became to the advantage of every participant that instruments and data reduction techniques of the other scientists be maximally effective! Thus, cooperation among the various teams was enthusiastic and effective from the very beginning.

In operation since 2000, SDSS has archived 70 terabytes of data from 1/3 of the celestial sphere including over half a billion galaxies, stars, quasars and asteroids. It has resulted in over 5000 refereed publications that have been cited over 200,000 times. This is a record that tops any previous, ground based astronomical project – astonishing for results of a relatively small telescope - and due in no small part to the “Open Science” character of its charter and operations.

The successor program, the Large Synoptic Survey Telescope (LSST), is in the President’s 2014 budget. A much larger telescope at a superb Chilean site is planned to extend and enhance this policy of open distribution of data, with results that should greatly extend our knowledge and understanding of the universe around us. The story of this half-century revolution of cosmological discovery and comprehension is told in my recent book – “Heart of Darkness: Unraveling the Mysteries of the Invisible Universe.”

I have been lucky indeed to have lived through and participated in this exciting adventure.

Jeremiah Ostriker is a Professor of Astronomy at Columbia University.

Learn more about Technology
You’ve Got to See it Before You Can Read it! Making Ancient Texts and Images Available on the Web

Posted by William Noel on June 20, 2013 at 6:25 PM EDT

William Noel is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

I was one of those people who found their passion early. When I was about six, my Dad gave me a book called The Nursery History of England. It started out with “Little Men and Big Beasts,” and it ended with Queen Victoria. Not that I spent much time at the end of the book. I got bored when the book reached 1381, the date of the Peasant’s Revolt. I just read the beginning, with pictures of Alfred the Great and King Canute and King Harold and the Battle of Hastings, over and over again. I was hooked on medieval history.

My expensive English education only confirmed my life path. My history teacher was sublime; my science teachers were terrible: Physics was taught by a man who fenced for England, biology by a man who jogged across the United States, and chemistry by the UK hockey coach. But they couldn’t teach the sciences, and if I had any latent interest in science, it totally died.

Fast forward 25 years and I was the curator of a wonderful collection of illuminated medieval manuscripts at The Walters Art Museum in Baltimore. I knew nothing about the digital revolution that was about to explode, and neither did the museum. And then a private collector left on my desk an old book called The Archimedes Palimpsest. Its important texts - which included unique works by the ancient Greek mathematician - were erased in the thirteenth century, and the private collector charged me with making them legible. This developed into a worldwide project that involved multispectral imaging in Baltimore and X-ray fluorescence imaging at the Stanford Linear Accelerator Center. It also involved the work of scholars of Greek texts throughout the world. It was a cool project, we discovered neat things, and I finally got to realize that science – ancient and modern – was actually incredibly cool.

The importance of the project in this context, however, is that the owner of the book insisted that we publish the raw data as a set of flat files on the Internet, for anyone to use however they liked, and for free. I thought this was a nutty idea. How were people actually going to read the book: Would they have to open up each of the files in turn? Why were we not building an interface for people to conveniently view the book? Surely that was what was needed...

As so often in this project, I was wrong. The point is that anyone could build an interface, anyone could do with this data exactly what they wanted. They could ingest it into their own institutional repositories, they could further process the images, and they could create interfaces to read the text. And that is exactly what has happened. The dataset is now replicated in libraries around the world, and the images are being enjoyed by all sorts of people in all sorts of different context. The project is over, but the data and its manipulation live on in an open environment.

This experience fundamentally transformed me as a curator of rare materials. With a wonderful crew of people, and with funding from the National Endowment for the Humanities (NEH,) I started to digitize the illuminated manuscripts under my care and present them on the web in the same form as the Archimedes data. The result is that images from these manuscripts are now the easiest images to find of medieval manuscripts on the web: just try finding them on a Google image search! The traditional audience for these materials is grateful, and entirely new audiences have been reached.

The great problem in my field is that so few repositories of ancient books make digital images of their material available in truly useful ways: the data needs to be free, it needs to be published at the resolution at which it is captured, and it needs to be presented outside any fancy interfaces so that others can ingest it and use it as they like with the least “friction” possible. The web of medieval manuscripts in the future isn’t going to be built by institutions; it’s going to be built by users who are going to present the data as they want to present it, to answer the questions that they want to ask. The institutions need only provide the data – but they do have to provide the data! I now direct The Schoenberg Institute for Manuscript Studies at The University of Pennsylvania, which is in part dedicated to making this happen.

I want to use this opportunity to talk about another fascinating dataset created by the same team that created the Archimedes Palimpsest data. Like the Archimedes manuscript, this one too is a palimpsest – with the important text scraped off. The erased text was written in the ninth century, and it is by far the fullest witness to a Syriac translation of Galen’s On Simple Drugs by Sergius of Res ‘Ayna. There could well be other texts in the manuscript that have yet to be identified. A group of Syriac scholars is working on this, but the text was much more thoroughly erased than the Archimedes text, and what is needed is a campaign by people who can process the raw data to create legible images for these scholars. So, if you are an image processor and feel up to the Indiana Jones Challenge, have a go at it, and send me the results. I’ll put you in touch with the right people! Here is the dataset.

William Noel is the Director of the Schoenberg Institute for Manuscript Studies, and the Director of the Special Collections Center at the University of Pennsylvania.

Learn more about Technology
Genbank and Pubmed Central: Creating the Tools for Scientific Discovery

Posted by David Lipman on June 20, 2013 at 6:23 PM EDT

David Lipman is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

The National Institutes of Health (NIH) has a long history of supporting open access to the research it funds. That approach, I believe, recognizes the fact that science is cumulative, and that the greatest benefit to public health will be achieved if scientists can rapidly and easily access the research that has come before them. As NIH explains in a 2003 policy statement, “data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health.” I wholeheartedly agree.

At the National Center for Biotechnology Information (NCBI), the division of NIH’s National Library of Medicine (NLM) that I direct, we produce and make available more than 40 online databases. All of our databases are freely available to the public. However, two that people often think of when considering open access are GenBank, our database of all publically available DNA sequences (including the sequences from the Human Genome Project), and PubMed Central, our online archive of peer-reviewed biomedical sciences literature.

PubMed Central, or PMC as we commonly call it, is also the repository for articles submitted in compliance with the NIH Public Access Policy. The policy, which was implemented in 2008 as a result of legislation, requires that papers arising from NIH-funded research be made publicly available in PMC within 12 months of publication. As of result of the legislation and policy, hundreds of thousands of research papers have been made available to researchers, medical professionals, educators/students, and the general public.

PMC, however, is more than just a repository for scientific articles: it is an integral part of a larger information infrastructure that aims to accelerate biomedical discovery. One of the key concepts we focus on here at NCBI is trying to surface information that is relevant to a user’s query, but that they may not have thought to look for. That is, we try to help them look under that rock that they might have otherwise passed by. We are able to do this because of the underlying integration of the data and information in our databases. That integration also allows users to easily move between different types of related data and information -- for example going from a genetic sequence to a published article that cites that sequence, and then to the structure of a protein related to that same sequence.

Our hope, and my belief, is that these efforts further enhance the ability to make discoveries and bring added value to open-access information. I appreciate receiving this Champions of Change award, but I'd like to emphasize that it is the talented and hard-working folks at NCBI that have made our databases and services so well-received.

David Lipman, Founding Director of the National Center for Biotechnology Information (NCBI) at the National Institutes of Health’s National Library of Medicine (NLM).

Learn more about Technology
Indiana Jones and the Dungeon of Lost Data

Posted by Eric Kansa on June 20, 2013 at 6:03 PM EDT

Eric Kansa is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

Many people will remember the closing scene of Raiders of the Lost Ark, where the artifact at the center of the movie's plot ends up in a vast warehouse, presumably never to be seen again. Many people don't realize that scene, more than any other aspect of the movie, reflects something of the reality of archaeology in practice. Here's how -

Archaeology explores the history and development of people and societies through the documentation and analysis of the physical remains of human activities. While most people think archaeology focuses on artifacts, artifacts are only part of the picture. In fact, artifacts mean very little without detailed information about context - the position and relationships between artifacts, other finds (food remains, architectural debris, etc.,) stratigraphy (layering of deposits,) and other lines of evidence. Archaeologists today build complex databases to fully document and describe artifacts, other finds, and critical contextual information. Without sharing and preserving this data, artifacts can easily be as “lost” as the Ark hidden away in the warehouse.

For a variety of reasons, archaeologists find it difficult to share and preserve this irreplaceable information that is key to understanding ancient societies. That's what motivated Eric Kansa and Sarah Whitcher Kansa, ten years ago, to step out of the traditional academic career path, launch a nonprofit “start-up” (the Alexandria Archive Institute,) and develop Open Context, a system that provides new ways for archaeologists to publish the full richness of the data they create. A key innovation, Open Context helps to advance data sharing as a form of publication. This means Open Context not only works to archive data for preservation, but it also works to expand the quality and expressive power of data, and to make open data a rich and meaningful part of scientific communications. This approach can free archaeology from the confines of the ivory tower, and make data accessible to everyone, to advance research, to teach, or to simply appreciate and enjoy.

Open Context stands as one of many efforts working to make our knowledge of the human experience, our evolution and cultural heritage, freely available for exploration and debate. Open Context publishes editorially-vetted, peer reviewed data and archives this content with the California Digital Library, a vast repository preserving information from many disciplines. Leveraging the power of the Web, Open Context links its published content with other data published by museums, online maps, Wikipedia, and other open databases and digital archives (see here and here) shared by researchers worldwide.

This rich linking helps identify unexpected connections across the Web. For example, the Open Context team is working with the Encyclopedia of Life to link archaeological data that documents how ancient people domesticated animals in the Middle East together with databases that encode relationships between genes and the physiological processes involved in growth and development in people and animals. Open access and open data helps us to realize the rich interconnections between all areas of knowledge, spanning across the humanities, the natural sciences, and medicine.

Archaeology is inherently multidisciplinary, requiring collaboration between natural scientists, social scientists, and humanists. In working to meet archaeology's data needs, the Open Context team gained invaluable experience and expertise, now applied in other areas of open science. Eric Kansa has used this experience to help open data and open government efforts in public health and with NASA, and he even participated a panel discussion on information architectures for open government with Vivek Kundra, the first CIO of the United States (see here and here.)

Open Data has a vibrant and growing community, which freely shares knowledge and innovations, and is working together to solve problems. It's a network that breaks down legal, technical and bureaucratic barriers to foster collaboration. This network also works to reclaim and reinvigorate notions of the “public good,” an ideal long undermined by narrow interests that have co-opted so much of our public investment in research and education.

Since its inception, Open Context has aimed to make research outcomes freely accessible to the widest possible community. Yet for openness to work, it cannot only be about the data outputs of research - we also have to care about the inputs. Eric Kansa's collaboration with IPinCH, an international effort exploring intellectual property and privacy issues in different cultural settings, shows that research impacts many communities. Those communities need to have a say in how research is conducted and how research outcomes should be communicated.

In that vein, the Open Context team is also beginning a new program, aimed at the next generation of researchers, to develop expertise in the technical, theoretical and ethical challenges inherent in data. This will help students not only to better adapt to a radically changed professional environment, but also to become future leaders in creating and using data with greater thought and care – a critical need given the expanding role of data in virtually every aspect of our lives.

The Open Context team is grateful to the American public, which has supported their work through grants from the National Endowment for the Humanities and the National Science Foundation. Additional funding has come from the William and Flora Hewlett Foundation, the Alfred P. Sloan Foundation, the American Council of Learned Societies, and others.

Eric Kansa is an Archaeologist.

Learn more about Service, Technology