Latest News

Indiana Jones and the Dungeon of Lost Data

Posted by Eric Kansa on June 20, 2013 at 5:03 PM EST

Eric Kansa is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

Many people will remember the closing scene of Raiders of the Lost Ark, where the artifact at the center of the movie's plot ends up in a vast warehouse, presumably never to be seen again. Many people don't realize that scene, more than any other aspect of the movie, reflects something of the reality of archaeology in practice. Here's how -

Archaeology explores the history and development of people and societies through the documentation and analysis of the physical remains of human activities. While most people think archaeology focuses on artifacts, artifacts are only part of the picture. In fact, artifacts mean very little without detailed information about context - the position and relationships between artifacts, other finds (food remains, architectural debris, etc.,) stratigraphy (layering of deposits,) and other lines of evidence. Archaeologists today build complex databases to fully document and describe artifacts, other finds, and critical contextual information. Without sharing and preserving this data, artifacts can easily be as “lost” as the Ark hidden away in the warehouse.

For a variety of reasons, archaeologists find it difficult to share and preserve this irreplaceable information that is key to understanding ancient societies. That's what motivated Eric Kansa and Sarah Whitcher Kansa, ten years ago, to step out of the traditional academic career path, launch a nonprofit “start-up” (the Alexandria Archive Institute,) and develop Open Context, a system that provides new ways for archaeologists to publish the full richness of the data they create. A key innovation, Open Context helps to advance data sharing as a form of publication. This means Open Context not only works to archive data for preservation, but it also works to expand the quality and expressive power of data, and to make open data a rich and meaningful part of scientific communications. This approach can free archaeology from the confines of the ivory tower, and make data accessible to everyone, to advance research, to teach, or to simply appreciate and enjoy.

Open Context stands as one of many efforts working to make our knowledge of the human experience, our evolution and cultural heritage, freely available for exploration and debate. Open Context publishes editorially-vetted, peer reviewed data and archives this content with the California Digital Library, a vast repository preserving information from many disciplines. Leveraging the power of the Web, Open Context links its published content with other data published by museums, online maps, Wikipedia, and other open databases and digital archives (see here and here) shared by researchers worldwide.

This rich linking helps identify unexpected connections across the Web. For example, the Open Context team is working with the Encyclopedia of Life to link archaeological data that documents how ancient people domesticated animals in the Middle East together with databases that encode relationships between genes and the physiological processes involved in growth and development in people and animals. Open access and open data helps us to realize the rich interconnections between all areas of knowledge, spanning across the humanities, the natural sciences, and medicine.

Archaeology is inherently multidisciplinary, requiring collaboration between natural scientists, social scientists, and humanists. In working to meet archaeology's data needs, the Open Context team gained invaluable experience and expertise, now applied in other areas of open science. Eric Kansa has used this experience to help open data and open government efforts in public health and with NASA, and he even participated a panel discussion on information architectures for open government with Vivek Kundra, the first CIO of the United States (see here and here.)

Open Data has a vibrant and growing community, which freely shares knowledge and innovations, and is working together to solve problems. It's a network that breaks down legal, technical and bureaucratic barriers to foster collaboration. This network also works to reclaim and reinvigorate notions of the “public good,” an ideal long undermined by narrow interests that have co-opted so much of our public investment in research and education.

Since its inception, Open Context has aimed to make research outcomes freely accessible to the widest possible community. Yet for openness to work, it cannot only be about the data outputs of research - we also have to care about the inputs. Eric Kansa's collaboration with IPinCH, an international effort exploring intellectual property and privacy issues in different cultural settings, shows that research impacts many communities. Those communities need to have a say in how research is conducted and how research outcomes should be communicated.

In that vein, the Open Context team is also beginning a new program, aimed at the next generation of researchers, to develop expertise in the technical, theoretical and ethical challenges inherent in data. This will help students not only to better adapt to a radically changed professional environment, but also to become future leaders in creating and using data with greater thought and care – a critical need given the expanding role of data in virtually every aspect of our lives.

The Open Context team is grateful to the American public, which has supported their work through grants from the National Endowment for the Humanities and the National Science Foundation. Additional funding has come from the William and Flora Hewlett Foundation, the Alfred P. Sloan Foundation, the American Council of Learned Societies, and others.

Eric Kansa is an Archaeologist.

Learn more about Service, Technology
Democratizing the Science, Accelerating the Cure

Posted by Kathy Giusti on June 20, 2013 at 4:57 PM EST

Kathy Giusti is being honored as a Champion of Change for the vision she has demonstrated and for her commitment to open science.

Multiple myeloma is a fatal blood cancer with a five-year survival rate of only 41 percent – one of the lowest of all cancers. As an organization, we have worked tirelessly to remove barriers slowing research into this disease, and the progress we have helped make – namely six new treatments in the span of 10 years – has had a meaningful impact on patients. Despite the progress, however, all multiple myeloma patients will inevitably relapse, necessitating new lines of therapy that effectively treat the disease.

Significant challenges exist. Chief among these is multiple myeloma’s staggering genetic diversity. Advances in basic science have taught us that multiple myeloma, like most cancers, is not a single disease; rather, it is comprised of several distinct sub-types, each one defined by abnormal genes and disruptive proteins that allow cancer cells to thrive. Developing and matching patients to treatments that precisely neutralize these abnormalities is key to achieving longer-lasting remission and cures.

In 2011, we took a giant step forward to this goal with the launch of the CoMMpass℠ study, a landmark, $40 million study that brings together competitors from pharmaceutical and biotech companies, academic institutions, and community cancer centers to work. The group is working as one to define multiple myeloma’s sub-types and to assess which drugs work best for different sub-types. One thousand newly-diagnosed multiple myeloma patients will be followed longitudinally over five years and their tissue samples, genetic information and various disease and clinical outcomes will be extensively analyzed. We believe that CoMMpass data will allow us to precisely match patients, based on their sub-type, with treatments that offer the best chance of long-lasting remissions with fewer effects and cures.

Importantly, all data from CoMMpass will be placed into a publicly available, open source, IP-free data ecosystem, together with data from other MMRF-driven initiatives and other datasets. This will create one mega dataset that is unprecedented in its depth and breadth.

We encourage anyone interested in using or contributing to this robust dataset to be part of our search for a cure. The MMRF Researcher Gateway, scheduled to launch in September 2013, draws upon the wisdom and creativity of the crowd to advance discovery. Basic analytic tools make data accessible to a diverse group of users—scientists, pharmaceutical companies, health care organizations, physicians, and even patients—without requiring strong IT skills.

As a complement to the Researcher Gateway, the MMRF Community Gateway, also to be launched in September, will engage patients in the research process. The Community Gateway will create a dynamic, online community of myeloma patients and will allow patients to better understand their disease and treatment options, to share their data and experiences, and to take action by (optionally) sharing their data with research studies or joining clinical trials.

On behalf of the entire MMRF team and our partners in CoMMpass, we thank the White House for recognizing the promise and significance of our approach. We are honored to have been named an Open Science Champion of Change.

Kathy Giusti is the Founder and Chief Executive Officer of the Multiple Myeloma Research Foundation (MMRF).

Learn more about Technology
Large-Scale Open Access for Research and Outreach

Posted by Paul Ginsparg on June 20, 2013 at 4:48 PM EST

Paul Ginsparg is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

In 1991, before the WorldWideWeb, before the general population was even aware of the internet, physicists had already begun to share pre-publication versions of their articles via email. As a research staff member at Los Alamos National Laboratory, I was concerned that this private sharing unintentionally gave privileged access to more established researchers. To help rectify the situation, I set up a centralized automated repository and alerting system, making cutting-edge full-text articles accessible to anyone with internet access. For the initial group of high energy theoretical physicists, the system had the immediate positive effect of democratizing the exchange of information within an entire global research community.

By 1993, the advent of the WorldWideWeb suggested ever broader possibilities, with research communication for all fields ported to the new on-line medium. The arXiv system itself quickly grew from a few hundred submissions per year to many tens of thousands, and moved to the Cornell University Library in 2001. Its growth continued to accelerate, and it now receives close to 100,000 new open access submissions per year (see graphs.) With nearly a million open access articles in the repository, and hundreds of millions of full-text downloads per year, it serves as the primary daily information feed for global communities of researchers in physics, mathematics, computer science, and related fields. Its proof-of-concept served as the prototype for many other modern open access systems that disseminate scientific research results.

When the general public arrived on the internet by the mid 1990's, there emerged intriguing possibilities for engaging beyond the professional research community. Scientists write articles in order to have them read, hence the more readers the better. Research output, like any public good, becomes more valuable the more it is used.

Twenty years ago, I began discussing with officials at NSF and DOE the possibility of creating open repositories for the articles that result from federally funded research. In 1997, biologists (and by then as well mathematicians and computer scientists) interested in how physicists were sharing on-line information set up a series of discussions which ultimately led to NIH's PubMedCentral repository (on whose initial advisory board I served) and, a decade later, to the NIH mandate to deposit. The OSTP policy memorandum expanding this mandate to other large federal funding agencies, making "the results of federally funded research freely available to the public within one year of publication" is thus welcome progress, and perhaps the time delay can eventually be reduced. arXiv.org already plays an extremely valuable role in giving access to federally-funded research articles to the general public; conventional news sites and blogs link directly to multiple articles, frequently bringing in hundreds of thousands of readers to popularly accessible articles.

My own work has remained focused on making these systems so useful to researchers that they participate spontaneously, and so (like YouTube) no mandate has ever been required. Most recently, in collaboration with computer scientists (at Cornell, Rutgers, and Princeton), I've been implementing a variety of new "big data" datamining tools for the purpose of analyzing usage and information genealogy, using arXiv's unique combination of open access texts and twenty years of longitudinal usage data. Some of the results of this work will go on-line in a new experimental interface later this year, with improved ability to track both long- and short-term trends through the literature, a new recommender system to help users cope with information overload, and new interdisciplinary means of information navigation and discovery. This work should further clarify the benefits to both the research community and the general public of having fully open access research text aggregated and treatable as computable objects.

Continued growth in distributed network databases, new interoperability protocols, machine-readable document standards, and relevant ontologies will build on these components to catalyze more rapid scientific progress, and provide integration of educational resources for the general public.

Paul Ginsparg is a Professor of Physics and Information Science at Cornell University.

Learn more about Technology
Big Data and Personalized Medicine

Posted by Steven Friend on June 20, 2013 at 4:44 PM EST

Steven Friend is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

The massive public investment in the Human Genome Project has already paid serious dividends - calculated by some at $141 in returns for every $1 invested. But the advances in genomics have yet to impact the lives of most American citizens through what is often called "personalized medicine.”

One promise of personalized medicine is to predict which patients will respond to medications and which patients will not. From molecular data like our DNA sequence information, we should be able to tease out the subtle variations that make each of us unique to predict, for example, whether a drug is likely to work in our bodies. But so far we haven’t been able to do this with real effectiveness, or at the kind of scale we need. That means many people receive drugs that are unlikely to work, for reasons that we should be able to understand - but we don’t.

Rheumatoid arthritis (RA) is a good example. For people with RA, strong immunosuppressive medications are administered in order to treat pain and inflammation. Despite substantial efforts by researchers in academia and industry, there are no reliable genetic clues to predict which 30% of patients will enter clinical remission - that is, have the drug actually work - following treatment with therapy in RA. That means we have to give the drug to everyone, and pay for it for everyone, while 70% of Americans who take the drug only get the side effects without any therapeutic relief.

Alzheimer’s disease is another area where we haven’t been able to use genetics as well as we should. We have amazing imaging equipment that can peer deep into brains, and we have amazing sequencing capacity. But we haven’t yet put them together to figure out why some people get Alzheimer’s and some don’t, or why some who get the disease progress faster than others. Some barriers have been created as a result of the small numbers of those who know how to use the complex information, as well as from fact that so much data sits compartmentalized inside corporate and academic silos that have limited appeal and accessibility for scientific collaboration.

The time is right to try a different approach to make genetics work for personalized medicine. At Sage Bionetworks, we are partnering with innovative groups such as Gustavo Stolovitzky at IBM-DREAM, the Robert Wood Johnson Foundation, and Ashoka to apply the tools of open science to solve big problems in health.

For example, we’re launching Big Data “Challenges” in both Rheumatoid Arthritis and Alzheimer’s disease that have the potential to bring the power of crowds to figure out how individual genetic variation impacts these diseases. We’re making the Challenge data about genes and proteins available to anyone who wants to take a crack at the problem. To drive these Challenges, we’re leveraging our open-source Synapse data-sharing platform and DREAM’s well-established framework for running Challenges (originally developed by IBM). DREAM’s know-how helps us design smart, impactful Challenges, and Synapse’s leaderboards, code-sharing and provenance tools will get teams of teams revved up and participating in a real-time dialog that fosters rapid learning and better predictive models. These Challenges will generate winning models that then guide new clinical trials (in RA) and that spell out the patient data we most need to help guide better treatment for patients (in AD).

By partnering with the Arthritis Foundation, the Global CEO Initiative on Alzheimer’s Disease, and others, we know we can make an immediate impact on these two diseases. And most importantly, we can do it in a way that others can build on - which is the essence and value of open science.

Steven Friend is the President of Sage Bionetworks.

Learn more about Technology
Most of the Smartest Bioengineers Work for Someone Else

Posted by Drew Endy on June 20, 2013 at 4:39 PM EST

Drew Endy is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

In ways we often barely understand, natural organisms, from piping plovers on Long Island to banana slugs among California’s redwoods, reproduce, struggle, or thrive across amazingly diverse environments. Yet despite this sparse understanding, we already partner with biology to make many necessities. Foods, medicines, fuels, and materials are increasingly manufactured by domesticated or re-engineered organisms: Insulin for treating diabetes is made with re-engineered microbes, and organic mushrooms grown by natural wood fungi that eat sawdust. Forty years after its inception, genetic engineering now underlies about two percent of our domestic economy.

Yet, most of what we might make with biology has not been imagined or created. For example, researchers just figured out how to store archival digital copies of books, including Shakespeare, in chemically synthesized DNA, a molecule so tiny that many millions of books could be stored in mere thimbles. Meanwhile, much of nature’s biodiversity is being lost or increasingly threatened by growing populations and expanding consumption. These natural ecosystems are essential for the wellbeing of our environment and are also the source of almost all biotechnology innovations. We don’t so much engineer biology from scratch as repurpose and refine nature’s existing materials.

Thus, a fundamental challenge confronting everyone, directly or indirectly is to learn to work in better partnership with nature and each other, to make the things we need without destroying ourselves or the environment. “Open science” in every respect, including sharing of ideas, unfettered access to all research literature and data, and freedom-to-use basic biological materials, is central to making this progress.

In this spirit, we started the BioBricks Foundation (BBF) in 2003 as a public-benefit charity with a long-term mission of helping to advance biotechnology to benefit all people and the planet. Tom Knight, then at MIT, had already led the way by teaching how free-to-use technical standards could enable radical, global collaboration. Students in Australia could suddenly take a fragment of natural DNA encoding a microscopic protein-based “balloon” that causes cells to float or sink, and refine it into a standard “biobrick” part. Researchers everywhere could then readily take that BioBrick balloon, and stack it with many other similarly standardized genetic parts to more easily solve innumerable problems.

Since 2003, the BBF has focused on improving open and free-to-use standards that support biological engineering. We have leveraged seed funding from the National Science Foundation to create cooperative partnerships between universities and companies to develop high-quality, standard biological parts. We created a legal tool, the BioBrick Public Agreement (BPA), which allows researchers in academia and industry to easily share free-to-use our standard biological parts. We are now using the BPA to give away many of the best standardized creations from our labs, including genetic switches and amplifying logic gates, so that others can more quickly solve pressing problems.

As Bill Joy, Co-Founder of Sun Microsystems reportedly observed, “No matter who you are, most of the smartest people work for someone else.” “Joy’s Law” applies to bioengineering in almost every way possible. No matter where you are, most of the smartest biologists work for somebody else. Thus, most of the biological discoveries you need for a given problem will be housed somewhere else. Needs and opportunities will exist somewhere else, and the world’s biomanufacturing capacity, distributed across the world’s bread, yogurt, cheese, and brewing facilities, is controlled by someone else. Most of the people who need to use and trust your work won’t ever meet you. In all, tools for sharing will form the foundation of our future bioeconomy.

Drew Endy is a Bioengineer at Stanford University.

Learn more about Technology
Transforming Open Access Biomedical Data into New Drugs and Diagnostics

Posted by Atul Butte on June 20, 2013 at 4:33 PM EST

Atul Butte is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

The past 20 years have seen amazing changes in biomedical research. Gone are the days of sequencing a small piece of DNA, or measuring the expression level of one gene, or studying one protein at a time. Scientists can sequence an individual's entire DNA, measure the levels of every gene, and study nearly every protein, all simultaneously. Moreover, scientists perform these measurements using commercial tools and services, which are all positive outcomes from the Human Genome Project. These tools, services, and discoveries enable scientists to learn what the differences might be between individuals with disease and those without, and how we might treat those diseases.

But what is truly stunning is that, in many cases, today scientists share their raw measurements on the Internet.

The impetus to share data comes from many directions. Scientists share their data to enable others to reproduce their discoveries, while journal editors believe shared data helps readers trust their publications. Biotechnology companies release data to help scientists understand their measurement platforms. And funding agencies are asking researchers to make their data publicly accessible to promote its reuse. Earlier this year, the White House Office of Science and Technology Policy directed Federal agencies with significant R&D grants and awards to ensure their recipients make their work publicly available within one year of publication. This policy also applies to the digital data created by scientists.

Open scientific data is an amazing public resource. Making scientific data publicly accessible costs only a small, marginal amount over its scientific creation. Public data enables transparency and reproducibility in science. Data is infinitely shareable without diminishment in its value.

In fact, data takes on greater value when it is intersected with other data sets. Clinical trial data can be reanalyzed to find the subsets of patients who most greatly benefit from specific drugs. DNA sequencing data from thousands of individuals can be used to learn what is “normal” and help us interpret DNA from an afflicted patient. And open data on health care costs, utilization, quality, and errors, all can be integrated into apps of the future, enabling patients and consumers to make better data-driven decisions.

In my lab, we have found that combining publicly available molecular measurements made by a dozen independent researchers on the same medical condition, such as preterm birth, can yield a reliable set of diagnostic markers that would not be obvious to each researcher working separately. We also have seen that open measurements of diseases can be integrated with measurements of drug effects, resulting in new ways to use those drugs to treat conditions. Finding new ways to use existing drugs could help get therapies to patients with rare diseases. And these data-driven drugs and diagnostics can even form the basis of new businesses and ventures in medicine.

In this way, open scientific data is a kind of power platform to be leveraged. But perhaps open scientific data is best described as a means to thaw “frozen discoveries,” meaning that focusing light and energy can thaw on existing knowledge can release those discoveries. This yields the new drugs, diagnostics and knowledge we still sorely need in medicine.

I am honored to be recognized as an Open Science Champion of Change, and thank my daughter Kimi for inspiring my work, my lab and collaborators for their efforts in making these important discoveries, my wife, Gini Deshpande, for being a life-partner and collaborator in developing my science and launching our ventures, and to scientists everywhere for sharing their data with the public and for creating the tools to enable others do so.

Atul Butte is an Associate Professor in Pediatrics and Genetics at Stanford University, and the principal investigator of ImmPort, the long-term, sustainable data warehouse for re-use of immunological data funded by the National Institute for Allergy and Infectious Diseases. He is also a founder of Personalis, which provides clinical interpretation of whole genome sequences, of Carmenta, which uses public data to discover diagnostics for life-threatening conditions in pregnancy, and of NuMedii, which uses public big data to find new uses for drugs.

Learn more about Technology
Open Access: The Pathway to Innovation

Posted by Jack Andraka on June 20, 2013 at 4:27 PM EST

Jack Andraka is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

When I was 14 a close family friend, who was like an uncle to me, succumbed to pancreatic cancer. When the disease hit so close to home, I felt like I needed to know more. So I went online and started reading through the available information and what I discovered was eye opening. A big reason for the dismal pancreatic cancer survival rate is that there is no inexpensive, simple way to detect it early.

I was sure there had to be a better way. But as I started to research online (using the teenager’s favorite sources, Google and Wikipedia) I started running into problems. I would find an article I needed, I would click to start reading, and then a new window would pop up informing me that if I wanted the paper, it would cost $35. The first thought that came to mind was, “Who in their right mind would pay $35 for 11 pieces of paper???”

Now I can answer my question as to why anyone would pay $35 for 11 pieces of paper: It’s scientific knowledge.

Because of the high demand for the newest and best scientific research, the major publications have successfully — yet subtly — commoditized this knowledge. Publishers are basically discriminating based on whether or not you or your school can afford to access. This tier-based approach for the dissemination of knowledge — I believe— is incredibly detrimental to the entire field of science.

Scientific research benefits from the open sharing of knowledge. When I was working on a diagnostic test for pancreatic cancer, there was one key paper… but imagine if I didn’t have access to that paper. I might never have had the idea that led to my success. That to me, that is the fundamental problem with scientific journals: they prevent the democratization of innovation.

Google and Wikipedia were the sites I relied heavily on to gather most of the information I needed. But it was by no means easy. I still had to send out hundreds of emails and pay hundreds of dollars to get access to those articles. Imagine if we removed the cost barrier. If the flow of scientific knowledge was unrestricted no matter what your age, race, or how much money you had, how would that impact your ability to do quality scientific research?

Then the only restriction would be was what was in your head. That’s the ONLY way to include the billions of future innovators: to make access to articles free. If we did, can you imagine the possibilities? The best ideas in the world usually come from the most unexpected places! When you deny open access you deny people like me the ability to innovate. You are leaving behind billions of potential innovators and innumerable amounts of world-changing innovations.

People’s minds must be free, and that means the minds of all, not the minds of a select few. A scientific discovery is like a grain of wheat. A grain of wheat is just a singular grain, unless of course you cultivate it and let it grow, and then you have many grains of wheat that will continue to multiply. A scientific discovery is in itself just a singular discovery unless you allow others to see it and add their input and creativity. Then, you have a scientific revolution.

We have our grains; so now the question is what will we do with them?

I for one am supporting the Fair Access to Science and Technology Research Act (FASTR) in its efforts to promote open access. You’ll find me giving talks in person around the world and through social media about the importance of open access to innovation and STEM.

Jack Andraka is a Maryland High School Student.

Learn more about Technology
Creating a Global Alliance for Sharing of Genomic and Clinical Data

Posted by David Altshuler on June 20, 2013 at 4:11 PM EST

David Altshuler is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

Working together with over 70 leading healthcare, research, and disease advocacy organizations (involving collaborators in over 40 countries), my colleagues and I have begun to form a global alliance to enable responsible sharing of genomic and clinical data.

We are motivated by the view that a new era is opening in the science of genomics and its application to medicine. The cost of genome sequencing has recently fallen one million fold. Just a few years ago, only a handful of human genomes had been sequenced; today there are many tens of thousands of sequenced genomes, and it is widely expected that in the coming years millions of people will choose to have their genome sequenced for research, clinical, or personal use. The public interest will be best served if we work together to develop and promulgate open standards (both technical and regulatory) that make it possible to effectively and responsibly share and interpret this wealth of information.

The ability to collect and analyze large amounts of genomic and clinical data presents a tremendous opportunity to learn about underlying causes of cancer, inherited and infectious diseases, and individual responses to drugs. Moreover, for patients with cancer, and rare inherited diseases, genome sequencing is already becoming a powerful tool for diagnosis and decisions about therapy.

We realize that discussions about sharing large amounts of personal data naturally raise important questions about ethics and privacy. Accordingly, we have committed to work together to study and share perspectives on ethics, regulation and privacy. We are committed to the principle that each individual has the right to decide whether and how broadly to share their personal health information. Our technical and regulatory solutions must support and enable these personal decisions.

read the rest

Learn more about , Technology

Engage and Connect

Latest News

Twitter