Reflections, September 2012
by Roberto Pieraccini, Director
W. Brian Arthur, in his book The Nature of Technology: What It Is and How It Evolves, describes the evolution of technology as a combinatorial process. Each new technology consists in a combination of existing technologies that “beget further technologies.” Moreover, each technology springs from the harnessing of one or more physical, behavioral, mathematical, or logical principles that constitute its foundation. Innovation proceeds either through the establishment of new principles—which is typically, but not only, the domain of science—or through new combinations of existing technologies.
Not all technologies are alike, however; sometimes a single new technology, an enabler, generates a myriad of new possibilities that lead to the creation of new industries and new economies, and in some very rare occasions contribute to the definition of a new era of our civilization. Such were the steam engine, the digital computer, and the Internet.
We may find ourselves wondering what the next enabler will be. Of course no one knows for sure, and any attempt to make a prediction will most likely be wrong. But researchers have a special role in our technological future. They cannot predict what the future will be but, to paraphrase Alan Kay’s famous quote, they can attempt at creating it. Well, looking at the trends of current research in information technology, we can definitely see that the attempt to create a new future based on automatically deriving higher levels of data understanding is today one of the most challenging endeavors researchers are embarked on. Let me be more specific.
Our era is characterized by an unprecedented amount of information. It is no surprise that a significant amount of technological research today is devoted to the creation, management, and understanding, by computers, of the wealth of data – text, images, and sounds around us. But it is the understanding, which is the most challenging among those problems and the farthest from a satisfactory solution. A large portion of the research community aims to devise algorithms and systems to automatically extract the meaning, the semantics, from raw data and signals. In fact, a lot of the research carried out at ICSI, as at many other research centers, can be described as “looking for meaning in raw data.” Research on natural language and visual and multimedia signals is diving into the problem of deeper understanding. Beyond the mere (so to speak) recognition of words and entities in language and objects in images, we are now trying to get to the deeper content, such as metaphors and visual concepts. And beyond that, the research in network security, for instance, is trying to assign meanings to the patterns found in streams of communication among computers in order to detect possible intrusions and attacks, while theoretical computer scientists are trying to find meaning in DNA strings and in networks of brain activity.
However, we are not quite there yet. Think, for instance, about the promises made by the vision of the semantification of the whole Web. The vast majority of the Web comprises unstructured raw data: text, images, audio and videos. Tim Berners Lee was the first to envision a semantic web, and many have been working toward that dream, with limited degrees of success. Even though many agree on ways to encode a semantic web, and Google’s knowledge graph is an example of one of the most advanced large-scale attempts to structure factual knowledge and make it readily available to everyone, a full semantic Web is not there yet. The knowledge graph starts from existing structured knowledge, for instance the facts about Albert Einstein life, and connects that structured knowledge to Web searches. The knowledge graph includes millions of entries, which is an infinitesimally small number compared to the vast universe of the Web. Are you and your papers on the knowledge graph? Are all recent world facts, blog entries, and opinions on the European financial crisis in the world graph? Maybe they will be, but the question of coverage and that of keeping the information fresh and updated is yet to be solved. And an even more serious issue: the Web is not just text. For instance, the amount of video on the Web is growing at a mindboggling rate. Some of the recent published statistics about YouTube estimate that 72 hours of video are uploaded every minute, over 4 billions hours are watched every month, and in 2011 alone, YouTube had more than 1 trillion views, the equivalent of over 140 views for every person on Earth.
It is true that we have ways to encode semantics into Web pages, as seen in the work of W3C; semantic representation languages, like OWL are widely used today. But with the Web growing at dizzying speed any attempt to manually annotate every unstructured element—text or video—with its meaning or something closely related to it, is bound to fail. If we want to fulfill the dream of a fully semantic Web, we need methods for automatically understanding text, images, and videos, including speech, music, and general audio.
The enabling potential of a fully semantic Web is huge. A fully semantic Web will change the concept of searching the Web into that of asking questions of it and getting answers, not just, as is sometimes possible today, in restricted domains, but everywhere, about any topic, no matter the size, or popularity, or language. It will help transform mere facts into structured data and actionable information, not just about Albert Einstein and other famous people, but also about you, the grocery store at the corner, and what your friends write on their blogs. It will complete the process of moving from raw data—the gazillions of pages of online text and video—toward higher abstractions of knowledge, understanding, and wisdom. And that’s not all. Semantification of the whole Web will enable the flourishing of new areas of the information industry that are not possible today, or that are possible with a lot of handcrafting and ad-hoc solutions in limited domains and hardly scalable to other domains, languages, and media. It will allow us to interact with the Web as we do with humans, asking it questions in the same way we ask human experts questions. It will allow us to automatically compare sources of information for accuracy and truthfulness, even if they are in different languages.
However, the true full semantification of the Web is not just an ambitious dream, but a necessity. We may reach a point, in the not too distant future, when the Web will be so large that current search methods will no longer be the most effective way to find the information we need in the format we need it. How do we sort through an increasingly vast number of documents, and not just text, on a particular topic, with many independent and conflicting opinions, comprising true and false statements and different points of view? We already have to use specialized search engines to find the cheapest flight among independent travel sites and aggregators and to find the most economical reseller of a particular item. That’s possible today because of the underlying structured form of the commercial sites. In a way, they are already semantified. But think, for a minute, of doing the same thing with unstructured information: raw text, audio, images, and video. Think about searching for documents that “have the same meaning” of another document, regardless of their language, form, and wording.
I don’t know if we will see a full semantification of the Web in our lifetimes. I don’t know if that’s even possible, or whether it is that great enabler we dream of. But one thing is certain: research is clearly moving toward a deeper understanding of the information around us, and if it is successful, we will be able to experience a higher level of social, economic and political influence of the technologies in our lives. The full semantification of the Web, whenever it will happen, will be a game changer of enormous proportions, an enabler of industries and services which will impact all aspects of our lives. We are working on it.