In‌ ‌the‌ ‌data‌ ‌wasteland‌ ‌

    HOW TO

  • Find the gems within mountains of incoherent and unorganized data 
  • Create a data collection system to bolster your research and analysis

Everything is data: from accurate figures to impressionistic narratives, through to rumors and plain lies, which carry information on the people articulating them. The issue with data is that, precisely because it is ubiquitous, it forms an incoherent slush until organized into something recognizable.

Its analytical value will depend on two variables in particular: volume and structure. Discrete information may yield indications on where else to look for more, but will fail to offer broader insight into social, economic or political trends. Large quantities of data will prove impossible to mine unless organized in such a way as to make it comprehensible. The goal, therefore, must be a critical mass of structured data.

The availability of different sorts of data varies widely from one part of the world to another, depending on how bureaucratic or informal their systems of governance are, how mature their civil society tends to be, and so on. Societies, also, can render themselves transparent or opaque to varying degrees. The Arab world, for example, is a case-study on data paucity: dysfunctional authoritarian regimes produce little reliable information and share even less, while citizens conceal much about themselves, notably on social media where multiple accounts, shifting aliases, and cryptic forms of expression, such as sarcasm, are the norm.

In such a data wasteland, even conventional wisdom becomes suspect and hard to fact-check. It will nonetheless reverberate, impose itself and often gain validation from supposedly legitimate and trusted sources. In Jordan and Syria, in the late 2000s, the UN embraced wild government estimates on the number of Iraqi refugees long before any institutional measures had been taken to register them. In Iraq, serious media outlets consistently describe Mosul as the “second biggest city” in a country that hasn’t had a proper census in decades, and even though Basra shows many signs of being larger. In Lebanon, a host of international bodies adopt economic figures that are rendered unverifiable by the government’s years-old refusal to divulge essential financial data or even a draft budget.

The issue with data is that it is ubiquitous

Groundbreaking research on social issues will often unearth, at first glance, a wealth of rich but not necessarily reliable narratives and a dearth of hard data. Academics and pollsters use various techniques to overcome this problem: notably they build questionnaires and code the answers given by respondents. The outcomes can help produce more clarity, which can be deceptive too, because respondents’ answers are shaped by predetermined questions.

The truth is that data collection, despite its façade of objectivity, is a handicraft more than a science. Success depends far more on common sense, creativity, trial and error, and flexibility than it does on any formalistic methodology. A farmer doesn’t need complex technology or methodology to develop a sophisticated database of his crop yields, factoring in diverse soils, climate patterns, past experiments with seeds or fertilizers, annual variations, comparisons with neighbors, etc.

Researchers can cultivate their field of investigation in much the same fashion. In fieldwork-based research, data collection consists of parsing information from a variety of sources and weaving it anew into thematic threads. All dates will go into a chronology (or several timelines covering different aspects of a topic). Information on individuals and their relationships can build up into a biographical data-set, which may be conducive to a genealogical tree, an organizational chart or a visualization of networks. Geographic information will naturally feed into maps. Descriptive “building blocs” will also emerge from scattered information, gradually adding up into a history of a particular institution, a memo on a specific legal issue, an infographic or the like.

The data itself will likely emanate from a mix of sources. Most topics will reveal themselves through existing “literature” or expertise; documents containing raw material; media mentions over a period of time; and interviews conducted with the concerned. As a rule, much more information is available than we are initially tempted to believe—simply because it’s convenient to save ourselves the trouble of digging deep into archives and narratives, which indeed is time-consuming. Assuming the opposite, i.e. that a data treasure trove is out there just awaiting to be discovered, will in fact save you time: more often than not, you’ll come across people who have already done much of what you could do. “Mapping the mappers” is, therefore, always a good place to start.

Shuffling data is tedious, for sure. But it is also an essential component of the analytic process. Our eye “sees” because it organizes things into categories—colors, textures, movements, distances—that may be irrelevant to other living creatures whose senses are wired differently. Their reality—that is, their understanding of the world—will inevitably be distinct from ours, since the information they collect and synthesize is itself different. Making sense of anything boils down, consciously, conscientiously or intuitively, to categorizing and reorganizing data.

Making sense of anything boils down to categorizing and reorganizing data

This sorting mechanism adds layers of meaning to something initially nondescript and perhaps chaotic. You could be looking at a pile of blocs of different shapes and colors. If you leave it as such, that’s about all you can say about it. However, if you manipulate the blocs and separate them into groupings, many more things can then be said: how many they are in all; what exactly their shapes and colors are; what may be missing or lost; whether they are heavy or not; what material they are made of; what underlying logic they may conceal, etc.

The act of compiling data will likewise reveal trends, inconsistencies, ambiguities, voids, distortions and so on, all of which are precious analytical material that was not visible when data remained in bulk. Even when incomplete, a chronology, a biographical data-set, a map and a collection of figures will all help make sense of various complex and competing narratives.

Besides, information collected systematically will bolster your credibility and boost your value-added—especially in an environment where it is a rarity. Information is power, they say. And as you chose research, it may well be the only means you’ve got to get a little taste of it. It would be a pity not to indulge!

27 February 2017

Illustration credit: Silhouettes in the rocky wasteland by Unsplash via Wikipedia / licensed by Unsplash.

Related content