Open Data Day, 2021. The Open Data journey.

Open Data Day ? What about it ?

Yet another day to celebrate, is it ? Well for some of us, it surely is.

Open Data Day is an attempt by the Open Knowledge Foundation to connect more people and communities around the world on a common theme of opening up more data in a collaborative manner. Hundreds of events are organised by open data enthusiasts around the globe to work on the innumerable challenges that we face when opening up data for public consumption.

But why are people interested in spending their weekends on this ? What drives them ? What are the incentives ? I don’t know, maybe articulating my journey might help ?

Journey - When I started ..

My professional journey started with working on datasets that were owned by private corporations. I don’t remember having any conversations with my colleagues on anything close to open data. This was back in 2013. I started to find more interest in this concept when I joined DataKind BLR as a volunteer. This was a random and a self-centered decision to join. I might have known a few colleagues, maybe I joined to learn more skills to work with data, maybe see how such communities work, etc. Nothing to do with open data. But Datakind helped me in learning more about the things we can do with data and the kind of questions we can answer with public datasets. This was almost 3 years after I started my professional career.

Open Data - Flashback

Open data, especially Open Government Data was starting to take some well-defined shape in the later part of the first decade of the twenty first century in India and other parts of the world. A few key events:

2009 - President Obama’s first term and he signed the Memorandum on Transparency and Open Government on the first day in office.
Tim Berners-Lee gave a TED talk titled “The Next Web” where he talked about the importance of linked datasets and why we need RAW Data Now
The open data portal of the United States - data.gov was launched in 2009
The open data portal of the United Kingdom - data.gov.uk was launched in 2010
Another TED talk by Tim Berners-Lee titled the “The year open data went worldwide” where he shares inspiring examples of the open data revolution between 2009 and 2010.
World bank Group opens data to all - Press Release and releases a new Access to Information Policy. This snippet by the TED team lists down the important events that might have lead the bank to open up their datasets.
2010 - President Obama attends the expo on [Expo on Democracy and Open Government]. (https://obamawhitehouse.archives.gov/the-press-office/2010/11/06/expo-democracy-and-open-government) which also marked the launch of the United States-India Open Government Dialogue and will eventually lead to the development of India’s Open Data Portal
The Open Government Partnership Initiative was launched in 2011 - India is not a part yet.
The Union Cabinet approved the National Data Sharing and Accessibility Policy (NDSAP) on 9 February 2012 - Press Release
The open data portal of India - data.gov.in was launched in 2012.

A lot was already happening behind the scenes before 2009 in the developed parts of the world. This article chronicles the journey well. One of the major events was the meeting in Sebastopol. Quoting from the article:

In December 2007, thirty thinkers and activists of the Internet held a meeting in Sebastopol, north of San Francisco. Their aim was to define the concept of open public data and have it adopted by the US presidential candidates.

A few well known people who attend the meeting:

Carl Malamud - Public Resource
Tim O’Reilly
Lawrence Lessig - Founder of the Creative Commons licences
Aaron Swartz - Inventor of the RSS and the free knowledge advocate. If you can do just one thing on this Open Data Day, I will recommend you to start with watching this documentary about Aaron.

And a few other activists and entrepreneurs. As per the article - Together, they created the principles that allow us today to define and evaluate open public data. And there must have been a lot of similar discussions before someone convinced President Obama to sign the memorandum. This paper lists some of the other key initiatives in the open data and open government space launched in the US in the 1990’s.

A few open data advocates were quite active in India as well, raising their concerns about the state of open data in India and doing whatever they could to improve the access to information mostly for the purposes of transparency and accountability. This report from the Centre for Internet and Society looks at landscape of Open Government data at the time. Quoting from the report:

OGD (Open Government Data) in India must be looked at differently from what it has so far been understood as in countries like the UK and the US.

That was a not-so-quick recap. We wont be wrong in assuming that it wasn’t a smooth sailing ride for open data advocates. Concerns have been raised about whether open data will solve or change the world, is this a tool for the already privileged and what is the boundary for public officials to be open about their activities to the public. Lawrence Lessig, who attended the Sebastopol meeting wrote a piece titled Against Transparency. This article from the Global Integrity is a good break down of his critiques with valid reasons on why opening up data is not that bad as it is made to sound.

Open Data - Present

So, India already had an open data portal by the time I started my professional career. We’re just a year away from completing a decade to the launch of NDSAP and our Open Government Portal. Where are we now ? What is the impact of these policies ? Do we have more data, better data, inclusive policies ? Are we still where we were 10 years back in terms of our systems of data and information?

With my personal experiences of working with and around open data, I can say that, as a society, we still facing a lot of challenges in terms of understanding the benefits of open data. Though there are examples we can learn from. Some good, long term collaborations among open data advocates have lead to a creation of few important resources that are widely used today and they have definitely helped us in shaping our open data strategy. We should learn from these efforts. Highlighting a couple of them:

1 - Folks at Datameet have been working to create and maintain Indian geographical boundaries. Yes, that sounds weird but they had to intervene when there was no official datasets available and there was a lot of demand as the potential for these datasets for researchers and policy makers is huge.
2 - A more recent example is the SHRUG dataset from Devdatalab. Their team is is working on creating a single database for all important developmental indicators at a village level for India. Sounds easy ? Well, if you have any experience working with public datasets from India, you will be aware of the challenge of counting villages. Every public database throws a different number. The SHRUG dataset creates a standard that solves these data inconsistency issues.

Both these databases have some commonalities:

They are not from the government.
They are designed to create good quality open datasets over a longer term. It’s not a one time effort, but there are rules and guidelines in place for community contributions, maintenance, etc.
Both of them have good documentation.
Both have been taking efforts to educate users about the use and use-cases that can be powered by these datasets

This work takes time, effort and sustaining such projects is always challenging especially when we’re reliant on a community of volunteers and are dependent on grants. We need people who are there for the long haul, and who realise the potential of not just the data sources, but on educating people to work with these sources.

There are important lessons to learn for all:

Our governments need to learn the process of curating and maintaining important datasets. We should question the impact of India’s open data portal, has it lead to lesser RTI applications, more data driven policy interventions or better impact assessment of our laws and policies ?
Our civil society needs to be more open and transparent about their research. Decentralisation and diversity of ideas is important. We wont be able to create robust policies if we’re afraid of critique. Opening up datasets and our process of research invites more collaborations and helps organisations with lesser resources on building internal capacity to do the same in their respective areas.
Our researchers, who have an aim to publish for top journals in the world should also aim to make their research reproducible and open the datasets they create during the process of their research and not wait till the time the articles are accepted and published. Try and avoid this lag.
Our public and private universities need to open up their courses, databases, libraries, journals. We need to inculcate the values of openness and transparency in our curriculum and make it part of our values and lets try and adhere to it.
Our private sector especially the orgs working in the areas of education, health care, insurance, communications hoard a lot of important information that can be utilised for policy research. Yes, there are challenges, but there are solutions as well. At-least, be open about our issues and discussion on the potential threats and challenges.

But change only happens whenever people are able to see an incentive. Whether they are a student, a bureaucrat, politician, employee or someone who runs their own business. We need to advocate for transparency and for better standards. Not accept whatever is thrown at us in the name of openness. Work with our representatives, educate them and also hold them accountable, not with opinions but facts. Strengthen our civil society by building capacity for data driven advocacy. We need to demand default disclosure for all information that is unavailable.

The Journey - Continued ..

So, now let me come back to the year 2016 when I was still new to Open Data and was figuring our answers to what it means, who creates it and who uses it. Later that year, I joined SocialCops. More than working with open data it was the lure of working on a few public policy challenges through data and tech. I got a chance to collaborate with the government on a a few pilot projects and to work with a few public databases. That stint helped me in learning about the challenges we face in creating, curating and maintaining public datasets. Meanwhile, a few friends from DataKind got together and collaborated with the Centre for Budget and Governance Accountability to create OpenBudgetsIndia which is an open source platform created with an objective to increase the access to central and state budget documents by converting them to machine readable formats from PDF’s. A massive effort from the team and we’ve managed to sustain it and keep it updated by adding more datasets, improving the documentation, making it easier for people to understand budget data, working with the state governments and advocating the need to release their budget documents in machine readable formats and now we’re trying to add a layer of public procurement data through our recent collaboration with the team at Open Contracting Partnership.

Our DataKind experience taught us a lot on working with CSO’s. Orgs that want to utilise the potential of open datasets. We also learned that models such as DataKind are useful for small pilot projects, when an org wants to test an idea, to develop something small and experiment. But for long term projects such as OBI, we need continuous efforts by a team over a longer term. With that experience, our network and with the support of our partners, a couple of volunteers who started and lead the OBI initiative decided to register an organisation in 2018 by the name CivicDataLab. I joined them the same year to be able to collaborate on similar initiatives which are targeted towards better access to information.

Open Data collaborations with CivicDataLab

Our work at OBI opened up a lot of doors for us to do similar work in other domains. One such area was the Law and Justice ecosystem which never seems to show up whenever we talk about open data.
Over the last couple of years, I’ve had a chance to learn more about this ecosystem and understand the feasibility of opening up datasets in the sector. A few projects we’ve been involved with since:

We’ve collaborated with HAQ - Centre for Child Rights to analyse the implementation of the POCSO act in the states of Assam, Delhi and Haryana using data from the e-Courts database. After going through a round of data audits, we’ll be able to open up metadata for around 35K POCSO cases with details on the trials, hearings, courts, judges and other metadata fields that are available on e-Courts.
We collaborated with the Internet Freedom Foundation on building the Zombie Tracker to collect, report and analyse the cases registered under section 66A of the Information Technology Act in 11 states.
We’re building a platform to crowd source all important datasets to advance the work on data driven judicial reforms. We call this the Justice Hub. With this project, our objective is to create datasets, standards and resources that can help in increasing the access to legal datasets. We’re also working with the community on building capacity for reporting and analysis of these open datasets. This will be our second open data platform, after OBI, built on top of the CKAN platform, which was one of the first tools developed by the Open Knowledge Foundation.

CKAN has continued to evolve and today is the leading open data platform software in the world used by governments including the US and UK to publish millions of public datasets.

Other than Law and Justice and Public Finance, we’re trying to doing similar work in other areas as well such as Urban Development and Education. My colleague Arpit aka Cube has summarised our efforts in opening up more datasets in the article here.

Recently, we released the State of the FOSS report, where we worked with the community to chronicle the journey of free and open source software movement in India and also looked at the challenges faced by the open source community. The report suggests a road map for the different stakeholders involved to advance the work on FOSS movement in India in key areas such as Education, Businesses and Government.

The Journey - Road Ahead ..

This is where I am today and here are a few things I look forward to:

A lot more to learn and unlearn.
To collaborate with people who are not convinced by the ideas of openness and the possibilities with opening data.
To do my bit on creating an Indian version of open data and open knowledge frameworks.
To get inspired by communities who are trying their best to open up more datasets.
To work with our partners on and building capacity on conducting reproducible research.
Sharing some more inspiring examples by the next Open Data Day 🙂

Do I have the answers now ?

Not sure. Open Data is hard, opinionated and political. There are issues of equity, access, availability and using it for ways that were not intended. What I know for sure is this - The ideas of open data are not just limited to techies or the government and it should never be that way. It all starts with asking questions, finding and validating facts, reporting inconsistencies if we find any and sharing whatever we’ve learned in the process. It’s as easy as that. Maybe curiosity and the realisation that even the smallest change matters in moving the collective is what drives all of this. I’ll keep searching for answers.

If you’ve managed to stay with me till now, I wish the best for you in this journey. We’re all allies . Let’s keep supporting and holding one another. The road is bumpy and destination far, but we’ll enjoy this ride, one station at a time.