Our documentation is dynamic and powered by Jupyter Book to improve discoverability and dissemination of knowledge. All is designed to help you get started on your project more quickly and effectively.
Inspired by literate programming, we have gathered and prepared documentation alongside with code snippets and examples from data partners’ products and services to help you start with your project.
Throughout this document, you will find information about:
Quick links to data partner’s updates, documentation and tutorials
How to access datasets and services
Information about Terms and Conditions
This documentation is NOT open source!
All data, code snippets, examples and documentation you are going to find thoughout this document are governed by the Data Partnership’s Master Data License Agreement and must not be shared with persons outside the consortium, including external contributors at universities and contractors.
When your team is ready to publish and disseminate results, some data partners require review and approval by their part. For more information, please refer to each Data Partner’s Terms and Conditions.
A glossary of common terms used throughout Development Data Partnership.
- AWS (Amazon Web Services)¶
AWS it is a cloud service platform. Its services are widely used to store, organize, process and analyze several types of data.
- AWS SageMaker¶
SageMaker is one of AWS services to analyze your datasets. It was developed specifically to build, train and deploy Machine Learning Models
- Application / Application Program / Application Software¶
An application is a program inside your computer. It could be your note pad, powerpoint, excel, and many others. Application can also be web browsers (Internet Explore, Google Chrome, Mozila), emails client, and web application (Facebook, Twitter, Google Docs).
- APIs (Application Programming Interface)¶
An API is basically a channel, with some set of rules, where two applications can talk to each other. Thus, one application uses this “channel” to make queries and requests for another application. Then, this same API will bring back the information or data you are asking for.
- Data Partner¶
Data Partner, Data Provider or Dataset Provider is an organization that provides data and/or Metadata under the Master Data License Agreement.
- Development Data Partnership¶
Development Data Partnership or “Data Partnership” is a partnership between international organizations, created to promote the use of third-party data in research and international development.
- Development Data¶
Development Data is data about countries that can be used for reference or analysis in the process of development, typically in sectors such as the economy and finance, poverty, education, health, public administration, private sector development, agriculture, land use, gender, climate change, the environment, infrastructure, trade and others and does not contain Personally Identifiable Information (PII).
- Devivative Works¶
Derivative Works are works based on or derived from one or more existing works. For the purposes of this document, derivative works include derived data and analytical products, including but not limited to: research papers, analytical studies, data visualizations, derived indicators, aggregated and/or derived databases, and other outputs (e.g. publications, CDs, mobile device applications, biogs, online data products, etc.) created using the Dataset(s) and Metadata in question.
Geographic Information System (GIS) software is designed to capture, manage, analyze, and display all forms of geographically referenced information. GIS can show many kinds of data on one map. This enables researchers and other data users to more easily see, analyze, and understand patterns and relationships.
Geo-spatial data or GIS data or geo-data has explicit geographic positioning information included within it, such as a road network from a GIS, or a gee-referenced satellite image. Geospatial data may include attribute data that describes the features found in the Data
Git is an application, a distrubed version control system to track changes on any set of files. Git is mainly used to programmers and developers who are working together on a project and need to verify the changes and modifications their teammates made on a specific code set
Github is a code hosting platform for version control (GIT) which also provides a sharing and publishing service, and a social networking environment for data scientists and programmers.
Metadata are defined as ‘data about data’. They help understand the meaning of data, or provide useful information about its provenance or licensing status.
Microdata are unit-level data obtained from sample surveys, censuses, and administrative systems. They provide information about characteristics of individual people or entities such as households, business enterprises, facilities, farms or even geographical areas such as villages or towns.
- Personally Identifiable Information (PII). ¶
Any information that permits the identity of an individual to be directly or indirectly inferred, or any information which is linked or linkable, or may be attributed, to that individual
- Relational Database (RD)¶
Relational Databases correspond to a database (a table) build in a such specific manner (a schema). It is a table because RD manages data through colunms and rows. However, this is not the same as an spreadshet as we are used to. While Relational Database you can only build organized and strutured tables designed by a schemma, a database on a spreedsheet lacks a definite and standard structure to organize your table
- Relational Database Management System (RDBMS)¶
A RDBM is an application (a program) where you can create, manage or alter a relational database.
- Schema ¶
Schema it is how the data is organized and structured. It is a logical order of how the computer can read, access and return the information from a dataset. Depending how your schema was defined it can make a huge difference on how the computer respond and process the information you have requested
- SQL (Structured Query Language)¶
Just like Python, R, Stata or C#, SQL is a programming language which was specially designed for managing and communicating with relational databases through a relational database managem system (RDBMS).
A compilation of common questions about the Development Data Partnership.
- Why Data Partnership?¶
Public policy and provision of public infrastructure and services are heavily dependent on data – higher quality, timely data translates into more effective sector and program prioritization, design, implementation, monitoring, and evaluation.
Increasingly, the private sector is generating data that could be used to complement traditional public-sector data collection methods. Through public-private collaboration, in addition to generating more timely and relevant data for decision making, entirely new public good use cases could be discovered and implemented.