August 18, 2023

6 min read


JSON Introduction For Data Science


Do you work with data from web applications? Interact with REST APIs? Have data that is hierarchical and structured? Then you've probably heard about or used JSON.

So why exactly is JSON so popular? JSON (JavaScript Object Notation) has several advantages as seen below.

JSON Advantages

JSON originated from JavaScript object literals as defined by the ECMAScript Programming Language Standard. The ECMAScript standard facilitated interoperability of web pages across different web browsers. Consequently, JSON quickly became the de-facto data interchange format of the web that helps web applications talk to each other.

As we will see below, a large part of JSON's success can be attributed to its simplicity and flexibility. JSON is a subset of JavaScript but excludes assignment and invocation. JSON has a small set of formatting rules for the portable representation of structured data.

JSON Simple Example


JSON supports four primitive i.e. basic types:

  • strings - e.g. "Predinfer"
  • numbers - e.g. 3.14
  • booleans - only the lowercase literal words "true" or "false" is supported
  • null - only the lowercase literal word "null" is supported

JSON supports two structured i.e. complex user-defined types:

  • object - unordered collection of zero or more name-value pairs
  • array - an ordered sequence of zero or more values

What makes the object type very useful is that it's name-value pairs adhere to simple rules:

  • name - must be a string
  • value - must be one of the other supported types i.e. string, number, boolean, null, object or array

This means, you can compose your own objects from any of the supported types. The following represents a single JSON object that has values representing all the supported types in the order of string, number, boolean, null, object and array i.e. lines 2-7 below.

Copy
{
  "Company":  "Predinfer",
  "Year Established": 2022,
  "Has Website":  true,
  "No. of shares": null,
  "Address":   {"City": "Boston","State": "MA", "Zip": 21140, "Country": "US"},
  "Metadata": ["EIN","Annual Report", 314884]
}

You can quickly validate that the above JSON is valid by copy-pasting it into an online validator.

JSON Syntax


Since JSON only supports two complex types i.e. object & array, it needs just six characters {}[]:, to specify an object or an array:

  • {}- anything within curly brackets is an object
Copy
{  "Company":  "Predinfer",
  "Year Established": 2022
}
  • :- name and value is separated with a colon
  • ,- values are separated from each other via a comma
  • Note - names within an object must be unique
Copy
{
  "Company":  "Predinfer",  "Year Established": 2022
 }
  • []- anything within square brackets is an array
  • ,- as with the object above values are separated from each other via a comma
  • Note - Unlike other formats, there is no requirement that the values in an array be of the same type (E.g. Metadata below has both strings and numbers in the array)
Copy
{
  "Metadata": ["EIN","Annual Report", 314884] }
  • Insignificant whitespace (which improves readability without impacting meaning) is allowed before or after any of the six structural characters i.e. {}[]:,
  • A common feature associated with JSON is Pretty Print which adds insignificant whitespace to improve readability
  • So our example from above with Pretty Print enabled would be transformed as follows:
Copy
{
	"Company": "Predinfer",
	"Year Established": 2022,
	"Has Website": true,
	"No. of shares": null,
	"Address": {
		"City": "Boston",
		"State": "MA",
		"Zip": 21140,
		"Country": "US"
	},
	"Metadata": ["EIN", "Annual Report", 314884]
}
  • Numbers in JSON are decimal digits (base 10). This means numbers are integers (implemented as double precision) with a fractional and/or exponent part
  • Exceptions worth highlighting are that leading zeros, Infinity & NaN are NOT permitted
  • The following array has valid numbers:
Copy
[9999, -100, 3.14, 3e3, 3E-4]
  • A string in JSON must be double quoted and all Unicode characters must be placed within the double quotes
  • Control characters have special meaning in JSON. If you would like to use control characters "literally" i.e. without their special meaning they must be escaped using \ i.e. backslash character. This means, if you want to use \b (backspace), \t (tab) or \n (newline) or any of the control characters you will have to escape it as follows:
Copy
["\\b\\t\\n"]
  • To achieve maximum interoperability, JSON text must be UTF-8 encoded
  • The MIME type for JSON is application/json
  • With respect to JSON parsers, in practice you may encounter limitations imposed on:
    • Size of text
    • Maximum depth of nesting
    • Range and precision of numbers
    • Length and contents of strings

JSON Complex Example


Now that we've covered all the basics, you can see that JSON is minimal, simple & flexible. This makes it extremely powerful and hence has made it a very popular format.

In the real-world, it is likely that you will encounter complex nested examples of JSON text as seen below. However, keep in mind that any piece of JSON text adheres to the simple rules we've seen above. The complexity comes from the hierarchical structural representation of real-world data which aims to preserve the relationship & associations between data points.

Below is an example of a "Document Database" which is a type of nonrelational database that stores and queries JSON documents. The bookstore database below has four primary attributes:

  1. name
  2. access permission
  3. location
  4. inventory
  • The attributes "name" and "location" contain simple strings as values.
  • The attributes "access permission" and "inventory" are nested further demonstrating the hierarchical nature of JSON.
  • The "inventory" attribute consists of an array of three objects with each object representing a unique book available in the bookstore and has its own schema.
Copy
{
	"Database": {
		"name": "Bookstore",
		"access permission": [{
			"Administrator": ["Read", "Insert", "Delete", "Modify"],
			"Manager": "Insert",
			"Sales Representative": ["Read", "Modify"]
		}],
		"location": "New York, USA",
		"inventory": [{
				"title": "The Great Gatsby",
				"author": "F. Scott Fitzgerald",
				"genre": "Fiction",
				"price": 12.99,
				"availability": true,
				"# in stock": 8,
				"Find in Store": {
					"Isle": 34,
					"Side": "Right",
					"Stack #": 11
				}
			},
			{
				"title": "To Kill a Mockingbird",
				"author": "Harper Lee",
				"genre": "Fiction",
				"price": 10.50,
				"availability": false,
				"expected availability date": "End of 2023",
				"Find in Store": {
					"Isle": "",
					"Side": "",
					"Stack #": ""
				}
			},
			{
				"title": "1984",
				"author": "George Orwell",
				"genre": "Science Fiction",
				"price": 9.75,
				"availability": true,
				"discount code": "My15Off",
				"Find in Store": {
					"Isle": 17,
					"Side": "Left",
					"Stack #": 10
				}
			}
		]
	}
}