Latest Stories

Background Streamlit is a very popular open-source framework that pitches itself as a pure-Python framework to build and share Data Web Apps in minutes with no front-end experience needed. Snowflake, a popular cloud computing company, acquired Streamlit in March 2022 for $800 million. A closer look at this acquisition gives us some insights about the merits of the framework, the future direction…

Keep Reading →

1. Biomimicry Biomimicry is the practice of imitating life. It involves looking to nature for inspiration and direction to solve complex human problems. So why does this work? Well, if you think about it, nature has been constantly evolving ever since life first appeared on earth some 3.8 billion years ago. Can there be a better and proven source of inspiration than nature? Now, one of the most…

Keep Reading →

When presented with any problem, it is very natural to go head-on into problem-solving mode. However, this is not always the most optimal strategy. Here's why: You may be solving a problem that has already been solved efficiently. You may not be aware of the second-order effects and side-effects of your solution. You may be solving the wrong problem. Your thinking may be subject to cognitive…

Keep Reading →

So why exactly is JSON so popular? JSON (JavaScript Object Notation) has several advantages as seen below. JSON originated from JavaScript object literals as defined by the ECMAScript Programming Language Standard. The ECMAScript standard facilitated interoperability of web pages across different web browsers. Consequently, JSON quickly became the de-facto data interchange format of the web…

Keep Reading →

The International Data Corporation (IDC) forecasts 175 zettabytes of data will be created by 2025. One zettabyte equals one sextillion bytes—that is, 10^21 (1,000,000,000,000,000,000,000) bytes—which is a trillion gigabytes. Can you pause for a second and at least attempt to quantify or visualize this in your mind? As a comparison point, the Library of Congress (the largest library in the world…

Keep Reading →

Ever since its November 2022 release, ChatGPT has taken the world by storm. Given the rapid rate of change in this space, it can be difficult to keep up with iterative developments—especially for those new to the field or out of touch with recent progress. While there have been many important contributions that have made ChatGPT a reality, there are essentially three papers published in the last…

Keep Reading →

The key thing to observe in the image above is that the i.e. we iterate through the entire PPDAC cycle as well as within each step of the cycle. The framework was developed by R. J. MacKay and R. W. Oldford. Most recently it was popularized by David Spiegelhalter in his book "The Art of Statistics: Learning from Data". Problem As always the first step is to understand and define the problem you…

Keep Reading →

Gestalt principles can be ranked in order of strongest influence to weakest influence as follows i.e. Enclosure > Connection > Proximity > Similarity Similarity The similarity principle uses a common feature such as . In the following generic example, take a moment to closely observe how "Shape" and "Color" are used to create two groups. As you can see from the last example, the use of shape…

Keep Reading →

This post aims to answer some common questions related to the open-source Parquet file format such as: What are Parquet files? What are the benefits of using the Parquet file format? How does the Parquet file format compare with other popular formats such as CSV and JSON? Should you use Parquet files for Data Science? What are Parquet files? Parquet file format offers very efficient compression…

Keep Reading →

Well now that we have your attention, hopefully we can introduce you to the fascinating field of Social Science which might offer some insights about the central question in this post - . While the field of Social Science is very broad and has been shaped by contributions from a wide array of academic disciplines, in this post we would like to draw attention to the work of two psychologists. In…

Keep Reading →

Summarized below is a list of ten simple rules published by a group of senior statisticians. The nice part about these rules is the emphasis on non-technical issues which are easy to understand even for those with no formal background in statistics. The hard part about these rules is in how to apply these "simple rules" to your use case? Well, unfortunately there are no easy answers here. One…

Keep Reading →