Classification of Data for Data Science, Statistics and Machine Learning

classification of data

Data Science, Statistics, Machine Learning, all of them rely on data. Before we can start working on any of them, we have to get acquainted with the types of data we usually encounter. Different types of data require different types of tests. As such, classification of data is key to working with them.

We can classify data in two main ways based on its –

  1. Type
  2. Measurement Level.

The First Classification of Data – Types of Data:

Let’s start with the first classification of data. The types of data we can have are –

  1. Categorical Data and
  2. Numerical Data

Classification of data - types of data

1. Categorical Data

Categorical data describes categories or groups one example is car brands like Mercedes BMW and Audi. They show different categories. Another instance is answers to yes and no questions. If I ask questions, like are you currently enrolled in a university? Or do you own a car? Yes, and no would be the two groups of answers that can be obtained. This is categorical data.

2. Numerical Data

Numerical data on the other hand, as its name suggests, represents numbers. It is further divided into two subsets –

  1. Discrete
  2. Continuous

Classification of data - Numerical data

2.1 Discrete Numerical Data

Discrete data can usually be counted in a finite matter. A good example would be the number of children that you would want to have. Even if you don’t know exactly how many, you are absolutely sure that the value will be an integer such as 0 1 2 or even 10. Another instance is marks in an exam. You may get 60, 99, 87.5 or 100. What is important for a variable to be Numerical Discrete is that you can imagine each member of the data set. Knowing that exam scores range from 0 to 100 in steps of 0.5. You can thus separate all possible scores that can be obtained is key.

Another way to think of Discrete Data is that discrete data changes only in steps. The number of  children can only change in steps of 1. Exam scores may only change in steps of 0.5 or 0.25. All the values in the data sat can only be multiples of these steps. There can’t be values in between two steps.

2.2 Continuous Numerical Data

Continuous Data, on the other hand, is the opposite of Numerical Data. Continuous data is infinite and impossible to count. For instance, your weight can take on every value in some range. Let’s dig in a bit deeper into this.

You get on the scale and the screen shows 60 kilograms or 60.59 kilograms, but this is just an approximation. If you gain 0.01 KG,the figure on the scale is unlikely to change but your new weight will be 60.6 KGs. Now think about sweating every drop of sweat reduces your weight by the weight of that drop.

But a scale is unlikely to capture that change the process of losing and gaining weight occurs all the time. Your exact weight is a continuous variable. It can take on an infinite amount of values. No matter how many digits there are after The Dot.

Or, you can think about temperature. It can take on any value in a range. 30°C or 30.1°C or 30.001°C. It only depends on how precisely you want to measure it.

To sum it up your weight can vary by incomprehensibly small amounts and is continuous while the number of children you want to have is directly understandable and is discrete. Alright, these were the types of data. Next, we will explore the levels of measurement.

classification of data - types

The Second Classification of Data – Levels of Measurement:

Let us now move on to the other classification of data – levels of measurement.

These can be split into two groups

  1. Qualitative data
  2. Quantitative data

Measurement levels in data science - level1

1. Qualitative Data

Qualitative data can be

1.1 Nominal

Nominal values are like the categories we talked about earlier. Like Mercedes, BMW or Audi or like the Four Seasons winter spring summer and Autumn. They are numbers and cannot be put in any order.

nominal data example - car brands and seasons

1.2. Ordinal

Ordinal data on the other hand consists of groups and categories, but follows a strict order. Imagine you have been asked to rate your lunch and the options are – Disgusting, Unappetizing, Neutral Tasty and Delicious. Although we have words and not number, it is obvious that these preferences are ordered from negative to positive.

ordinal data lunch rating - levels of measurements

Another example could be ratings for a question – Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree. Here again, we have words and not number, but these preferences are ordered from positive to negative.

Ordinal Data example - Levels of Measurements of data

Thus the data is qualitative.

Qualitative measurement level

2. Quantitative Data

Okay. So what about quantitative variables? They are split into two groups

  1. Interval
  2. Ratio

Intervals and ratios are both represented by numbers but have one major difference – ratios have a true zero and intervals don’t. For example, length is a ratio variable. You all know that 0 inches or 0 feet means no length.

With temperature, however, we have a different story. It is usually an interval variable. Let me explain. Usually it is expressed in Celsius or Fahrenheit. They are both interval variables 0 degrees Celsius or zero degrees Fahrenheit don’t mean anything as the absolute zero temperature is actually negative 273.15°C, or negative 459.67°F. However, we can easily say that 80 degrees Fahrenheit is less than 100 degrees Fahrenheit.

In the case of interval variables, the difference is Meaningful, but the zero is not.

Continuing this temperature example, there is another scale kelvins according to it. The absolute minimum temperature is zero degrees Kelvin. Therefore if the degrees are stated in Kelvin, the variable will be a ratio

So, numbers like 2, 3, 10, 10.5, pie, etc., can be both an interval or a ratio label, but you have to be careful with the context you are using them in.

Now, combining all these information, we have the following types of levels of measurement:

All right, we have quickly gone through the types of data and the measurement levels. Stick around to see the types of graphs that are used on a daily basis when performing statistical analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *