Logistic Regression Explained: Why It's Better Than Linear Regression for Classification"
Machine learning (Part 22)-
📚Chapter: 5 -Logistic Regression
If you want to read more articles about Machine Learning n, don’t forget to stay tuned :) click here.
Introduction
Classification problems are a common type of machine learning problem, where the goal is to predict discrete values for a variable. In this blog, we will dive into the concept of logistic regression, which is one of the most widely used algorithms for classification tasks.
In this and the next few Tutorials, I want to start to talk about classification problems, where the variable y that you want to predict is discreet valued. We’ll develop an algorithm called logistic regression, which is one of the most popular and most widely used learning algorithms today.
Sections
What is a Classification Problem?
Regression and Classification
The Need for Logistic Regression
Conclusion
Section 1- What is a Classification Problem?
Let’s start by understanding what a classification problem entails. In a classification problem, the variable we want to predict, denoted as Y, takes on discrete values. To illustrate this, let’s consider a few examples:
Math Details
Given a set of n known example pairs *(xi , yi) : i = 1, 2, . . . , n+, where xi ∈ Rd and yi ∈ {±1}, we want to learn a (binary) “classification rule” h : Rd → {±1}, so that h(x) = y
on most unseen (future) examples (x, y). Throughout we will call xi the feature vector of the i-th example, and yi the (binary) label of the i-th example. Together, the known example pairs *(xi , yi) : i = 1, 2, . . . , n+ are called the training set, with n being its size and d being its dimension. The unseen future example (x, y) will be called the test example. If we have a set of test examples, together they will be called a test set.
The challenge of binary classification lies in two aspects:
on a test example (x, y), we actually only have access to x but not the label y. It is our job to predict y, hopefully correctly most of the time.
• the test example x can be (very) different from any of the training examples {xi : i = 1, . . . , n}. So we can not expect naive memorization to work.
Here (Figure 1) are some examples of classification problems. Earlier, we talked about emails, spam classification as an example of a classification problem. Another example would be classifying online transactions. So, if you have a website that sells stuff and if you want to know if a physical transaction is fraudulent or not, whether someone has, you know, is using a stolen credit card or has stolen the user’s password. That’s another classification problem, and earlier we also talked about the example of classifying tumors as a cancerous malignant or as benign tumors. In all of these problems, the variable that we’re trying to predict is a variable Y that we can think of as taking on two values, either zero or one, either a spam or not spam, fraudulent or not fraudulent, malignant or benign. Another name for the class that we denote with 0 is the negative class, and another name for the class that we denote with 1 is the positive class. So 0 may denote the benign tumor and 1 positive class may denote a malignant tumor. The assignment of the 2 classes, you know, spam, no spam, and so on — the assignment of the 2 classes to positive and negative, to 0 and 1 is somewhat arbitrary and it doesn’t really matter. But often there is this intuition that the negative class is conveying the absence of something, like the absence of a malignant tumor, whereas one, the positive class, is conveying the presence of something that we may be looking for. But the definition of which is negative and which is positive is somewhat arbitrary and it doesn’t matter that much. For now, we’re going to start with classification problems with just two classes; zero and one. Later on, we’ll talk about multi-class problems as well, whether variable Y may take on say, for value zero, one, two and three. This is called a multi-class classification problem, but for the next few tutorial, let’s start with the two class or the binary classification problem. and we’ll worry about the multi-class setting later.
Section 2-Regression and Classification
Before we delve into logistic regression, let’s briefly discuss linear regression and its limitations when applied to classification problems. In linear regression, we aim to fit a straight line to the data, typically using a training set. While linear regression is primarily used for predicting continuous values, it can be tempting to apply it to classification tasks as well.
Here’s (Figure 2) an example of a training set for a classification task for classifying a tumor as malignant or benign and notice that malignancy takes on only two values zero or no or one or one or yes. So, one thing we could do given this training set is to apply the algorithm that we already know, linear regression to this data set and just try to fit the straight line to the data.
So, if you take this training set and fill a straight line to it, maybe you get hypothesis that looks like that. Alright, so that’s my hypothesis, h of x equals theta transpose x. If you want to make predictions, one thing you could try doing is then threshold the classifier outputs at 0.5. That is at the vertical access value 0.5. And if the hypothesis outputs a value that’s greater than equal to 0.5 you predict y equals one. If it’s less than 0.5, you predict y equals zero. Let’s see what happens when we do that. So, let’s take 0.5, and so, you know, that’s where the threshold is. And thus, using linear regression this way. Everything to the right of this point, we will end up predicting as the positive class because of the output values are greater than 0.5 on the vertical axis and everything to the left of that point we will end up predicting as a negative value. In this particular example, it looks like linear regression is actually doing something reasonable even though this is a classification task we’re interested in. But now let’s try changing problem a bit. Let me extend out the horizontal axis of orbit and let’s say we got one more training example way out there on the right. Notice that that additional training example, this one out here, it doesn’t actually change anything, right? Looking at the training set, it is pretty clear what a good hypothesis is. Well, everything to the right of somewhere around here to the right of this we should predict as positive, and everything to the left we should probably predict as negative because from this training set it looks like all the tumors larger than, you know, a certain value around here are malignant, and all the tumors smaller than that are not malignant, at least for this training set.
Section 3- The Need for Logistic Regression
But once we’ve added that extra example out here, if you now run linear regression, you instead get a straight line fit to the data. That might maybe look like this, and if you now threshold this hypothesis at 0.5, you end up with a threshold that’s around here so that everything to the right of this point you predict as positive, and everything to the left of that point you predict as negative. And this seems a pretty bad thing for linear regression to have done, right? Because, you know, these are our positive examples, these are our negative examples. It’s pretty clear, we should really be separating the two classes somewhere around there, but somehow by adding one example way out here to the right, this example really isn’t giving us any new information. I mean, it should be no surprise to the learning out of that the example way out here turns out to be malignant. But somehow adding that example out there caused linear regression to change in straight line fit to the data from this magenta line out here to this blue line over here, and caused it to give us a worse hypothesis. So, applying linear regression to a classification problem usually isn’t, often isn’t a great idea. In the first instance, in the first example before I added this extra training example, previously linear regression was just getting lucky and it got us a hypothesis that, you know, worked well for that particular example, but usually apply linear regression to a data set, you know, you might get lucky but often it isn’t a good idea, so I wouldn’t use linear regression for classification problems.
Here is one other funny thing about what would happen if we were to use linear regression for a classification problem. For classification, we know that Y is either zero or one, but if you are using linear regression, well the hypothesis can output values much larger than one or less than zero, even if all of good the training examples have labels Y equals zero or one, and it seems kind of strange that even though we know that the label should be zero one, it seems kind of strange if the algorithm can offer values much larger than one or much smaller than zero. So what we’ll do in the next few videos is develop an algorithm called logistic regression which has the property that the output, the predictions of logistic regression are always between zero and one, and doesn’t become bigger than one or become less than zero and by the way, logistic regression is and we will use it as a classification algorithm in some, maybe sometimes confusing that the term regression appears in his name, even though logistic regression is actually a classification algorithm. But that’s just the name it was given for historical reasons so don’t be confused by that. Logistic Regression is actually a classification algorithm that we apply to settings where the label Y is discreet valued. The 1001. So hopefully you now know why if you have a causation problem using linear regression isn’t a good idea . In the next tutorial we’ll start working out the details of the logistic regression algorithm.
So why is linear regression a poor choice for classification problems? One key issue is that linear regression allows predictions outside the desired range of 0 and 1, even though we know the true labels should be 0 or 1. This creates a discrepancy between the algorithm’s output and the expected values.
To address this problem, we introduce logistic regression. Logistic regression is a classification algorithm that ensures the predictions always fall between 0 and 1, avoiding values larger than 1 or smaller than 0. Despite the term “regression” in its name, logistic regression is indeed a classification algorithm, not a regression algorithm.
Conclusion
In this blog, we have explored the concept of classification problems and the limitations of using linear regression for such tasks. We have learned that logistic regression is a more suitable algorithm for classification problems, as it ensures predictions fall within the desired range of 0 and 1. Logistic regression, despite its name, is a classification algorithm that allows us to tackle binary classification problems effectively. In future blogs, we will delve deeper into the logistic regression algorithm and discuss its applications in multi-class classification problems.
Please Follow and 👏 Subscribe for the story courses teach to see latest updates on this story
If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research
Stay tuned for our upcoming articles because we research end to end , where we will explore specific topics related to Machine Learning in more detail!
Remember, learning is a continuous process. So keep learning and keep creating and Sharing with others!💻✌️
Note:if you are a Machine Learning export and have some good suggestions to improve this blog to share, you write comments and contribute.
if you need more update about Machine Learning and want to contribute then following and enroll in following
👉Course: Machine Learning (ML)
Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com
Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.
Together, let’s make this the best AI learning Community! 🚀
Source
1- Machine Learning — Andrew