a year ago

March 4, 2016

How well does your model explain your data? R-squared is a useful statistic for answering this question. In this episode we explore how it applies to the problem of valuing a house. Aspects like the number of bedrooms go a long way in explaining why different houses have different prices. There's some amount of variance that can be explained by a model, and some amount that cannot be directly measured. R-squared is the ratio of the explained variance to the total variance. It's not a measure of accuracy, it's a measure of the power of one's model.

In [1]:

```
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
```

In [6]:

```
x = 1.0 * np.arange(100) / 100
y = 0.5 * np.arange(100) / 100
yy = y + (np.random.rand(100)-.5) * .01
f = sm.OLS(yy, x).fit()
plt.scatter(x, yy)
plt.plot(x, f.predict(x))
plt.title('R-squared = ' + str(f.rsquared))
plt.xlim(0, 1)
plt.ylim(-1, 1)
plt.show()
```

In [7]:

```
yy = y + (np.random.rand(100)-.5) * .1
f = sm.OLS(yy, x).fit()
plt.plot(x, f.predict(x))
plt.scatter(x, yy)
plt.title('R-squared = ' + str(f.rsquared))
plt.xlim(0, 1)
plt.ylim(-1, 1)
plt.show()
```

In [10]:

```
yy = y + (np.random.rand(100)-.5)
f = sm.OLS(yy, x).fit()
plt.plot(x, f.predict(x))
plt.scatter(x, yy)
plt.title('R-squared = ' + str(f.rsquared))
plt.xlim(0, 1)
plt.ylim(-1, 1)
plt.show()
```

In [11]:

```
yy = np.random.rand(100)*2 - 1
f = sm.OLS(yy, x).fit()
plt.plot(x, f.predict(x))
plt.scatter(x, yy)
plt.title('R-squared = ' + str(f.rsquared))
plt.xlim(0, 1)
plt.ylim(-1, 1)
plt.show()
```

Enjoy this post? Sign up for our mailing list and don't miss any updates.