

You visualize the data in a scatterplot to check for a linear pattern: Visual inspection exampleYou gather a sample of 5,000 college graduates and survey them on their high school SAT scores and college GPAs. A linear pattern means you can fit a straight line of best fit between the data points, while a non-linear or curvilinear pattern can take all sorts of different shapes, such as a U-shape or a line with a curve. Visually inspect your plot for a pattern and decide whether there is a linear or non-linear pattern between variables.

It doesn’t matter which variable you place on either axis. You predict that there’s a positive correlation: higher SAT scores are associated with higher college GPAs while lower SAT scores are associated with lower college GPAs.Īfter data collection, you can visualize your data with a scatterplot by plotting one variable on the x-axis and the other on the y-axis. Correlational research exampleYou investigate whether standardized scores from high school are related to academic grades in college. In correlational research, you investigate whether changes in one variable are associated with changes in other variables. See editing example Using a correlation coefficient Comparing studiesĪ correlation coefficient is also an effect size measure, which tells you the practical significance of a result.Ĭorrelation coefficients are unit-free, which makes it possible to directly compare coefficients between studies. You can use an F test or a t test to calculate a test statistic that tells you the statistical significance of your finding. If your correlation coefficient is based on sample data, you’ll need an inferential statistic if you want to generalize your results to the population. A correlation coefficient is a bivariate statistic when it summarizes the relationship between two variables, and it’s a multivariate statistic when you have more than two variables. That means that it summarizes sample data without letting you infer anything about the population. Summarizing dataĪ correlation coefficient is a descriptive statistic. What does a correlation coefficient tell you?Ĭorrelation coefficients summarize data and help you compare results between studies. Frequently asked questions about correlation coefficients.What does a correlation coefficient tell you?.Then just draw the two plots: import matplotlib.pyplot as plt If x_data and y_data are numpy arrays: x_mean, y_mean = np.mean(x_data), np.mean(y_data)īeta = np.sum((x_data - x_mean) * (y_data - y_mean)) / np.sum((x_data - x_mean)**2) X_var = sum((xi - x_mean)**2 for xi in x_data) If x_data and y_data are lists: x_mean = sum(x_data) / len(x_data)Ĭovar = sum((xi - x_mean) * (yi - y_mean) for xi, yi in zip(x_data, y_data)) Simple regression coefficients have a closed form solution so you can also solve explicitly for them and plot the regression line along with the scatter plot. Sns.regplot(x=x_data, y=y_data, ci=False, line_kws=, ax=axs) You can even draw the confidence intervals (with ci= I turned it off in the plot below). The seaborn library has a function ( regplot) that does it in one function call. Trendline for a scatter plot is the simple regression line.
