Data Analytics and Machine Learning

Problem Set 2

Question 1: Fixed effects and within transformations

You will find a modified version the imports-85.csv (imports85_modified.csv) file attached

to this assignment. Again, make sure that all continuous variables of interest are numeric.

1. Regress fuel efficiency (city.mpg) on horsepower without fixed effects. What would

you conclude based on that regression?

2. Repeat the same regression but this time, add a fixed effect for number of cylinders

being “two” or “four”. What would you conclude based on this new regression? What

do you think drives the results in part 1?

3. (Within transformation) Now obtain the mean city.mpg and horsepower for each

group. Use these group means to demean horsepower and city.mpg. Run the same

regression you ran in part 1. Are the results different? Are the results obtained here

different from the results in part 2? What does this tell you about the relation

between fixed effect regressions and within transformations?

Question 2: On marginal significance and trading strategy

improvements

You come up with a signal of stock outperformance: log total asset growth. You realize that

your professor has conveniently already coded up this variable for you in the dataset

StockRetAcct_insample.dta. The variable is called “lnInv”.

1. Using the Fama-MacBeth regression approach, what are the average return, standard

deviation and Sharpe ratio of the trading strategy implied by using only an intercept

and lnInv on the right hand side in the regressions?

2. What is the analytical expression for the portfolio weights in this case? (I’m looking for

a formula)

3. You worry that there is industry-related noise associated with the characteristic lnInv

and want to clean up your trading strategy with the goal of reducing exposure to

unpriced industry risks. What regressions to you run? Report mean, standard

deviation, and Sharpe ratio of the ‘cleaned-up’ trading strategy.

4. As in the class notes, plot the cumulative returns to the simple and the ‘cleaned-up’

trading strategies based on your new signal, lnInv. Make sure both trading strategies

result in portfolios with a 15% return standard deviation.

Question 3: Predicting medium to long-run firm-level return variance

There are many return volatility models, such as GARCH. These work best at shorter

horizons. As an alternative, we will explore a panel regression approach to predicting firmlevel return variance. The data set StockRetAcct_insample.dta has annual realized variance

(rv), calculated as the sum of squared daily returns to each firm, each year.

Run panel forecasting regressions to forecast firm-level one-year ahead rv along the lines

of what we did with lnROE in class.

1. Try with and without industry and year fixed effects, with and without clustering of

standard errors. Discuss which specification makes most sense to you. In particular,

discuss the effect of a year fixed effect. What is the intuition for the impact of this fixed

effect?

2. Also try forecasting at the 5-year horizon (rv in 5 years). How do the results change?

Can we predict return variance 5-years ahead? Is the 5-year lagged rv significant, or

are other variables more important?

3. What are the benefits of the panel approach, versus simply running one regression for

each firm? What are the potential costs?

Sale!