Data Analytics and Machine Learning Problem Set 2


5/5 - (2 votes)

Data Analytics and Machine Learning
Problem Set 2
Question 1: Fixed effects and within transformations
You will find a modified version the imports-85.csv (imports85_modified.csv) file attached
to this assignment. Again, make sure that all continuous variables of interest are numeric.
1. Regress fuel efficiency (city.mpg) on horsepower without fixed effects. What would
you conclude based on that regression?
2. Repeat the same regression but this time, add a fixed effect for number of cylinders
being “two” or “four”. What would you conclude based on this new regression? What
do you think drives the results in part 1?
3. (Within transformation) Now obtain the mean city.mpg and horsepower for each
group. Use these group means to demean horsepower and city.mpg. Run the same
regression you ran in part 1. Are the results different? Are the results obtained here
different from the results in part 2? What does this tell you about the relation
between fixed effect regressions and within transformations?
Question 2: On marginal significance and trading strategy
You come up with a signal of stock outperformance: log total asset growth. You realize that
your professor has conveniently already coded up this variable for you in the dataset
StockRetAcct_insample.dta. The variable is called “lnInv”.
1. Using the Fama-MacBeth regression approach, what are the average return, standard
deviation and Sharpe ratio of the trading strategy implied by using only an intercept
and lnInv on the right hand side in the regressions?
2. What is the analytical expression for the portfolio weights in this case? (I’m looking for
a formula)
3. You worry that there is industry-related noise associated with the characteristic lnInv
and want to clean up your trading strategy with the goal of reducing exposure to
unpriced industry risks. What regressions to you run? Report mean, standard
deviation, and Sharpe ratio of the ‘cleaned-up’ trading strategy.
4. As in the class notes, plot the cumulative returns to the simple and the ‘cleaned-up’
trading strategies based on your new signal, lnInv. Make sure both trading strategies
result in portfolios with a 15% return standard deviation.
Question 3: Predicting medium to long-run firm-level return variance
There are many return volatility models, such as GARCH. These work best at shorter
horizons. As an alternative, we will explore a panel regression approach to predicting firmlevel return variance. The data set StockRetAcct_insample.dta has annual realized variance
(rv), calculated as the sum of squared daily returns to each firm, each year.
Run panel forecasting regressions to forecast firm-level one-year ahead rv along the lines
of what we did with lnROE in class.
1. Try with and without industry and year fixed effects, with and without clustering of
standard errors. Discuss which specification makes most sense to you. In particular,
discuss the effect of a year fixed effect. What is the intuition for the impact of this fixed
2. Also try forecasting at the 5-year horizon (rv in 5 years). How do the results change?
Can we predict return variance 5-years ahead? Is the 5-year lagged rv significant, or
are other variables more important?
3. What are the benefits of the panel approach, versus simply running one regression for
each firm? What are the potential costs?

PlaceholderData Analytics and Machine Learning Problem Set 2
Open chat
Need help?
Can we help?