1. How to specify model?

Asad Zaman commented as such> [1] What is the goal in creating a regression model? Ans:
We want to capture underlying causal mechanisms which relate variables in our model. This is
why regression of C on Y is correct, but Y on C is not,because causality runs from income to
consumption and not the other way around. [2] Misspecification of causal relationships is VERY
common (like running Y on C) but there is no easy way to detect and fix this problem -- all of the
complex and confusing discussion about exogeneity, endogeneity, is an attempt to tackle this
problem -- but all of these attempts are failures, because they lack understanding of the root of
the problem. This is why best definitions of exogeneity in textbooks and econometric articles are
just -- plain and simple -- WRONG. [3] This problem is too complex to handle here, so let us
assume it away. ASSUME you have managed to get the right dependent variable Y on the LHS
and all variables on the RHS are exogenous. Now the main issue is: IS your model correctly
specified? Have you included all relevant variables? If you miss an important variable, then all
your results will be wrong. For example, if you run Pakistani Consumption on Guatemala GNP
you will get a very good regression with high R-squared significant t statistics, right signs and
everything. The Guatemala GNP will be significant because you have omitted an important
variable, Pakistan GNP from the equation. [4] If you have a static model, there is a chance of
dynamic misspecification -- that is, maybe the past is relevant, but you dont know about this,
because you have not included any lagged variables. If you are omitting an important lagged
variable X(t-1) or Y(t-1) from your regression than your equation is misspecified, and the results
cannot be trusted -- like Guatemala GNP, irrelevant variables may appear to be important. [5]
The problem to test for is DYNAMIC MIS-SPECIFICATION: Is any lagged regressor significant?
The way to do this is to put all lagged regressors into the model, and do a joint F test for
significance of all of them. If this F test fails to reject the null hypothesis that all of the
coefficients are jointly zero, this means that there is no strong evidence for dynamic
misspecfication -- your model has NOT omitted a significant lagged effect. [6] Serial correlation
is just ONE special type of dynamic misspecification which is included as a VERY special case
of general dynamic misspecification which we tested by F test. If model is NOT dynamically
misspecified, than there can be no serial correlation. No need to separately test for serial
correlation [7] If F test in (5) rejects null, model IS dynamically misspecified and one needs
lagged regressors in the model -- there are SEVERAL different possible patterns which could
occur in the lagged variables -- several SPECIAL types of dynamic misspecification. Koyck lags
is one of them. Serial correlation is another one: This one says the lagged regressors affect
current period ONLY in one way, through the error which occurred in the last period. This is
one possible case -- there is no reason to make this special case the ONLY type of dynamic
misspecification that you should consider, test for, and correct.

2. How to construct a model?

Sayed Hossain commented>
Hossain Academy Note
Univariate Models
Multivariate Models