a. different starting point may end up with different local minimum points
Feature Scaling:
make gredent decent converge faster (with mean normalization?)
Feature choosing:
ex. know the depth and width of a house, better have a feature of area (=depth*width)