After retrieving web data and storing them using MongoDB (pymongo), we are considering to clean or format the data in a certain consistent data, filter the data using "pipeline", and make the plot using "chart" module. All the coding was performed in Jupyter Notebook.
- Create a new collection, and transfer the retrieved data (.json format) to the new data collection and make a copy for that collections using either mongo shell or cmd:
- Below is the link for the code on how to show the top 3 posted categories in one selected zone:
https://anaconda.org/tangli666/week3_hw_v2/notebook
- Below is the link for the code on how to show the relationship between the item condition and the average price:
https://anaconda.org/tangli666/week3_hw_v10/notebook
Note: in order to filter and format the 'price', some modification was made and update to a the new collection:
"""
for i in item_info.find():
try:
price = int(i['price'].split(' ')[0])
except ValueError:
price = 0
item_info.update({'_id':i['_id']},{'$set':{'price':price}})
"""
- Last, the command line for exporting the data collection to a csv file: