scrapy采集dmoz網(wǎng)站Home目錄下的信息

一、實驗背景

此次實驗要求我們爬取DMOZ下的Home目錄(http://www.dmoztools.net/Home/)的所有子目錄.Home子目錄下圖所示损敷。

Home子目錄

二、實驗目標

我們需要爬取Home目錄下的所有的網(wǎng)站信息诱桂,爬取時主要爬取以下內(nèi)容:
①爬取site時的當前路徑(category_path)
②目錄的目錄名(cat_name)挥等、鏈接即內(nèi)鏈(cat_url)
③site的標題(site_title)、網(wǎng)址(site_url)辞槐、簡介(site_desc)

三催蝗、網(wǎng)站分析

dmoz的網(wǎng)頁結(jié)構(gòu)總的來說是一棵樹丙号,其每一節(jié)點的子節(jié)點較多。
接下來先分析需要得到的Xpath:
1.<-----categories(目錄)----->

subcategories——目錄的div塊

這里有兩個目錄的div塊怀薛,盡管它們的id屬性值不同,但是他們的class屬性值是相同的焚碌,因此我們的目錄部分的Xpath就寫成:

'//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]'
目錄部分


進一步細化到目錄的鏈接(cat_url)和名稱(cat_name):

cat_url 的Xpath = '//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]/a/@href'
cat_name 的Xpath = '//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]/a/div/text()

2.<-----sites(網(wǎng)站)----->

sites(網(wǎng)站)部分

site的標題(site_title)叹螟、網(wǎng)址(site_url)、簡介(site_desc)的Xpath分析出來如下:

site_title的Xpath = '//div[@class="site-item "]/div[@class="title-and-desc"]/a/div[@class="site-title"]/text()'
site_url的Xpath = '/div[@class="site-item "]/div[@class="title-and-desc"]/a/@href'
site_desc的Xpath = '/div[@class="site-item "]/div[@class="title-and-desc"]/div[@class="site-descr "]/text()'

三世吨、創(chuàng)建編輯項目——home

1.新建項目
scrapy startproject home
2.在Pycharm中導入項目進行編輯
導入項目
3.編輯items.py文件
import scrapy
class HomeItem(scrapy.Item):
    category_path = scrapy.Field()
    categories = scrapy.Field()
    cat_url = scrapy.Field()
    cat_name = scrapy.Field()
    sites = scrapy.Field()
    site_url =  scrapy.Field()
    site_desc = scrapy.Field()
    site_title = scrapy.Field()
    pass
4.在spider文件夾下創(chuàng)建homeSpider.py文件,并編輯
from home.items import *
from scrapy.spiders import CrawlSpider, Rule    #CrawlSpider與Spider類的最大不同是多了一個rules參數(shù),通過rule可以定義提取動作史隆。
from scrapy.linkextractors import LinkExtractor

class HomeSpider(CrawlSpider):
    name = "home"
    start_urls = ['http://www.dmoztools.net/Home/']
    rules = (
        Rule(LinkExtractor(allow=(r'http://www.dmoztools.net/Home/.*'),
                           restrict_xpaths=('//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]')),
             callback="parse_item", follow=True),
    )
 # 利用rule的Rule類定義爬取規(guī)則粘姜;LinkExtractor孤紧,用于定義需要提取的鏈接号显;callback是當LinkExtractor獲取到鏈接時參數(shù)所指定的值作為回調(diào)函數(shù); follow指定了根據(jù)該規(guī)則從response提取的鏈接是否需要跟進揽碘。當callback為None,默認值為true。
 #其中allow后面是提取匹配 'https://curlie.org/Home/.*'掖桦;使用restrict_xpaths參數(shù)限制滞详,將對url進行選擇性抓取,抓取規(guī)定位置符合allow的url
   
 def parse_item(self, response):
        item = HomeItem()
        item['category_path'] = response.url.lstrip('http://www.dmoztools.net')   
 #python 的lstrip()方法,用于截掉字符串左邊的空格或指定字符,在這里表示只取'http://www.dmoztools.net'右邊的字符,用于提取爬取site時的當前路徑(category_path)

        item['categories'] = []
        for cat in response.xpath('//div[@class="cat-list results leaf-nodes"]/div[@class="cat-item"]'):
            child_cat = {}
            child_cat["cat_url"] = cat.xpath('a/@href').extract()
            child_cat['cat_name'] = cat.xpath('a/div/text()').extract()
            child_cat['cat_name'] = [item.replace("\r\n", "") for item in child_cat['cat_name']]
            child_cat['cat_name'] = [item.replace(" ", "") for item in child_cat['cat_name']]
            item['categories'].append(child_cat)    
 #python list append()方法用于在列表末尾添加新的對象 

        item['sites'] = []
        for site in response.xpath('//div[@class="site-item "]/div[@class="title-and-desc"]'):
            result_site = {}
            result_site['site_title'] = site.xpath('a/div[@class="site-title"]/text()').extract_first().strip()
            result_site['site_url'] = site.xpath('a/@href').extract_first().strip()
            result_site['site_desc'] = site.xpath('div[@class="site-descr "]/text()').extract_first().strip()
            item['sites'].append(result_site)
        yield item
5.編輯begin.py文件

在home和home.cfg的同級目錄下新建一個begin.py文件,然后進行編輯:

from scrapy import cmdline
cmdline.execute('scrapy crawl home -o home.jl'.split())

如果是在云服務(wù)器或者通過命令行啟動爬蟲的就不需要做這一步,只需要在對應(yīng)的項目目錄下輸入以下內(nèi)容即可巡蘸,并且可以省去后面的一個步驟

scrapy crawl home -o home.jl
6.運行前配置Pycharm

②設(shè)置路徑

③點擊運行

四嘹吨、爬取結(jié)果

給出部分吧,數(shù)據(jù)挺多的

{"category_path": "Home/Apartment_Living/", "categories": [{"cat_url": ["/Business/Real_Estate/Residential/Rentals/"], "cat_name": ["", "ApartmentLocators", ""]}, {"cat_url": ["/Home/Apartment_Living/Roommates/"], "cat_name": ["", "Roommates", ""]}, {"cat_url": ["/Society/Issues/Housing/Tenant_Rights/"], "cat_name": ["", "TenantRights", ""]}], "sites": [{"site_title": "Apartment Living", "site_url": "http://ths.gardenweb.com/forums/apt/", "site_desc": "A discussion forum for those living in apartments, condominiums and co-ops. Topics range from roommate problems, maintenance issues, leases to decorating."}, {"site_title": "Apartment Therapy", "site_url": "http://www.apartmenttherapy.com/", "site_desc": "Offers articles and advice on apartment living."}, {"site_title": "Good for Apartment Life", "site_url": "http://www.dogbreedinfo.com/apartment.htm", "site_desc": "Provides a list of dog breeds suited to apartment living. Each breed is linked to a detailed description."}, {"site_title": "Rental Decorating Digest", "site_url": "http://www.rentaldecorating.com/", "site_desc": "Tips and tricks for decorating for the renter."}, {"site_title": "TheDollarStretcher.com: How to Furnish a Studio Apartment", "site_url": "http://www.stretcher.com/stories/970929b.cfm", "site_desc": "Studio apartment solutions for \"hiding and otherwise disguising belongings that needed to be stored in a small space.\""}]}
{"category_path": "Home/Weblogs/", "categories": [{"cat_url": ["/Home/Family/Adoption/Weblogs/"], "cat_name": ["", "Adoption", ""]}, {"cat_url": ["/Home/Gardening/Bonsai_and_Suiseki/Bonsai/Weblogs/"], "cat_name": ["", "Bonsai", ""]}, {"cat_url": ["/Home/Cooking/Weblogs/"], "cat_name": ["", "Cooking", ""]}, {"cat_url": ["/Home/Consumer_Information/Electronics/Weblogs/"], "cat_name": ["", "Electronics", ""]}, {"cat_url": ["/Home/Consumer_Information/Food_and_Drink/Weblogs/"], "cat_name": ["", "FoodandDrink", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Weblogs/"], "cat_name": ["", "MoneyManagement", ""]}, {"cat_url": ["/Home/Family/Parenting/Mothers/Weblogs/"], "cat_name": ["", "Mothers", ""]}, {"cat_url": ["/Home/Family/Parenting/Fathers/Stay_at_Home_Fathers/Weblogs/"], "cat_name": ["", "StayatHomeFathers", ""]}], "sites": []}
{"category_path": "Home/Software/", "categories": [{"cat_url": ["/Home/Family/Software/"], "cat_name": ["", "Family", ""]}, {"cat_url": ["/Home/Gardening/Software/"], "cat_name": ["", "Gardening", ""]}, {"cat_url": ["/Society/Genealogy/Software/"], "cat_name": ["", "Genealogy", ""]}, {"cat_url": ["/Computers/Home_Automation/Software/"], "cat_name": ["", "HomeAutomation", ""]}, {"cat_url": ["/Home/Personal_Finance/Software/"], "cat_name": ["", "PersonalFinance", ""]}, {"cat_url": ["/Home/Cooking/Recipe_Management/"], "cat_name": ["", "RecipeManagement", ""]}], "sites": [{"site_title": "Agilaire", "site_url": "http://www.agilairecorp.com/", "site_desc": "Environmental data management solutions. Products and support."}, {"site_title": "Chief Architect Inc: Home Designer Software", "site_url": "http://www.homedesignersoftware.com/", "site_desc": "Software package for home remodelling, interior design, decks and landscaping creation. Company info, products list, shop and support."}, {"site_title": "Chore Wars", "site_url": "http://www.chorewars.com/", "site_desc": "Browser-based system loosely based on D&D that allows household members to claim experience points for doing tasks.  Monsters and treasure may be optionally defined with each chore."}, {"site_title": "Kopy Kake", "site_url": "http://www.kopykake.com/", "site_desc": "Cake decorating software for professional bakers and hobbyists."}, {"site_title": "Let's Clean Up", "site_url": "http://www.lets-clean-up.com/", "site_desc": "Software to help the family and small businesses organize cleaning chores and maintenance activities."}, {"site_title": "Punch Software", "site_url": "http://www.punchsoftware.com/", "site_desc": "Offers 3D home design suite for professional home planning with real model technology. Plan a dream house with this architecture design and 3D landscape software."}]}
{"category_path": "Home/News_and_Media/", "categories": [{"cat_url": ["/Home/News_and_Media/Radio_Programs/"], "cat_name": ["", "RadioPrograms", ""]}, {"cat_url": ["/Home/News_and_Media/Television/"], "cat_name": ["", "Television", ""]}, {"cat_url": ["/Home/Homemaking/Frugality/Publications/"], "cat_name": ["", "BudgetLiving", ""]}, {"cat_url": ["/Home/Consumer_Information/News_and_Media/"], "cat_name": ["", "ConsumerInformation", ""]}, {"cat_url": ["/Home/Cooking/Magazines_and_E-zines/"], "cat_name": ["", "Cooking", ""]}, {"cat_url": ["/Home/Family/Publications/"], "cat_name": ["", "Families", ""]}, {"cat_url": ["/Home/Gardening/News_and_Media/"], "cat_name": ["", "Gardens", ""]}, {"cat_url": ["/Home/Home_Improvement/News_and_Media/"], "cat_name": ["", "HomeImprovement", ""]}, {"cat_url": ["/Home/Family/Parenting/Magazines_and_E-zines/"], "cat_name": ["", "Parenting", ""]}, {"cat_url": ["/Recreation/Pets/News_and_Media/"], "cat_name": ["", "Pets", ""]}, {"cat_url": ["/Business/Real_Estate/News_and_Media/"], "cat_name": ["", "RealEstate", ""]}], "sites": [{"site_title": "Better Homes & Gardens", "site_url": "http://www.bhg.com/", "site_desc": "Ideas and improvement projects for your home and garden plus recipes and entertaining ideas."}, {"site_title": "Better Homes and Gardens Australia", "site_url": "http://www.bhg.com.au/", "site_desc": "Offers items from both the weekly television show and monthly magazine."}, {"site_title": "Carolina Home and Garden", "site_url": "http://www.carolinahg.com/", "site_desc": "Features articles on the arts, gardens, local people, interior decor. Also includes calendar and resources."}, {"site_title": "Coastal Living Magazine", "site_url": "http://www.coastalliving.com/", "site_desc": "Features articles on homes, decorating, travel, food and living in coastal communities."}, {"site_title": "Country Living Magazine", "site_url": "http://www.countryliving.com/", "site_desc": "Features include home decorating, recipes and antiques and collectibles."}, {"site_title": "Homes and Gardens Magazine", "site_url": "http://www.housetohome.co.uk/homesandgardens", "site_desc": "Features beautiful houses and gardens, decorating ideas, designers and decorators, unusual shops. Includes subscription information, and a range of articles online."}, {"site_title": "House to Home UK", "site_url": "http://www.housetohome.co.uk/", "site_desc": "Provides a look inside British residences, ideas for rooms, videos and guest bloggers."}, {"site_title": "Howdini", "site_url": "http://www.howdini.com/", "site_desc": "An e-zine with video tips on how to do a variety of household tasks such as cooking or caring for a new baby. Includes 'life hacks'."}, {"site_title": "Martha Stewart Living", "site_url": "http://www.marthastewart.com/", "site_desc": "Official site, with links to personal information, Martha's Scrapbook, television highlights, radio guide, virtual studio, recipes, live chat."}, {"site_title": "Mother Earth Living", "site_url": "http://www.motherearthliving.com/", "site_desc": "Offers today's health-conscious, environmentally concerned homeowners information needed to practice earth-inspired living."}, {"site_title": "Mother Earth News", "site_url": "http://www.motherearthnews.com/", "site_desc": "Features and articles covering sustainable, self-reliant living. Topics include building, gardening, homesteading, do-it-yourself, kitchen, energy and health."}, {"site_title": "Pioneer Thinking", "site_url": "http://www.pioneerthinking.com/", "site_desc": "Features articles on gardening, cooking, crafts, personal finance, and beauty."}, {"site_title": "Real Simple", "site_url": "http://www.realsimple.com/", "site_desc": "Magazine about simplifying your life. Includes home solutions, meals, special features."}, {"site_title": "SFGate.com: Home and Garden", "site_url": "http://www.sfgate.com/homeandgarden/", "site_desc": "Daily features and articles from the San Francisco Chronicle covering decorating, entertaining, gardening, and family life."}, {"site_title": "Southern Living", "site_url": "http://www.southernliving.com/", "site_desc": "Features about fine interiors, gardens, design, antiques, travel, events, and the arts."}, {"site_title": "Style at Home", "site_url": "http://www.styleathome.com/", "site_desc": "Covers interior decoration and design tips, decorating on a budget, recipes and buying guides."}, {"site_title": "Sunset Magazine and Books", "site_url": "http://www.sunset.com/", "site_desc": "News and feature articles on Western living."}, {"site_title": "The Washington Post: Home & Garden", "site_url": "http://www.washingtonpost.com/wp-dyn/home/", "site_desc": "Daily features and articles covering decorating, home improvement, gardening, pets and family life."}]}
{"category_path": "Home/Urban_Living/", "categories": [{"cat_url": ["/Home/Apartment_Living/"], "cat_name": ["", "ApartmentLiving", ""]}, {"cat_url": ["/Science/Earth_Sciences/Atmospheric_Sciences/Climatology/Urban/"], "cat_name": ["", "Climate", ""]}, {"cat_url": ["/Arts/Photography/Photographers/Urban/Fine_Art/"], "cat_name": ["", "PhotographicExhibits", ""]}, {"cat_url": ["/Society/Subcultures/"], "cat_name": ["", "Subcultures", ""]}], "sites": [{"site_title": "City Noise", "site_url": "http://www.citynoise.org/", "site_desc": "A public photoblog where people with a love for the urban form, modern world, or a general appreciation of their environment gather to post stories, narratives and often upload photos of their favourite cities, hometowns, travels, or current locations."}, {"site_title": "CityCulture", "site_url": "http://cityculture.org/", "site_desc": "World city reviews and a free test which suggests cities and regions around the world that fit your personality."}, {"site_title": "Flickr: Urban Negative", "site_url": "http://www.flickr.com/groups/urban_negative/", "site_desc": "Photos of the negative object, material and people in urban lifestyles."}, {"site_title": "Pedestrian", "site_url": "http://www.turbulence.org/Works/pedestrian/pedestrian2.html", "site_desc": "An artistic work about urban life."}, {"site_title": "Self Sufficientish", "site_url": "http://www.selfsufficientish.com/", "site_desc": "Information on growing plants, wild food recipes, and alternatives to lead a low impact urban life."}, {"site_title": "Urbansome", "site_url": "http://urbansome.com/", "site_desc": "City travel information, tips, deals, news, photography and lots of entertainment brought to you by urban enthusiasts."}]}
{"category_path": "Home/Rural_Living/", "categories": [{"cat_url": ["/Science/Social_Sciences/Economics/Agricultural_and_Rural_Economics/"], "cat_name": ["", "AgriculturalandResourceEconomics", ""]}, {"cat_url": ["/Home/Cooking/"], "cat_name": ["", "Cooking", ""]}, {"cat_url": ["/Society/People/Cowboys/"], "cat_name": ["", "Cowboys", ""]}, {"cat_url": ["/Reference/Education/K_through_12/Rural_Issues/"], "cat_name": ["", "Education", ""]}, {"cat_url": ["/Business/Agriculture_and_Forestry/Farm_Real_Estate/"], "cat_name": ["", "FarmRealEstate", ""]}, {"cat_url": ["/Health/Public_Health_and_Safety/Rural_Health/"], "cat_name": ["", "Health", ""]}, {"cat_url": ["/Home/Rural_Living/Hobby_Farms/"], "cat_name": ["", "HobbyFarms", ""]}, {"cat_url": ["/Reference/Education/K_through_12/Home_Schooling/"], "cat_name": ["", "HomeSchooling", ""]}, {"cat_url": ["/Home/Rural_Living/Homesteading/"], "cat_name": ["", "Homesteadi-ng", ""]}, {"cat_url": ["/Science/Environment/Water_Resources/Wastewater/Household_Wastewater_Management/"], "cat_name": ["", "HouseholdWastewaterManagement", ""]}, {"cat_url": ["/Society/Lifestyle_Choices/Intentional_Communities/"], "cat_name": ["", "IntentionalCommunities", ""]}, {"cat_url": ["/Home/Rural_Living/Personal_Pages/"], "cat_name": ["", "PersonalPages", ""]}, {"cat_url": ["/Science/Technology/Energy/Renewable/"], "cat_name": ["", "RenewableEnergy", ""]}, {"cat_url": ["/Science/Social_Sciences/Sociology/Rural_Sociology/"], "cat_name": ["", "RuralSociology", ""]}, {"cat_url": ["/Science/Agriculture/Sustainable_Agriculture/"], "cat_name": ["", "SustainableAgriculture", ""]}, {"cat_url": ["/Business/Construction_and_Maintenance/Building_Types/Sustainable_Architecture/"], "cat_name": ["", "SustainableArchitecture", ""]}, {"cat_url": ["/Society/Lifestyle_Choices/Voluntary_Simplicity/"], "cat_name": ["", "VoluntarySimplicity", ""]}], "sites": [{"site_title": "Countryside Magazine", "site_url": "http://countrysidenetwork.com/daily/", "site_desc": "Selected articles from the printed magazine for readers seeking voluntary simplicity and greater self-reliance with emphasis on home food production. Gardening, cooking, food preservation, and livestock. Has an active forum."}, {"site_title": "DTN: The Progressive Farmer", "site_url": "https://www.dtnpf.com/agriculture/web/ag/home", "site_desc": "Forums and news for farmers and people involved in rural issues."}, {"site_title": "Internet Hay Exchange", "site_url": "http://www.hayexchange.com/", "site_desc": "Free International hay listing service for the US and Canada."}, {"site_title": "Kountry Life", "site_url": "http://www.kountrylife.com/index.htm", "site_desc": "An interactive country and rural living site with discussion forums, photo gallery, articles, how-to information, humor, sounds, and recipes."}, {"site_title": "Renewing the Countryside", "site_url": "http://www.renewingthecountryside.org/", "site_desc": "Aims to strengthen rural areas by highlighting the initiatives and projects of rural communities, farmers, artists, entrepreneurs, educators, and activists."}, {"site_title": "Rural Living Canada", "site_url": "http://rurallivingcanada.4t.com/", "site_desc": "A concise directory of Canadian non-urban lifestyle information, news and websites."}, {"site_title": "Soil and Health Library", "site_url": "http://www.soilandhealth.org/", "site_desc": "A free how-to and encouragement resource for self-supporters with little cash, for non-domineering environmentalists, and folks frustrated with urbanity."}, {"site_title": "The Urban Rancher", "site_url": "http://theurbanrancher.tamu.edu/", "site_desc": "The Texas A&M University site dedicated to improving rural living with information on natural resources, rural life, and the urban-rural interface."}]}
{"category_path": "Home/Personal_Organization/", "categories": [{"cat_url": ["/Home/Personal_Organization/Consultants/"], "cat_name": ["", "Consultant-s", ""]}, {"cat_url": ["/Business/Business_Services/Office_Services/Secretarial_Services_and_Virtual_Assistants/"], "cat_name": ["", "VirtualAssistants", ""]}, {"cat_url": ["/Health/Services/Health_Records_Services/"], "cat_name": ["", "HealthRecordsServices", ""]}, {"cat_url": ["/Society/Crime/Theft/Identity_Theft/"], "cat_name": ["", "IdentityTheft", ""]}, {"cat_url": ["/Shopping/Home_and_Garden/Furniture/Storage/"], "cat_name": ["", "StorageFurnitureShopping", ""]}, {"cat_url": ["/Business/Management/Education_and_Training/Time_Management/"], "cat_name": ["", "TimeManagement", ""]}, {"cat_url": ["/Reference/Knowledge_Management/Information_Overload/"], "cat_name": ["", "InformationOverload", ""]}, {"cat_url": ["/Computers/Software/Operating_Systems/Microsoft_Windows/Software/Shareware/Home_and_Hobby/Financial%2C_Insurance_and_Home_Inventory/"], "cat_name": ["", "InventorySoftware", ""]}, {"cat_url": ["/Computers/Mobile_Computing/"], "cat_name": ["", "MobileComputing", ""]}, {"cat_url": ["/Computers/Software/Freeware/Personal_Information_Managers/"], "cat_name": ["", "PersonalInformationManagersFreeware", ""]}, {"cat_url": ["/Computers/Software/Operating_Systems/Microsoft_Windows/Software/Shareware/Personal_Information_Managers/"], "cat_name": ["", "PersonalInformationManagersShareware", ""]}, {"cat_url": ["/Computers/Internet/On_the_Web/Web_Applications/Personal_Information_Managers/"], "cat_name": ["", "PersonalInformationManagersWebApplications", ""]}], "sites": [{"site_title": "43 Folders", "site_url": "http://www.43folders.com/", "site_desc": "Aids to living including (but not limited to) productivity and time management tips, Mac OS X programs and technologies, ideas about modest ways to improve a person's life and reduce stress, and cool or helpful shortcuts that makes life a bit easier."}, {"site_title": "52 Projects", "site_url": "http://www.52projects.com/", "site_desc": "Motivating people to work on whatever projects they have long wanted or needed to do."}, {"site_title": "ABCs of Life Skills", "site_url": "http://lifeskills.endlex.com/", "site_desc": "Articles and references about organizing a successful life. Divided into general knowledge, money, work, family, health, and communication."}, {"site_title": "Checklists.com", "site_url": "http://www.checklists.com/atoz.html", "site_desc": "Free checklists on a large number of activities."}, {"site_title": "Clutterbug Network", "site_url": "http://www.clutterbug.net/", "site_desc": "Organizer directory, free newsletter."}, {"site_title": "Creative Homemaking - Organize", "site_url": "http://www.creativehomemaking.com/organize_1.shtml", "site_desc": "Offers tips for home, family, clutter control, holidays, cooking and time management. Includes a newsletter and check lists."}, {"site_title": "Flylady.net", "site_url": "http://www.flylady.net/", "site_desc": "Offers a system for organizing and managing a home, based on the concept of daily routines and a focus on small, time- and space-limited tasks. Provides resources, tips and newsletter."}, {"site_title": "Get Organized Now", "site_url": "http://www.getorganizednow.com/", "site_desc": "Offers tools, ideas and articles. Features monthly checklists, a discussion forum, e-courses and a newsletter."}, {"site_title": "I Need More Time", "site_url": "http://ineedmoretime.com/", "site_desc": "Free organizing tips, ideas and articles.  Sells additional tips as an ebook."}, {"site_title": "Lifehack.org", "site_url": "http://www.lifehack.org/", "site_desc": "Pointers on productivity, getting things done and lifehacks."}, {"site_title": "List Organizer", "site_url": "http://listorganizer.com/", "site_desc": "Offers planning lists with time management instructions for home, personal, travel, budgets, children, pets. Includes a newsletter."}, {"site_title": "Messies Anonymous", "site_url": "http://www.messies.com/", "site_desc": "Dedicated to bringing harmony in the home through understanding and aiding the messie mindset. Provides resources, FAQ and newsletter."}, {"site_title": "National Association of Professional Organizers (NAPO)", "site_url": "http://www.napo.net/", "site_desc": "Non-profit educational association whose members include organizing consultants, speakers, trainers, authors, and manufacturers of organizing products. Includes membership information, events, chapters, FAQ and a newsletter."}, {"site_title": "OrganizeTips", "site_url": "http://www.organizetips.com/", "site_desc": "Tips on organizing daily life.  Free planners, organizers, free software for home, office, wedding, moving, pregnancy, holiday and budget."}, {"site_title": "Printable Checklists", "site_url": "http://www.printablechecklists.com/", "site_desc": "Printable charts and checklists on topics such as parenting, children and special occasions. Includes a newsletter."}, {"site_title": "Professional Organizer Academy", "site_url": "http://professionalorganizeracademy.com/", "site_desc": "Online training academy for professional organizers."}, {"site_title": "Professional Organizers Web Ring", "site_url": "http://www.organizerswebring.com/", "site_desc": "Works to promote the field and to provide information on products and services. Includes events and FAQ."}]}
{"category_path": "Home/Personal_Finance/", "categories": [{"cat_url": ["/Reference/Education/Colleges_and_Universities/Financial_Aid/"], "cat_name": ["", "CollegeFinancialAid", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Debt_and_Bankruptcy/"], "cat_name": ["", "DebtandBankruptcy", ""]}, {"cat_url": ["/Society/Issues/Violence_and_Abuse/Elder/Financial_Abuse/"], "cat_name": ["", "ElderFraudIssues", ""]}, {"cat_url": ["/Society/Law/Legal_Information/Estate_Planning_and_Administration/"], "cat_name": ["", "EstatePlanning", ""]}, {"cat_url": ["/Society/Crime/Theft/Identity_Theft/"], "cat_name": ["", "IdentifyTheftIssues", ""]}, {"cat_url": ["/Home/Personal_Finance/Insurance/"], "cat_name": ["", "Insurance", ""]}, {"cat_url": ["/Home/Personal_Finance/Investing/"], "cat_name": ["", "Investing", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Loans/"], "cat_name": ["", "Loans", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/"], "cat_name": ["", "MoneyManagement", ""]}, {"cat_url": ["/Home/Personal_Finance/Money_Management/Loans/Home/"], "cat_name": ["", "Mortgages", ""]}, {"cat_url": ["/Home/Personal_Finance/Philanthropy/"], "cat_name": ["", "Philanthro-py", ""]}, {"cat_url": ["/Home/Personal_Finance/Retirement/"], "cat_name": ["", "Retirement", ""]}, {"cat_url": ["/Home/Personal_Finance/Software/"], "cat_name": ["", "Software", ""]}, {"cat_url": ["/Home/Personal_Finance/Tax_Preparation/"], "cat_name": ["", "TaxPreparatio-n", ""]}, {"cat_url": ["/Home/Personal_Finance/Unclaimed_Money/"], "cat_name": ["", "UnclaimedMoney", ""]}], "sites": [{"site_title": "20 Something Finance", "site_url": "http://20somethingfinance.com/", "site_desc": "Articles focused on helping young people manage their money."}, {"site_title": "AARP - Money and Work", "site_url": "http://www.aarp.org/money/", "site_desc": "Discussion of money matters in considerable depth, especially those related to people who have retired or are planning to in the near future."}, {"site_title": "About.com: Financial Planning", "site_url": "http://financialplan.about.com/", "site_desc": "Information on personal financial planning, including budgeting, savings, investing, retirement, insurance, and taxes."}, {"site_title": "American Savings Education Council", "site_url": "http://www.asec.org/", "site_desc": "A coalition of government and industry institutions to educate people on all aspects of personal finance and wealth development, including credit management, college savings, home purchase, and retirement planning."}, {"site_title": "Bankrate.com", "site_url": "http://www.bankrate.com/", "site_desc": "An online publication that provides consumers with  financial data, research and editorial information on       non-investment financial products."}, {"site_title": "CCH Financial Planning Toolkit", "site_url": "http://www.finance.cch.com/", "site_desc": "Information to manage one's personal finances, including investments, insurance, risk and asset management strategies, and tax, retirement and estate planning."}, {"site_title": "CNBC", "site_url": "http://www.cnbc.com/", "site_desc": "Headline news, articles, reports, stocks and quotes, message boards, and a stock ticker."}, {"site_title": "CNN/Money", "site_url": "http://money.cnn.com/", "site_desc": "Combines practical personal finance advice, calculators and investing tips with business news, stock quotes, and financial market coverage from the editors of CNN and Money Magazine."}, {"site_title": "ConsumerReports.org: Money", "site_url": "http://www.consumerreports.org/cro/money/index.htm", "site_desc": "Information about adjustable rate mortgage, investment tools, and personal finance tips."}, {"site_title": "Federal Reserve Board: Consumer Information", "site_url": "http://www.federalreserve.gov/consumers.htm", "site_desc": "Centralized home for articles giving advice and warnings about financial topics, products, and scams."}, {"site_title": "Forbes.com - Personal Finance", "site_url": "http://www.forbes.com/finance/", "site_desc": "Financial information previously published in the print version of Forbes."}, {"site_title": "I Retire Early", "site_url": "http://www.iretireearly.com/", "site_desc": "How to save money, advance your career and manage your finances."}, {"site_title": "Inflation Calculator", "site_url": "http://www.westegg.com/inflation/", "site_desc": "Adjusts a given amount of money for inflation, according to the Consumer Price Index."}, {"site_title": "Institute of Consumer Financial Education", "site_url": "http://www.financial-education-icfe.org/", "site_desc": "Offers financial education for all age groups with a special section devoted to teaching children about money."}, {"site_title": "International Foundation for Retirement Education (InFRE)", "site_url": "http://www.infre.org/", "site_desc": "A not-for-profit educational foundation dedicated to empowering working Americans with the motivation and capability to save and plan for a successful retirement."}, {"site_title": "Joe Taxpayer", "site_url": "http://www.joetaxpayer.com/", "site_desc": "Blog covering nearly all of the personal finance topics, with an emphasis on taxes."}, {"site_title": "The JumpStart Coalition for Personal Financial Literacy", "site_url": "http://www.jumpstart.org/", "site_desc": "Purpose is to evaluate the financial literacy of young adults; develop, disseminate, and encourage the use of guidelines for grades K-12; and promote the teaching of personal finance."}, {"site_title": "Kiplinger Online", "site_url": "http://www.kiplinger.com/", "site_desc": "Investing, personal finance, calculators and financial advice."}, {"site_title": "Marketplace", "site_url": "http://www.marketplace.org/", "site_desc": "Public radio business and economic news and commentary."}, {"site_title": "MarketWatch.com: Personal Finance", "site_url": "http://www.marketwatch.com/personal-finance", "site_desc": "Tips and stories for managing your personal finances."}, {"site_title": "Money Instructor", "site_url": "http://www.moneyinstructor.com/", "site_desc": "Tools and information to help   teach money and money management, business, the economy, and investing."}, {"site_title": "Money Talks News", "site_url": "http://www.moneytalksnews.com/", "site_desc": "Tips and advice to help you spend less and save more."}, {"site_title": "The Money Ways", "site_url": "http://www.themoneyways.com/", "site_desc": "Advice on money management, saving money, and budgeting."}, {"site_title": "The Motley Fool", "site_url": "http://www.fool.com/", "site_desc": "Investing information and an enjoyably useful site. Updated hourly."}, {"site_title": "MsMoney.com", "site_url": "http://www.msmoney.com/", "site_desc": "Resource for women to learn about financial planning, personal finance and investing."}, {"site_title": "Mymoney.gov - Financial Literacy Education Commission", "site_url": "http://www.mymoney.gov/", "site_desc": "Starting point for information intended by the US government to help improve the financial literacy and education of persons in the United States."}, {"site_title": "The New York Times - Your Money", "site_url": "http://www.nytimes.com/pages/business/yourmoney/", "site_desc": "Articles and features on investing, pensions and retirement plans, mortgage rates, mutual funds, the stock market, bonds and notes.  Also has company research, earnings reports and market insight."}, {"site_title": "Wealth Informatics", "site_url": "http://www.wealthinformatics.com/", "site_desc": "Blog discussing everyday finance topics."}, {"site_title": "Yahoo! Finance", "site_url": "http://finance.yahoo.com/", "site_desc": "Personal finance, investing tips and news."}, {"site_title": "Your Money Page", "site_url": "http://www.yourmoneypage.com/", "site_desc": "Online calculators for financial planning and personal finance."}]}
{"category_path": "Home/Moving_and_Relocating/", "categories": [{"cat_url": ["/Home/Consumer_Information/Home_and_Family/Moving_and_Relocating/"], "cat_name": ["", "ConsumerInformation", ""]}, {"cat_url": ["/Business/Business_Services/Corporate_Relocation/"], "cat_name": ["", "CorporateRelocation", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/Moving/"], "cat_name": ["", "Moving", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/Publications/"], "cat_name": ["", "Publicatio-ns", ""]}, {"cat_url": ["/Business/Real_Estate/Agents_and_Agencies/"], "cat_name": ["", "RealEstateAgencies", ""]}, {"cat_url": ["/Business/Real_Estate/By_Region/"], "cat_name": ["", "RealEstatebyCountry", ""]}, {"cat_url": ["/Business/Real_Estate/Residential/Rentals/"], "cat_name": ["", "ApartmentsandRentals", ""]}, {"cat_url": ["/Society/Gay%2C_Lesbian%2C_and_Bisexual/Home_and_Living/Moving_and_Relocating/"], "cat_name": ["", "Gay,Lesbian,andBisexual", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/International_Relocation/"], "cat_name": ["", "Internatio-nalRelocation", ""]}, {"cat_url": ["/Home/Moving_and_Relocating/Military_Relocation/"], "cat_name": ["", "MilitaryRelocation", ""]}, {"cat_url": ["/Business/Real_Estate/Residential/Rentals/Students/"], "cat_name": ["", "StudentHousing", ""]}], "sites": []}
{"category_path": "Home/Homeowners/", "categories": [{"cat_url": ["/Home/Home_Improvement/Decorating/"], "cat_name": ["", "Decorating", ""]}, {"cat_url": ["/Home/Home_Improvement/Design_and_Construction/"], "cat_name": ["", "DesignandConstruction", ""]}, {"cat_url": ["/Home/Do-It-Yourself/"], "cat_name": ["", "Do-It-Yourself", ""]}, {"cat_url": ["/Home/Home_Improvement/Energy_Efficiency/"], "cat_name": ["", "EnergyEfficiency", ""]}, {"cat_url": ["/Society/Religion_and_Spirituality/Taoism/Feng_Shui/"], "cat_name": ["", "FengShui", ""]}, {"cat_url": ["/Home/Home_Improvement/Automation/"], "cat_name": ["", "HomeAutomation", ""]}, {"cat_url": ["/Home/Homeowners/Home_Buyers/"], "cat_name": ["", "HomeBuyers", ""]}, {"cat_url": ["/Home/Home_Improvement/"], "cat_name": ["", "HomeImprovement", ""]}, {"cat_url": ["/Home/Homeowners/Homeowner_Associations/"], "cat_name": ["", "HomeownerAssociatio-ns", ""]}, {"cat_url": ["/Business/Real_Estate/Residential/Cooperatives/"], "cat_name": ["", "HousingCooperatives", ""]}, {"cat_url": ["/Science/Environment/Air_Quality/Indoor_Air_Quality/"], "cat_name": ["", "IndoorAirQuality", ""]}, {"cat_url": ["/Home/Homeowners/Pest_Control/"], "cat_name": ["", "PestControl", ""]}, {"cat_url": ["/Shopping/Home_and_Garden/Outdoor_Structures/Playsets/"], "cat_name": ["", "Playhouses", ""]}, {"cat_url": ["/Home/Home_Improvement/Restoration/"], "cat_name": ["", "Restoration", ""]}, {"cat_url": ["/Home/Homeowners/Treehouses/"], "cat_name": ["", "Treehouses", ""]}], "sites": [{"site_title": "The Condominium Bluebook (Condominium Laws)", "site_url": "http://www.condobook.com/", "site_desc": "The complete guide to the operations of condominiums, planned developments and other common interest developments in California."}, {"site_title": "EPA.gov - Refrigerant-22 Phaseout", "site_url": "http://www.epa.gov/ozone/title6/phaseout/22phaseout.html", "site_desc": "EPA site with homeowner information for consideration when purchasing or repairing a residential system or heat pump."}, {"site_title": "Home Owners Information Center", "site_url": "http://www.ourfamilyplace.com/homeowner/", "site_desc": "Guide to remodeling, refinancing, household budgets and getting the most enjoyment from your home."}, {"site_title": "Homebuilding Pitfalls", "site_url": "http://www.homebuildingpitfalls.com/", "site_desc": "Aims to help consumers save time, money and stress by providing advice on how to properly building their own home"}, {"site_title": "Popular Mechanics: Home Improvement", "site_url": "http://www.popularmechanics.com/home_journal/home_improvement/", "site_desc": "Project information for home and garden. Furniture making,  gardening, home improvement, tools, homeowner's clinic, how it works section. Illustrated."}, {"site_title": "WSJ.com's Real Estate Journal", "site_url": "http://www.realestatejournal.com/", "site_desc": "A guide to buying, selling and maintaining a home."}]}
{"category_path": "Home/Homemaking/", "categories": [{"cat_url": ["/Home/Homemaking/Cleaning_and_Stains/"], "cat_name": ["", "CleaningandStains", ""]}, {"cat_url": ["/Home/Home_Improvement/Decorating/"], "cat_name": ["", "Decorating", ""]}, {"cat_url": ["/Home/Homemaking/Frugality/"], "cat_name": ["", "Frugality", ""]}, {"cat_url": ["/Health/Alternative/Non-Toxic_Living/"], "cat_name": ["", "Non-ToxicLiving", ""]}, {"cat_url": ["/Home/Family/Parenting/"], "cat_name": ["", "Parenting", ""]}, {"cat_url": ["/Home/Personal_Organization/"], "cat_name": ["", "PersonalOrganization", ""]}, {"cat_url": ["/Home/Homemaking/Christian/"], "cat_name": ["", "Christian", ""]}, {"cat_url": ["/Home/Homemaking/Celebrity_Homemakers/"], "cat_name": ["", "CelebrityHomemakers", ""]}, {"cat_url": ["/Home/Homemaking/News_and_Media/"], "cat_name": ["", "NewsandMedia", ""]}], "sites": [{"site_title": "About.com - Housekeeping", "site_url": "http://housekeeping.about.com/", "site_desc": "Offers cleaning articles, how-to's, and product reviews. Includes a newsletter and message board."}, {"site_title": "All About Home by Service Master", "site_url": "http://www.allabouthome.com/", "site_desc": "Offers advice on topics including seasonal issues and disaster preparedness. Features a virtual tour and measurement calculators."}, {"site_title": "AOL Living", "site_url": "http://living.aol.com/", "site_desc": "Features homemaking help, organizational tips, recipes, beauty advice, and decorating ideas."}, {"site_title": "Barefoot Lass's Hints & Tips", "site_url": "http://members.tripod.com/~Barefoot_Lass/", "site_desc": "Offers information on topics such as removing crayon marks from walls, finding the best hangover cure and alternative uses for cola. Includes awards and information about trigminal neuralgia."}, {"site_title": "Berkeley Parents Network - Advice About Household Management", "site_url": "http://parents.berkeley.edu/advice/household/index.html", "site_desc": "Offers advice on topics such as health and safety, household organization, cleaning and laundry. Includes subscription information."}, {"site_title": "Bob Allison's Ask Your Neighbor - Helpful Household Hints", "site_url": "http://www.askyourneighbor.com/hhints.htm", "site_desc": "Features tips shared on Bob Allison's Ask your Neighbor radio program. Includes instructions for making cleaning products."}, {"site_title": "Creative Homemaking", "site_url": "http://www.creativehomemaking.com/", "site_desc": "Offers organization, decorating, crafts, frugal living and parenting hints. Includes holiday ideas, recipes and a newsletter."}, {"site_title": "DontForgetTheMilk.com", "site_url": "http://www.dontforgetthemilk.com/", "site_desc": "Creates shopping lists sorted by store and price. Features a message board. Requires free registration."}, {"site_title": "eHow: Organize Your Closet", "site_url": "http://www.ehow.com/closet-organizing/", "site_desc": "Full-length article covers the process of organizing the items in a closet."}, {"site_title": "The F.U.N. Place - Families United on the Net", "site_url": "http://www.thefunplace.com/", "site_desc": "Offers home tips, recipes, crafts and parenting articles. Includes forums, chat and a newsletter."}, {"site_title": "Forums for the Chaotic Home", "site_url": "http://ths.gardenweb.com/forums/", "site_desc": "Offers discussion forums on topics such as the house, cooking, crafts and hobbies and the family. Includes information on meetings."}, {"site_title": "Hints and Things", "site_url": "http://www.hintsandthings.com/", "site_desc": "Offers advice that used to be passed down from generation to generation. Features competitions and a newsletter."}, {"site_title": "Hints from Heloise", "site_url": "http://www.heloise.com/", "site_desc": "Offers tips for the home, garden and travel."}, {"site_title": "Home Made Simple", "site_url": "http://www.homemadesimple.com/", "site_desc": "Includes features on home decorating, gardening, and organizing, with ideas to simplify, organize, beautify and inspire life."}, {"site_title": "Homemaking School for Children", "site_url": "http://theparentsite.com/parenting/homemakingschool.asp", "site_desc": "Discusses how to teach children lessons in house cleaning and responsibility. By Monica Resinger."}, {"site_title": "Household Hints by Myra L. Fitch", "site_url": "http://lonestar.texas.net/~fitch/hints/hints.html#stainlesssteel", "site_desc": "Offers advice on topics such as getting whites white and removing hard water marks on polished marble. Includes reactions from readers."}, {"site_title": "Joey Green's WackyUses.com", "site_url": "http://www.wackyuses.com/", "site_desc": "Offers little-known uses for well-known products. Includes histories and facts behind the products."}, {"site_title": "John's House", "site_url": "http://johnshouse.itgo.com/", "site_desc": "Offers advice on topics such as food storage, removing pet hair from clothing and preventing dust build-up on television screens. Includes author's profile."}, {"site_title": "The New Homemaker", "site_url": "http://www.thenewhomemaker.com/", "site_desc": "Offers advice and resources on topics including parenting, thriftiness, kitchen, family health, crafts, decorating, and organization. Features news and chat."}, {"site_title": "Old Fashioned Living", "site_url": "http://oldfashionedliving.com/", "site_desc": "Presents old-fashioned traditions for the modern family. Features a newsletter and discussion forum."}, {"site_title": "Organized Home", "site_url": "http://organizedhome.com/", "site_desc": "Offers articles on uncluttering the house, cutting mealtime chaos, streamlining storage and finding more time. Includes a newsletter."}, {"site_title": "Robbie's Kitchen - Household Tips & Tricks", "site_url": "http://www.robbiehaf.com/RobbiesKitchen/RobbiesHints.html", "site_desc": "Offers tips for cleaning, cooking, laundry, and home remedies. Features a message board."}, {"site_title": "Seeking Sources", "site_url": "http://www.seekingsources.com/", "site_desc": "Offers articles about cooking, home finance and the holidays. Topics include gifts from the kitchen, how to take a financial inventory and how to choose the right cookware."}, {"site_title": "Uses for Vinegar", "site_url": "http://www.angelfire.com/cantina/homemaking/vinegar.html", "site_desc": "Offers suggestions on topics including cleaning tools, getting rid of an upset stomach and laundry care."}]}
{"category_path": "Home/Home_Improvement/", "categories": [{"cat_url": ["/Home/Home_Improvement/Appliances/"], "cat_name": ["", "Appliances", ""]}, {"cat_url": ["/Home/Home_Improvement/Bathrooms/"], "cat_name": ["", "Bathrooms", ""]}, {"cat_url": ["/Home/Home_Improvement/Exterior/"], "cat_name": ["", "Exterior", ""]}, {"cat_url": ["/Home/Home_Improvement/Floors/"], "cat_name": ["", "Floors", ""]}, {"cat_url": ["/Home/Home_Improvement/Furniture/"], "cat_name": ["", "Furniture", ""]}, {"cat_url": ["/Home/Home_Improvement/Kitchens/"], "cat_name": ["", "Kitchens", ""]}, {"cat_url": ["/Home/Home_Improvement/Storage/"], "cat_name": ["", "Storage", ""]}, {"cat_url": ["/Home/Home_Improvement/Walls/"], "cat_name": ["", "Walls", ""]}, {"cat_url": ["/Home/Home_Improvement/Windows_and_Doors/"], "cat_name": ["", "WindowsandDoors", ""]}, {"cat_url": ["/Home/Home_Improvement/Automation/"], "cat_name": ["", "Automation", ""]}, {"cat_url": ["/Home/Home_Improvement/Climate_Control/"], "cat_name": ["", "ClimateControl", ""]}, {"cat_url": ["/Home/Home_Improvement/Decorating/"], "cat_name": ["", "Decorating", ""]}, {"cat_url": ["/Home/Home_Improvement/Electrical/"], "cat_name": ["", "Electrical", ""]}, {"cat_url": ["/Home/Home_Improvement/Energy_Efficiency/"], "cat_name": ["", "EnergyEfficiency", ""]}, {"cat_url": ["/Home/Home_Improvement/Lighting/"], "cat_name": ["", "Lighting", ""]}, {"cat_url": ["/Home/Home_Improvement/Painting/"], "cat_name": ["", "Painting", ""]}, {"cat_url": ["/Home/Home_Improvement/Plumbing/"], "cat_name": ["", "Plumbing", ""]}, {"cat_url": ["/Home/Home_Improvement/Restoration/"], "cat_name": ["", "Restoratio-n", ""]}, {"cat_url": ["/Home/Home_Improvement/Safety_and_Security/"], "cat_name": ["", "SafetyandSecurity", ""]}, {"cat_url": ["/Home/Home_Improvement/Welding_and_Soldering/"], "cat_name": ["", "WeldingandSoldering", ""]}, {"cat_url": ["/Home/Home_Improvement/Chats_and_Forums/"], "cat_name": ["", "ChatsandForums", ""]}, {"cat_url": ["/Business/Construction_and_Maintenance/Commercial_Contractors/"], "cat_name": ["", "CommercialContractors", ""]}, {"cat_url": ["/Home/Home_Improvement/Design_and_Construction/"], "cat_name": ["", "DesignandConstructi-on", ""]}, {"cat_url": ["/Home/Home_Improvement/Glossaries/"], "cat_name": ["", "Glossaries", ""]}, {"cat_url": ["/Home/Home_Improvement/News_and_Media/"], "cat_name": ["", "NewsandMedia", ""]}, {"cat_url": ["/Home/Home_Improvement/Tools_and_Equipment/"], "cat_name": ["", "ToolsandEquipment", ""]}], "sites": [{"site_title": "411 Home Repair", "site_url": "http://www.411homerepair.com/", "site_desc": "Collection of short articles offering tips for home repair, gardens, tools, and appliances."}, {"site_title": "Acme How To", "site_url": "http://www.acmehowto.com/", "site_desc": "Articles on tasks and repairs around the home and garden."}, {"site_title": "Around the House", "site_url": "http://www.thefunplace.com/house/home/", "site_desc": "Articles on pool maintenance, water softeners, lead exposure, doors and windows, and heating safely."}, {"site_title": "Ask The Builder", "site_url": "http://www.askthebuilder.com/", "site_desc": "Information about home building and remodeling."}, {"site_title": "AskToolTalk.com", "site_url": "http://www.asktooltalk.com/", "site_desc": "Home improvement experts feature articles, product reviews, links to manufacturers, and online shopping."}, {"site_title": "Beaver House Addition", "site_url": "http://adam_sb.tripod.com/beaveraddition/", "site_desc": "Homeowner's journal of large, residential building addition project spanning several years. Includes photographs, descriptions, and stories."}, {"site_title": "BobVila.com", "site_url": "http://www.bobvila.com/", "site_desc": "Home improvement projects, featured products, tip library, bulletin board, designer tools, and information about television programs hosted by Bob Vila."}, {"site_title": "Construction Resource", "site_url": "http://www.construction-resource.com/", "site_desc": "Employment listings, forums, how to articles, and calculators."}, {"site_title": "Consumer Information Center: Housing", "site_url": "http://publications.usa.gov/USAPubs.php?CatID=8", "site_desc": "Collection of home maintenance articles ranging from two to thirty-six pages available for download or purchase. Most include detailed information and illustrations. From the United States Federal Consumer Information Center."}, {"site_title": "Dave's Shop Talk: Building Confidence", "site_url": "http://daveosborne.com/dave/index.php", "site_desc": "Features articles and plans for the do it yourself person on home renovations and home repair, from a professional carpenter, renovator and contractor."}, {"site_title": "DIY Doctor", "site_url": "http://www.diydoctor.org.uk/projects.htm", "site_desc": "Collection of articles and photographs covering a range of home projects."}, {"site_title": "Diy Fix It", "site_url": "http://www.diyfixit.co.uk/", "site_desc": "Collection of tips and advice covering building and repairs, plumbing, tiling, electrical, wallpapering, painting, and decorating."}, {"site_title": "DIY Not", "site_url": "http://www.diynot.com/", "site_desc": "Information including an encyclopedia, regular articles and a discussion forum."}, {"site_title": "DIY Repair and Home Improvement Forums", "site_url": "http://www.diychatroom.com/", "site_desc": "A community of homeowners and contractors sharing knowledge on painting, construction, electrical, plumbing, carpentry, flooring, and landscaping."}, {"site_title": "DIYData.com", "site_url": "http://www.diydata.com/", "site_desc": "Collection of short articles on topics including plumbing, painting, tool usage, and other repair projects around the home."}, {"site_title": "DIYonline.com", "site_url": "http://www.diyonline.com/", "site_desc": "Articles and step-by-step tutorials with photographs for a wide variety of home improvement projects, glossary, cost calculators (United States-based), and design tools."}, {"site_title": "HammerZone.com", "site_url": "http://www.hammerzone.com/", "site_desc": "Photographs and step-by-step instructions for electrical, plumbing, kitchen, bath, windows and doors, exterior, flooring, and carpentry projects."}, {"site_title": "Handymanwire", "site_url": "http://www.handymanwire.com/", "site_desc": "Collection of articles, tips, FAQs, and forums for a variety of home improvement projects."}, {"site_title": "Helpwithdiy.com", "site_url": "http://www.helpwithdiy.com/", "site_desc": "Illustrated tutorials on topics including plumbing, painting, tiling, and decorating."}, {"site_title": "Home Improvement", "site_url": "http://www.home-improvement-home-improvement.com/", "site_desc": "Do-it-yourself articles cover interior and exterior, decks and patios, and decorating."}, {"site_title": "Home Repair", "site_url": "http://homerepair.about.com/index.htm", "site_desc": "Information on do-it-yourself projects or major renovations. Offers time and money saving techniques, diagrams, and links to other home repair sites."}, {"site_title": "Home Repair Stuff", "site_url": "http://www.factsfacts.com/MyHomeRepair/", "site_desc": "Frequently asked questions from the alt.home.repair newsgroup, with plumbing, carpentry, and tool tips."}, {"site_title": "Home Repairs", "site_url": "http://www.repair-home.com/", "site_desc": "Home repair tips, discussion forum and contractor search utility."}, {"site_title": "Home Tips", "site_url": "http://www.hometips.com/", "site_desc": "Home repair and improvement advice and tips including \"How Your House Works\", a manual by Don Vandervort, buying guides, ideas for home and shop."}, {"site_title": "HomeDoctor.net", "site_url": "http://homedoctor.net/", "site_desc": "Short articles covering appliance repair, pest control, electrical repairs, energy savings, and heating and cooling in the home. Also includes discussion forums."}, {"site_title": "HomeImprove.com", "site_url": "http://homeimprove.com/", "site_desc": "Collection of short articles on dozens of home improvement and repair topics."}, {"site_title": "Hometime", "site_url": "http://www.hometime.com/Howto/howto.htm", "site_desc": "An assortment of how-to guides, with manufacturer and safety information."}, {"site_title": "How Not to Build an Addition", "site_url": "http://www.homehumor.com/", "site_desc": "All aspects of home improvement from getting a loan to finish carpentry.  Humor and practical advice."}, {"site_title": "Jackie Craven: The Fix", "site_url": "http://jackiecraven.com/fixit/thefix.htm", "site_desc": "Article archive with answers on a variety of subjects including kitchen and bath, paint and wallpaper, home design, and pests."}, {"site_title": "Jerry Built", "site_url": "http://jerrybuilt.com/", "site_desc": "Woodworking and home-improvement site offering the opportunity to participate in the construction of an actual online project, by joining the \"team\" and submitting ideas, plans, and criticisms."}, {"site_title": "Kenovations", "site_url": "http://www.kenovations.net/", "site_desc": "A homeowner's step-by-step guide to renovating a 1968 Cape Cod home."}, {"site_title": "Mobile Home Doctor", "site_url": "http://www.mobilehomedoctor.com/", "site_desc": "Short articles offering tips for repair and improvement specific to mobile homes. Includes a tutorial on construction of mobile homes."}, {"site_title": "Mobile Home Repair", "site_url": "http://www.mobilehomerepair.com/", "site_desc": "Advice; hardboard siding lawsuit update."}, {"site_title": "Move, Inc. Home Improvement", "site_url": "http://www.realtor.com/advice/home-improvement/", "site_desc": "Offers a variety of how-to guides and do-it-yourself related calculators."}, {"site_title": "National Association of the Remodeling Industry", "site_url": "http://www.nari.org/", "site_desc": "Tips on planning, what to look for when hiring a pro."}, {"site_title": "National Kitchen and Bath Association", "site_url": "http://www.nkba.org/", "site_desc": "Information for consumers and trade professionals. Consumers can locate design professionals, research design strategies, and request a free remodeling planning kit."}, {"site_title": "The Natural Handyman", "site_url": "http://www.naturalhandyman.com/", "site_desc": "Home repair help, humor, and encouragement through a collection of articles and a newsletter."}, {"site_title": "Practical DIY", "site_url": "http://www.practicaldiy.com/", "site_desc": "UK-specific information. Small series of Do-It-Yourself home repair articles."}, {"site_title": "Ten Square Metres", "site_url": "http://www.tensquaremetres.com/", "site_desc": "A photographic diary of a DIY project to extend a house by ten square metres."}, {"site_title": "This to That", "site_url": "http://www.thistothat.com/", "site_desc": "Advice about how to glue things to other              things. They are given with humor and good details."}, {"site_title": "Thumb and Hammer", "site_url": "http://www.thumbandhammer.com/", "site_desc": "A do-it-yourselfer's photo reviews of his past projects."}, {"site_title": "You Repair", "site_url": "http://www.yourepair.com/", "site_desc": "Helping people fix things around the house or understanding what a contractor will do to solve your troubles. Managed by Robert Marencin."}]}

不得不說這個網(wǎng)站真的太龐大了,我在編寫這篇文章時光羞,我的小蜘蛛都沒有爬完home目錄下的內(nèi)容纱兑,但是我們組員之前已經(jīng)自己編寫并運行了爬取home目錄的項目(據(jù)他說用了一個半小時左右潜慎,但是其他組的同學有4個小時才爬完他們的目錄)勘纯,結(jié)果顯示是13280多條數(shù)據(jù):

組員的運行成果

補上我的運行結(jié)果:
本次實驗的運行結(jié)果

為什么組員爬了1萬多條而我的只有2378條呢淫奔,這看起來很奇怪堤结,但這是因為我把一個目錄下的所有site作為對象放進了sites{}里面,所以整體的數(shù)量條數(shù)就變少了竞穷。

五瘾带、操作過程中出現(xiàn)的主要問題

1.代碼出現(xiàn)的問題

child_cat['cat_name'] = cat.xpath('a/div/text()').extract().replace("\r\n", "").replace(" ", "")

以上是獲取目錄名稱最初的代碼看政,由于獲取的結(jié)果含有\(zhòng)n,\r以及空格等標簽允蚣,本來打算直接在后面使用replace代替嚷兔,但是運行報錯:

AttributeError: 'list' object has no attribute 'replace' when trying to remove character

后面發(fā)現(xiàn)是因為 xpath方法返回一個列表而不是字符串冒晰,替換只適用于字符串即replace是不能直接作用于列表的翩剪,因此需要迭代項目來完成一些替換工作前弯。(參考:https://stackoverflow.com/questions/36642782/attributeerror-list-object-has-no-attribute-replace-when-trying-to-remove-c
因而后面的代碼便改成了:

 child_cat['cat_name'] = cat.xpath('a/div/text()').extract()
            child_cat['cat_name'] = [item.replace("\r\n", "") for item in child_cat['cat_name']]
            child_cat['cat_name'] = [item.replace(" ", "") for item in child_cat['cat_name']]

不過還存在一點問題恕出,就是沒有替換掉 ' 以及 浙巫,
2.爬取過程中被禁止訪問的問題
由于之前做過豆瓣的爬取,因此知道需要將爬蟲偽裝成瀏覽器訪問——在settings.py中添加:

USER_AGENT = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"

但編寫代碼的過程中總會有很多出錯的地方尝胆,編輯過程中出現(xiàn)了一個死循環(huán)护桦,我還沒有發(fā)現(xiàn)二庵,當我運行時發(fā)現(xiàn)爬蟲一直不斷地在爬取同一個目錄下的site里面的東西催享,沒有跳出來因妙,由于爬取頻率太高攀涵,被這個網(wǎng)站發(fā)現(xiàn)了汁果,于是IP就被禁止了据德,當然訪問時就出現(xiàn)403錯誤棘利。解禁過后善玫,就立馬去解決這個問題茅郎,在settings.py中找到:

#DOWNLOAD_DELAY = 3

將#去掉或渤,爬取時設(shè)置3秒的時間間隔薪鹦,這樣爬取的頻率就不會太高。

六楷兽、總結(jié)

在爬取一個網(wǎng)站之前我認為需要將整個網(wǎng)站的結(jié)構(gòu)給分析透徹芯杀,清楚每一個鏈接會鏈到什么頁面瘪匿,每一個頁面的div塊哪里一樣棋弥,這樣才能便于分析顽染,制定爬取計劃粉寞∵罂眩總之振亮,我覺得這是我這幾周以來做過的最難的爬蟲項目了坊秸,也認識到了自己對python和scrapy其實還有好多好多不懂得地方褒搔,好多方法和包都不會用星瘾。在進行項目的過程中還需要和組員死相、同學多多交流討論算撮,互相學習肮柜。這個網(wǎng)站审洞,真的不知道該怎么去說芒澜,結(jié)構(gòu)化程很高痴晦,但是一層一層的深入下去誊酌,要爬取到數(shù)據(jù)還是需要好好的分析碧浊,制定對爬取的路線箱锐。這是一個練習爬蟲的比較好的網(wǎng)站驹止,但是數(shù)據(jù)量也有點太龐大了幢哨。

參考:
https://blog.csdn.net/u012150179/article/details/34913315
http://www.reibang.com/p/83c73071d3cb
http://www.reibang.com/p/83c73071d3cb
一個學習python基本知識的網(wǎng)站http://www.runoob.com/python/python-tutorial.html
十分重要的友情鏈接(對完成本次實驗幫助很大)http://www.reibang.com/p/d6b6feb0a504
我們小組關(guān)于dmoz/Home目錄下的數(shù)據(jù)的采集另一種采集方法http://www.reibang.com/p/51419fec3915

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市岸售,隨后出現(xiàn)的幾起案子凸丸,更是在濱河造成了極大的恐慌屎慢,老刑警劉巖腻惠,帶你破解...
    沈念sama閱讀 207,248評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件悔雹,死亡現(xiàn)場離奇詭異腌零,居然都是意外死亡益涧,警方通過查閱死者的電腦和手機饰躲,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,681評論 2 381
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來寄狼,“玉大人泊愧,你說我怎么就攤上這事删咱√底蹋” “怎么了敲街?”我有些...
    開封第一講書人閱讀 153,443評論 0 344
  • 文/不壞的土叔 我叫張陵多艇,是天一觀的道長峻黍。 經(jīng)常有香客問我奸披,道長轻局,這世上最難降的妖魔是什么样刷? 我笑而不...
    開封第一講書人閱讀 55,475評論 1 279
  • 正文 為了忘掉前任仑扑,我火速辦了婚禮,結(jié)果婚禮上置鼻,老公的妹妹穿的比我還像新娘镇饮。我一直安慰自己,他們只是感情好箕母,可當我...
    茶點故事閱讀 64,458評論 5 374
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著嘶是,像睡著了一般钙勃。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上聂喇,一...
    開封第一講書人閱讀 49,185評論 1 284
  • 那天辖源,我揣著相機與錄音,去河邊找鬼希太。 笑死克饶,一個胖子當著我的面吹牛,可吹牛的內(nèi)容都是我干的誊辉。 我是一名探鬼主播矾湃,決...
    沈念sama閱讀 38,451評論 3 401
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼堕澄!你這毒婦竟也來了邀跃?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 37,112評論 0 261
  • 序言:老撾萬榮一對情侶失蹤奈偏,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后躯护,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體惊来,經(jīng)...
    沈念sama閱讀 43,609評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,083評論 2 325
  • 正文 我和宋清朗相戀三年棺滞,在試婚紗的時候發(fā)現(xiàn)自己被綠了裁蚁。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片矢渊。...
    茶點故事閱讀 38,163評論 1 334
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖枉证,靈堂內(nèi)的尸體忽然破棺而出矮男,到底是詐尸還是另有隱情,我是刑警寧澤室谚,帶...
    沈念sama閱讀 33,803評論 4 323
  • 正文 年R本政府宣布毡鉴,位于F島的核電站,受9級特大地震影響秒赤,放射性物質(zhì)發(fā)生泄漏猪瞬。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 39,357評論 3 307
  • 文/蒙蒙 一入篮、第九天 我趴在偏房一處隱蔽的房頂上張望陈瘦。 院中可真熱鬧,春花似錦潮售、人聲如沸痊项。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,357評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽鞍泉。三九已至,卻和暖如春盆均,著一層夾襖步出監(jiān)牢的瞬間塞弊,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,590評論 1 261
  • 我被黑心中介騙來泰國打工泪姨, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留游沿,地道東北人。 一個月前我還...
    沈念sama閱讀 45,636評論 2 355
  • 正文 我出身青樓肮砾,卻偏偏與公主長得像诀黍,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子仗处,可洞房花燭夜當晚...
    茶點故事閱讀 42,925評論 2 344

推薦閱讀更多精彩內(nèi)容