程序接口

Using the API in Visual Studio


Visual Web Ripper contains two main API files. You need to include one or both of these files in your Visual Studio project:

  • WebRipper.DLL
  • WebRipperBrowser.DLL
    If you want to use the API to run a project, you need to include both WebRipperBrowser.DLL and WebRipper.DLL in your Visual Studio project. If you only want to process extracted data, you need to include only WebRipper.DLL.

Include the API files for Visual Web Ripper by adding the files as references. Browse the Visual Web Ripper installation folder in order to find the API files.


image.png

You must also copy the following four files to your applications Bin folder.

  • SQLite.Interop.dll
  • msvcp100.dll
  • msvcr100.dll
  • AjaxHook.dll

After including the API files in your project, you will have access to the namespaces VisualWebRipper and VisualWebRipper.Processor. If you have included only WebRipper.DLL, you will have access only to the namespace VisualWebRipper. This is how to include a namespace in C#:

using  VisualWebRipper;   
using  VisualWebRipper.Processor;  

Platform Target

Visual Web Ripper is a 32-bit application. A 32-bit application can run on a 64-bit operating system, but it must run in 32-bit mode, so you must set the target platform to x86 as shown below.

image.png

Loading and Running a Project


The most common task when using the API is loading and running a data extraction project from within your own application.

The following classes are used when running a project:

  • WrProject defines an instance of a data extraction project.
  • WrAgent can be used to run a project with the WebCrawler agent or the WebBrowser agent.

The following two static methods can be used to load a data extraction project.

WrProject project = WrProject.Load( "C:\projects\sequentum.rip" );                       
WrProject project = WrProject.LoadByName( "sequentum" );

The following three static methods can be used to run a data extraction project in synchronous mode.

IAgent agent = WrAgent.RunProject( new  WrProcessPars(project), true);     
IAgent agent = WrAgent.RunProject(project, true);    
IAgent agent = WrAgent.RunProject( "C:\projects\sequentum.rip", true);

You can control certain aspects of the process by specifying additional parameters on WrProcessPars.

WrProcessPars(WrProject project, bool isResume, bool isRetryErrors, 
    bool isViewBrowser, WrProcessorTypeEnum defaultAgentType, int debugLevel)

For example:

WrAgent.RunProject(new WrProcessPars(project, false, false, true, 
    project.DefaultCollector, project.LogLevel), true);

Status information can be retrieved from the IAgent interface as follows.

string status = agent.Status;     
int  processedPages = agent.ProcessedPages;     
int  pageLoadErrors = agent.TimeoutPages;     
int  missedRequiredElements = agent.MissedRequiredElements;     
bool  isError = agent.IsError

The following three static methods can be used to run a data extraction project in asynchronous mode.

IAgent agent = WrAgent.RunProject( new  WrProcessPars(project), false);     
IAgent agent = WrAgent.RunProject(project, false);     
IAgent agent = WrAgent.RunProject( "C:\projects\sequentum.rip", false);

If you are running a project asynchronously you can use the IsDone property of the IAgent interface to see whether a project has finished running.

if (agent.IsDone)   
{   
     //The project has finished running   
}

Manipulating a Project


You can use the API to manipulate a data extraction project before you run it. You must first load the project to get an instance of theWrProjectclass.

The following two static methods can be used to load a data extraction project.

WrProject project = WrProject.Load( "C:\projects\sequentum.rip" );                       
WrProject project = WrProject.LoadByName( "sequentum" );

After you have loaded the project, you can set any of its properties and then run the project.

WrProject project = WrProject.LoadByName( "sequentum" );   
project.StartUrls.Clear();   
project.StartUrls.Add( "http://www.sequentum.com" );   
IAgent agent = WrAgent.RunProject(project);

Setting Input Parameters

The best way to manipulate a project is to use input parameters. This allows you to keep all functionality in the project file and use the API to set the parameters.

WrProject project = WrProject.LoadByName( "sequentum" );   
project.InputParameters.SetParameter( "server" ,  "web" );   
project.InputParameters.SetParameter( "database" ,  "test" );   
project.InputParameters.SetParameter( "username" ,  "myUser" );   
project.InputParameters.SetParameter( "password" ,  "myPassword" );   
IAgent agent = WrAgent.RunProject(project);

You do not need to use the API in order to supply input parameters to a project. You can also use the command-line tool to run projects and specify input parameters.

  • Command-Line Utility

Setting the Output Folder

When exporting data to a file format, such as CSV or XML, the output folder can be set this way.

WrProject project = WrProject.LoadByName( "sequentum" );   
project.DataConfiguration.DataSource.OutputFolder = @"c:\output";
project.DataConfiguration.DataSource.IsDefaultOutputFolder = false;

You can export data programatically to the export target configured in a project.

WrProject project = WrProject.LoadByName( "sequentum" );              
WrExportData data = project.OpenExportData();
WrExport.Export(project, data)

Working With Export Data


After you have run a data extraction project, you may want to do some custom post-processing on the extracted data. You can configure a custom export script for the project, but if you are using the API to run a project, it may be easier and more appropriate to post-process the extracted data directly in your application using the API to access the extracted data.

The class WrExportData provides access to the exported data. You can get an instance of the WrExportData class by calling the method OpenExportData of the WrProject class.

WrProject project = WrProject.LoadByName( "sequentum" );              
WrExportData data = project.OpenExportData();
Example

This example runs a project and then writes the extracted data to a text file.

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using VisualWebRipper;

namespace Export
{
    class Program
    {
        static void Main(string[] args)
        {
            WrProject project = WrProject.LoadByName("sequentum");
            project.ViewBrowserCollector = false;
            IAgent agent = WrAgent.RunProject(project, true);
            WrExportData data = project.OpenExportedData();

            StringBuilder content = new StringBuilder();
            WrExportTableReader reader = data.GetTableReader("table_name");
            while (reader.Read())
            {
                content.Append(reader.GetStringValue("productId"));
                content.Append(",");
                content.Append(reader.GetStringValue("productName"));
                content.Append(",");
                content.Append(reader.GetStringValue("price"));
                content.Append(Environment.NewLine);
            }
            File.WriteAllText("C:\\div\\output.txt", content.ToString());
            reader.Close();
            data.Close();
        }
    }
}

Using the API from ASP.NET


You should never use the API directly from a web application, but instead build a command-line program that uses the API, and then call that program from your ASP.NET web application. When you start the command-line program, you can specify the user context in which the program should run.

A web application is likely to have insufficient privileges to run a website in IE. The required privileges may depend on the target website, so it's nearly impossible to configure a web server with the correct privileges. Notice that a web application is likely to have different privileges when run from within Visual Studio compared to when it is deployed to a web server.

Project Owner Settings


Visual Web Ripper uses your Windows user settings to retrieve information about the default location of your Visual Web Ripper files.

When a project runs from your application, it may run in the context of a user that does not have any Visual Web Ripper settings. A data extraction project contains information about the user who owns the project, and Visual Web Ripper will use that information to locate the default Visual Web Ripper folders.

You can set the project owner in the Project menu in Visual Web Ripper.


image.png

If you copy a project from one computer to another, your application may be unable to run the project on the new computer until you set the project owner to a Windows user on the new computer.

If you are using the command-line utility to run a data extraction project, the project owner information will automatically be used to locate the appropriate licensing information and default folders, but if you are using the API in a custom application, you must call the method VisualWebRipperPath.SetServiceDocumentPath as in this example.

WrProject project = WrProject.LoadByName( "Sequentum" );   
  
VisualWebRipperPath.SetServiceDocumentPath(project.Schedule.DocumentPath);

The project owner settings do not specify the Windows security context when running a data extraction project. The project will run in the security context of the user who started your program. You must make sure the user that runs your program has access to all the required resources on your computer. For example, if your data extraction project is using the WebBrowser agent, the user must be able to start an instance of Internet Explorer.

Internet Explorer Emulation Mode


Visual Web Ripper uses an embedded instance of Internet Explorer when running a project using the WebBrowser agent.

The embedded IE instance runs in IE7 emulation mode by default, but Visual Web Ripper is configured to use IE9 emulation mode. If you have developed a project in Visual Web Ripper it may not work correctly in your own application because the website is displayed differently in IE9 and IE7.

The IE emulation mode is set in the registry for each executable, so if you want your application to run in IE9 emulation mode, you need to change your registry setting.

Please read this blog post for more information about changing your registry to specify IE9 emulation mode.

http://www.west-wind.com/weblog/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version

Export Plugins


Export plugins can be used to provide customized export functionality. Export plugins are similar to export scripts, but plugins can provide a user interface allowing a user to configure the export settings at design time.

The following screenshot shows a plugin user interface that allows a user to specify a database connection string at design time. The database connection string can then be used by the plugin at runtime when exporting data.

image.png

The plugin export routine is run after the standard data export, so a plugin can be used to operate on exported data files. For example, a plugin export routine could use FTP to transfer an exported CSV file to a remote server, and the plugin user interface could be used to configure the FTP address and login details.

You can set the standard export target to None if you want a plugin export routine to completely replace the standard data export.

Export plugins should be placed in the Visual Web Ripper installation folder in the sub-folder Plugins\Export. Each plugin should be placed in a separate folder and the name of the folder becomes the plugin name displayed to the Visual Web Ripper user. A plugin named FTPExport should be placed in the following sub-folder.

Plugins\Export\FTPExport

Building a Plugin

Visual Web Ripper uses the .NET MEF plugin framework. All plugins must export an implementation of the interface IExportPlugin, which is declared in the WebRipper.dll assembly.

public interface IExportPlugin
{
    UserControl LoadUserInterface(WrProject project);
    bool SaveUserInterface(WrProject project);
    void Export(WrProject project, WrExportData data);
}

The method LoadUserInterface should return a standard .NET UserControl that displays the plugin user interface.

The method SaveUserInterface is called when the Visual Web Ripper user presses the Save button and the plugin should save any data the user has entered. The plugin should validate the entered data and return false if the data is invalid, or true if the data is valid.

The Export method is the plugin's export routine, and is called after the standard data export has completed.

The class below is an example of an exported class that implements the IExportPlugin interface.

[Export(typeof(IExportPlugin))]
public class ExportPlugin : IExportPlugin, IDisposable
{
    DatabaseConnection databaseConnectionControl;

    public UserControl LoadUserInterface(WrProject project)
    {
        databaseConnectionControl = new DatabaseConnection(project);
        return databaseConnectionControl;
    }
    
    public bool SaveUserInterface(WrProject project)
    {
        return databaseConnectionControl.Save(project);
    }
    
    public void Export(WrProject project, WrExportData data)
    {
        DataExport.Export(project, data);          
    }

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
    
    protected virtual void Dispose(bool disposing)
    {
        if (disposing)
            if (databaseConnectionControl != null)
            {
                databaseConnectionControl.Dispose();
                databaseConnectionControl = null;
            }
    }
    ~ExportPlugin()
    {
        Dispose(false);
    }
}

Saving User Data

A plugin can save user data in the project file using the project property PluginParameters. The following example saves a database connection string in the project.

project.PluginParameters["SimpleDatabaseExport_ConnectionString"] = connectionString.Text;

The following example opens a database connection using the stored connection string.

IConnection connection = new WrSqlServerConnection(project,
     project.PluginParameters["SimpleDatabaseExport_ConnectionString"]);

Examples

The following two plugin examples have been built using Visual Studio.

Example 1

This example shows how to build a plugin that can FTP an exported CSV file to a remote server. The plugin user interface is used to configure FTP address and login details. To use this plugin, copy the compiled assembly FTPExport.dll to the following sub-folder in the Visual Web Ripper installation folder.

Plugins\Export\FTPExport

Download Visual Studio sample project

Example 2

This example shows how to build a plugin that exports data to SQL Server.
The plugin user interface is used to configure the database connection string.
To use this plugin, copy the compiled assembly SimpleDatabaseExport.dll to the following sub-folder in the Visual Web Ripper installation folder.

Plugins\Export\SimpleDatabaseExport

This plugin is designed to replace the standard data export, so the standard export target should be set to None.

Download Visual Studio sample project

Example 3

This example shows how to build a plugin that can email an exported CSV file. The plugin user interface is used to configure email server and recipient details. To use this plugin, copy the compiled assemblyEmailExport.dllto the following sub-folder in the Visual Web Ripper installation folder.

Plugins\Export\EmailExport

Download Visual Studio sample project

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市各墨,隨后出現(xiàn)的幾起案子伴箩,更是在濱河造成了極大的恐慌,老刑警劉巖涉兽,帶你破解...
    沈念sama閱讀 206,311評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異篙程,居然都是意外死亡枷畏,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,339評論 2 382
  • 文/潘曉璐 我一進店門虱饿,熙熙樓的掌柜王于貴愁眉苦臉地迎上來拥诡,“玉大人触趴,你說我怎么就攤上這事】嗜猓” “怎么了冗懦?”我有些...
    開封第一講書人閱讀 152,671評論 0 342
  • 文/不壞的土叔 我叫張陵,是天一觀的道長仇祭。 經(jīng)常有香客問我披蕉,道長,這世上最難降的妖魔是什么乌奇? 我笑而不...
    開封第一講書人閱讀 55,252評論 1 279
  • 正文 為了忘掉前任没讲,我火速辦了婚禮,結果婚禮上华弓,老公的妹妹穿的比我還像新娘食零。我一直安慰自己,他們只是感情好寂屏,可當我...
    茶點故事閱讀 64,253評論 5 371
  • 文/花漫 我一把揭開白布贰谣。 她就那樣靜靜地躺著,像睡著了一般迁霎。 火紅的嫁衣襯著肌膚如雪吱抚。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,031評論 1 285
  • 那天考廉,我揣著相機與錄音秘豹,去河邊找鬼。 笑死昌粤,一個胖子當著我的面吹牛既绕,可吹牛的內容都是我干的。 我是一名探鬼主播涮坐,決...
    沈念sama閱讀 38,340評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼凄贩,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了袱讹?” 一聲冷哼從身側響起疲扎,我...
    開封第一講書人閱讀 36,973評論 0 259
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎捷雕,沒想到半個月后椒丧,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,466評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡救巷,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 35,937評論 2 323
  • 正文 我和宋清朗相戀三年壶熏,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片浦译。...
    茶點故事閱讀 38,039評論 1 333
  • 序言:一個原本活蹦亂跳的男人離奇死亡久橙,死狀恐怖俄占,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情淆衷,我是刑警寧澤,帶...
    沈念sama閱讀 33,701評論 4 323
  • 正文 年R本政府宣布渤弛,位于F島的核電站祝拯,受9級特大地震影響,放射性物質發(fā)生泄漏她肯。R本人自食惡果不足惜佳头,卻給世界環(huán)境...
    茶點故事閱讀 39,254評論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望晴氨。 院中可真熱鬧康嘉,春花似錦、人聲如沸籽前。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,259評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽枝哄。三九已至肄梨,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間挠锥,已是汗流浹背众羡。 一陣腳步聲響...
    開封第一講書人閱讀 31,485評論 1 262
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留蓖租,地道東北人粱侣。 一個月前我還...
    沈念sama閱讀 45,497評論 2 354
  • 正文 我出身青樓,卻偏偏與公主長得像蓖宦,于是被迫代替她去往敵國和親齐婴。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 42,786評論 2 345

推薦閱讀更多精彩內容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 9,389評論 0 23
  • 模擬LinckedList實現(xiàn)增刪改查 ps:未考慮并發(fā)情況 鏈表結構優(yōu)點球昨,刪除尔店,插入數(shù)據(jù)速度快,占用內存小主慰。
    syimo閱讀 275評論 0 0
  • 唯物主義:物質決定意識共螺。意識指導物質该肴。迷茫是意識的一種。 短暫人生道路藐不,前20年匀哄,靠父母蔭蔽秦效。以后的日子,無論成功...
    走在雨的縫中閱讀 337評論 4 5
  • 小時候愛寫日記涎嚼,因為太年幼無知阱州,所以什么都往日記里寫,今天認為媽媽做的不對法梯,就寫下了幾百字的抱怨苔货。明天看到秋天家家...
    二二的天空閱讀 190評論 0 0
  • 大家好夜惭,我是十八雯,我相信打開這個的寶寶們都是勤奮铛绰,努力诈茧,有責任,有擔當捂掰,骨子里有種不服輸?shù)娜烁一幔M麑殞殏兌寄軋猿?..
    劉阿雯閱讀 325評論 0 1