Using the API in Visual Studio
Visual Web Ripper contains two main API files. You need to include one or both of these files in your Visual Studio project:
- WebRipper.DLL
- WebRipperBrowser.DLL
If you want to use the API to run a project, you need to include both WebRipperBrowser.DLL and WebRipper.DLL in your Visual Studio project. If you only want to process extracted data, you need to include only WebRipper.DLL.
Include the API files for Visual Web Ripper by adding the files as references. Browse the Visual Web Ripper installation folder in order to find the API files.
You must also copy the following four files to your applications Bin folder.
- SQLite.Interop.dll
- msvcp100.dll
- msvcr100.dll
- AjaxHook.dll
After including the API files in your project, you will have access to the namespaces VisualWebRipper and VisualWebRipper.Processor. If you have included only WebRipper.DLL, you will have access only to the namespace VisualWebRipper. This is how to include a namespace in C#:
using VisualWebRipper;
using VisualWebRipper.Processor;
Platform Target
Visual Web Ripper is a 32-bit application. A 32-bit application can run on a 64-bit operating system, but it must run in 32-bit mode, so you must set the target platform to x86 as shown below.
Loading and Running a Project
The most common task when using the API is loading and running a data extraction project from within your own application.
The following classes are used when running a project:
- WrProject defines an instance of a data extraction project.
- WrAgent can be used to run a project with the WebCrawler agent or the WebBrowser agent.
The following two static methods can be used to load a data extraction project.
WrProject project = WrProject.Load( "C:\projects\sequentum.rip" );
WrProject project = WrProject.LoadByName( "sequentum" );
The following three static methods can be used to run a data extraction project in synchronous mode.
IAgent agent = WrAgent.RunProject( new WrProcessPars(project), true);
IAgent agent = WrAgent.RunProject(project, true);
IAgent agent = WrAgent.RunProject( "C:\projects\sequentum.rip", true);
You can control certain aspects of the process by specifying additional parameters on WrProcessPars.
WrProcessPars(WrProject project, bool isResume, bool isRetryErrors,
bool isViewBrowser, WrProcessorTypeEnum defaultAgentType, int debugLevel)
For example:
WrAgent.RunProject(new WrProcessPars(project, false, false, true,
project.DefaultCollector, project.LogLevel), true);
Status information can be retrieved from the IAgent interface as follows.
string status = agent.Status;
int processedPages = agent.ProcessedPages;
int pageLoadErrors = agent.TimeoutPages;
int missedRequiredElements = agent.MissedRequiredElements;
bool isError = agent.IsError
The following three static methods can be used to run a data extraction project in asynchronous mode.
IAgent agent = WrAgent.RunProject( new WrProcessPars(project), false);
IAgent agent = WrAgent.RunProject(project, false);
IAgent agent = WrAgent.RunProject( "C:\projects\sequentum.rip", false);
If you are running a project asynchronously you can use the IsDone property of the IAgent interface to see whether a project has finished running.
if (agent.IsDone)
{
//The project has finished running
}
Manipulating a Project
You can use the API to manipulate a data extraction project before you run it. You must first load the project to get an instance of theWrProjectclass.
The following two static methods can be used to load a data extraction project.
WrProject project = WrProject.Load( "C:\projects\sequentum.rip" );
WrProject project = WrProject.LoadByName( "sequentum" );
After you have loaded the project, you can set any of its properties and then run the project.
WrProject project = WrProject.LoadByName( "sequentum" );
project.StartUrls.Clear();
project.StartUrls.Add( "http://www.sequentum.com" );
IAgent agent = WrAgent.RunProject(project);
Setting Input Parameters
The best way to manipulate a project is to use input parameters. This allows you to keep all functionality in the project file and use the API to set the parameters.
WrProject project = WrProject.LoadByName( "sequentum" );
project.InputParameters.SetParameter( "server" , "web" );
project.InputParameters.SetParameter( "database" , "test" );
project.InputParameters.SetParameter( "username" , "myUser" );
project.InputParameters.SetParameter( "password" , "myPassword" );
IAgent agent = WrAgent.RunProject(project);
You do not need to use the API in order to supply input parameters to a project. You can also use the command-line tool to run projects and specify input parameters.
- Command-Line Utility
Setting the Output Folder
When exporting data to a file format, such as CSV or XML, the output folder can be set this way.
WrProject project = WrProject.LoadByName( "sequentum" );
project.DataConfiguration.DataSource.OutputFolder = @"c:\output";
project.DataConfiguration.DataSource.IsDefaultOutputFolder = false;
You can export data programatically to the export target configured in a project.
WrProject project = WrProject.LoadByName( "sequentum" );
WrExportData data = project.OpenExportData();
WrExport.Export(project, data)
Working With Export Data
After you have run a data extraction project, you may want to do some custom post-processing on the extracted data. You can configure a custom export script for the project, but if you are using the API to run a project, it may be easier and more appropriate to post-process the extracted data directly in your application using the API to access the extracted data.
The class WrExportData provides access to the exported data. You can get an instance of the WrExportData class by calling the method OpenExportData of the WrProject class.
WrProject project = WrProject.LoadByName( "sequentum" );
WrExportData data = project.OpenExportData();
Example
This example runs a project and then writes the extracted data to a text file.
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using VisualWebRipper;
namespace Export
{
class Program
{
static void Main(string[] args)
{
WrProject project = WrProject.LoadByName("sequentum");
project.ViewBrowserCollector = false;
IAgent agent = WrAgent.RunProject(project, true);
WrExportData data = project.OpenExportedData();
StringBuilder content = new StringBuilder();
WrExportTableReader reader = data.GetTableReader("table_name");
while (reader.Read())
{
content.Append(reader.GetStringValue("productId"));
content.Append(",");
content.Append(reader.GetStringValue("productName"));
content.Append(",");
content.Append(reader.GetStringValue("price"));
content.Append(Environment.NewLine);
}
File.WriteAllText("C:\\div\\output.txt", content.ToString());
reader.Close();
data.Close();
}
}
}
Using the API from ASP.NET
You should never use the API directly from a web application, but instead build a command-line program that uses the API, and then call that program from your ASP.NET web application. When you start the command-line program, you can specify the user context in which the program should run.
A web application is likely to have insufficient privileges to run a website in IE. The required privileges may depend on the target website, so it's nearly impossible to configure a web server with the correct privileges. Notice that a web application is likely to have different privileges when run from within Visual Studio compared to when it is deployed to a web server.
Project Owner Settings
Visual Web Ripper uses your Windows user settings to retrieve information about the default location of your Visual Web Ripper files.
When a project runs from your application, it may run in the context of a user that does not have any Visual Web Ripper settings. A data extraction project contains information about the user who owns the project, and Visual Web Ripper will use that information to locate the default Visual Web Ripper folders.
You can set the project owner in the Project menu in Visual Web Ripper.
If you copy a project from one computer to another, your application may be unable to run the project on the new computer until you set the project owner to a Windows user on the new computer.
If you are using the command-line utility to run a data extraction project, the project owner information will automatically be used to locate the appropriate licensing information and default folders, but if you are using the API in a custom application, you must call the method VisualWebRipperPath.SetServiceDocumentPath as in this example.
WrProject project = WrProject.LoadByName( "Sequentum" );
VisualWebRipperPath.SetServiceDocumentPath(project.Schedule.DocumentPath);
The project owner settings do not specify the Windows security context when running a data extraction project. The project will run in the security context of the user who started your program. You must make sure the user that runs your program has access to all the required resources on your computer. For example, if your data extraction project is using the WebBrowser agent, the user must be able to start an instance of Internet Explorer.
Internet Explorer Emulation Mode
Visual Web Ripper uses an embedded instance of Internet Explorer when running a project using the WebBrowser agent.
The embedded IE instance runs in IE7 emulation mode by default, but Visual Web Ripper is configured to use IE9 emulation mode. If you have developed a project in Visual Web Ripper it may not work correctly in your own application because the website is displayed differently in IE9 and IE7.
The IE emulation mode is set in the registry for each executable, so if you want your application to run in IE9 emulation mode, you need to change your registry setting.
Please read this blog post for more information about changing your registry to specify IE9 emulation mode.
http://www.west-wind.com/weblog/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version
Export Plugins
Export plugins can be used to provide customized export functionality. Export plugins are similar to export scripts, but plugins can provide a user interface allowing a user to configure the export settings at design time.
The following screenshot shows a plugin user interface that allows a user to specify a database connection string at design time. The database connection string can then be used by the plugin at runtime when exporting data.
The plugin export routine is run after the standard data export, so a plugin can be used to operate on exported data files. For example, a plugin export routine could use FTP to transfer an exported CSV file to a remote server, and the plugin user interface could be used to configure the FTP address and login details.
You can set the standard export target to None if you want a plugin export routine to completely replace the standard data export.
Export plugins should be placed in the Visual Web Ripper installation folder in the sub-folder Plugins\Export. Each plugin should be placed in a separate folder and the name of the folder becomes the plugin name displayed to the Visual Web Ripper user. A plugin named FTPExport should be placed in the following sub-folder.
Plugins\Export\FTPExport
Building a Plugin
Visual Web Ripper uses the .NET MEF plugin framework. All plugins must export an implementation of the interface IExportPlugin, which is declared in the WebRipper.dll assembly.
public interface IExportPlugin
{
UserControl LoadUserInterface(WrProject project);
bool SaveUserInterface(WrProject project);
void Export(WrProject project, WrExportData data);
}
The method LoadUserInterface should return a standard .NET UserControl that displays the plugin user interface.
The method SaveUserInterface is called when the Visual Web Ripper user presses the Save button and the plugin should save any data the user has entered. The plugin should validate the entered data and return false if the data is invalid, or true if the data is valid.
The Export method is the plugin's export routine, and is called after the standard data export has completed.
The class below is an example of an exported class that implements the IExportPlugin interface.
[Export(typeof(IExportPlugin))]
public class ExportPlugin : IExportPlugin, IDisposable
{
DatabaseConnection databaseConnectionControl;
public UserControl LoadUserInterface(WrProject project)
{
databaseConnectionControl = new DatabaseConnection(project);
return databaseConnectionControl;
}
public bool SaveUserInterface(WrProject project)
{
return databaseConnectionControl.Save(project);
}
public void Export(WrProject project, WrExportData data)
{
DataExport.Export(project, data);
}
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (disposing)
if (databaseConnectionControl != null)
{
databaseConnectionControl.Dispose();
databaseConnectionControl = null;
}
}
~ExportPlugin()
{
Dispose(false);
}
}
Saving User Data
A plugin can save user data in the project file using the project property PluginParameters. The following example saves a database connection string in the project.
project.PluginParameters["SimpleDatabaseExport_ConnectionString"] = connectionString.Text;
The following example opens a database connection using the stored connection string.
IConnection connection = new WrSqlServerConnection(project,
project.PluginParameters["SimpleDatabaseExport_ConnectionString"]);
Examples
The following two plugin examples have been built using Visual Studio.
Example 1
This example shows how to build a plugin that can FTP an exported CSV file to a remote server. The plugin user interface is used to configure FTP address and login details. To use this plugin, copy the compiled assembly FTPExport.dll to the following sub-folder in the Visual Web Ripper installation folder.
Plugins\Export\FTPExport
Download Visual Studio sample project
Example 2
This example shows how to build a plugin that exports data to SQL Server.
The plugin user interface is used to configure the database connection string.
To use this plugin, copy the compiled assembly SimpleDatabaseExport.dll to the following sub-folder in the Visual Web Ripper installation folder.
Plugins\Export\SimpleDatabaseExport
This plugin is designed to replace the standard data export, so the standard export target should be set to None.
Download Visual Studio sample project
Example 3
This example shows how to build a plugin that can email an exported CSV file. The plugin user interface is used to configure email server and recipient details. To use this plugin, copy the compiled assemblyEmailExport.dllto the following sub-folder in the Visual Web Ripper installation folder.
Plugins\Export\EmailExport