通過觀察級(jí)互動(dòng)操縱散點(diǎn)圖軸
Abstract—Scatterplots are effective visualization techniques for multidimensional data that use two (or three) axes to visualize data
items as a point at its corresponding x and y Cartesian coordinates. Typically, each axis is bound to a single data attribute. Interactive exploration occurs by changing the data attributes bound to each of these axes. In the case of using scatterplots to visualize
the outputs of dimension reduction techniques, the x and y axes are combinations of the true, high-dimensional data. For these
spatializations, the axes present usability challenges in terms of interpretability and interactivity. That is, understanding the axes
and interacting with them to make adjustments can be challenging. In this paper, we present InterAxis, a visual analytics technique
to properly interpret, define, and change an axis in a user-driven manner. Users are given the ability to define and modify axes by
dragging data items to either side of the x or y axes, from which the system computes a linear combination of data attributes and binds
it to the axis. Further, users can directly tune the positive and negative contribution to these complex axes by using the visualization
of data attributes that correspond to each axis. We describe the details of our technique and demonstrate the intended usage through
two scenarios.
**
摘要-散點(diǎn)圖是使用兩個(gè)(或三個(gè))軸來可視化數(shù)據(jù)的多維數(shù)據(jù)的有效可視化技術(shù)物品作為其對應(yīng)的x和y笛卡爾坐標(biāo)的點(diǎn)恨闪。通常,每個(gè)軸都綁定到單個(gè)數(shù)據(jù)屬性。通過更改綁定到這些軸中的每一個(gè)的數(shù)據(jù)屬性來進(jìn)行交互式探索收厨。在使用散點(diǎn)圖進(jìn)行可視化的情況下
尺寸縮小技術(shù)的輸出坟漱,x和y軸是真實(shí)的,高維數(shù)據(jù)的組合。對于這些
空間化圃伶,軸在可解釋性和交互性方面呈現(xiàn)可用性挑戰(zhàn)泻拦。那就是理解軸
并與他們進(jìn)行互動(dòng)以進(jìn)行調(diào)整是具有挑戰(zhàn)性的毙芜。在本文中,我們介紹了視覺分析技術(shù)InterAxis
以用戶驅(qū)動(dòng)的方式正確解釋争拐,定義和更改軸腋粥。用戶可以通過定義和修改軸
將數(shù)據(jù)項(xiàng)拖動(dòng)到x軸或y軸的任一側(cè),系統(tǒng)將從中計(jì)算數(shù)據(jù)屬性和綁定的線性組合它到軸架曹。此外隘冲,用戶可以通過使用可視化來直接調(diào)整對這些復(fù)雜軸的正負(fù)貢獻(xiàn)的數(shù)據(jù)屬性對應(yīng)于每個(gè)軸。我們描述我們的技術(shù)的細(xì)節(jié)绑雄,并展示預(yù)期的用途兩種情況展辞。**
Index Terms—Scatterplots, user interaction, model steering
索引術(shù)語——散點(diǎn)圖,用戶交互万牺,模型操縱
Scatterplots are commonly utilized in visualizing relationships between two individual data attributes . The use of two orthogonal
axes mapped to data attributes produces a Cartesian space where data
objects can be charted. A basic strategy to form these axes in multidimensional data visualization is to assign each axis an individual
feature or dimension originally given in a dataset. For example, plotting temperature over time on the y and x axes, respectively, generates a chart that can be used for understanding the relationship between
these two data attributes. However, this has a severe scalability issue because two-dimensional (2D) scatterplots can represent only two
features out of many at any given point of time.
散點(diǎn)圖通常用于可視化兩個(gè)單獨(dú)數(shù)據(jù)屬性之間的關(guān)系罗珍。 使用兩個(gè)正交
映射到數(shù)據(jù)屬性的軸產(chǎn)生數(shù)據(jù)的笛卡爾空間對象可以被繪制。 在多維數(shù)據(jù)可視化中形成這些軸的基本策略是將每個(gè)軸分配給個(gè)體特征或尺寸原始在數(shù)據(jù)集中給出脚粟。 例如覆旱,分別在y軸和x軸上繪制溫度隨時(shí)間的變化,生成可以用于理解關(guān)系的圖表這兩個(gè)數(shù)據(jù)屬性核无。 然而扣唱,這具有嚴(yán)重的可擴(kuò)展性問題,因?yàn)槎S(2D)散點(diǎn)圖可以僅表示兩個(gè)在任何給定時(shí)間點(diǎn)的許多功能。
Instead, an alternative strategy that better handles this scalability issue is dimension reduction, which involves multiple original features
to represent each axis. Dimension reduction [21] is a popular technique used to transform high-dimensional data into lower-dimensional
views (typically, 2D scatterplots). While a variety of approaches exist,
their fundamental functionality is similar: to solve for distances between data points in a lower-dimensional space that closely represents
the true distances between the points in a high-dimensional space. This
is carried out by variations in solving for distance metrics from the
data.
相反噪沙,更好地處理這種可擴(kuò)展性問題的替代策略是維度降低炼彪,其涉及多個(gè)原始特征以表示每個(gè)軸。 尺寸減小[21]是用于將高維數(shù)據(jù)轉(zhuǎn)換為低維的流行技術(shù)視圖(通常為2D散點(diǎn)圖)曲聂。 雖然存在各種方法霹购,它們的基本功能類似于:解決緊密代表的低維空間中的數(shù)據(jù)點(diǎn)之間的距離高維空間點(diǎn)之間的真實(shí)距離。 這個(gè)是通過解決距離度量的變化進(jìn)行的數(shù)據(jù)朋腋。
In the visual and perceptual understanding of a scatterplot, the interpretation of its axes plays a crucial role. That is, understanding what
it means to have large/small values along the x or y axis significantly
helps the users’ reasoning process about why the relationships among
data items are close/remote in a scatterplot. In the case of traditional
scatterplots where each axis is directly mapped to a particular data
attribute (without any dimension reduction), this process is straightforward. However, this is not often the case when it comes to the axis
of a 2D scatterplot generated by dimension reduction. One of the primary reasons is that only a limited set of dimension reduction methods
provide the interpretability of the axes of a scatterplot. Such methods include traditional methods such as principal component analysis
(PCA) [27] and linear discriminant analysis [23], which form an axis
(or a reduced dimension) explicitly as a linear combination of the original data attributes. Through this linear combination representation of
the original attributes, one can interpret the contribution of each original attribute to the axis. On the other hand, many other dimension
reduction methods form each axis implicitly in terms of the original
attributes, and thus they do not provide users with its clear meaning.
Most advanced non-linear dimension reduction methods such as manifold learning [33] correspond to this case. Even worse, in some other
popular methods such as multidimensional scaling (MDS) [31] and
force-directed graph layout [22], these are rotation invariant, which
means that the axis is not defined at all. Thus, communicating with
users about the meaning of the axes resulting from dimension reduction techniques is an open challenge.
在對散點(diǎn)圖的視覺和感知理解中齐疙,對其軸的解釋起著至關(guān)重要的作用。那就是理解什么這意味著顯著地沿x或y軸具有大/小的值幫助用戶推理過程中關(guān)于為什么之間的關(guān)系數(shù)據(jù)項(xiàng)在散點(diǎn)圖中是近/遠(yuǎn)的旭咽。在傳統(tǒng)的情況下
每個(gè)軸直接映射到特定數(shù)據(jù)的散點(diǎn)圖屬性(沒有任何維度減少)贞奋,這個(gè)過程很簡單。然而穷绵,這在軸上并不常見由尺寸減小生成的2D散點(diǎn)圖轿塔。其中一個(gè)主要原因是只有一些有限的尺寸縮小方法提供散點(diǎn)圖的軸的可解釋性。這些方法包括諸如主成分分析的傳統(tǒng)方法(PCA)[27]和線性判別分析[23]仲墨,形成軸(或縮小的維度)顯式地作為原始數(shù)據(jù)屬性的線性組合勾缭。通過這種線性組合表示原始屬性,可以解釋每個(gè)原始屬性對軸的貢獻(xiàn)目养。另一方面俩由,許多其他方面縮減方法以原始方式隱含地形成每個(gè)軸屬性,因此它們不為用戶提供其明確的含義癌蚁。最先進(jìn)的非線性尺寸縮小方法幻梯,如歧管學(xué)習(xí)[33]對應(yīng)于這種情況。更糟糕的是努释,在其他一些流行的方法碘梢,如多維縮放(MDS)[31]和力導(dǎo)向圖布局[22],這些是旋轉(zhuǎn)不變量伐蒂,其中意味著軸根本沒有定義煞躬。因此,溝通用戶關(guān)于由維度縮減技術(shù)產(chǎn)生的軸的含義是一個(gè)開放的挑戰(zhàn)逸邦。
Another issue with the scatterplot generated by dimension reduction lies in the lack of interactivity. Forming the axes via dimension
reduction does not typically allow human intervention. In other words,
most of the dimension reduction methods are performed in a fully automated manner on the basis of their own pre-defined mathematical
criteria, and thus, diverse user needs and task goals are not considered
in this process. For instance, the PCA criterion, which maximally preserves the total variance of data, may not align well with the goal of
a user’s task. While MDS attempts to preserve all pairwise distances
with equal weights, one may want to focus on a subset of data points,
e.g., a local region in a scatterplot, at a time.
Motivated by these challenges, we propose a novel interactive
knowledge specification method for multidimensional data visualization, which is an alternative to the purely automatic process of generating a scatterplot via dimension reduction. The proposed method interactively forms an axis, thereby generating a corresponding scatterplot
in a user-driven manner. The key novelty of the proposed method lies
in the direct and seamless incorporation of user-selected data items for
characterizing the axis during the data exploration process. Our technique enables users to create and modify the axes by dragging data
objects to the high and low locations on both the x and y axes. The
proposed method defines the meaning of an axis accordingly in the
form of a linear combination of original data features, similar to the
output of linear dimension reduction methods. Such a user-driven linear combination of data attributes is visualized on each axis, showing
the positive or negative contribution of each attribute to the axis. Finally, users can continually refine the axes by dragging additional data
points to the axes, or by directly adjusting the contribution of the data
attributes as part of the linear combination.
由維度降低產(chǎn)生的散點(diǎn)圖的另一個(gè)問題在于缺乏交互性恩沛。通過維度降低形成軸通常不允許人為干預(yù)。換一種說法昭雌,
大多數(shù)維度降低方法是以完全自動(dòng)化的方式根據(jù)它們自己的預(yù)定義數(shù)學(xué)來執(zhí)行的標(biāo)準(zhǔn),因此健田,在這個(gè)過程中不考慮不同的用戶需求和任務(wù)目標(biāo)烛卧。例如,最大限度地保留數(shù)據(jù)總方差的PCA標(biāo)準(zhǔn)可能與目標(biāo)無關(guān)用戶的任務(wù)。雖然MDS嘗試保留所有成對的距離具有相等的權(quán)重总放,可能想要集中在數(shù)據(jù)點(diǎn)的一個(gè)子集上呈宇,例如,散點(diǎn)圖中的局部區(qū)域局雄。
受到這些挑戰(zhàn)的驅(qū)動(dòng)甥啄,我們提出了一個(gè)新穎的互動(dòng)
用于多維數(shù)據(jù)可視化的知識(shí)規(guī)范方法,其是通過維度降低生成散點(diǎn)圖的純自動(dòng)過程的替代方法炬搭。所提出的方法交互地形成軸蜈漓,由此產(chǎn)生相應(yīng)的散點(diǎn)圖。以用戶驅(qū)動(dòng)的方式宫盔。提出的方法的關(guān)鍵新穎之處在于
在用戶選擇的數(shù)據(jù)項(xiàng)目中直接和無縫地結(jié)合
在數(shù)據(jù)勘探過程中表征軸融虽。我們的技術(shù)使用戶能夠通過拖動(dòng)數(shù)據(jù)來創(chuàng)建和修改軸對象到x和y軸上的高低位置。該提出的方法定義了相應(yīng)的軸的含義形式的原始數(shù)據(jù)特征的線性組合灼芭,類似于
線性維度降低方法的輸出有额。這樣的用戶驅(qū)動(dòng)的數(shù)據(jù)屬性的線性組合在每個(gè)軸上可視化,顯示每個(gè)屬性對軸的正或負(fù)貢獻(xiàn)彼绷。最后巍佑,用戶可以通過拖動(dòng)附加數(shù)據(jù)來持續(xù)細(xì)化軸指向軸,或通過直接調(diào)整數(shù)據(jù)的貢獻(xiàn)屬性作為線性組合的一部分寄悯。
The primary contributions of this work include the following:
? a visual analytics technique for directly creating, modifying, and
visualizing complicated axes formed by a linear combination of
data attributes
? a user interaction technique enabling seamless interactivity via
both data objects and data attributes to steer the meaning of the
axes
? a visual analytics technique to help users discover and weigh data
attributes
這項(xiàng)工作的主要貢獻(xiàn)包括:
?通過數(shù)據(jù)屬性線性組合直觀的分析技術(shù)萤衰,直接創(chuàng)建,修改和
可視化形成復(fù)雜軸
?通過用戶交互技術(shù)實(shí)現(xiàn)無縫交互
這兩個(gè)數(shù)據(jù)對象和數(shù)據(jù)屬性來指導(dǎo)它軸的意義
?視覺分析技術(shù)热某,幫助用戶發(fā)現(xiàn)和權(quán)衡數(shù)據(jù)屬性
The rest of this paper is organized as follows: Section 2 discusses related work. Section 3 describes our proof-of-concept visual analytics
system along with how the proposed interaction techniques are performed from the perspectives of both the front end and the back end,
followed by a discussion about our design rationale. Section 4 presents
several usage scenarios showcasing the advantages of the proposed interaction techniques. Section 5 presents in-depth discussions about the
limitations of our interaction techniques as well as potential directions
for improving them. Finally, Section 6 concludes the paper with some
future work.
本文的其余部分組織如下:第二部分討論相關(guān)工作腻菇。 第3節(jié)(怎么實(shí)現(xiàn))描述了我們的概念驗(yàn)證視覺分析系統(tǒng)以及從前端和后端的角度如何執(zhí)行所提出的交互技術(shù),其次是關(guān)于我們的設(shè)計(jì)理念的討論昔馋。 第4節(jié)(使用場景)介紹幾種使用場景展示了所提出的交互技術(shù)的優(yōu)點(diǎn)筹吐。 第5節(jié)對此進(jìn)行了深入的討論我們的互動(dòng)技術(shù)的局限性以及潛在的方向改善他們。 最后秘遏,第6節(jié)總結(jié)了一些文章未來的工作丘薛。
2.1 Multiattribute Data Visualization
Fig. 2. A scatterplot generated by Tableau [41]. Users can interactively explore data by selecting and changing the bindings between
data attributes and axes.
圖2,Tableau [41]生成的散點(diǎn)圖邦危。 用戶可以通過選擇和更改兩者之間的綁定來交互地探索數(shù)據(jù)數(shù)據(jù)屬性和軸洋侨。
The design space for visualization techniques for representing multiattribute data is large [28]. For example, the existing techniques include iconic displays [6], transforming displays based on geometric
characteristics [13], and stacked visual representations [32]. Among
these many techniques, one commonly used technique is the scatterplot [12, 20, 45], owing to the visual simplicity and cultural familiarity
of such charts [43]. Scatterplots (such as the one shown in Fig. 2) represent data on a Cartesian plane defined by the two graphical axes (the
x and the y axes). Three-dimensional scatterplots are also an available
option, but their use in information visualization is limited given the
perceptual and visual challenges [38, 47]. Systems that enable users to
generate scatterplots include Tableau [41], GGobi [40], Matlab [34],
Spotfire [1], and Microsoft Excel [19]. One basic user interaction supported by scatterplots is to select and change the mapping of the axes
to data attributes (Fig. 2).
Other kinds of high-dimensional data have also been visualized in
the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
facial images [8], and text documents [7].
用于表示多屬性數(shù)據(jù)的可視化技術(shù)的設(shè)計(jì)空間很大[28]。例如倦蚪,現(xiàn)有技術(shù)包括圖標(biāo)顯示[6]希坚,基于幾何變換顯示特征[13]和疊加的視覺表示[32]。其中
這些許多技術(shù)陵且,一種常用的技術(shù)是散點(diǎn)圖[12,20,45]裁僧,由于視覺簡潔和文化熟悉度的這樣的圖表[43]。散點(diǎn)圖(如圖2所示)表示由兩個(gè)圖形軸定義的笛卡爾坐標(biāo)平面上的數(shù)據(jù)(x和y軸)。三維散點(diǎn)圖也是可用的選項(xiàng)聊疲,但它們在信息可視化中的使用受到感知和視覺挑戰(zhàn)的限制[38,47]茬底。允許用戶使用生成散點(diǎn)圖的系統(tǒng)包括Tableau [41],GGobi [40]获洲,Matlab [34]阱表,
Spotfire [1]和Microsoft Excel [19]。通過散點(diǎn)圖支持的一個(gè)基本用戶交互是選擇和更改軸的映射到數(shù)據(jù)屬性(圖2)贡珊。
Fig. 3. A scatterplot matrix (adapted from [15]) showing all individual
pairwise feature scatterplots of an 8-dimensional dataset
圖3.散點(diǎn)圖矩陣(從[15]改編))顯示所有個(gè)體8維數(shù)據(jù)集的成對特征散點(diǎn)圖
Fig. 4. A Galaxy View generated by IN-SPIRE [48] showing a scatterplot of documents (dots)
圖4. IN-SPIRE [48]生成的Galaxy View最爬,顯示文件散點(diǎn)圖(點(diǎn))
As dataset complexities increase, often, the number of data attributes to select from increases as well. This causes situations where
directly selecting one out of hundreds or thousands of data attributes
can be less than optimal. As such, different types of techniques exist
to show more combinations of data attributes simultaneously. For example, multiple scatterplots can be arranged into a single view called a scatterplot matrix [12]. A scatterplot matrix (such as the example
shown in Fig. 3, adapted from [15]) binds data attributes to rows and
columns so that each cell in the matrix can represent a single scatterplot. As such, users do not have to individually bind data attributes to
the axes and interactively choose among the potentially large number
of choices
隨著數(shù)據(jù)集復(fù)雜性的增加,通常選擇的數(shù)據(jù)屬性數(shù)量也會(huì)增加飞崖。 這導(dǎo)致了情況烂叔,直接從數(shù)以百計(jì)的數(shù)據(jù)屬性中直接選擇一個(gè)不是最佳的。 因此固歪,存在不同類型的技術(shù)同時(shí)顯示更多的數(shù)據(jù)屬性組合蒜鸡。 例如,可以將多個(gè)散點(diǎn)圖排列成稱為散點(diǎn)圖矩陣的單個(gè)視圖[12]牢裳。 散點(diǎn)圖矩陣(如示例
如圖3逢防,改編自[15])將數(shù)據(jù)屬性綁定到行和列,使得矩陣中的每個(gè)單元格可以表示單個(gè)散點(diǎn)圖蒲讯。 因此忘朝,用戶不必單獨(dú)綁定數(shù)據(jù)屬性
軸和不在需要大量的選擇
2.2 Applications of Dimension Reduction in Information Visualization
在信息可視化中,降維的應(yīng)用
When using dimension reduction for visualization purposes, the goal
is to provide a low-dimensional view, typically a 2D scatterplot, in
a manner that the original high-dimensional distances between data
points are maximally preserved in the resulting 2D views. These
views often show spatial clusters or groups of data representing coherent contents. The widely used dimension reduction methods used
for visualization include PCA [27], MDS [31], self-organizing map
(SOM) [29], and generative topographic mapping (GTM) [3]. Recently, t-distributed stochastic neighbor embedding [46] has been proposed as a dimension reduction method, which is particularly suitable for generating 2D scatterplots that can reveal meaningful insights
about data such as clusters and outliers
當(dāng)為了可視化目的使用降維時(shí)判帮,目標(biāo)
是提供一個(gè)低維度的視圖局嘁,通常是2D散點(diǎn)圖,初始化高維數(shù)據(jù)點(diǎn)之間的距離需要最大程度的表現(xiàn)在2維視圖中晦墙。 這些視圖通常顯示表示相干內(nèi)容的空間群集或數(shù)據(jù)組悦昵。 使用廣泛使用的降維方法
可視化包括PCA [27],MDS [31]晌畅,自組織圖
(SOM)[29]和生成地形圖(GTM)[3]但指。 最近,t分布隨機(jī)相鄰嵌入[46]已經(jīng)被提出作為一種維數(shù)減小方法抗楔,特別適用于生成可以揭示有意義的見解的二維散點(diǎn)圖關(guān)于諸如集群和異常值之類的數(shù)據(jù)
To date, these methods have been actively adopted in visual analytics systems. For example, IN-SPIRE [48], a well-known visual analytics system for document analysis, provides a Galaxy View (as shown
in Fig. 4) that visualizes text corpora spatially by showing the pairwise similarity between documents as their distance in a 2D space.
As a result, groups and clusters emerge, which can be perceived as
the sets of similar documents, based on the geographic "near=similar"
metaphor [39]. More recently, a visual analytics system applicable to
more general high-dimensional data types including documents and
images has been proposed, allowing a user to explore the diverse aspects of data by applying various dimension reduction methods to generate different scatterplot visualizations [9].
Other kinds of high-dimensional data have also been visualized in
the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
facial images [8], and text documents [7].
迄今為止棋凳,這些方法已經(jīng)在視覺分析系統(tǒng)中得到積極應(yīng)用。例如连躏,IN-SPIRE [48]剩岳,用于文檔分析的知名視覺分析系統(tǒng)提供了一個(gè)Galaxy View(如圖4)通過顯示文檔之間的成對相似性作為它們在2D空間中的距離,在空間上可視化文本語料庫入热。結(jié)果拍棕,群體和集群出現(xiàn)疲迂,這可以被認(rèn)為是各類相似的文件,基于地理“近=相似”比喻[39]莫湘。最近,一個(gè)視覺分析系統(tǒng)適用于更一般的高維數(shù)據(jù)類型包括文檔和已經(jīng)提出了圖像郑气,允許用戶通過應(yīng)用各種維度降低方法來生成不同的散點(diǎn)圖可視化來探索數(shù)據(jù)的不同方面[9]幅垮。其他類型的高維數(shù)據(jù)也已被可視化基于維度降低的散點(diǎn)圖形式,包括教育績效數(shù)據(jù)尾组,人口普查數(shù)據(jù)[18]忙芒,葡萄酒特征[5]面部圖像[8]和文本文檔[7]。
2.3 Interactivity for Dimension Reduction in Information
Visualization
在信息可視化中對于降維的交互
In general, the axes created via dimension reduction techniques are defined by linear or non-linear combinations of original data dimensions.
This complexity can lead to trust and interpretation challenges for domain experts exploring their data visually [10]. For example, users
may question whether their interpretation of a pattern is trustworthy or
if it is just an artifact of a dimension reduction technique. More fundamentally, using only two dimensions to represent considerably higherdimensional data inevitably involves significant information loss and
distortion. To overcome these issues, various user interactions have
been employed in numerous visual analytics systems.
One approach to user interaction is via direct manipulation of dimension reduction model parameters. For example, Jeong et al.
presented iPCA, a visual analytics application that visualizes highdimensional data in a 2D scatterplot using PCA [26]. They utilize
graphical controls (e.g., sliders) to enable users to directly manipulate
the weight on the principal components used in PCA. As a result, the
adjustments by the user generate a new projection (i.e., a new scatterplot). Similar interaction guidelines have been used by other applications, such as a text visualization system called STREAMIT [2].
A different set of techniques for incorporating user interactions into
such visual analytics systems also exists. Semantic interaction techniques function by inferring model updates based on direct interactions performed in the visualization [16, 17]. For example, Endert et
al. have shown how directly manipulating the position of points in a
2D scatterplot can be used for inferring the parameters of PCA, MDS,
and GTM [18]. These inferences can also be used for exporting the
specification of distance functions computed in the dimension reduction step so that they can be reused, shared, or simply saved [5].
Other than manipulating data items to interact with scatterplots, researchers have studied the interaction techniques that manipulate features or dimensions. Yi et al. have presented a technique called Dust
& Magnet that allows users to additionally place features or dimensions on top of a scatterplot themselves to see which data items have
large values of these features or dimensions [49]. For text analysis, the
VIBE system allows users to perform similar interactions with keywords [35]. In addition, Turkay et al. proposed a technique using
dual scatterplots one of which shows data items while the other shows
features [44]. By providing brushing and linking as well as filtering
operations on both data items and features in these dual scatterplots,
users can check major patterns as well as outliers among data items
and among features.The technique proposed in this paper follows a similar idea of interacting with both data items and features, but the main novelty of
the proposed technique against the existing work lies in the capability
of directly defining and interpreting the axes of the 2D scatterplot by
assigning the data items of our interest to the axes. In this respect, our
work is related to PivotSlice, a technique recently proposed by Zhao
et al. that allows faceted browsing of high-dimensional data [50], as
it allows users to specify data attributes on axes of the scatterplot by
directly dragging the attribute to the axis. However, our technique enables users to drag data objects (instead of data attributes) to the axis.
Further, the proposed technique does not divide the scatterplot into a
multifaceted view.
Furthermore, a technique called flexible linked axes [11] has a relationship with our work from a different aspect. That is, this technique
is a different type of interaction that allows users to draw axes on a canvas, where scatterplots can be generated between any two neighboring
axes. However, the main goal of this technique is fundamentally different from ours in that it attempts to flexibly coordinate and place
multiple scatterplots on a large canvas, while our focus is on improving a single scatterplot for better supporting the interactive exploration
of data based on a more sophisticated, user-driven axis specification.
Further, Kondo and Collins have shown how directly interacting with
visualizations can be used for revealing temporal trends and relationships between data items [30]. Their work allowed users to manipulate
the position of data points in a scatterplot to reveal the temporal trends
in data, again enabling interactions directly on the data items in a scatterplot to parameterize a data model.
通常讳侨,通過降維技術(shù)創(chuàng)建的軸由原始數(shù)據(jù)維度的線性或非線性組合定義呵萨。這種復(fù)雜性可能導(dǎo)致領(lǐng)域?qū)<医忉寯?shù)據(jù)可視化的信任和解釋挑戰(zhàn)[10]。例如跨跨,用戶可能質(zhì)疑他們對模式的解釋是否值得信賴
如果它只是尺寸縮小技術(shù)的工件潮峦。更基本的是,僅使用二維代表相當(dāng)高的維度數(shù)據(jù)就不可避免地會(huì)涉及重大的信息丟失失真勇婴。為了克服這些問題忱嘹,各種用戶交互都有被用于許多視覺分析系統(tǒng)。
用戶交互的一種方法是通過直接操縱維度降低模型參數(shù)耕渴。例如拘悦,Jeong et al。提出了iPCA橱脸,一種視覺分析應(yīng)用程序础米,可以使用PCA在2D散點(diǎn)圖中顯示高維數(shù)據(jù)[26]。他們利用圖形控件(例如滑塊)添诉,以使用戶能夠直接操縱PCA中使用的主要成分的重量屁桑。結(jié)果,用戶的調(diào)整產(chǎn)生新的投影(即新的散點(diǎn)圖)吻商。其他應(yīng)用程序也使用了類似的交互指南掏颊,例如名為STREAMIT [2]的文本可視化系統(tǒng)。用于將用戶交互納入的一組不同的技術(shù)
這樣的視覺分析系統(tǒng)也存在艾帐。語義交互技術(shù)通過基于在可視化中執(zhí)行的直接交互來推斷模型更新而起作用[16,17]乌叶。例如,Endert et
人柒爸。已經(jīng)表明如何直接操縱一個(gè)點(diǎn)的位置2D散點(diǎn)圖可用于推斷PCA准浴,MDS,和GTM [18]捎稚。這些推論也可以用于出口在維度降低步驟中計(jì)算出的距離函數(shù)的規(guī)范乐横,以便可以重用求橄,共享或簡單地保存[5]。
除了操縱數(shù)據(jù)項(xiàng)與分散圖進(jìn)行交互之外葡公,研究人員還研究了操縱特征或尺寸的相互作用技術(shù)罐农。 Yi等已經(jīng)提出了一種稱為塵埃的技術(shù)
&Magnet,允許用戶另外將功能或維度放在散點(diǎn)圖上催什,以查看哪些數(shù)據(jù)項(xiàng)
這些特征或尺寸的大值[49]涵亏。對于文本分析,
VIBE系統(tǒng)允許用戶執(zhí)行與關(guān)鍵字的類似交互[35]蒲凶。此外气筋,Turkay等提出了一種使用技術(shù)雙散點(diǎn)圖其中一個(gè)顯示數(shù)據(jù)項(xiàng),而另一個(gè)顯示
特征[44]旋圆。通過提供刷洗和連接以及過濾
對這兩個(gè)散點(diǎn)圖中的數(shù)據(jù)項(xiàng)和特征的操作宠默,
用戶可以檢查數(shù)據(jù)項(xiàng)中的主要模式以及異常值
本文提出的技術(shù)遵循與數(shù)據(jù)項(xiàng)和特征相互作用的類似思想,但主要的新穎性
針對現(xiàn)有工作的提出的技術(shù)在于能力
直接定義和解釋2D散點(diǎn)圖的軸
將我們感興趣的數(shù)據(jù)項(xiàng)分配給軸灵巧。在這方面搀矫,我們的
工作涉及到PivotSlice,這是趙先生最近提出的一種技術(shù)
et al刻肄。這允許分面瀏覽高維數(shù)據(jù)[50]艾君,as
它允許用戶在散點(diǎn)圖的軸上指定數(shù)據(jù)屬性
直接將屬性拖到軸上。然而肄方,我們的技術(shù)使用戶能夠?qū)?shù)據(jù)對象(而不是數(shù)據(jù)屬性)拖到軸上冰垄。
此外,所提出的技術(shù)不將散點(diǎn)圖劃分成a
多方面的觀點(diǎn)
此外权她,一種稱為靈活連接軸的技術(shù)[11]與我們在不同方面的工作有關(guān)系虹茶。也就是說,這種技術(shù)
是一種不同類型的互動(dòng)隅要,允許用戶在畫布上繪制軸蝴罪,其中可以在任何兩個(gè)相鄰之間生成散點(diǎn)圖
軸。然而步清,這種技術(shù)的主要目標(biāo)是與我們的根本不同要门,它試圖靈活地協(xié)調(diào)和放置
大型畫布上的多個(gè)散點(diǎn)圖,而我們的重點(diǎn)是改進(jìn)單個(gè)散點(diǎn)圖廓啊,以更好地支持互動(dòng)式探索
的數(shù)據(jù)基于更復(fù)雜的用戶驅(qū)動(dòng)的軸規(guī)范欢搜。
此外,Kondo和Collins已經(jīng)展示了如何直接相互作用
可視化可用于揭示數(shù)據(jù)項(xiàng)之間的時(shí)間趨勢和關(guān)系[30]谴轮。他們的工作允許用戶操縱
數(shù)據(jù)點(diǎn)在散點(diǎn)圖中的位置炒瘟,以揭示時(shí)間趨勢
在數(shù)據(jù)中,再次在分散圖中直接對數(shù)據(jù)項(xiàng)進(jìn)行交互以參數(shù)化數(shù)據(jù)模型第步。
3 PROPOSED TECHNIQUE
To realize the proposed interaction technique, we built a proof-ofconcept visual analytics system. In this section, we describe (1) the
overall design of the proposed visual analytics system, (2) the proposed interaction to steer the axis in a user-driven manner, (3) the underlying mathematical details to support the proposed user interaction,
(4) the design rationale, and (5) the implementation details of the proposed system.
3提出的技術(shù)
為了實(shí)現(xiàn)所提出的交互技術(shù)疮装,我們構(gòu)建了一個(gè)驗(yàn)證視覺分析系統(tǒng)缘琅。 在本節(jié)中,我們將描述(1)
提出的視覺分析系統(tǒng)的總體設(shè)計(jì)廓推,(2)提出的以用戶驅(qū)動(dòng)的方式操作軸的交互作用刷袍,(3)支持提出的用戶交互的基礎(chǔ)數(shù)學(xué)細(xì)節(jié),
(4)設(shè)計(jì)理由樊展,(5)擬議制度的實(shí)施細(xì)則做个。
3.1 System Design
As shown in Fig. 1 by using the well-known Car dataset, which consists of 387 data items with 18 attributes,1 the proposed system mainly
contains three panels: (1) the scatterplot view (Fig. 1(A)), (2) the
axis interaction panel to support the proposed interaction capabilities
(Fig. 1(B-D)), and the data detail view (Fig. 1(E)).
The user interaction technique presented in this paper fosters a visual data exploration process grounded in the principles of semantic
interaction techniques [16, 17]. That is, the system interprets the analytical reasoning of exploratory user interactions to steer the underlying data model. The generic workflow supported by our user interaction technique is as follows:
- The user observes two data points that define the difference between the two semantic groupings (e.g., “nice cars” and “bad
cars”). - The user drags one data item to each side of the axis.
- Interaxis computes the weighting of data attributes that supports
these higher-level groupings (Eq. 1). The weights are displayed
in the bar chart below the axis. - The scatterplot updates to reflect the newly defined axis, where
data items are placed according to the similarity on either side of
the axis (Eq. 2). - The user can refine the semantic grouping by adding/removing
data points or directly modifying the weighting in the visualization below the axes. - The user can save the axis for future use and continue to explore
the visualization iteratively by using the same interaction concept
based on different semantic groupings.
The scatterplot view provides a 2D overview of the data. By default,
the first and the second features of data, e.g., Retail Price and HP
(Horsepower), are assigned to the x and the y axes, respectively, but
this initial view can be set up by using a dimension reduction method
such as PCA [27] to provide another starting point. Data points are represented as semi-transparent circles so that regions with overlapped
data points can be highlighted. The scatterplot view supports zoom
and pan via mouse wheel operations on a white space (to zoom on
both axes simultaneously) or over a particular axis (to zoom only on
this axis). Hovering over or clicking on a data point, one can check the
full details (or the original high-dimensional information) of the data
item in the data detail view (Fig. 1(E)).
The axis interaction panel consists of two drop zones (the high-end
and the low-end of each axis), which the user drags data points into in
order to steer the axis (Fig. 1(B)), an interactive bar chart (Fig. 1(C)),
and a sub-panel (Fig. 1(D)) containing buttons to save the current axis
for further use or to clear the data points currently assigned to the axis
and a combo box to change the axis back to one among the original
features or the previously defined axes. The bars in the interactive
bar chart represent the contributions/weights of attributes to the corresponding axis. The longer the length of a bar is, the stronger its corresponding attribute contributes to the axis. The bars are color-coded
by the signs of their weights: positive contributions in blue and negative contributions in red. Data points that are high on the positively
weighted (blue-colored) attributes will be placed on the high-end side
of the axis. Data points that are high on the negatively weighted attributes will be placed on the low-end side of the axis. For example,
in Fig. 1(C), sedans tend to be on the left side of the scatterplot, while
sports cars and cars with rear-wheel drive (RWD) tend to be on the
right side. Positive and negative weights represent the magnitude and
at which end of the axis the data points with those attributes will be
placed
3.1系統(tǒng)設(shè)計(jì)
如圖1所示,通過使用著名的Car數(shù)據(jù)集滚局,其中包括387個(gè)具有18個(gè)屬性的數(shù)據(jù)項(xiàng),1個(gè)主要提出的系統(tǒng)
包含三個(gè)面板:(1)散點(diǎn)圖(圖1(A))顽频,(2)軸互動(dòng)面板來支持所提出的交互能力(圖1(B-D))和數(shù)據(jù)細(xì)節(jié)圖(圖1(E))藤肢。
本文提出的用戶交互技術(shù),建立在語義學(xué)原理基礎(chǔ)上的可視化數(shù)據(jù)挖掘過程交互技術(shù)[16,17]糯景。也就是說嘁圈,系統(tǒng)解釋了探索性用戶交互的分析推理,以引導(dǎo)基礎(chǔ)數(shù)據(jù)模型蟀淮。我們的用戶交互技術(shù)支持的通用工作流程如下:
1.用戶觀察定義兩個(gè)語義分組之間的差異的兩個(gè)數(shù)據(jù)點(diǎn)(例如最住,“漂亮的車”和“不好的”汽車”)。
2.用戶將一個(gè)數(shù)據(jù)項(xiàng)拖到軸的每一側(cè)怠惶。
3.Interaxis計(jì)算支持的數(shù)據(jù)屬性的權(quán)重這些較高級(jí)別的分組(等式1)涨缚。顯示權(quán)重在軸下方的條形圖中。
4.散點(diǎn)圖更新以反映新定義的軸策治,其中數(shù)據(jù)項(xiàng)根據(jù)兩邊的相似度進(jìn)行放置在軸的一側(cè)(方程2)脓魏。
5.用戶可以通過添加/刪除來細(xì)化語義分組數(shù)據(jù)點(diǎn)或直接修改軸下的可視化中的權(quán)重。
6.用戶可以保存軸以備將來使用通惫,并繼續(xù)探索通過使用相同的交互概念迭代地進(jìn)行可視化基于不同的語義分組茂翔。
散點(diǎn)圖提供了數(shù)據(jù)的2D概述。默認(rèn)履腋,數(shù)據(jù)的第一和第二個(gè)特征珊燎,例如零售價(jià)和HP(馬力)分別分配給x軸和y軸,但是可以通過使用尺寸縮小方法來設(shè)置此初始視圖如PCA [27]提供了另一個(gè)起點(diǎn)遵湖。數(shù)據(jù)點(diǎn)被表示為半透明圓圈悔政,使得具有重疊的區(qū)域數(shù)據(jù)點(diǎn)可以突出顯示。散點(diǎn)圖視圖支持縮放
并通過鼠標(biāo)滾輪操作在白色空間(以放大同時(shí))或在特定的軸上(僅縮小這個(gè)軸)延旧。懸妥矿铮或點(diǎn)擊數(shù)據(jù)點(diǎn),可以檢查數(shù)據(jù)的完整細(xì)節(jié)(或原始高維信息)數(shù)據(jù)詳細(xì)視圖中的項(xiàng)目(圖1(E))垄潮。
軸互動(dòng)面板由兩個(gè)放置區(qū)(高端組成和每個(gè)軸的低端)烹卒,用戶將數(shù)據(jù)點(diǎn)拖入為了引導(dǎo)軸(圖1(B))闷盔,交互式條形圖(圖1(C)),和包含用于保存當(dāng)前軸的按鈕的子面板(圖1(D))用于進(jìn)一步使用或清除當(dāng)前分配給軸的數(shù)據(jù)點(diǎn)和一個(gè)組合框?qū)⑤S更改回原來的一個(gè)特征或先前定義的軸旅急。在互動(dòng)的條形圖表示屬性對相應(yīng)軸的貢獻(xiàn)/重量逢勾。條的長度越長,其對應(yīng)的屬性越強(qiáng)于軸藐吮。條形框是彩色編碼的通過他們的權(quán)重的跡象:積極的貢獻(xiàn)在藍(lán)色和負(fù)面的貢獻(xiàn)在紅色溺拱。數(shù)據(jù)點(diǎn)高,積極加權(quán)(藍(lán)色)屬性將被放置在高端端的軸谣辞。負(fù)權(quán)重屬性高的數(shù)據(jù)點(diǎn)將放置在軸的低端側(cè)迫摔。例如,
在圖1中泥从。 1(C)句占,轎車往往位于散點(diǎn)圖的左側(cè)蚓耽,而
具有后輪驅(qū)動(dòng)(RWD)的跑車和汽車傾向于在
右邊冗尤。正負(fù)權(quán)重表示大小和在軸的哪一端,數(shù)據(jù)點(diǎn)與這些屬性將一起
放置
Fig. 1. An overview of the proposed visual analytics system, InterAxis, showing a car dataset, which includes 387 data items with
18 attributes. The proposed system contains three panels: (A) the scatterplot view to provide a two-dimensional overview of data,
(B-D) the axis interaction panel to support the proposed interaction capabilities, and (E) the data detail view to show the original
high-dimensional information of the data items of interest. The axis interaction panel (B-D) consists of (B) two drop zones (the
high-end and the low-end of each axis), which a user drags data points into in order to steer the axis, (C) an interactive bar chart,
and a sub-panel containing buttons to save the current axis for future use (D, middle) or to clear the data points currently assigned
to the axis (D, right) and a combo box to change the axis back to one among the original features or the previously created axes
via our interaction (D, left).
圖1.提出的視覺分析系統(tǒng)的概述哗脖,InterAxis祈餐,顯示一個(gè)汽車數(shù)據(jù)集擂啥,其中包括387個(gè)數(shù)據(jù)項(xiàng) 18個(gè)屬性。 所提出的系統(tǒng)包含三個(gè)面板:(A)散點(diǎn)圖視圖以提供數(shù)據(jù)的二維概述帆阳,(B-D)軸互動(dòng)面板支持提出的交互能力哺壶,(E)數(shù)據(jù)詳細(xì)視圖顯示原始感興趣的數(shù)據(jù)項(xiàng)的高維信息。 軸相互作用面板(B-D)由(B)兩個(gè)放置區(qū)組成
每個(gè)軸的高端和低端)蜒谤,用戶拖動(dòng)數(shù)據(jù)點(diǎn)以引導(dǎo)軸变骡,(C)交互式條形圖,
和一個(gè)子面板芭逝,其中包含保存當(dāng)前軸以供將來使用(D塌碌,中間)或清除當(dāng)前分配的數(shù)據(jù)點(diǎn)的按鈕
到軸(D,右)和組合框?qū)⑤S更改回原始要素或先前創(chuàng)建的軸之一
通過我們的互動(dòng)(D旬盯,左)台妆。
3.2 Interactive Axis Steering
The proposed method provides two types of interactions: (1) data-level
axis steering and (2) attribute-level axis manipulation. Data-level axis
steering is prompted by dragging a data point from the scatterplot into
the two drop zones at the high- and the low- end of the axis. Attributelevel axis manipulation is prompted by directly adjusting the bars in
the interactive bar chart.
The main idea of the proposed interaction for steering the axis in
a user-driven manner lies in an intuitive process of incorporating data
items seamlessly while exploring data in a scatterplot. For example,
when a user finds data points that he likes (or dislikes) in the scatterplot, he can drag them to the high-end (or the low-end) drop zone of
an axis (Fig. 1(B)). Accordingly, a new axis is formed by reflecting
these choices of data items, which will then update the scatterplot on
the basis of the newly formed axis. The technical details about how
we form a new axis will be described in the next section.
How the axis is formed from this process is summarized and visualized as a bar chart (Fig. 1(C)) so that a user can get an idea about
how much a particular original feature or dimension is emphasized or
de-emphasized. Given such a bar chart, a user can further refine the
meaning of an axis by directly manipulating the length of each bar
via drag-and-drop operations on the tip of the bar (attribute-level axis
manipulation).
The entire interaction process can be dynamic and iterative. That is,
a user can additionally assign new data items to an axis or remove data
items that was already assigned to an axis. Furthermore, the abovedescribed direct manipulation on the bar chart can be performed at
any moment during such an interactive exploration of the bar chart.
Finally, a user can save the current definition of an axis, and then it is
registered as a new entry in the combo box (Fig. 1(D, left)) so that a
user can later recover the axis to a previously saved one.
3.2交互軸操作
所提出的方法提供了兩種類型的交互:(1)數(shù)據(jù)級(jí)
軸轉(zhuǎn)向和(2)屬性級(jí)軸操縱。數(shù)據(jù)級(jí)軸通過將數(shù)據(jù)點(diǎn)從散點(diǎn)圖拖到中來提示轉(zhuǎn)向在軸的高端和低端的兩個(gè)落下區(qū)域胖翰。通過直接調(diào)整條形來提示屬性級(jí)別的軸操作
互動(dòng)條形圖接剩。
提出的相互作用的主要思想是將軸轉(zhuǎn)向
用戶驅(qū)動(dòng)的方式在于并入數(shù)據(jù)的直觀過程
在散點(diǎn)圖中探索數(shù)據(jù)時(shí),項(xiàng)目無縫連接萨咳。例如懊缺,
當(dāng)用戶在散點(diǎn)圖中找到他喜歡(或不喜歡)的數(shù)據(jù)點(diǎn)時(shí),他可以將它們拖到高端(或低端)下拉區(qū)域
軸(圖1(B))。因此鹃两,通過反射形成新的軸
這些數(shù)據(jù)項(xiàng)的選擇遗座,然后將更新散點(diǎn)圖
新形成軸的基礎(chǔ)。技術(shù)細(xì)節(jié)如何
我們形成一個(gè)新的軸將在下一節(jié)描述俊扳。
如何從這個(gè)過程形成軸是總結(jié)和可視化為條形圖(圖1(C))途蒋,以便用戶可以得到一個(gè)想法
要強(qiáng)調(diào)特定原始特征或維度多少
去加重。給定這樣的條形圖馋记,用戶可以進(jìn)一步細(xì)化
通過直接操縱每個(gè)條的長度來表示軸
通過拖動(dòng)操作在桿的頂端(屬性級(jí)軸
操作)号坡。
整個(gè)交互過程可以是動(dòng)態(tài)的和迭代的。那是梯醒,
用戶可以另外向軸分配新的數(shù)據(jù)項(xiàng)或刪除數(shù)據(jù)
已分配給軸的項(xiàng)目宽堆。此外,可以在條形圖上進(jìn)行上述的直接操縱
在條形圖的這種互動(dòng)探索過程中的任何時(shí)刻茸习。
最后畜隶,用戶可以保存軸的當(dāng)前定義,然后是
在組合框中注冊為新條目(圖1(D逮光,左)),以便a
用戶可以稍后將軸恢復(fù)到以前保存的軸墩划。
3.3 Underlying Techniques根本技術(shù)
In this section, we describe the underlying technique for the proposed
user interaction of forming the axis via data items. For the sake of
brevity, we consider only the x axis (the horizontal axis) in a scatterplot, but the following description can be generalized to the y axis in
the same manner
在本節(jié)中涕刚,我們描述了提出的基礎(chǔ)技術(shù)
用戶通過數(shù)據(jù)項(xiàng)形成軸的交互作用。 為了
簡而言之乙帮,我們在散點(diǎn)圖中僅考慮x軸(橫軸)杜漠,但以下描述可以推廣到y(tǒng)軸同樣的方式
Data preprocessing. As will be discussed later, the underlying
model to define the axis is based on a linear combination of the original dimensions. To this end, we adopt data preprocessing steps used in linear regression models [14]. For a categorical variable with c different categories, we use dummy encoding, which converts it to a cdimensional indicator vector where the value of each dimension is 1
if a data item is in the category of the corresponding dimension and
0 otherwise. Next, we scale and translate each dimension (including
both indicator and numerical variables) so that its value is exactly in
the range from 0 to 1
Linear transformation. Assuming that such data preprocessing is
done, we denote a set of high-dimensional vectors of data items that
the user assigned (via a drag-and-drop) to the high-end of the x axis
as , ax n,xh,h? and a set of those that he dragged into
the low-end side of the x axis as?, where
n
x,h and nx,l represent the total number of the assigned points to the
high-end and the low-end of the x axis, respectively. Now, we define
the linear transformation vector for the x axis as follows:
This is then further scaled to have a unit Euclidean norm.
One can define the linear transformation vector T
y for the y axis
in the same manner. Every data item is mapped to the x axis (and
the y axis) via the transformation Tx (and Ty). That is, the i-th data
item whose high-dimensional vector is represented as ai is mapped to
a point in our 2D scatterplot so that its 2D coordinates are represented
as follows:
Owing to the easy interpretability of this linear model, one can understand the meaning of this transformation in a straightforward manner. That is, the resulting x axis basically emphasizes the features
or dimensions that have large values on the high-dimensional vectors
contained in Ax,h but have low values on those in Ax,l. On the other
hand, we de-emphasize the features that have low values on the vectors
contained in Ax,h but have high values on those in Ax,l. In this manner,
as a data item has larger (or lower) values on these emphasized dimensions and lower (or higher) values on the de-emphasized dimensions,
its x coordinate will have a higher (or lower) value, appearing more on
the right (or left) side of the x axis. The notations used in this section
are summarized in Table 1.
**數(shù)據(jù)預(yù)處理如下文將討論的,底層
定義軸的模型是基于原始尺寸的線性組合察净。為此驾茴,我們采用線性回歸模型中使用的數(shù)據(jù)預(yù)處理步驟[14]。對于具有c個(gè)不同類別的分類變量氢卡,我們使用虛擬編碼锈至,將其轉(zhuǎn)換為維度指示符向量,其中每個(gè)維度的值為1
如果數(shù)據(jù)項(xiàng)在相應(yīng)維度的類別中
否則為0译秦。接下來峡捡,我們縮放和翻譯每個(gè)維度(包括
指標(biāo)和數(shù)值變量),使其值正好在
范圍從0到1
線性變換筑悴。假設(shè)這樣的數(shù)據(jù)預(yù)處理是
完成们拙,我們表示一組數(shù)據(jù)項(xiàng)的高維向量
用戶(通過拖放)分配到x軸的高端
as,ax n阁吝,xh砚婆,h?和一組他拖入的那些
x軸的低端側(cè)為突勇?装盯,哪里坷虑?
?
x,h和nx验夯,l表示分配給的點(diǎn)的總數(shù)
高端和低端的x軸分別〔猓現(xiàn)在,我們定義
x軸的線性變換矢量如下:
然后進(jìn)一步縮放以具有單位歐幾里得規(guī)范挥转。
可以定義線性變換向量T
y為y軸
以相同的方式海蔽。每個(gè)數(shù)據(jù)項(xiàng)都映射到x軸(和
y軸)通過轉(zhuǎn)換Tx(和Ty)。也就是說绑谣,第i個(gè)數(shù)據(jù)
將其高維向量表示為ai的項(xiàng)目映射到
我們的2D散點(diǎn)圖中的一個(gè)點(diǎn)党窜,以便表示其2D坐標(biāo)
如下:
由于這種線性模型的易解釋性,可以直接的方式了解這種變換的含義借宵。也就是說幌衣,所得到的x軸基本上強(qiáng)調(diào)了特征
或在高維度向量上具有大值的尺寸
包含在Ax,h中壤玫,但在Ax豁护,l中的值較低。在另一
手欲间,我們不強(qiáng)調(diào)在向量上具有低價(jià)值的特征
包含在Ax楚里,h中,但在Ax猎贴,l中具有高值班缎。以這種方式,
因?yàn)閿?shù)據(jù)項(xiàng)在這些強(qiáng)調(diào)維度上具有更大(或更低)的值她渴,而在減重維度上具有較低(或更高)的值达址,它的x坐標(biāo)將具有更高(或更低)的值,更多出現(xiàn)x軸的右側(cè)(或左側(cè))趁耗。本節(jié)中使用的符號(hào)總結(jié)在表1中沉唠。
**