Concept Summary: Interactive Visual Statistics
Let’s summarize what we just learned in each of the concept videos. Then, we’ll continue with the hands-on lesson where you can apply each concept.
Statistics Worksheet
For a dataset in Dataiku DSS, it is quite useful to have a designated space with the tools for performing statistical analyses. This is just what a statistics worksheet provides!
A Worksheet provides a visual summary of exploratory data analysis (EDA) tasks. To create or access worksheets, go to the Statistics tab of your dataset.
The worksheet header consists of a worksheet menu. You can use the worksheet menu to create a new worksheet or rename, duplicate, and delete worksheets. You can also switch from one worksheet to another.
There are also buttons and menu items for creating a new card, running the worksheet in a container, changing the global confidence level for statistical tests, and specifying how to sample the dataset used in the worksheet. Note that by default, DSS computes statistics on a sample of first records in your dataset.
For more information about worksheets, see The Worksheet Interface in the reference documentation.
Statistics Card
Cards in a worksheet provide a straightforward way to perform various statistical tasks while keeping your workspace well organized.
In DSS, a Card is used to perform a specific EDA task. For example, you can describe your dataset, draw inferences about an underlying population, analyze the effect of dimensionality reduction, and so on.
A worksheet can have many cards, with the cards appearing below the worksheet header. When creating a card, specify the card type and its corresponding parameter values.
All cards have a configuration menu (?) for editing card settings, duplicating or deleting the card, viewing the JSON payloads and responses (for the purpose of leveraging the public API), and so on. Some cards also contain multiple sections, with each section having its own configuration menu.
Finally, the Split by menu in a card is useful for grouping your dataset by a specified variable. This allows the card to perform computations on each data subgroup.
For more information about cards, see Elements of a card in the reference documentation.