A pivot table is a table that summarizes data in another table, and is made by applying an operation such as sorting, averaging, or summing to data in the first table, typically including grouping of the data. A trivial example is a column of numbers as the first table, and the column average as a pivot table with only one row and column. A slightly less trivial example is a table with columns sex and height, each row giving the sex and height of a person, and a pivot table made from it with the same two columns but only two rows, with "male" and "female" in the sex column and the average height of each sex in the height column. A third more complex and more typical example adds another dimension to the pivot table: a data table has columns month, salesman, product, and sales, each row giving the total sales of some product by some salesman in a certain month, and a pivot table made from it has a first column salesman of salesmen names and the remaining column names are product names and contain the total sales by that salesman of that product over all months.
Pivot tables are used in data processing and are found in data visualization programs such as spreadsheets or business intelligence software. Such programs can automatically sort, count, total or average the data stored in one table or spreadsheet, displaying the results in a second table -- the pivot table -- showing the summarized data. The user often sets up and changes the summary's structure by dragging and dropping fields graphically. This "rotation" or pivoting of the summary table gives the concept its name.
Count-based pivot tables are equally weighted cross tabulations.
The term pivot table is a generic phrase used by multiple vendors. In the United States, Microsoft Corporation has trademarked the specific compound word form, PivotTable.
Pivot tables can be seen as a simplification of the more complete and complex OLAP (online analytical processing) concepts.
Video Pivot table
History
In their book Pivot Table Data Crunching, Bill Jelen and Mike Alexander refer to Pito Salas as the "father of pivot tables". While working on a concept for a new program that would eventually become Lotus Improv, Salas noted that spreadsheets have patterns of data. A tool that could help the user recognize these patterns would help to build advanced data models quickly. With Improv, users could define and store sets of categories, then change views by dragging category names with the mouse. This core functionality would provide the model for pivot tables.
Lotus Development released Improv in 1991 on the NeXT platform. A few months after the release of Improv, Brio Technology published a standalone Macintosh implementation, called DataPivot (with technology eventually patented in 1999). Borland purchased the DataPivot technology in 1992 and implemented it in their own spreadsheet application, Quattro Pro.
In 1993 the Microsoft Windows version of Improv appeared. Early in 1994 Microsoft Excel 5 brought a new functionality called a "PivotTable" to market. Microsoft further improved this feature in later versions of Excel:
- Excel 97 included a new and improved PivotTable Wizard, the ability to create calculated fields, and new pivot cache objects that allow developers to write Visual Basic for Applications macros to create and modify pivot tables
- Excel 2000 introduced "Pivot Charts" to represent pivot-table data graphically
In 2007 Oracle Corporation made PIVOT
and UNPIVOT
operators available in Oracle Database 11g.
Maps Pivot table
Mechanics
For typical data entry and storage, data usually appear in flat tables, meaning that they consist of only columns and rows, as in the following portion of a sample spreadsheet showing data on shirt types:
While tables such as these can contain many data items, it can be difficult to get summarized information from them. A pivot table can help quickly summarize the data and highlight the desired information. The usage of a pivot table is extremely broad and depends on the situation. The first question to ask is, "What am I seeking?" In the example here, let us ask, "How many Units did we sell in each Region for every Ship Date?":
A pivot table usually consists of row, column and data (or fact) fields. In this case, the column is Ship Date, the row is Region and the data we would like to see is (sum of) Units. These fields allow several kinds of aggregations, including: sum, average, standard deviation, count, etc. In this case, the total number of units shipped is displayed here using a sum aggregation.
Implementation
Using the example above, software will find all distinct values for Region. In this case, they are: North, South, East, West. Furthermore, it will find all distinct values for Ship Date. Based on the aggregation type, sum, it will summarize the fact, the quantities of Unit, and display them in a multidimensional chart. In the example above, the first datum is 66. This number was obtained by finding all records where both Region was East and Ship Date was 1/31/2005, and adding the Units of that collection of records (i.e., cells E2 to E7) together to get a final result.
Pivot tables are not created automatically. For example, in Microsoft Excel one must first select the entire data in the original table and then go to the Insert tab and select "Pivot Table" (or "Pivot Chart"). The user then has the option of either inserting the pivot table into an existing sheet or creating a new sheet to house the pivot table. A pivot table field list is provided to the user which lists all the column headers present in the data. For instance, if a table represents sales data of a company, it might include Date of sale, Sales person, Item sold, Color of item, Units sold, Per unit price, and Total price. This makes the data more readily accessible.
The fields that would be created will be visible on the right hand side of the worksheet. By default, the pivot table layout design will appear below this list.
Each of the fields from the list can be dragged on to this layout, which has four options :
- Report filter
- Column labels
- Row labels
- Summation values
Report filter
Report filter is used to apply a filter to an entire table. For example, if the "Color of Item" field is dragged to this area, then the table constructed will have a report filter inserted above the table. This report filter will have drop-down options (Black, Red, and White in the example above). When an option is chosen from this drop-down list ("Black" in this example), then the table that would be visible will contain only the data from those rows that have the "Color of Item = Black".
Column labels
Column labels are used to apply a filter to one or more columns that have to be shown in the pivot table. For instance if the "Sales person" field is dragged to this area, then the table constructed will have values from the column "Sales Person", i.e., one will have number of columns equal to the number of "Sales person". There will also be one added column of Total. In the example above, this instruction will create five columns in the table -- one for each sales person, and Grand Total. There will be a filter above the data -- column labels -- from which one can select or deselect a particular sales person for the pivot table.
This table will not have any numerical values as no numerical field is selected but when it is selected, the values will automatically get updated in the column of "Grand total".
Row labels
Row labels are used to apply a filter to one or more rows that have to be shown in the pivot table. For instance if the "Sales person" field is dragged on this area then the other output table constructed will have values from the column "Sales person", i.e., one will have number of rows equal to the number of "Sales Person". There will also be one added row of "Grand Total". In the example above, this instruction will create five rows in the table -- one for each sales person, and Grand Total. There will be a filter above the data -- row labels -- from which one can select or deselect a particular sales person for the Pivot table.
This table will not have any numerical values, as no numerical field is selected, but when it is selected, the values will automatically get updated in the Row of "Grand Total".
Summation values
This usually takes a field that has numerical values that can be used for different types of calculations. However, using text values would also not be wrong; instead of Sum it will give a count. So, in the example above, if the "Units sold" field is dragged to this area along with row label of "Sales person", then the instruction will add a new column, "Sum of units sold", which will have values against each sales person.
Application support
Pivot tables or pivot functionality are an integral part of many spreadsheet applications and some database software, as well as being found in other data visualization tools and business intelligence packages.
- Spreadsheets
- In addition to Microsoft Excel, competing software programs such as Apache OpenOffice Calc provide similar functionality; the implementation in Apache OpenOffice and LibreOffice up to release 3.3 is called DataPilot. In version 3.4 of both LibreOffice and OpenOffice, DataPilot is renamed "Pivot Table".
- Google Docs initially allowed the creation of basic pivot tables via the pivot table gadget from Panorama called Panorama Analytics, but as of 2011 this gadget provided limited functionality and was extremely slow with large data sets. In May 2011, Google announced the roll-out of a natively hosted pivot table feature in the Google spreadsheets editor.
- Database support
- PostgreSQL, an object-relational database management system, allows the creation of pivot tables using the tablefunc module.
- MariaDB, a MySQL fork, allows pivot tables using the CONNECT storage engine.
- Microsoft Access supports pivot queries under the name "crosstab" query.
- Oracle database supports the PIVOT operation.
- Some popular databases that do not directly support pivot functionality, such as Microsoft SQL server and SQLite can usually simulate pivot functionality using embedded functions, dynamic SQL, and/or subqueries. The issue with pivoting in such cases is usually that the number of output columns must be known at the time the query starts to execute; for pivoting this is not possible as the number of columns is based on the data itself. Therefore the names must be hard coded or the query to be executed must itself be created dynamically (meaning, prior to each use) based upon the data.
- Web applications
- ZK, an Ajax framework, also allows the embedding of pivot tables in Web applications.
- Programming languages and libraries
- Programming languages and libraries suited to work with tabular data contain functions that allow the creation and manipulation of pivot tables. Python data analysis toolkit pandas has the function pivot_table, and the xs method useful to obtain sections
of pivot tables.
Online analytical processing
Excel pivot tables include the feature to directly query an OLAP server for retrieving data instead of getting the data from an Excel spreadsheet. On this configuration a pivot table is a simple client of an OLAP server. Excel's PivotTable not only allows for connecting to Microsoft's Analysis Service, but to any XML for Analysis (XMLA) OLAP standard-compliant server.
Other OLAP clients are JPivot, Dundas, IcCube (Client Library).
See also
- Aggregate function
- Business reporting
- Comparison of office suites
- Comparison of OLAP Servers
- Contingency table
- Data drilling
- Data mining
- Data warehouse
- Drill down
- Extract, transform, load
- Fold (higher-order function)
- Group by (SQL)
- OLAP cube
- Relational algebra
References
Further reading
- A Complete Guide to PivotTables: A Visual Approach (ISBN 1-59059-432-0) (in-depth review at slashdot.org)
- Excel 2007 PivotTables and PivotCharts: Visual blueprint (ISBN 978-0-470-13231-9)
- Pivot Table Data Crunching (Business Solutions) (ISBN 0-7897-3435-4)
- Beginning Pivot Tables in Excel 2007 (ISBN 1-59059-890-3)
Source of the article : Wikipedia