In Pandas, pivot tables are built for the pivoting of aggregate numerical data. A DataFrame can be pivoted to display data from a specific column separated by index and column parameters. For example, `pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])` displays the data in column `D` with index information from columns `A` and `B`, and column separation using column `C`. Aggregate functions can also be used during the pivot by specifying the `aggfunc` parameter, such as `aggfunc=np.sum`.

University of Michigan - Ann Arbor

Google

Data doesn't always come in the format that is very useful or helpful for performing analysis and sometimes can be re-organized to make things easier to view. Reshaping can change the relative position of certain information to help draw emphasis between different pieces of data. Consider the following example:
```python
                     A         B
first second                    
bar   one     0.721555 -0.706771
      two    -1.039575  0.271860
baz   one    -0.424972  0.567020
      two     0.276232 -1.087401
```
Using the ```.stack()``` method the data can be modified to look like this
```python
first  second   
bar    one     A   -0.727965
               B   -0.589346
       two     A    0.339969
               B   -0.693205
baz    one     A   -0.339355
               B    0.593616
       two     A    0.884345
               B    1.591431
```
You can also include an argument when using the stack method to specify which columns you want to stack. The same logic and usage applies when using the inverse function ```.unstack()```.

Both ```.stack()``` and ```.unstack()``` will return a sorted copy of the dataframe. 

Reshaping Data

Pandas allows for multiple levels when reshaping a DataFrame, providing more specificity for data manipulation. For example, if a DataFrame has a MultiIndex with multiple levels (e.g., `animal` and `hair_length`), you can specify which levels to stack using `df.stack(level=["animal", "hair_length"])`.

However, if the subgroups do not have the same set of labels, missing data may occur when using the `stack` and `unstack` methods. When using `df.unstack()` on such data, missing combinations will appear as `NaN`. To overcome this, you can use the `fill_value` parameter to specify a default replacement for the missing values. Handling missing values promptly is important to avoid unexpected behavior during later operations on the DataFrame.

Advanced Stack Manipulation in Pandas

Pandas Pivot Tables

The Pandas `melt()` method unpivots a DataFrame from a wide format to a long format. It allows you to select specific columns to act as identifier variables (`id_vars`), while converting the remaining columns into a two-column format representing measured variables and their corresponding values.

For example, given a DataFrame:
```python
  first last  height  weight
0  John  Doe     5.5     130
1  Mary   Bo     6.0     150
```

Transforming the data using `df.melt(id_vars=["first", "last"])` yields:
```python
  first last variable  value
0  John  Doe   height    5.5
1  Mary   Bo   height    6.0
2  John  Doe   weight  130.0
3  Mary   Bo   weight  150.0
```

By default, the resulting DataFrame is given a new zero-based integer index. Setting `ignore_index=False` retains the original index values from before the melt operation.

Learn Before

Related