Learn Before
Concept

Advance Stack Manipulation

You can also have multiple levels when reshaping your data. This will help you provide more specificity to how you want your data to be manipulated. If given the following data set:

exp A B A B animal cat cat dog dog hair_length long long short short 0 1.075770 -0.109050 1.643563 -1.469388 1 0.357021 -0.674600 -1.776904 -0.968914 2 -1.294524 0.413738 0.276662 -0.472035 3 -0.013960 -0.362543 -0.006154 -0.923061

We can specify a compression of the levels animal and hair_length using df.stack(level=["animal", "hair_length"]). The resulting DF will be

exp A B animal hair_length 0 cat long 1.075770 -0.109050 dog short 1.643563 -1.469388 1 cat long 0.357021 -0.674600 dog short -1.776904 -0.968914 2 cat long -1.294524 0.413738 dog short 0.276662 -0.472035 3 cat long -0.013960 -0.362543 dog short -0.006154 -0.923061

However if the subgroups do not have the same set of labels i.e long/short or cat/dog then there might be missing data when using stack and unstack methods. i.e.

exp B animal dog cat first second bar one 0.805244 -1.206412 two 1.340309 -1.170299 foo one 1.607920 1.024180 qux two 0.769804 -1.281247

When you unstack this using df.unstack()

exp B animal dog cat second one two one two first bar 0.805244 1.340309 -1.206412 -1.170299 foo 1.607920 NaN 1.024180 NaN qux NaN 0.769804 NaN -1.281247

To overcome this issue, you can use the fill_value parameter to modify what the missing values will be set as. It is important that missing values are handled as soon as they show up to avoid it affecting later usage of the data frame causing unexpected behavior.

0

1

Updated 2021-06-02

Tags

Python Programming Language

Data Science

Related