Search This Blog

Sunday, March 23, 2014

R (16) : Order of legend entries in ggplot2



I'm struggling get the right ordering of variables in a graph I made with ggplot2 in R.
Suppose I have a dataframe such as:
set.seed(1234)
my_df<- data.frame(matrix(0,8,4))
names(my_df) <- c("year", "variable", "value", "vartype")
my_df$year <- rep(2006:2007)
my_df$variable <- c(rep("VX",2),rep("VB",2),rep("VZ",2),rep("VD",2))
my_df$value <- runif(8, 5,10) 
my_df$vartype<- c(rep("TA",4), rep("TB",4))
which yields the following table:
  year variable    value vartype
1 2006       VX 5.568517      TA
2 2007       VX 8.111497      TA
3 2006       VB 8.046374      TA
4 2007       VB 8.116897      TA
5 2006       VZ 9.304577      TB
6 2007       VZ 8.201553      TB
7 2006       VD 5.047479      TB
8 2007       VD 6.162753      TB
There are four variables (VX, VB, VZ and VD), belonging to two groups of variable types, (TA and TB).
I would like to plot the values as horizontal bars on the y axis, ordered vertically first by variable groups and then by variable names, faceted by year, with values on the x axis and fill colour corresponding to variable group. (i.e. in this simplified example, the order should be, top to bottom, VB, VX, VD, VZ)
1) My first attempt has been to try the following:
ggplot(my_df,        
    aes(x=variable, y=value, fill=vartype, order=vartype)) +
       # adding or removing the aesthetic "order=vartype" doesn't change anything
     geom_bar() + 
     facet_grid(. ~ year) + 
     coord_flip()
However, the variables are listed in reverse alphabetical order, but not by vartype : the order=vartype aesthetic is ignored.
enter image description here
2) Following an answer to a similar question I posted yesterday, i tried the following, based on the post Order Bars in ggplot2 bar graph :
my_df$variable <- factor(
  my_df$variable, 
  levels=rev(sort(unique(my_df$variable))), 
  ordered=TRUE
)
This approach does gets the variables in vertical alphabetical order in the plot, but ignores the fact that the variables should be ordered first by variable goups (with TA-variables on top and TB-variables below).
enter image description here
3) The following gives the same as 2 (above):
my_df$vartype <- factor(
  my_df$vartype, 
  levels=sort(unique(my_df$vartype)), 
  ordered=TRUE
)
... which has the same issues as the first approach (variables listed in reverse alphabetical order, groups ignored)
4) another approach, based on the original answer to Order Bars in ggplot2 bar graph , also gives the same plat as 2, above
my_df <- within(my_df, 
                vartype <- factor(vartype, 
                levels=names(sort(table(vartype),
                decreasing=TRUE)))
                ) 
I'm puzzled by the fact that, despite several approaches, the aesthetic order=vartype is ignored. Still, it seems to work in an unrelated problem: http://learnr.wordpress.com/2010/03/23/ggplot2-changing-the-default-order-of-legend-labels-and-stacking-of-data/
I hope that the problem is clear and welcome any suggestions.
Matteo
I posted a similar question yesterday, but, unfortunately I made several mistakes when descrbing the problem and providing a reproducible example. I've listened to several suggestions since, and thoroughly searched stakoverflow for similar question and applied, to the best of my knowledge, every suggested combination of solutions, to no avail. I'm posting the question again hoping to be able to solve my issue and, hopefully, be helpful to others.
share|edit

    
    
It's not a duplicate of stackoverflow.com/q/5208679/602276 . Please read the question carefully. –  MatteoS Sep 4 '11 at 13:48
    
It is indeed the same question. You need to specify the levels of your factor in the order that you want them in your plot. The linked answer tells you how to do that. –  Andrie Sep 4 '11 at 13:54
1  
+1 for learning to provide reproducible code. –  Roman Luštrik Sep 4 '11 at 13:58
2  
More generally, I believe there is an issue related to coord_flip() when ordering variables. In my original data frame (not the one shown above), the order of groups in the legend is correct and corresponds to that of the dataframe, but the vertical order of variables is upside-down. (although the plot is conceptually different, the issue is similar to this learnr.files.wordpress.com/2010/03/… ). As far as I can see, this is something beyond an order issue of the dataframe, but an issue concerning the order reversal in ggplot2, possibly related to coord_flip. –  MatteoS Sep 4 '11 at 14:41
show 11 more comments

1 Answer


This has little to do with ggplot, but is instead a question about generating an ordering of variables to use to reorder the levels of a factor. Here is your data, implemented using the various functions to better effect:
set.seed(1234)
df2 <- data.frame(year = rep(2006:2007), 
                  variable = rep(c("VX","VB","VZ","VD"), each = 2),
                  value = runif(8, 5,10),
                  vartype = rep(c("TA","TB"), each = 4))
Note that this way variable and vartype are factors. If they aren't factors, ggplot() will coerce them and then you get left with alphabetical ordering. I have said this before and will no doubt say it again; get your data into the correct format first before you start plotting / doing data analysis.
You want the following ordering:
> with(df2, order(vartype, variable))
[1] 3 4 1 2 7 8 5 6
where you should note that we get the ordering by vartype first and only then by variable within the levels of vartype. If we use this to reorder the levels of variable we get:
> with(df2, reorder(variable, order(vartype, variable)))
[1] VX VX VB VB VZ VZ VD VD
attr(,"scores")
 VB  VD  VX  VZ 
1.5 5.5 3.5 7.5 
Levels: VB VX VD VZ
(ignore the attr(,"scores") bit and focus on the Levels). This has the right ordering, but ggplot() will draw them bottom to top and you wanted top to bottom. I'm not sufficiently familiar with ggplot() to know if this can be controlled, so we will also need to reverse the ordering using decreasing = TRUE in the call to order().
Putting this all together we have:
## reorder `variable` on `variable` within `vartype`
df3 <- transform(df2, variable = reorder(variable, order(vartype, variable,
                                                         decreasing = TRUE)))
Which when used with your plotting code:
ggplot(df3, aes(x=variable, y=value, fill=vartype)) +
       geom_bar() + 
       facet_grid(. ~ year) + 
       coord_flip()
produces this:
reordered barplot
share|edit

1  
I thank you for your solution! It works. However, i've also found, with a thorough search, that my original issue is a particular case of a common nuisance when using coord_flip(). –  MatteoS Sep 4 '11 at 15:38
1  
@MatteoS Do you understand now why people felt this was another duplicate? The answer is the same - reorder the levels of the factor in the order you want them. The issue here was how to derive that ordering. All the ggplot code was superfluous and distracting. It does help to boil problems down to their base level and also tell us exactly what you want. Andrie's Answer was almost spot on until you happened to mention in comments you didn't want to enter the ordering by hand. –  Gavin Simpson Sep 4 '11 at 15:43
2  
Now I see, but ggplot2 is the issue here. With coord_flip(), the axis are flipped, the variables that are originally ordered L-> R are then ordered B -> T, while the legend does not match them. –  MatteoS Sep 4 '11 at 15:44
1  
@MatteoS Ask away, but I don't see the need for this given the general solution of getting the factor levels in the order you want. –  Gavin Simpson Sep 4 '11 at 17:56
3  
@MatteoS scale_fill_discrete(guide = guide_legend(reverse=TRUE)) would be the equivalent for top.down=TRUE to reverse the order in legend. –  mlt Dec 6 '12 at 6:16
show 12 more comments

R (15) : Stack data


I'm having trouble stacking columns in a data.frame into one column. Now my data looks something like this:
id   time    black   white   red 
a     1       b1      w1     r1
a     2       b2      w2     r2
a     3       b3      w3     r3
b     1       b4      w4     r4
b     2       b5      w5     r5
b     3       b6      w6     r6
I'm trying to transform the data.frame so that it looks like this:
id   time  colour 
a     1     b1
a     2     b2
a     3     b3
b     1     b4
b     2     b5
b     3     b6
a     1     w1
a     2     w2
a     3     w3
b     1     w4
b     2     w5
b     3     w6
a     1     r1
a     2     r2
.     .     .
.     .     .
.     .     .
I'm guessing that this problem requires using the reshape package, but I'm not exactly sure how to use it to stack multiple columns under one column. Can anyone provide help on this?




1 Answer


Here's melt from reshape:
library(reshape)
melt(x, id.vars=c('id', 'time'),var='color')

And using reshape2 (an up-to-date, faster version of reshape) the syntax is almost identical.
The help files have useful examples (see ?melt and the link to melt.data.frame).
In your case, something like the following will work (assuming your data.frame is called DF)
library(reshape2)
melt(DF, id.var = c('id','time'), variable.name = 'colour')

Friday, March 21, 2014

R (14) : theme in ggplot2




Set theme elements

Usage

theme(..., complete = FALSE)

Arguments

...
a list of element name, element pairings that modify the existing theme.
complete
set this to TRUE if this is a complete theme, such as the one returned by theme_grey(). Complete themes behave differently when added to a ggplot object.

Description

Use this function to modify theme settings.

Details

Theme elements can inherit properties from other theme elements. For example, axis.title.x inherits from axis.title, which in turn inherits from text. All text elements inherit directly or indirectly from text; all lines inherit from line, and all rectangular objects inherit from rect.
For more examples of modifying properties using inheritance, see +.gg and %+replace%.
To see a graphical representation of the inheritance tree, see the last example below.

Theme elements

The individual theme elements are:
line all line elements (element_line)
rect all rectangluar elements (element_rect)
text all text elements (element_text)
title all title elements: plot, axes, legends (element_text; inherits from text)
axis.title label of axes (element_text; inherits from text)
axis.title.x x axis label (element_text; inherits from axis.title)
axis.title.y y axis label (element_text; inherits from axis.title)
axis.text tick labels along axes (element_text; inherits from text)
axis.text.x x axis tick labels (element_text; inherits from axis.text)
axis.text.y y axis tick labels (element_text; inherits from axis.text)
axis.ticks tick marks along axes (element_line; inherits from line)
axis.ticks.x x axis tick marks (element_line; inherits from axis.ticks)
axis.ticks.y y axis tick marks (element_line; inherits from axis.ticks)
axis.ticks.length length of tick marks (unit)
axis.ticks.margin space between tick mark and tick label (unit)
axis.line lines along axes (element_line; inherits from line)
axis.line.x line along x axis (element_line; inherits from axis.line)
axis.line.y line along y axis (element_line; inherits from axis.line)
legend.background background of legend (element_rect; inherits from rect)
legend.margin extra space added around legend (unit)
legend.key background underneath legend keys (element_rect; inherits from rect)
legend.key.size size of legend keys (unit; inherits from legend.key.size)
legend.key.height key background height (unit; inherits from legend.key.size)
legend.key.width key background width (unit; inherits from legend.key.size)
legend.text legend item labels (element_text; inherits from text)
legend.text.align alignment of legend labels (number from 0 (left) to 1 (right))
legend.title title of legend (element_text; inherits from title)
legend.title.align alignment of legend title (number from 0 (left) to 1 (right))
legend.position the position of legends. ("left", "right", "bottom", "top", or two-element numeric vector)
legend.direction layout of items in legends ("horizontal" or "vertical")
legend.justification anchor point for positioning legend inside plot ("center" or two-element numeric vector)
legend.box arrangement of multiple legends ("horizontal" or "vertical")
panel.background background of plotting area (element_rect; inherits from rect)
panel.border border around plotting area (element_rect; inherits from rect)
panel.margin margin around facet panels (unit)
panel.grid grid lines (element_line; inherits from line)
panel.grid.major major grid lines (element_line; inherits from panel.grid)
panel.grid.minor minor grid lines (element_line; inherits from panel.grid)
panel.grid.major.x vertical major grid lines (element_line; inherits from panel.grid.major)
panel.grid.major.y horizontal major grid lines (element_line; inherits from panel.grid.major)
panel.grid.minor.x vertical minor grid lines (element_line; inherits from panel.grid.minor)
panel.grid.minor.y horizontal minor grid lines (element_line; inherits from panel.grid.minor)
plot.background background of the entire plot (element_rect; inherits from rect)
plot.title plot title (text appearance) (element_text; inherits from title)
plot.margin margin around entire plot (unit)
strip.background background of facet labels (element_rect; inherits from rect)
strip.text facet labels (element_text; inherits from text)
strip.text.x facet labels along horizontal direction (element_text; inherits from strip.text)
strip.text.y facet labels along vertical direction (element_text; inherits from strip.text)

Examples

p <- qplot(mpg, wt, data = mtcars) p
p + theme(panel.background = element_rect(colour = "pink"))
p + theme_bw()
# Scatter plot of gas mileage by vehicle weight p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() # Calculate slope and intercept of line of best fit coef(lm(mpg ~ wt, data = mtcars))
(Intercept) wt 37.285126 -5.344472
p + geom_abline(intercept = 37, slope = -5)
# Calculate correlation coefficient with(mtcars, cor(wt, mpg, use = "everything", method = "pearson"))
[1] -0.8676594
#annotate the plot p + geom_abline(intercept = 37, slope = -5) + geom_text(data = data.frame(), aes(4.5, 30, label = "Pearson-R = -.87"))
# Change the axis labels # Original plot p
p + xlab("Vehicle Weight") + ylab("Miles per Gallon")
# Or p + labs(x = "Vehicle Weight", y = "Miles per Gallon")
# Change title appearance p <- p + labs(title = "Vehicle Weight-Gas Mileage Relationship") # Set title to twice the base font size p + theme(plot.title = element_text(size = rel(2)))
p + theme(plot.title = element_text(size = rel(2), colour = "blue"))
# Changing plot look with themes DF <- data.frame(x = rnorm(400)) m <- ggplot(DF, aes(x = x)) + geom_histogram() # Default is theme_grey() m
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
# Compare with m + theme_bw()
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
# Manipulate Axis Attributes library(grid) # for unit m + theme(axis.line = element_line(size = 3, colour = "red", linetype = "dotted"))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
m + theme(axis.text = element_text(colour = "blue"))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
m + theme(axis.text.y = element_blank())
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
m + theme(axis.ticks = element_line(size = 2))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
m + theme(axis.title.y = element_text(size = rel(1.5), angle = 90))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
m + theme(axis.title.x = element_blank())
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
m + theme(axis.ticks.length = unit(.85, "cm"))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
# Legend Attributes z <- ggplot(mtcars, aes(wt, mpg, colour = factor(cyl))) + geom_point() z
z + theme(legend.position = "none")
z + theme(legend.position = "bottom")
# Or use relative coordinates between 0 and 1 z + theme(legend.position = c(.5, .5))
z + theme(legend.background = element_rect(colour = "black"))
# Legend margin controls extra space around outside of legend: z + theme(legend.background = element_rect(), legend.margin = unit(1, "cm"))
z + theme(legend.background = element_rect(), legend.margin = unit(0, "cm"))
# Or to just the keys z + theme(legend.key = element_rect(colour = "black"))
z + theme(legend.key = element_rect(fill = "yellow"))
z + theme(legend.key.size = unit(2.5, "cm"))
z + theme(legend.text = element_text(size = 20, colour = "red", angle = 45))
z + theme(legend.title = element_text(face = "italic"))
# To change the title of the legend use the name argument # in one of the scale options z + scale_colour_brewer(name = "My Legend")
z + scale_colour_grey(name = "Number of \nCylinders")
# Panel and Plot Attributes z + theme(panel.background = element_rect(fill = "black"))
z + theme(panel.border = element_rect(linetype = "dashed", colour = "black"))
z + theme(panel.grid.major = element_line(colour = "blue"))
z + theme(panel.grid.minor = element_line(colour = "red", linetype = "dotted"))
z + theme(panel.grid.major = element_line(size = 2))
z + theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())
z + theme(plot.background = element_rect())
z + theme(plot.background = element_rect(fill = "green"))
# Faceting Attributes set.seed(4940) dsmall <- diamonds[sample(nrow(diamonds), 1000), ] k <- ggplot(dsmall, aes(carat, ..density..)) + geom_histogram(binwidth = 0.2) + facet_grid(. ~ cut) k + theme(strip.background = element_rect(colour = "purple", fill = "pink", size = 3, linetype = "dashed"))
k + theme(strip.text.x = element_text(colour = "red", angle = 45, size = 10, hjust = 0.5, vjust = 0.5))
k + theme(panel.margin = unit(5, "lines"))
k + theme(panel.margin = unit(0, "lines"))
# Modify a theme and save it mytheme <- theme_grey() + theme(plot.title = element_text(colour = "red")) p + mytheme
## Run this to generate a graph of the element inheritance tree build_element_graph <- function(tree) { require(igraph) require(plyr) inheritdf <- function(name, item) { if (length(item$inherit) == 0) data.frame() else data.frame(child = name, parent = item$inherit) } edges <- rbind.fill(mapply(inheritdf, names(tree), tree)) # Explicitly add vertices (since not all are in edge list) vertices <- data.frame(name = names(tree)) graph.data.frame(edges, vertices = vertices) } g <- build_element_graph(ggplot2:::.element_tree)
Loading required package: igraph
V(g)$label <- V(g)$name set.seed(324) par(mar=c(0,0,0,0)) # Remove unnecessary margins
plot(g, layout=layout.fruchterman.reingold, vertex.size=4, vertex.label.dist=.25)

See also

+.gg %+replace% rel