# Nyaplot Tutorial 2: Interaction with DataFrame

I have alreay explained about Nyaplot::DataFrame in tutorial 1, but it's not enough to tell the usefulness. This notebook consists of 2 use case using DataFrame.

In [1]:
gem 'nyaplot', '0.1.5'
require 'nyaplot'

true

## Case 1: Scatter with original tooltips

First, prepare sample data and put it into DataFrame. Then build a scatter plot based on it.

In [2]:
samples = Array.new(10).map.with_index{|d,i| 'cat'+i.to_s}
x=[];y=[];home=[]
10.times do
  x.push(5*rand)
  y.push(5*rand)
end
df = Nyaplot::DataFrame.new({x:x,y:y,name:samples})
df

x,y,name
0.3776941903756781,3.150631669472884,cat0
1.16523539526362,3.636084983829032,cat1
3.402401569022356,1.5265781962652962,cat2
0.6478786205183784,4.697932060531303,cat3
1.502303692675978,0.2355120510472763,cat4
0.6186835562297704,1.5712039108775393,cat5
2.360417118852578,3.037560696785177,cat6
0.9958851328846746,3.3965329595925136,cat7
1.0207006596250006,0.0200497833849949,cat8
1.8874015058395728,2.973976083790996,cat9


In [3]:
plot = Nyaplot::Plot.new
plot.x_label("weight [kg]")
plot.y_label("height [m]")
sc = plot.add_with_df(df, :scatter, :x, :y)

#<Nyaplot::Diagram:0x000000011704e8 @properties={:type=>:scatter, :options=>{:x=>:x, :y=>:y}, :data=>"4415e395-2c04-483a-a794-91e437f82ff9"}, @xrange=[0.3776941903756781, 3.4024015690223557], @yrange=[0.020049783384994968, 4.697932060531303]>

In [4]:
plot.show

The plot above is not contain `name` information, so add it into tool-tip. Use `tooltip_contents` to add contents to tool-tip.

In [5]:
sc.tooltip_contents([:name])
plot.show

Tool-tip can include multiple lines, but the DataFrame has only three columns and that's not enough to add more line to tool-tip. Let's add `home` column to it.

In [6]:
address = ['London', 'Kyoto', 'Los Angeles', 'Puretoria']
home = Array.new(10,'').map{|d| address.clone.sample}
df.home = home
df

x,y,name,home
0.3776941903756781,3.150631669472884,cat0,Kyoto
1.16523539526362,3.636084983829032,cat1,London
3.402401569022356,1.5265781962652962,cat2,Los Angeles
0.6478786205183784,4.697932060531303,cat3,Kyoto
1.502303692675978,0.2355120510472763,cat4,Puretoria
0.6186835562297704,1.5712039108775393,cat5,Los Angeles
2.360417118852578,3.037560696785177,cat6,London
0.9958851328846746,3.3965329595925136,cat7,Puretoria
1.0207006596250006,0.0200497833849949,cat8,Kyoto
1.8874015058395728,2.973976083790996,cat9,Los Angeles


In [7]:
sc.tooltip_contents([:name, :home])
plot.show

Then, fill points on the scatter in different colors according to 'home' column. To do so, specify column name by `fill_by` method.

In [8]:
colors = Nyaplot::Colors.qual

"rgb(251,180,174)","rgb(179,205,227)","rgb(204,235,197)","rgb(222,203,228)","rgb(254,217,166)","rgb(255,255,204)","rgb(229,216,189)","rgb(253,218,236)","rgb(242,242,242)"
,,,,,,,,


In [9]:
sc.color(colors)
sc.fill_by(:home)
plot.show

Use `shape_by` method to change shape according to value in a column.

In [10]:
sc.color(colors)
sc.shape_by(:home)
plot.show

## Case 2: Multiple panes

DataFrame is also useful when visualizing data in multiple panes. Let's create plot from data about mutation.  
First, fetch data from csv file. (All data used in this Tutorial is included in Nyaplot's repository: /examples/notebook/data/*)

In [11]:
path = File.expand_path("../data/first.tab", __FILE__)
df = Nyaplot::DataFrame.from_csv(path, sep="\t")

mutation,blood,set1,set2,set3,set12,set21,set31
G>A,0.0,0.019230769230769232,0.0,0.48214285714285715,0.0,0.0,0.4782608695652174
C>T,0.0,0.42592592592592593,0.0,0.0,0.375,0.0,0.0
C>G,0.0,0.0,0.0,0.0,0.0,0.525,0.0
C>A,0.0,0.0,0.1935483870967742,0.0,0.0,0.4666666666666667,0.0
C>A,0.0,0.0,0.08333333333333333,0.0,0.0,0.5161290322580645,0.0
G>T,0.0,0.0,0.0,0.0,0.4444444444444444,0.0,0.0
C>G,0.0,0.0,0.0,0.0,0.0,0.0,0.4
C>A,0.0,0.0,0.03333333333333333,0.0,0.0,0.42857142857142855,0.0
A>C,0.0,0.6153846153846154,0.0,0.0,0.5925925925925926,0.0,0.0
C>A,0.0,0.0,0.0,0.0,0.0,0.32,0.0


Now I want to plot **SET1** column, but it contains many zero cells. Then filter them out.

In [12]:
df.filter! {|row| row[:set1] != 0.0}
df

mutation,blood,set1,set2,set3,set12,set21,set31
G>A,0.0,0.019230769230769232,0.0,0.48214285714285715,0.0,0.0,0.4782608695652174
C>T,0.0,0.42592592592592593,0.0,0.0,0.375,0.0,0.0
A>C,0.0,0.6153846153846154,0.0,0.0,0.5925925925925926,0.0,0.0
T>A,0.0,0.525,0.0,0.0,0.5277777777777778,0.0,0.0
C>T,0.0,0.42857142857142855,0.0,0.0,0.6666666666666666,0.0,0.0
G>A,0.0,0.45652173913043476,0.0,0.0,0.43478260869565216,0.0,0.0
C>T,0.0,0.09803921568627451,0.0,0.0,0.37142857142857144,0.0,0.0
T>A,0.0,0.5769230769230769,0.0,0.0,0.4782608695652174,0.0,0.0
A>G,0.0,0.43859649122807015,0.0,0.0,0.5142857142857142,0.0,0.0
T>C,0.0,0.5806451612903226,0.0,0.0,0.6341463414634146,0.0,0.0


Next prepare instance of Nyaplot::Plot as usual. Nyaplot::Plot.filter is a method for adding 'filter box' to the plot.

In [13]:
plot4=Nyaplot::Plot.new
plot4.add_with_df(df, :histogram, :set1)
plot4.configure do
  height(400)
  x_label('PNR')
  y_label('Frequency')
  filter({target:'x'})
  yrange([0,130])
end

plot5=Nyaplot::Plot.new
plot5.add_with_df(df, :bar, :mutation)
plot5.configure do
  height(400)
  x_label('Mutation types')
  y_label('Frequency')
  yrange([0,100])
end

Then create an instance of Nyaplot::Frame. It can hold multiple plots in it, and it helps them to interact with each other.

In [14]:
frame = Nyaplot::Frame.new
frame.add(plot4)
frame.add(plot5)
frame.show