7. Examples from the JSPF tutorial article

These are the simplest examples - no command line args except -i when you want the data preserved (see example1a, 2 etc)

../../examples/JSPF_tutorial/example1.py
""" Example 1, JSPF tutorial: simple density profile scan 

In this file, data was downsampled to save space in the download package.
# The following line was used to downsample the data:
run pyfusion/examples/save_to_local.py shot_list=range(86507,86517+1)  overwrite_local=1 dev_name='H1Local' diag_name='ElectronDensity15' downsample=100 local_dir='pyfusion/examples/JSPF_tutorial/local_data'
"""
import pyfusion as pf    # (we will assume these three import lines in all future examples)
import numpy as np       
import matplotlib.pyplot as plt
import os

plt.figure('Example 1')

dev = pf.getDevice('H1Local')  # open the device (choose the experiment - e.g H-1, LHD, Heliotron-J)
ne_profile, t_mid, shot = [ ], [ ], [ ]  # prepare empty lists for ne_profile, shot and time of measurement

# next line redirects pyfusion to find downsampled local data in ./local_data
pf.config.set('global','localdatapath','local_data') 

for shot_number in range(86507, 86517+1):  # the +1 ensures 86517 is the last shot
        d = dev.acq.getdata(shot_number, 'ElectronDensity15')	# a multichannel diagnostic
        sections = d.segment(n_samples=.001)   # break into time segments
	# work through each time segment, extracting the average density during that time
        for seg in sections:
            ne_profile.append(np.average(seg.signal,axis=1))   #  axis=1 -> average over time, not channel
            t_mid.append(np.average(seg.timebase))
            shot.append(shot_number)

# store the data in a DA (Dictionary of Arrays) object, which is like a DataFrame in R or python panda
myDA = pf.data.DA_datamining.DA(dict(shot=shot, ne_profile=ne_profile, t_mid=t_mid)) 

# the next step - write to arff
myDA.write_arff('ne_profile.arff',['ne_profile'])

print('for graphics, \nrun -i example1a.py')
../../examples/JSPF_tutorial/example1a.py
#   After running example1.py, you can just do  
# run -i example1a.py
#   (the -i preserves the variables )
# If you omit the -i, you will see this error "name 'myDA' is not defined"
# (or you can paste in into ipython (after running example1.py))

# first, define a convenient shortcut function to plot in a new window
# [you could just use plt.plot() instead]
import matplotlib.pyplot as plt
def pl(array, comment=None,**kwargs):
    plt.figure(num=comment)  # coment written to window title
    plt.plot(array, **kwargs)

# get the variables into local scope - so they can be accessed directly
myDA.extract(locals())        

# the ne profiles are in an Nx15 array, where N is the numer of channels
pl(ne_profile[40,:],'one profile')

# plot a sequence of profiles, showing every fifth 
for prof in ne_profile[10:20:5]:
    pl(prof)

# plot all profiles by using the transpose operator to get profiles
pl(ne_profile.T, 'all profiles',color='b',linewidth=.01)

# without the transpose, you will get the time variation for the data
pl(ne_profile, 'time variation, all channels',color='b',linewidth=.3)

# see all profiles as a false colour image
# time and shot number run vertically, each band is a shot
plt.figure(num = 'image of all data')
plt.imshow(ne_profile,origin='lower',aspect='auto')
# the negative data artifacts near (12,200) are due to fringe skips

plt.show(0)    # needed to make figures appear if "run" instead of pasted.
../../examples/JSPF_tutorial/example2.py
# Example code to accompany figure 6, the python k means version of Example 2.
#  The smaller data set in this package produces a slighly different result than the 
#  full data set in the article - see example2_small_data_set.png
#  The code for the actual figure is in example2a.py
#  After running example1.py, do
# run -i example2.py
#      Note: if the -i is omitted you will get the message 'np' is not defined
#
import sys
try:  # catch error if user forgets to set up the data first, make it available to this with -i
    x = np.arange(len(ne_profile[0]))  
    myDA.extract(locals())  # makes sure they are all numpy floating point arrays
except NameError as reason:
    print('please run example1 first, THEN run -i example2   (so that myDA is carried over)')
    sys.exit()


###########  Now we will process the data a little before clustering. ###############

# flag all profiles that can't be fitted well by a poly of degree 5
# This will eliminate rubbish data - we record an error, which we can later set a threshold on

deg = 5
err = 0 * t_mid                     # prepare an empty array of the right size to hold errors
for (i, nep) in enumerate(ne_profile):
    p = np.polyfit(x, nep, deg, w=nep)
    err[i] = 1/np.sqrt(len(nep)) * np.linalg.norm(nep*(nep - np.polyval(p, x)))
    small = 0.2  # 0.5 for LHD, 0.2 for H-1
    # disqualify if there are fewer than deg (poly degree) non-small samples
    if len(np.where(nep > small)[0]) < deg:
        err[i] = 9                  #  (flag as bad by setting large error)

    # disqualify if the average density is too small
    if np.average(nep) < small/3:
        err[i] = 10

# normalise to the average value, so we can compare shapes
avg = 0 * t_mid   # prepare another variable of the right length
for (i, nep) in enumerate(ne_profile):
    avg[i] = np.average(nep)
    ne_profile[i] = nep/avg[i]

# Plot only the normalised profiles that are reasonably smooth and not discarded above
plt.figure(num = "normalised profiles, excluding erroneous data")
for (e,pr) in zip(err,ne_profile):
    if e < small/2:
        plt.plot(pr,color='g',linewidth=.04)
    plt.ylim(0,2)    
plt.show(0)
# The darker areas show recurring profiles.
# We would need more information (e.g. power, transform, B_0) to investigate the
#   reason for the different profiles.

########## simple k-means clustering ###########


from scipy.cluster.vq import vq, kmeans, whiten
# consider only profiles with low polyfit error (i.e. not really jagged
features = ne_profile[np.where(err < small/2)[0]]
lw=50./len(features)     # linewidth - for overplotting many lines, so darkness indicates density

whitened = whiten(features)
# set up the random number generator in a reproducible way - good for examples
# but in real life, we usually want random starts
from numpy import random
random.seed((1000,2000))
# compute the cluster centres, for 4 clusters (in normalised space)
ccs, distortion = kmeans(whitened, k_or_guess=4)
# get the cluster ids, use the normalised features (whitened)
cids, dist  = vq(whitened, ccs)

cols = ('c,g,r,m,b,y,k,orange,purple,lightgreen,gray'.split(','))  # colours to be rotated thru

# plot all data and overlay the cluster centres, in real units and normalised
# comment one of the following two lines
fig,[axunnorm, axnorm] = plt.subplots(1,2)     # this shows both norm and unnorm
#axunnorm = None; fig,[[axnorm]] = plt.subplots(1,1,squeeze=False) # this ignore unnormalised

# this order will reproduce the example image, if all the data is used, and seed is set
# but now it doesn't reproduce it exactly.
corder = [1, 3, 2, 0]
plt.rc('font',**{'size':18, 'family':'serif'})

# plot, colouring according to cluster membership
for c in corder:
    ws = np.where(cids == c)[0]
    if axunnorm is not None:   # restore to real space by mult. by avg
        axunnorm.plot(avg[ws]*features[ws].T,cols[c],linewidth=lw)

    axnorm.plot(features[ws].T,cols[c],linewidth=lw)
# and the cluster centres
for c in corder:
    axnorm.plot(ccs[c]*np.std(features,axis=0),'k',linewidth=9,)     #  outline effect
    axnorm.plot(ccs[c]*np.std(features,axis=0),cols[c],linewidth=6,) #  color


plt.ylabel(r'$n_e/10^{18}$',fontdict={'fontsize':'large'}) ; plt.xlabel('channel')
plt.xlim(7,15); plt.ylim(0,2)

plt.show()
../../examples/JSPF_tutorial/example2a.py
""" Full H1 example to produce Figure 6, discussed after Example 2 in 
the JSPF tutorial "A more realistic density profile scan clustering"
This is a more advanced version of example2.py, which uses the small data set
of the download package.
This example can be run on the download package data by editing the DA (npz) filename
ne_profile2.npz

Takes about a minute to run in full on a 2015 model machine
Needs full data files, ** so won't run with download package alone.**
Note: PDF versions of the figure need rasterization to properly render 
      the fine lines. Beware memory problems if you use rasterization 
      on anything but the figure as a whole.  See example2b.py for more info.
"""
import pyfusion as pf   # (we will assume these three import lines in all future examples)
import numpy as np
import matplotlib.pyplot as plt
import os

plt.figure('Example 2 extra , Figure 6')
mono = False  # monochrome figures required by JSPF
if not os.path.isfile('ne_profile2.npz'):  #  if not there, make it
    print('generating npz file')
    dev = pf.getDevice('H1Local')  # open the device (choose the experiment: e.g H-1, LHD, Heliotron-J)
    # prepare empty lists for ne_profile, shot and time of measurement
    ne_profile, t_mid, shot = [], [], []
    # for shot_number in range(86507, 86517+1):  # the +1 ensures 86517 is the last shot
    lst = list(range(83130, 83212+1, 1))
    # 133,162,163 weird, 166 172 missing, 171 and 205 channel8 starts late.
    for sh in [83133, 83162, 83163, 83166, 83171, 83172, 83205]: lst.remove(sh)
    try:
        d = dev.acq.getdata(lst[0], 'ElectronDensity15')
    except LookupError as reason:
        raise LookupError('{r}\n\n** Only works with access to full H1 data set **'.format(r=reason))

    for shot_number in lst:  # the +1 ensures 86517 is the last shot
        d = dev.acq.getdata(shot_number, 'ElectronDensity15')     # a multichannel diagnostic
        sections = d.segment(n_samples=1024)   # break into time segments
        # work through each time segment, extracting the average density during that time
        for seg in sections:
            ne_profile.append(np.average(seg.signal, axis=1))   # axis=1 -> avg over time, not channel
            t_mid.append(np.average(seg.timebase))
            shot.append(shot_number)
    # store the data in a DA (Dictionary of Arrays) object, which is like a DataFrame in R or panda
    myDA = pf.data.DA_datamining.DA(dict(shot=shot, ne_profile=ne_profile, t_mid=t_mid))
    myDA.save('ne_profile2')

###########  Now we will process the data a little before clustering. ###############
myDA = pf.data.DA_datamining.DA('ne_profile2')

# set up the random number generator in a reproducible way - good for examples
# but in real life, we usually want random starts
from numpy import random
random.seed((1000,2000))

# set limit=2000 to speed up, None to include all data, but takes many minutes
myDA.extract(locals(), limit=6000)
x = np.arange(len(ne_profile[0]))

# flag all profiles that can't be fitted well by a poly of degree 5
# This will eliminate rubbish data - we record an error, which we can later set a threshold on
deg = 5
err = 0 * t_mid                     # prepare an empty array of the right size to hold errors
for (i, nep) in enumerate(ne_profile):
    p = np.polyfit(x, nep, deg, w=nep)
    err[i] = 1/np.sqrt(len(nep)) * np.linalg.norm(nep*(nep - np.polyval(p, x)))
    small = 0.5  # 0.5 for LHD, 0.2 for H-1
    # disqualify if there are fewer than deg (poly degree) non-small samples
    if len(np.where(nep > small)[0]) < deg:
        err[i] = 9   #  (flag as bad by setting large error)

    # disqualify if the average density is too small
    if np.average(nep) < small/3:
        err[i] = 10

# normalise to the average value, so we can compare shapes
avg = 0 * t_mid   # prepare another variable of the right length
for (i, nep) in enumerate(ne_profile):
    avg[i] = np.average(nep)
    ne_profile[i] = nep/avg[i]

##########  Now we can cluster ############

from scipy.cluster.vq import vq, kmeans, whiten
lw = 0.02  # linewidth - for overplotting many lines, so darkness indicates density
# consider only profiles with low polyfit error (i.e. not really jagged
features = ne_profile[np.where(err<0.2)[0]]
whitened = whiten(features)

# random.seed(None)  # we previously set the seed to produce the same result
                     # every time - set it to None to get a random seed

# compute the cluster centres, for 4 clusters (in normalised space)
ccs, distortion = kmeans(whitened, k_or_guess=4)
# get the cluster ids, use the normalised features (whitened)
cids, dist = vq(whitened, ccs)

# colours to be rotated thru
cols = 'c,g,r,m,b,y,k,orange,purple,lightgreen,gray'.split(',')
if mono: cols = ['#444444', '#888888', '#bbbbbb', '#eeeeee']

# plot all data and overlay the cluster centres, un real units and notmalised
# fig, [axunnorm, axnorm] = plt.subplots(1, 2)
axunnorm = None; fig,[[axnorm]] = plt.subplots(1, 1, squeeze=False) # to ignore unnormalised

# this order will reproduce the example image, if all the data is used, and seed is set
# but now it doesn't reproduce it exactly.
corder = [0, 2, 1, 3]


plt.rc('font', **{'size': 18, 'family': 'serif'})

for c in corder:
    ws = np.where(cids == c)[0]
    if axunnorm is not None:   # restore to real space by mult. by avg
        axunnorm.plot(avg[ws]*features[ws].T, cols[c], linewidth=lw)

    col = 'darkgray' if mono else cols[c]
    axnorm.plot(features[ws].T, col, linewidth=lw)#  beware rasterized=True here!
# and the cluster centres
for c in corder:
    axnorm.plot(ccs[c]*np.std(features, axis=0), 'k', linewidth=9,)
    axnorm.plot(ccs[c]*np.std(features, axis=0), cols[c], linewidth=6,)


plt.ylabel(r'$n_e/10^{18}$', fontdict={'fontsize': 'large'}); plt.xlabel('channel')
plt.xlim(7, 15); plt.ylim(0, 2)
plt.text(13, 0.61, 'C1')

plt.show()
fig.set_rasterized(True)
fig.savefig('example2.pdf', dpi=300)
../../examples/JSPF_tutorial/example3.py
""" Example 3 - SQL example, similar to the python Dictionary of arryas version in example4.py
Plot all flucstrucs found, and the select the large rotating modes.

SQLAchemy has many ways to access data from python.  This is not the neatest of them, but I chose 
it because the queries match textbook 'text mode' SQL query-based approach.

== Aside on alternatives ==
One alternative SQLAlchemy access model means tables (everthing in fact) look like objects 
Using this mode, you could write more concisely 
    plt.plot(H1.t_mid, H1.freq)
Which is really nice and simple, unlike the messy iteration in this example.
    [row['t_mid'] for row in alldata]

Queries would contain phrases like and_(H1.freq.between(4,20), H1.comment.line('%argon%')
which are quite different to text mode, but easier to compose 'on the fly'
   
"""
from pyfusion.data.DA_datamining import DA
from sqlalchemy import create_engine 
import matplotlib.pyplot as plt
import os

if not os.path.isfile('H1_766.sqlite'):  #  if not there, make it
    myDA = DA('H1_766.npz')
    myDA.to_sqlalchemy('sqlite:///H1_766.sqlite', newfmts=dict(phases='dphi_{i:02d}'),
                       mytable='H1_766', chunk=5000, n_recs=1e9)

engine = create_engine('sqlite:///H1_766.sqlite')
conn = engine.connect()
result = conn.execute('select shot, t_mid, freq, a12, amp from H1_766')
alldata = result.fetchall() # all the instances, and overplot in feint colour
plt.plot([row['t_mid'] for row in alldata], [row['freq'] for row in alldata],'o',alpha=0.02)
plt.xlabel('time (s)') ; plt.ylabel('freq (kHz)')

# Ex 3a. this time, select the large rotating modes, (note - we combine two lines here, 
#    even though it makes it less readable) 
plt.figure()
sel = conn.execute('select shot, t_mid, freq, a12, amp from H1_766 where amp> 0.05 and a12>0.7').fetchall()
#  for the selected data,  plot freq against t_mid in red circles('r'), whose size reflects the amplitude 
plt.scatter([row['t_mid'] for row in sel], [row['freq'] for row in sel],[300*row['amp'] for row in sel],'r')
plt.show()
../../examples/JSPF_tutorial/example4.py
from pyfusion.data.DA_datamining import DA           
import numpy as np
import matplotlib.pyplot as plt

plt.figure('Example 4')
# Load pre-stored data from 30 shots into a dictionary of arrays (DA)
DA766 = DA('H1_766.npz', load=1)
# Extract all the data into local variables, such as freq, t_mid, phases[0]
DA766.extract(locals())

# Look at one set of instances - probe phase differences - plotting frequency
#   vs time, size indicates amplitude
plt.scatter(t_mid, freq, amp, edgecolor='lightblue')

# Find large amplitude, 'rotating' modes (see text). [0] is required by 
wb = np.where((amp>0.05) & (a12>0.7))[0]     # the numpy where() function

# Overplot with larger symbols proportional to amplitude, coloured by a12
plt.scatter(t_mid[wb], freq[wb], 300*amp[wb], a12[wb])
plt.xlabel(r'$k_H$'); plt.ylabel(r'${\rm frequency/V_{Alfv\'en}}$',fontdict={'fontsize':'large'})
plt.xlim(0,0.042) ; plt.ylim(0, 150)
plt.show()
../../examples/JSPF_tutorial/example5.py
""" Example 5 - Figure 7
Adding the Alfven speed for each item into the data set. colour
"""
from pyfusion.data.DA_datamining import DA           
import numpy as np
import matplotlib.pyplot as plt

plt.figure('Example 5 - Figure 7')
DA766 = DA('H1_766.npz',load=1)    #  Load data from 30 shots into a dictionary of arrays (DA)
DA766.extract(locals())            #  Extract all the data into local variables, such as freq, t_mid, phases[0]
wb = np.where((amp>0.05) & (a12>0.7))[0]  #  find large amplitude, 'rotating' modes (see text).  

mp = 1.67e-27
mu_0 = 4 * np.pi * 1e-7
V_A = b_0/np.sqrt(mu_0 * n_e * 1e18* mp)  # Calculate VA as a local variable , an array of the same size as the   
                                          #   extracted data
DA766.update(dict(V_A=V_A))        #  Add it to the dictionary - no problem if you use the same name
                                   #   for the dictionary key
DA766.save('H1_766_new')           #  Save the data set (under a new name if you prefer).
                                   #  When you load the dataset in the future, the variable V_A will be there.
plt.plot(k_h, freq*1e3/V_A,'.c')   # symbol '.', colour cyan - we see the values are in the range sqrt beta
plt.plot(k_h[wb], freq[wb]*1e3/V_A[wb],'ob',ms=12) # selecting large amplitude as above, we see beginnings of fig 5
plt.xlabel(r'$k_H$'); plt.ylabel(r'${\rm frequency/V_{Alfven}}$',fontdict={'fontsize':'large'})
plt.show()
# Note - there will be a warning "invalid value encountered in sqrt" due to noise in n_e
# This could be eliminated by selecting using np.where(n_e >= 0)[0]
../../examples/JSPF_tutorial/example6.py
""" Simple Gaussion Misxture clustering of phase profile data using the supplied data set
See example 6, figure 8. 16 Clusters are allowed for, I chose the two most interesting to plot.
This is not optimised for the number or choice of clusters, but does a reasonable 
job of separating the two main clusters.
It is more effective (e.g. Figure 5) to use von Mises mixtures.  This (vMMM) code
is under development - contact the authors for the latest copy.
 
Takes about 1 minute
Python3 produces a different clustering to Python2 - not sure why?
"""


from sklearn import mixture
from pyfusion.data.DA_datamining import DA, report_mem
import numpy as np
import matplotlib.pyplot as plt
# approx size used in pub  figure(figsize=((11.9,7.6)))
plt.figure('Example 6 - Figure 8')

DA76 = DA('H1_766.npz',load=1)
DA76.extract(locals())

np.random.seed(0)       # ensure the same result (useful for examples)
gmm = mixture.GMM(n_components=16, covariance_type='spherical')
m16 = gmm.fit(phases)   # fit 16 Gaussians
cids = m16.predict(phases)      # assign each point to a cluster id

for c in [7, 9]:    # show the two most interesting clusters in freq vs k_h
                    # select cluster members of sufficient amplitude, a12, and after 5ms
    w = np.where((c == cids) & (amp > 0.08) & (a12 > 0.5) & (t_mid > 0.005))[0]
    # add artificial noise to the k_h value to show points 'hidden' under others
    dither = .008 * np.random.random(len(w))
    # colored by cluster
    plt.scatter(k_h[w] + dither, np.sqrt(ne_1[w])*freq[w], 700*amp[w], 'bgrcmkycmrgrcmykc'[c]) 

plt.ylim(0, 60); plt.xlim(0.2, 1)
plt.xlabel(r'$k_H$', fontdict={'fontsize': 'large'}) ; plt.ylabel('frequency')
plt.show()
../../examples/JSPF_tutorial/cross_phase.py
"""
Plot coherence and cross phase for a multi signal diagnostic
Can pass a pyfusion data item or get data from a source
H-1 88673 is a nice example

This version reproduces figure 2 in the tutorial article. (see tut_notes.md)
see pyfusion/examples for a simpler version.  When run from the download 
package, a downsmapled data file is used, so we need NFFT=256 to obtain thetasame time window

mlab.cohere_pairs is > 10x faster on 15 signals than separate cohere() calls
See also csd in  /usr/lib/pymodules/python2.7/matplotlib/pyplot.py

"""

import pyfusion as pf
import numpy as np       
import matplotlib.pyplot as plt
import os
from numpy import mod, pi

_var_defaults = """
dev_name = "H1Local"           # "LHD"
diag_name = "H1ToroidalAxial"  # "MP1"
shot_number = 88673            # 27233
NFFT=256                       # 2048 for full data, 256 for downsampled
noverlap=1.0
unit_freq = 1000
dat=None
time_range=[0.04,0.05]
omit=[]
lw=2
lwtot=2*lw
chans=None
nr=2
shade=[22.5,24.5]   # None supresses the shaded area
mono=False          # JSPF wants mono pdfs
"""
exec(_var_defaults)

from pyfusion.utils import process_cmd_line_args
exec(process_cmd_line_args())

pf.config.set('global','localdatapath',os.path.join(pf.PYFUSION_ROOT_DIR,'examples','JSPF_tutorial','local_data')) # this line points to local data


if nr > 0:
    try:  # use my nicer version if it is available
        from subplots import subplots
        # close up space between graphs
        spkwargs=dict(apportion=[2,1],spadj_kwargs=dict(hspace=0))
    except ImportError:
        from matplotlib.pyplot import subplots
        spkwargs={}

    fig, axs = subplots(nrows=nr,ncols=1, sharex='all',**spkwargs)

if nr==1:
    strtpair = 0
    ax1, ax2 = axs, axs
    plt.sca(ax2)
else:
    strtpair = 1
    ax1, ax2 = axs
    plt.sca(axs[0])

xlab = {1:'Hz', 1000:'kHz', 1000000:'MHz'}[unit_freq] 

if type(noverlap) != int:
    noverlap = int(noverlap*NFFT)

if dat is None:
    dev = pf.getDevice(dev_name)
    orig_data = dev.acq.getdata(shot_number,diag_name)
    data = orig_data
else:
    data = dat

if time_range[1] != time_range[0]:
    data = data.reduce_time(time_range)

dt = np.average(np.diff(data.timebase))

#%run pyfusion/examples/plot_signals shot_number=88672 diag_name='H1ToroidalAxial' dev_name='H1Local'
X = np.array([s for s in data.signal]).T

if chans is None: 
    chans = [i for i in range(np.shape(X)[1]-1)]
for o in omit:
    chans.remove(o)

pairs = [(chans[i],chans[i+1]) for i in range(len(chans)-1)]

pairs.insert(0, (pairs[0][0],pairs[-1][1]))  #  the first elt is from end to end
coh,cp,freq = plt.mlab.cohere_pairs(X,pairs,NFFT=NFFT,Fs=1/dt/float(unit_freq))

colorset=('b,g,r,c,m,y,k,orange,purple,lightgreen,gray,yellow,brown,teal,tan'.split(','))
if mono: colorset = 4 * ['black', 'darkgray','gray','lightgray']

totph_col = 'k' if mono else 'b'
dph_0_13_col = 'gray' if mono else 'r'
fillcol = "#ebebeb" if mono else "lightcyan"

# Here is a cute way to define a whole range of dash/dot line styles
# 
s, l, L = 2, 5, 8  # short, long, ExtraLong
# define them in a simple, compact manner
lines = 's,sl,ssl,ssll,sll,sssl,sssl,slll,slsll,ssssl,sllll,sL,L,ssL,sLL,sssL'
# turn them into list for 'dashes=' by adding short spaces, and convert to nums 
lineset = []
for st in lines.split(","):
    lineset.append(eval('('+',s,'.join(st)+',s)'))

# go through the pairs of phase diffs, starting at strtpair
for (i,pair) in enumerate(pairs[strtpair:]): 
    col = colorset[i]
    lines = plt.plot(freq,coh[pair],label='dph'+str(pair),color=col,linewidth=lw)
    plt.setp(lines, dashes = lineset[i])
    lines = plt.plot(freq,cp[pair],color=col,linestyle=['--','-'][i==0 and strtpair==0],
            linewidth=lw)
    plt.setp(lines, dashes = lineset[i])  # this sets the dash style
    
plt.setp(ax1, ylim=(-4,4), ylabel = 'phase difference (rad) || coherence')
totph=np.sum([cp[pair] for pair in pairs[0:-1]],0)
hist = data.history
sz = 'small' if len(hist.split('\n'))>3 else 'medium'
plt.title(hist + str(': NFFT={NFFT}, noverlap={noverlap}'
                            .format(NFFT=NFFT, noverlap=noverlap)),size=sz)
if nr>1:
    plt.ylim(-2.5,1.2)
    plt.legend(prop={'size':'x-small'},loc='lower right')

if shade is not None:  # shade the area of interest for the tutorial
    from matplotlib.patches import Rectangle
    someX, someY = 23.5, -0.5
    ax1.add_patch(Rectangle((shade[0], someY - 2), shade[1]-shade[0], 4, facecolor=fillcol,
                            edgecolor='lightgray'))

ax2.plot (freq,totph,':',label='total phase',color=totph_col,linewidth=lwtot)
if nr>1: plt.setp(ax2, ylim=(-14.5,-12.5), ylabel='phase (rad)')
plt.setp(ax2.get_yticklabels()[-1],visible=False)
# plot a few to make sure at least one is in range and visible
for offs in [0, 2]:
    ax2.plot(freq, mod(totph,2*pi)-offs*pi,':',color=totph_col,linewidth=lwtot)
endendpair = pairs[0]
for offs in [0,1,2]:
    ax2.plot(freq,cp[endendpair]-offs*2*pi,color=dph_0_13_col,
             linewidth=lwtot,label=['dph'+str(endendpair),'',''][offs])
ax2.legend(prop={'size':'x-small'},loc='lower right')
xl = [15,30] if nr>1 else [1,100] # lower limit of 1 is better for log x scale
plt.setp(ax2, xlabel=str('frequency ({xlab})'.format(xlab=xlab)), xlim = xl)   


plt.show()

# title('')

8. More advanced examples

These examples use arguments on the command line to adjust the behaviour - e.g. NFFT=1024

In most cases, arguments are parsed by a routine (bdb_utils) in this package, with the following properties

  • there should be no spaces around the equals sign, NFFT=2048 not NFFT = 2048

  • quotes may be needed around parentheses -

  • typical arguments and defaults can be seen by using the keyword help

  • the keyword exception may be used to skip over exceptions (e.g. bad shot data)

    exception=Exception

    or to stop ready to enter the debugger (%debug) on an exception

    exception=()

    Note the case distinction (Exception is a researved word, exception is not)

9. Examples from W7-X

The following examples use command line keyword arguments, see above for special consideration such as quotes, avoiding spaces within arguments:

# Clone the repository, then from the shell
source pyfusion/run_tutorial

# You will land in ipython. Try example1.py and others as indicated above

# run_tutorial sets a few env vars and finds the right directory
# run_pyfusion is an example of a more specialised starting script,

# string (--exe) which has some bdb_utils args inside it (in the quotes)
# but will need adpatation to your enviroment

# Then move out of JSPF_examples to a convenient working directory
cd ../../..

# make the shot database (shotDA.pickle) , which allows us to obtain
# cached utc info from shot and date
run pyfusion/acquisition/W7X/get_shot_list update=1

# run a simple plot script - these data are compact, no need for caching
run pyfusion/examples/plot_signals.py  dev_name='W7X' diag_name='W7X_TotECH'  shot_number=[20160308,40]

# A pickle file for OP1.1 (acquisition/W7-X/shotDA.pickle) is included
# if the 'update' above doesn't work (there are still some unicode/py3 issues).

# If this does work (or not!) you could try reading and plotting a
# (trivially short) file of processed data - assumed to be in the top directory
# (containing pyfusion and .git folders)

from pyfusion.data.DA_datamining import DA, Masked_DA
da = DA('LP20160309_52_L57_2k2short.npz')
da.plot('ne18')  # all channels (with dubious data masked out with Nan's
da.plot('ne18',select=[1,6]) # selected - LP02 and LP07
da.keys()  # see data available
plot(da.masked['Te']) # many dubiuos values masked out
plot(da['Te'])  # all, including dubious values

# processing Langmuir data:
# this should be done with cached data, otherwise the URL access is
# too slow to be interesting.  Some smaller cached files are included
# These only have the first few ms of data - see ... for larger files.
from pyfusion.data.process_swept_Langmuir import Langmuir_data, fixup
LP30952 = Langmuir_data([20160309,52], 'W7X_L53_LP0107','W7X_L5UALL')
LP30952.process_swept_Langmuir(threshchan=0,t_comp=[0.85,0.86],filename='*2k2short')
# the following is a temporary fix, which doesn't work with the
# small examples - need a full sized data set - almost all 20
# probes, and the full time interval
run pyfusion/examples/plot_LP2D LP20160309_52_L53_2k2.npz
fixup(da,locals(),suppress_ne=suppress_ne) # set t=0 to ECH start, mask out ne for damaged probes

# Save to local cache
run pyfusion/examples/save_to_local.py  dev_name='W7X' diag_name="['W7X_L53_LP0107']" shot_list="[[20160309,52]]"
# make a much smaller version - limited time, every 4th sample
run pyfusion/examples/save_to_local.py  dev_name='W7X' diag_name="['W7X_L53_LP0107']" shot_list="[[20160309,52]]" local_dir=/tmp time_range=[0.85,1.45] downsample=4
# A range of shots - note quotes
run pyfusion/examples/save_to_local.py "shot_list=shot_range([20160309,6],[20160309,9])" diag_name='["W7X_TotECH","W7X_L57_LPALL"]'
# add exception=() to the arg list to stop on error (for debugging)

10. Example from Heliotron-J in four steps

There are a lot of parameters here, but it is only necessary to adjust the shot_range, diag_name and the outfile name (and possibly the value of seg_dt, the time interval in ms). Once you have examined the data produced in some detail, then you may want to adjust the other parameters.

max_bands - increase if there are more than a few simultaneous frequencies

time_range - automatically finds MHD activity on HeliotronJ at least - but you can restrict the time range by setting this

From version 0.60 time units in general are seconds, this can be changed for plots only in the configuration file.

MP - number of processors (only for scripts that have MP in the name) 1 avoids multiprocessing

overlap - fraction of time segment added to the data (half before, and half after) info - 0 1, 2, 3 retains progressively more descriptive and diagnostic text

10.1. Feature extraction on Heliotron-J

1a/ for one shot (or a few):

# the MP array has just four members - so this is a quick test. -  see below for use of exception
run  pyfusion/examples/gen_fs_bands.py n_samples=None df=2e3 exception=() max_bands=1 dev_name="HeliotronJ" 'time_range="default"' seg_dt=1e-3 overlap=2.5  diag_name='HeliotronJ_MP_array' shot_range=[60573] info=0 outfile='PF2_151119_60573'

1b/ for many shots - multi processing:

# include all the MP and PMPs here - takes a few minutes.
# exception=() will stop on any error for debugging.  To skip over
# errors, and continue processing other good data, use exception=Exception
# This example uses argparse arguments (e.g. ==MP=3) and one long
# string (--exe) which has some bdb_utils args inside it (in the quotes)
run  pyfusion/examples/prepfs_range_mp.py . --MP=3  --exe='gen_fs_bands.py n_samples=None df=2e3 exception=() max_bands=1 dev_name="HeliotronJ" ' --shot_range=range(60619,60550,-1) --time_range='"default"' --seg_dt=1e-3 --overlap=2.5  --diag_name='HeliotronJ_ALL'

Result is a text file(s), which will be merged with others in step 2, to form a DA (Dictionary of Arrays) object

2/ Merge feature files:

run pyfusion/examples/merge_text_pyfusion.py  "file_list=glob('PF2_151119*_60*')"

The file ‘wildcard’ expression above needs to be adjusted to include the files you generated in step 1a or 1b. The example given works for 1a/. For 1b, the file names will depend on the date - e.g. ‘PF2_1602*’ gets all output generated in feb 2016 regardless of shot number.

Result is a DA data set

3/ Add other data (B_0, NBI, etc):

run -i pyfusion/examples/merge_basic_HJ_diagnostics.py diags=diag_extra+diag_basic dd exception=None
from pyfusion.data.DA_datamining import DA
DAHJ60573 = DA(dd)                   #  for 1b, it would make sense to call it DAHJ60k = DA(dd)
DAHJ60573.save('DAHJ60573.npz')

Result: DA data set including other data for each time segment.

4/ Compare with spectrogram/sonogram:

run pyfusion/examples/plot_specgram.py dev_name='HeliotronJ' shot_number=60573 diag_name=HeliotronJ_MP_array hold=1
from pyfusion.visual import window_manager, sp, tog
sp(DAHJ60573.da, 't_mid', 'freq', 'amp', 'a12', hold=1)

5/ Clustering:

# DAHJ60k.npz is already prepared in the hj-mhd DA file area (defined in pyfusion.cfg)
run pyfusion/examples/cluster_DA.py DAfilename='$DAPATH/DAHJ60k.npz'
co.plot_clusters_phase_lines()  # show clusters
# Clusters 0,2,5 look interesting, but the phase difference at 2 in
#   all these looks out by pi - is this a wiring change?

# alternatively, check the one you just prepared in the previous step
run pyfusion/examples/cluster_DA.py DAfilename='DAHJ60k.npz'

11. Setting up hj-mhd

operating system:

install f77 and tig  # (git utility)
# assuming git etc already there

the rest is there already or in anaconda.

anaconda: Best way is to download both python2 and 3 download files, and keep them for later. Easier to setup when the /homea drives are present (booted the other way).

Install anaconda as on the web:

bash...

Notes - better to install in the folder they recommend - ~/anaconda2 etc. This wastes a little space, but is simpler. It is not that easy to move (your username etc gets written into at least 1000 different places)- much faster to just reinstall.