data Package¶

`data` Package¶

`DA_datamining` Module¶

class pyfusion.data.DA_datamining.DA(fileordict, debug=0, verbose=0, load=0, limit=None, mainkey=None)[source]¶

Class to handle and save data in a special dictionary of arrays referred to hereafter as a “DA”.

Can deal with databases larger than memory, by using load=0

Faster to use if load=1, but if you subselect by using extract you get the speed for large data sets (once extract is done).

Extract can be used over and over to get different data set selections.

Parameters:	fileordict – An .npz file containing a DA object or a dictionary of arrays sharing a common first dimension, including the result of a loadtxt(dtype=...) command. The filename is processed for env vars ~/ etc, but sometimes this seems to substitute the path of the DA module? (bug) load – 1 will immediately load into memory, 0 will defer load allowing some operations (but slowly) without ocnsuming memory. mainkey – The main key, not necessarily a unique identifier - e.g it can be shot. limit – Decimates the data when loaded into memory (via load=1). It is the most effective space saver, but you need to reload if more (or a different subselection of data) is needed. The alternative is to downselect by using ‘extract=’ (but this applies only to the variables extracted into namespace (e.g. locals())
Returns:	A DA object as described above
Raises:	KeyError, ValueError, LookupError –

Experimental new feature allows use of the DA object itself as a: dictionary (e.g. DA59[‘shot’]).

For more info type help(DA)

Note: This is my prototype of google style python sphinx docstrings - based on: http://www.sphinx-doc.org/en/stable/ext/example_google.html Had to include ‘sphinx.ext.napoleon’ in documentation.conf.py to get the parameters on separate lines.

append(dd)[source]¶

append the data arrays in dd to the data arrays in self - i.e.: extend the existing arrays. Typical use is in serial processing of a range od shots.

See also append_to_DA_file to add an extra variable

copyda(force=False)[source]¶: make a deepcopy of self.da typically dd=DAxx.copy() instead of dd=DAxx.da - which will make dd and DAxx.da the same thing (not usually desirable)

extract(dictionary=False, varnames=None, inds=None, limit=None, strict=0, masked=1, debug=0)[source]¶

extract the listed variables into the dictionary (local by default) selecting those at indices <inds> (all be default variables must be strings, either an array, or separated by commas

if the dictionary is False, return them in a tuple instead Note: returning a list requires you to make the order consistent

if varnames is None - extract all.

e.g. if da is a dictionary or arrays da = DA(‘mydata.npz’) da.extract(‘shot,beta’) plot(shot,beta)

(shot,beta,n_e) = da.extract([‘shot’,’beta’,’n_e’], inds=np.where(da[‘beta’]>3)[0]) # makes a tuple of 3 arrays of data for high beta. Note syntax of where()! It is evaluted in your variable space.

to extract one var, need trailing ”,” (tuple notation) e.g.

(allbeta,) = D54.extract(‘beta’,locals())

which can be abbreviated to

allbeta, = D54.extract(‘beta’,locals())

hist(key, bins=50, nanval=-0.01, percentile=99, label=None)[source]¶

plot a histogram of Te or resid etc, replacing Nans or infs with nanval, and considering only up to the <percentile>th percentile DA(‘LP20160310_9_L57__amoeba21_1.2_2k.npz’).hist(‘resid’) .. rubric:: Examples

da.hist(‘resid’,percentile=97,label=’{k}: {fn}’) da.hist(‘resid’,percentile=97,label=’{k}: {actual_fit_params}’) da.hist(‘resid’,percentile=97,label=’{k}: {i_diag} {actual_fit_params}’)

info(verbose=None)[source]¶

load(sel=None)[source]¶

make_attributes()[source]¶: make each element of the dictionary an attribute of the DA object This is very convenient for operations on more than one dataset e.g. plot(da.im2 - daold.im2) Is this python 3 compatible? It seems to work fine for continuum 3.5.1

plot(key, xkey='t', sharey=1, select=None, sharex='col', masked=1, ebar=1, marker='', elinewidth=0.3, **kwargs)[source]¶

Plot the member ‘key’ of the Dictionary of Arrays: masked [1] 1 show only ‘unmasked’ points ebar [1] the number of points per errorbar - None will suppress

save(filename, verbose=None, sel=None, use_dictionary=False, tempdir=None, zipopt=-1)[source]¶

Save as an npz file, using an incremental method, which only uses as much /tmp space as required by each var at a time. Select which to save with sel: if sel is None, save all except for use_dictionary below. If use_dictionary is a valid dictionary, save the values of ANY AND ONLY the LOCAL variables whose names are in the keys for this set. So if you have extracted a subset, and you specify use_dictionary=locals(), only that subset is saved (both in array length, and variables chosen). Beware locals that are not your variables - e.g. mtrand.beta To avoid running out of space on tmp, or to speed up zip - Now included as an argument (Note that the normal os.putenv() doesn’t seem to write to THIS environment use the fudge below - careful - no guarantees) os.environ.__setitem__(‘TMPDIR’,os.getenv(‘HOME’)) actually - this seems OK os.environ[‘IGETFILE’]=’/data/datamining/myView/bin/linux/igetfile’

reload tempfile tempfile.gettempdir() also (‘ZIPOPT’,’”-1”’) (Now incorporated into args, not tested) ** superseded by zlib.Z_DEFAULT_COMPRESSION 0–9 (or -1 for default)

to_sqlalchemy(db='sqlite:///:memory:', mytable='fs_table', n_recs=1000, newfmts={}, chunk=1000)[source]¶

update(new_dict, check=True)[source]¶: Add a new variable to the dictionary. Better than simply updating dd, as it allows length check and updates the list of keys.

write_arff(filename, keys=[])[source]¶: keys is a list of keys to include, and empty list includes all

class pyfusion.data.DA_datamining.Masked_DA(valid_keys=[], baseDA=None, mask=None)[source]¶

A virtual sub dictionary of a DA, contained in the ‘masked’ attribute: and returning applicable (valid_keys) elements, masked by DA.da[‘mask’] to have Nans in the positions where mask = False or 0

An important side effect is to add the mask array to the main dictionary Probably should NOT be a subclass - we don’t want to do unnecessary copying.

Parameters:	valid_keys – keys to which mask should be applied. mask – An array (usualy 2D) of the same shape as the data, is usually set at a later stage, when the quality or error criteria are evaluated. baseDA – not sure why this is required, because this object is usually attached to an existing DA - needs some thought.

Example

>>> from pyfusion.data.DA_datamining import Masked_DA, DA
>>> mydDA=DA('20160310_9_L57',load=1)  # needs to be loaded for this operation
>>> da.masked=Masked_DA(valid_keys=['Te','I0','Vp'], baseDA=myDA)
>>> myDA.da['mask']=-myDA['resid']/abs(myDA['I0'])<.35
>>> clf();plot(myDA.masked['Te']);ylim(0,100)

keys()[source]¶: return the keys for masked elements only

pyfusion.data.DA_datamining.append_to_DA_file(filename, new_dict, force=False)[source]¶

Adds a new variable to the file - more like ‘update’ than append

Opens filename with mode=a, after checking if the indx variables align force=1 ignores checks for consistent length c.f. the var shot.

Works with a DA file, in contrast to DA.append() which extends a DA

Parameters:	filename – file to append to new_dict – dictionary with new data to append force – try to continue if there is a mismatch error
Returns:	no return - side effect is to add a new variable to a DA file
Raises:	`ValueError` – if new arrays don’t match old.

Example

>>> append_to_DA_file('DAX.npz',dict(N=dd['N'])     # simple
>>> append_to_DA_file('foo.npz',dict((k, mydict[k]) for k in ['N','M']))

Not a member of the class DA, because the class has memory copies of the file, so it would be confusing.

pyfusion.data.DA_datamining.da(filename='300_small.npz', dd=True)[source]¶: return a da dictionary (used to be called dd - not the DA object) mainly for automated tests of example files.

pyfusion.data.DA_datamining.info_to_bytes(inf)[source]¶: Eventually this should deal with bytes/unicode compatibility

pyfusion.data.DA_datamining.mylen(ob)[source]¶: return the length of an array or dictionary for diagnostic info

pyfusion.data.DA_datamining.process_file_name(filename)[source]¶: Allow shell shortcuts such as ~/ and env var expansion - Note: not tested in windows

pyfusion.data.DA_datamining.report_mem(prev_values=None, msg=None)[source]¶

`base` Module¶

Base classes for data.

class pyfusion.data.base.BaseCoordTransform[source]¶

Bases: object

Base class does nothing useful at the moment

input_coords = 'base_input'¶

output_coords = 'base_output'¶

transform(coords)[source]¶

class pyfusion.data.base.BaseData(*args, **kwargs)[source]¶

Bases: object

Base class for handling processed data.

In general, specialised subclasses of BaseData will be used to handle processed data rather than BaseData itself.

Usage: ..........

save()[source]¶

class pyfusion.data.base.BaseDataSet(label='')[source]¶

Bases: object

add(item)[source]¶

copy()[source]¶

pop()[source]¶

remove(item)[source]¶

save()[source]¶

update(item)[source]¶

class pyfusion.data.base.BaseOrderedDataSet(label='')[source]¶

Bases: object

append(item)[source]¶

save()[source]¶

class pyfusion.data.base.Channel(name, coords, source='', parent_device='not specified')[source]¶

Bases: object

save()[source]¶: applicable only to ORM db

class pyfusion.data.base.ChannelList(*args)[source]¶

Bases: list

get_channel_index(channel_name, bounds=None)[source]¶

repopulate()[source]¶

save()[source]¶

class pyfusion.data.base.Coords(default_coords_name, default_coords_tuple, **kwargs)[source]¶

Bases: object

Stores coordinates with an interface for coordinate transforms.

add_coords(**kwargs)[source]¶

load_from_config(**kwargs)[source]¶

load_transform(transform_class)[source]¶

save()[source]¶

class pyfusion.data.base.DataSet(label='')[source]¶

Bases: pyfusion.data.base.BaseDataSet

copy(input_data, *args, **kwargs)¶

normalise(input_data, *args, **kwargs)¶

reduce_time(input_data, *args, **kwargs)¶

remove_noncontiguous(input_data, *args, **kwargs)¶

segment(input_data, *args, **kwargs)¶

subtract_mean(input_data, *args, **kwargs)¶

class pyfusion.data.base.DynamicDataSet(label='')[source]¶: Bases: pyfusion.data.base.BaseDataSet

class pyfusion.data.base.FloatDelta(channel_1, channel_2, delta, **kwargs)[source]¶: Bases: pyfusion.data.base.BaseData

class pyfusion.data.base.MetaMethods[source]¶

Bases: type

Metaclass which provides filter and plot methods for data classes.

class pyfusion.data.base.OrderedDataSet(label='')[source]¶: Bases: pyfusion.data.base.BaseOrderedDataSet

class pyfusion.data.base.OrderedDataSetItem(item, index)[source]¶: Bases: object

class pyfusion.data.base.PfMetaData[source]¶: Bases: dict

pyfusion.data.base.get_coords_for_channel(channel_name=None, **kwargs)[source]¶

pyfusion.data.base.get_history_args_string(*args, **kwargs)[source]¶

pyfusion.data.base.history_reg_method(method)[source]¶: Wrapper for filter and plot methods which updates the data history.

pyfusion.data.base.orm_load_basedata(man)[source]¶

pyfusion.data.base.orm_load_basedataset(man)[source]¶

pyfusion.data.base.orm_load_baseordereddataset(man)[source]¶

pyfusion.data.base.orm_load_channel(man)[source]¶

pyfusion.data.base.orm_load_channel_map(man)[source]¶

pyfusion.data.base.orm_load_channellist(man)[source]¶

pyfusion.data.base.orm_load_dataset(man)[source]¶

pyfusion.data.base.orm_load_dynamic_dataset(man)[source]¶

pyfusion.data.base.orm_load_floatdelta(man)[source]¶

pyfusion.data.base.setup_coords(man)[source]¶

`convenience` Module¶

pyfusion.data.convenience.between(var, lower, upper=None, closed=True)[source]¶: return whether var is between lower and upper includes end points if closed=True alternative call is between(var, range) e.g. between(x, [1, 2])

pyfusion.data.convenience.broaden(inds, data=None, dw=1)[source]¶: broaden a set of indices or data in width by dw each side

pyfusion.data.convenience.btw(var, lower, upper=None, closed=True)¶: return whether var is between lower and upper includes end points if closed=True alternative call is between(var, range) e.g. between(x, [1, 2])

pyfusion.data.convenience.bw(var, lower, upper=None, closed=True)¶: return whether var is between lower and upper includes end points if closed=True alternative call is between(var, range) e.g. between(x, [1, 2])

pyfusion.data.convenience.decimate(data, limit=None, fraction=None)[source]¶: reduce the number of items to a limit or by a fraction returns the same data every call decimation is regular, not random

pyfusion.data.convenience.his(xa, tabs=False, sort=-1, dfmt=None, total=True, maxsort=None)[source]¶: print the counts and fraction of xa binned as integers sort=1,-1 sorts by most frequent (first, last), maxsort sets the maximum number kept after the sort

pyfusion.data.convenience.inds_from_list(var, lst)[source]¶: return a list of indices of var which match the items in lst. This is to replace shot in [90000,90001] etc which can’t be used in where() (gets msg ”...... Use a.any() or a.all()”) >>> inds_from_list([1,4,5,1], [1,2,3]) array([0, 3])

pyfusion.data.convenience.inlist(var, lst)[source]¶: return a list of True/False of var which match the items in lst. This is to replace shot in [90000,90001] etc which gets msg ”...... Use a.any() or a.all()” >>> inlist([1,4,5,1], [1,2,3]) array([ True, False, False, True], dtype=bool)

pyfusion.data.convenience.whr(*args)[source]¶: short for where, doesn’t need np. or [0] at end, and echoes number

`evalexpr_script` Module¶

`filters` Module¶

Some un-pythonic code here (checking instance type inside function). Need to figure out a better way to do this.

python3 issues:

/home/bdb112/pyfusion/mon121210/pyfusion/pyfusion/data/filters.py:65: DeprecationWarning: classic int division: nice = [2**p * n/16 for p in range(minp2,maxp2) for n in [16, 18, 20, 24, 27]]
/home/bdb112/pyfusion/mon121210/pyfusion/pyfusion/data/utils.py:120: DeprecationWarning: classic int division: ipks = find_peaks(np.abs(FT)[0:ns/2], minratio = minratio, debug=1)
/home/bdb112/pyfusion/mon121210/pyfusion/pyfusion/data/filters.py:443: DeprecationWarning: classic int division: twid = 2*(1+max(n_pb_low - n_sb_low,n_sb_hi - n_pb_hi)/2)

/home/bdb112/pyfusion/mon121210/pyfusion/pyfusion/data/base.py:57: UserWarning: defaulting taper to 1 as band edges are sharp /home/bdb112/pyfusion/mon121210/pyfusion/pyfusion/data/filters.py:508: DeprecationWarning: classic int division

if np.mod(NA,2)==0: mask[:NA/2:-1] = mask[1:(NA/2)] # even

/home/bdb112/pyfusion/mon121210/pyfusion/pyfusion/data/filters.py:490: DeprecationWarning: classic int division: low_mid = n_pb_low - twid/2
/home/bdb112/pyfusion/mon121210/pyfusion/pyfusion/data/filters.py:491: DeprecationWarning: classic int division: high_mid = n_pb_hi + twid/2

pyfusion.data.filters.copy(input_data)[source]¶: safe (deep) copy of camplete data

pyfusion.data.filters.correlate(input_data, index_1, index_2, **kwargs)[source]¶

pyfusion.data.filters.cps(a, b)[source]¶

pyfusion.data.filters.downsample(input_data, skip=10, chan=None, copy=False)[source]¶: Good example of filter that changes the size of the data.

class pyfusion.data.filters.dummysig(tb, sig)[source]¶

pyfusion.data.filters.filter_fourier_bandpass(input_data, passband, stopband, taper=None, debug=None)[source]¶

Note: Is MUCH (2.2x faster) more efficient to use real ffts, (implemented April) Use a Fourier space taper/tophat or pseudo gaussian filter to perform narrowband filtering (much narrower than butterworth). Problem is that bursts may generate ringing. This should be better with taper=2, but it is not clear

See the __main__ code below for nice test facilities twid is the width of the transition from stop to pass (not impl.?) >>> tb = timebase(np.linspace(0,20,512)) >>> w = 2*np.pi* 1 # 1 Hertz >>> dat = dummysig(tb,np.sin(w*tb)*(tb<np.max(tb)/3)) >>> fop = filter_fourier_bandpass(dat,[0.9,1.1],[0.8,1.2],debug=1).signal[0]

Testing can be done on the dummy data set generated after running filters.py e.g. (with pyfusion,DEBUG=2 make_mask(512, [0.8,.93], [0.9,.98],dat,2) # medium sharp shoulder fopmed = filter_fourier_bandpass(dat,[9.5,10.5],[9,11],debug=1).signal[0] # very sharp shoulders fopsharp = filter_fourier_bandpass(dat,[9.1,10.9],[9,11],debug=1)

pyfusion.data.filters.flucstruc(input_data, min_dphase=-3.141592653589793, group=<function fs_group_geometric>, method='rms', separate=True, label=None, segment=0, segment_overlap=1.0)[source]¶: If segment is 0, then we dont segment the data (assume already done)

pyfusion.data.filters.fs_group_geometric(input_data, max_energy=1.0)[source]¶: no filtering implemented yet we don’t register this as a filter, because it doesn’t return a Data or DataSet subclass # TODO: write docs for how to use max_energy - not obvious if using flucstruc() filter...

pyfusion.data.filters.fs_group_threshold(input_data, threshold=0.7)[source]¶: no filtering implemented yet we don’t register this as a filter, because it doesn’t return a Data or DataSet subclass

pyfusion.data.filters.get_optimum_time_range(input_data, new_time_range, try_more=0.0002)[source]¶

This grabs a few more (or a few less, if enough not available in input_data) points than requested so that the FFT is more efficient. try_more is the number of samples to increase/decrease as a fraction of the input data.

Note: For FFTW, it is more efficient to zero pad to a nice number above, even if it is a long way away. This is always true for Fourier filtering, in which case you never see the zeros. For general applications, the zeros might be confusing if you forget they have been put there.

pyfusion.data.filters.integrate(input_data, baseline=[], delta_t=0.01, chan=None, copy=False)[source]¶

Return the time integral of a signal, with baseline optionally removed before integration

baseline = None No removal: = [] sloping baseline for delta_t at either end of the data = [t0, t1, t2, t3] as above, but t values are explicit = [t0, t1] as above, but simple constant removal.

Perhaps the first sample will be a Nan - no,the samples are all displaced by one If we used the trapeziodal method, the first and last samples are ‘incorrect’ and Nans may be appropriate

pyfusion.data.filters.make_mask(NA, norm_passband, norm_stopband, input_data, taper)[source]¶: works well now, except that the stopband is adjusted to be symmetric about the passband (take the average of the differences The problem with crashes (zero mask) was solved by shifting the mask before and after integrating, also a test for aliasing (on the mask before integration).

pyfusion.data.filters.next_nice_number(N)[source]¶

return the next highest power of 2 including nice fractions (e.g. 2**n *5/4) takes about 10us - should rewrite more carefully to calculate starting from smallest power of 2 less than N, but hard to do better >>> print(next_nice_number(256), next_nice_number(257)) (256, 288)

Have to be careful this doen’t take more time than it saves!

pyfusion.data.filters.normalise(input_data, method=None, separate=False)[source]¶: method=None -> default, method=0 -> DON’T normalise

pyfusion.data.filters.reduce_time(input_data, new_time_range, fftopt=0)[source]¶: reduce the time range of the input data in place(copy=False) or the returned Dataset (copy=True - default at present). if fftopt, then extend time if possible, or if not reduce it so that ffts run reasonably fast. Should consider moving this to actual filters? But this way users can obtain optimum fft even without filters. The fftopt is only visited when it is a dataset, and this isn’t happening

pyfusion.data.filters.register(*class_names)[source]¶

pyfusion.data.filters.remove_baseline(input_data, baseline=None, delta_t=0.01, chan=None, copy=False)[source]¶: Remove a tilted baseline from a signal if the baseline is 4 elements, correct at two points (mid point of those intervals) baseline in the same units as the timebase

pyfusion.data.filters.remove_noncontiguous(input_dataset)[source]¶

pyfusion.data.filters.segment(input_data, n_samples, overlap=1.0)[source]¶

Break into segments length n_samples.

Overlap of 2.0 starts a new segment halfway into previous, overlap=1 is no overlap. overlap should divide into n_samples. Probably should consider a nicer definition such as in pyfusion 0

n_samples < 1 implies a time interval, which is adjusted to suit fft otherwise n_samples is taken literally and not adjusted. fractional n_samples>1 allows the step size to be fine-tuned.

pyfusion.data.filters.sp_filter_butterworth_bandpass(input_data, passband, stopband, max_passband_loss, min_stopband_attenuation, btype='bandpass')[source]¶: ** Warning - fails for a single signal in the enumerate step.

This actually does ALL butterworth filters - just select bptype and use scalars instead of [x,y] for the passband.

e.g df=data.sp_filter_butterworth_bandpass(2e3,4e3,2,20,btype=’lowpass’)

pyfusion.data.filters.subtract_mean(input_data)[source]¶

pyfusion.data.filters.svd(input_data)[source]¶

`histogramHD` Module¶

class pyfusion.data.histogramHD.CoordHD(dims, debug=0)[source]¶

get(*indices)[source]¶: get(1,2,3) args not a tuple

set(indices, val)[source]¶: set((1,2,3),49.)

class pyfusion.data.histogramHD.CoordHDs(dims, debug=0)[source]¶

get(indices)[source]¶: string version doesn’t require the tuple notation (*)

set(indices, val)[source]¶: set(inds,49.)

pyfusion.data.histogramHD.find_eps(x, value=None)[source]¶: find the smallest number that will always exceed the representational accuracy of x in the range around value.

pyfusion.data.histogramHD.histogramHD(d, bins=None, method='safe')[source]¶

make a histogram of data too high in dimension to use histogramdd, by successive calls to histogramdd(). The simplest implementation is to assume all bins are equal and in the range -pi..pi subdivided into nbins bins.

First version will use the coo_utils for simplicity - not sure how efficient lookup is though - and there is no obvious way to make an array larger than memory

Second - use a dictionary to store the results - 120ns lookup in a 100k dict Assume we can do a rank R array with histogramdd Should really store the bin boundaries for later reference

`plots` Module¶

Note, plots (this file) doesn’t have unittests

Problems with checkbuttons working since 2011-2012? Temporary fix is to block in svdplot when hold==1. This enables the function, then you kill the window to move on (see plot_svd.py) Also attempted to use subplots here to tidy up putting additonal graphs on top. See plots_1.py and svd_plots1.py - need to sovle subplot pars problem bdb made format more conforming. 2013

class pyfusion.data.plots.Energy(energy_list, initial_list)[source]¶

add(elmt)[source]¶

sub(elmt)[source]¶

pyfusion.data.plots.findZero(i, x, y1, y2)[source]¶

pyfusion.data.plots.fsplot_phase(input_data, closed=True, ax=None, hold=0, offset=0, block=False)[source]¶

plot the phase of a flucstruc, optionally inserting the first point at the end (if closed=True). Applies to closed arrays (e.g complete 2pi). Until Feb 2013, this version did not yet attempt to take into account angles, or check that adjacent channels are adjacent (i.e. ch2-ch1, ch2-c2 etc). Channel names are taken from the fs and plotted abbreviated

1/1/2011: TODO This appears to work only for database = None config 1/17/2011: bdb: May be fixed - I had used channel instead of channel.name

pyfusion.data.plots.join_ends(inarray, add_2pi=False, add_360deg=False, add_lenarray=False, add_one=False)[source]¶: used in old code, needs clean up....

pyfusion.data.plots.mydiff(t, y)[source]¶: numpy diff in a wrapper to give correct time and units

pyfusion.data.plots.myiden(y)[source]¶

pyfusion.data.plots.myiden2(t, y)[source]¶

pyfusion.data.plots.overlap(interval1, interval2)[source]¶: return true if there is any overlap between the two intervals probably belongs in convenience or utils

pyfusion.data.plots.plot_fs_groups(input_data, fs_grouper)[source]¶

Show what would be generated by the supplied fs_grouper function.

An example of a grouper funcion is pyfusion.data.filters.fs_group_threshold

The grouper function takes an SVDData instance as its first input and returns an iterable where each element is a set of singular value indices which define a flucstruc.

pyfusion.data.plots.plot_signals(input_data, filename=None, downsamplefactor=1, n_columns=1, hspace=None, sharey=False, sharex=True, ylim=None, xlim=None, marker='None', decimate=0, markersize=2, linestyle=True, labelfmt='{short_name} {units}', filldown=True, suptitle='shot {shot}', raw_names=False, labeleg='False', color='b', fun=<function myiden>, fun2=None, **kwargs)[source]¶

Plot a figure full of signals using n_columns[1],: sharey [=1] “gangs” y axes - sim for sharex - sharex=None stops this sharey: 2 gangs all but first (top) axis x axes are ganged by default: see Note:

fun, fun2: optionally plot a function of the signal. fun= refers to a function of one variable. fun2 is a function (t,x), such as one that returns a different timebase (diff should do this) if fun2 is given (a function of t and sig), then fun is ignored

labelfmt[“{short_name} {units}”] controls the channel labels.

The default ignores the shot and uses an abbreviated form of the channel name. If the short form is very short, it becomes a y label. A full version is “{name} {units}” and if > 8 chars, will become the x label. Even longer is “Shot={shot}, k_h={kh}, {name}”

labeleg: If ‘True’ put label in legend - else use this str as a legend lab linestyle: default of True means use ‘-‘ if no marker, and nothing if a

marker is given e.g. marker of ‘.’ and markersize<1 will produce “shaded” waveforms, good to see harmonic structure even without zooming in (need to adjust markersize or plot size for best results.

raw_names uses “digitiser” names, otherwise use names from the config file Note = sharex that to allow implicit overlay by using the same subplot specs the sharex must be the same between main and overlay - hence the use of explicit sharex = None

suptitle by default refers to the shot number

pyfusion.data.plots.plot_spectrogram(input_data, windowfn=None, units='kHz', channel_number=0, filename=None, coloraxis=None, noverlap=0, NFFT=None, title=None, **kwargs)[source]¶: title will be auto generated, if supplied, include ‘+’ to include the auto-generated part

pyfusion.data.plots.posNegFill(x, y1, y2)[source]¶

pyfusion.data.plots.register(*class_names)[source]¶

pyfusion.data.plots.set_axis_if_OK(ax, xlims, ylims)[source]¶: set axes if they containg at least a bit of the plot originally meant to be used when forcing specific axes, to allow for attempt to force axes suitable for LHD on HJ data for example

pyfusion.data.plots.svdplot(input_data, fmax=None, hold=0)[source]¶

`plots_1` Module¶

Note, plots (this file) doesn’t have unittests

class pyfusion.data.plots_1.Energy(energy_list, initial_list)[source]¶

add(elmt)[source]¶

sub(elmt)[source]¶

pyfusion.data.plots_1.findZero(i, x, y1, y2)[source]¶

pyfusion.data.plots_1.fsplot_phase(input_data, closed=True, ax=None, hold=0)[source]¶

plot the phase of a flucstruc, optionally calculating the extra difference e.g. MP6-MP1 NOT just replicating the last point at the beginning (if closed=True). This version does not yet attempt to take into account angles, or check that adjacent channels are adjacent (i.e. ch2-ch1, ch2-c2 etc). Channel names are taken from the fs and plotted abbreviated

1/1/2011: TODO This appears to work only for database=None config 1/17/2011: bdb: May be fixed - I had used channel instead of channel.name

pyfusion.data.plots_1.join_ends(inarray, add_2pi=False, add_360deg=False, add_lenarray=False, add_one=False)[source]¶: used in old code, needs clean up....

pyfusion.data.plots_1.plot_fs_groups(input_data, fs_grouper)[source]¶

Show what would be generated by the supplied fs_grouper function.

An example of a grouper funcion is pyfusion.data.filters.fs_group_threshold

The grouper function takes an SVDData instance as its first input and returns an iterable where each element is a set of singular value indices which define a flucstruc.

pyfusion.data.plots_1.plot_signals(input_data, filename=None, downsamplefactor=1, n_columns=1, hspace=None, sharey=False, sharex=True, ylim=None, xlim=None, marker='None', markersize=0.3, linestyle=True, labelfmt='%(short_name)s', filldown=True)[source]¶

Plot a figure full of signals using n_columns[1],

sharey [=1] “gangs” y axes x axes are always ganged labelfmt[“%(short_name)s”] controls the channel labels.

The default ignores the shot and uses an abbreviated form of the channel name. If the short form is very short, it becomes a y label. A full version is “%(name)s” and if > 8 chars, will become the x label. Even longer is “Shot=%(shot), k_h=%(kh)s, %(name)s”

linestyle: default of True means use ‘-‘ if no marker, and nothing if a: marker is given e.g. marker of ‘.’ and markersize<1 will produce “shaded” waveforms, good to see harmonic structure even without zooming in (need to adjust markersize or plot size for best results.

pyfusion.data.plots_1.plot_spectrogram(input_data, windowfn=None, units='kHz', channel_number=0, filename=None, coloraxis=None, noverlap=0, NFFT=None, **kwargs)[source]¶

pyfusion.data.plots_1.posNegFill(x, y1, y2)[source]¶

pyfusion.data.plots_1.register(*class_names)[source]¶

pyfusion.data.plots_1.svdplot(input_data, fmax=None, axs=None, hold=0)[source]¶: axs is a set of four axes

`pyfusion_sigproc` Module¶

signal processing peculiar to plasma devices - more general stuff in signal_processing.py Simple criterion is if it imports pyfusion it should be here, if general, in signal_processing. Important to separate from routines that initialise database, as they can’t be recompiled/reloaded easily during debugging.

pyfusion.data.pyfusion_sigproc.find_shot_times(dev, shot, activity_indicator=None, debug=0)[source]¶: Note: This is inside a try/except - errors will just skip over!! fixme From the channel specified in the expression “activity_indicator”, determine the beginning and end of pulse. A suitable expression is hard to find. For example, density usually persists too long after the shot, and sxrays appear a little late in the shot. The magnetics may be useful if magnet power supply noise could be removed. (had trouble with lhd 50628 until adj threshold ?start and end were at 0.5-0.6 secs ) >>> import pyfusion >>> sh=pyfusion.core.get_shot(15043,activity_indicator=”MP4”) >>> print(‘start=%.3g, end=%.3g’ % (sh.pulse_start, sh.pulse_end) ) start=177, end=218 >>> sh=pyfusion.core.get_shot(33372,activity_indicator=”MP4”) >>> print(‘start=%.3g, end=%.3g’ % (sh.pulse_start, sh.pulse_end) ) start=168, end=290

`save_compress` Module¶

ported from the old pyfusion:

A “smart compression” replacement for savez, assuming data is quantised. The quantum is found, and the data replaced by a product of and integer sequence and quantum with offset. delta encoding is optional and often saves space. The efficiency is better than bz2 of ascii data for individual channels, and a little worse if many channels are lumped together with a common timebase in the bz2 ascii format, because save_compress stores individual timebases. $ wc -c /f/python/local_data/027/27999_P* 2300176 total

At the moment (2010), save_compress is not explicitly implemented - it is effected by calling discretise_signal() with a filename argument.

July 2009 - long-standing error in delta_encode_signal fixed (had not been usable before)

may 2016 - version 104 works around an error caused by W7X corrupted timebases (all 0s and nans)

pyfusion.data.save_compress.discretise_array(arrin, eps=0, bits=0, maxcount=0, verbose=None, delta_encode=False)[source]¶: Return an integer array and scales etc in a dictionary - the dictionary form allows for added functionaility. If bits=0, find the natural accuracy. eps defaults to 3e-6, and is the error relative to the largest element, as is maxerror.

pyfusion.data.save_compress.discretise_signal(timebase=None, signal=None, parent_element=array(0), eps=0, verbose=0, params={}, delta_encode_time=True, delta_encode_signal=False, filename=None)[source]¶: a function to return a dictionary from signal and timebase, with relative accuracy eps, optionally saving if filename is defined. Achieves a factor of >10x on MP1 signal 33373 using delta_encode_time=True Delta encode on signal is not effective for MP and MICROFAST (.005% worse) Probably should eventually separate the file write from making the dictionary. Intended to be aliased with loadz, args slightly different. There is no dependence on pyfusion. Version 101 adds time_unit_in_seconds version 102 adds utc, raw, 103 after correction of probes 11-20 Note: changed to parent_element=array(0) by default - not sure what this is!

pyfusion.data.save_compress.newload(filename, verbose=0)[source]¶: Intended to replace load() in numpy This is being used with nan data. The version in data/base.py is closer to python 3 compatible, but can’t deal with the nans yet.

pyfusion.data.save_compress.save_compress(timebase=None, signal=None, filename=None, *args, **kwargs)[source]¶: save a signal and timebase into a compress .npz file. See arglist of discretise_signal. Example: >>> sig=[1,2,1,2] ; tb=[1,2,3,4] # need this only for later comparison >>> save_compress(timebase=tb, signal=sig, filename=’junk’) >>> readback=newload(‘junk.npz’,verbose=0) >>> if (readback[‘signal’] != sig).any(): print ‘error in save/restore’

pyfusion.data.save_compress.test_compress(file=None, verbose=0, eps=0, debug=False, maxcount=0)[source]¶: Used in developing the save compress routines. Not tested since then >>> test_compress() Looks like it only saves the time series, not the rest.

pyfusion.data.save_compress.try_discretise_array(arr, eps=0, bits=0, deltar=None, verbose=0, delta_encode=False)[source]¶: Return an integer array and scales etc in a dictionary - the dictionary form allows for added functionality. If bits=0, find the natural accuracy. eps defaults to 1e-6

`signal_processing` Module¶

Boyd’s python for stand alone, general signal processing, try to be “efficient”. Stuff that imports pyfusion should live in pyfusion_sigproc.py. Put stuff here so that recompilation doesn’t require restarting pyfusion.

pyfusion.data.signal_processing.analytic_phase(x, t=None, subint=None)[source]¶: gets the phase from an amazing variety of signals http://en.wikipedia.org/wiki/Analytic_signal subinterval idea is not debugged and is probably unnecessary may shorten data?

pyfusion.data.signal_processing.cross_correl(x1, x2, nsmooth=21, n_times=3)[source]¶: <x1.x2>/sqrt(<x1.x1> * <x2,x2>) averaged over nsmooth points n_times Ideally extract raw data from pyfusion signals, but that makes it pyfusion specfic - bad.

pyfusion.data.signal_processing.powerof2(n, near=False)[source]¶

pyfusion.data.signal_processing.smooth(data, n_smooth=3, timebase=None, causal=False, indices=False, keep=False)[source]¶

An efficient top hat smoother based on the IDL routine of that name. The use of cumsum-shift(cumsum) means that execution time is 2xN flops compared to 2 x n_smooth x N for a convolution. If supplied with a timebase, the shortened timebase is returned as the first of a tuple.

causal – If true, the smoothed signal never preceded the input,: otherwise, the smoothed signal is “centred” on the input (for n_smooth odd) and close (1/2 timestep off) for even

indices – if true, return the timebase indices instead of the times data = (timebase, data) is a shorthand way to pass timebase n_smooth - apply recursively if an array e.g. n_smooth=[33,20,14,11]

removes 3rd, 5th, 7th, 9th harmonics fraction values are interpreted as timeintervals.

>>> smooth([1,2,3,4],3)
array([ 2.,  3.])
>>> smooth([1.,2.,3.,4.,5.],3)
array([ 2.,  3.,  4.])
>>> smooth([1,2,3,4,5],timebase=array([1,2,3,4,5]),n_smooth=3, causal=False)
(array([2, 3, 4]), array([ 2.,  3.,  4.]))
>>> smooth([0,0,0,3,0,0,0],timebase=[1,2,3,4,5,6,7],n_smooth=3, causal=True)
([3, 4, 5, 6, 7], array([ 0.,  1.,  1.,  1.,  0.]))
>>> smooth([0,0,0,3,0,0,0],timebase=[1,2,3,4,5,6,7],n_smooth=3, causal=True, indices=True)
([2, 3, 4, 5, 6], array([ 0.,  1.,  1.,  1.,  0.]))
>>> smooth([0,   0,   0,   0,   5,   0,    0,  0,   0,   0,   0], 5, keep=1)
array([ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0., -1., -1.])

Last example: keep=1:: Better to throw the partially cooked ends away, but if you want to

keep them use keep=True. THis is useful for quick filtering applications so that original and filtered signals are easily compared without worrying about timebase

pyfusion.data.signal_processing.smooth_n(data, n_smooth=3, timebase=None, causal=False, iter=3, keep=False, indices=False)[source]¶

Apply smooth “iter” times. [ smooth() doc follows: ]

An efficient top hat smoother based on the IDL routine of that name.

The use of cumsum-shift(cumsum) means that execution time is 2xN flops compared to 2 x n_smooth x N for a convolution. If supplied with a timebase, the shortened timebase is returned as the first of a tuple.

causal – If true, the smoothed signal never preceded the input,: otherwise, the smoothed signal is “centred” on the input (for n_smooth odd) and close (1/2 timestep off) for even

indices – if true, return the timebase indices instead of the times data = (timebase, data) is a shorthand way to pass timebase n_smooth - apply recursively if an array e.g. n_smooth=[33,20,14,11]

removes 3rd, 5th, 7th, 9th harmonics fraction values are interpreted as timeintervals.

>>> smooth([1,2,3,4],3)
array([ 2.,  3.])
>>> smooth([1.,2.,3.,4.,5.],3)
array([ 2.,  3.,  4.])
>>> smooth([1,2,3,4,5],timebase=array([1,2,3,4,5]),n_smooth=3, causal=False)
(array([2, 3, 4]), array([ 2.,  3.,  4.]))
>>> smooth([0,0,0,3,0,0,0],timebase=[1,2,3,4,5,6,7],n_smooth=3, causal=True)
([3, 4, 5, 6, 7], array([ 0.,  1.,  1.,  1.,  0.]))
>>> smooth([0,0,0,3,0,0,0],timebase=[1,2,3,4,5,6,7],n_smooth=3, causal=True, indices=True)
([2, 3, 4, 5, 6], array([ 0.,  1.,  1.,  1.,  0.]))
>>> smooth([0,   0,   0,   0,   5,   0,    0,  0,   0,   0,   0], 5, keep=1)
array([ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0., -1., -1.])

Last example: keep=1:: Better to throw the partially cooked ends away, but if you want to

keep them use keep=True. THis is useful for quick filtering applications so that original and filtered signals are easily compared without worrying about timebase

pyfusion.data.signal_processing.splot(signal, *args, **kwargs)[source]¶: simple wrapper to plot to accept tuples of form (timebase, data)

pyfusion.data.signal_processing.test_analytic_phase(verbose=3)[source]¶

>>> test_analytic_phase()

`specgram_fftw3` Module¶

A faster replacement for matplotlib specgram using float32 fftw3 Real-Complex routines. So far about 4x faster for no overlap, 3x faster with overlap Doesn’t include time from return to prompt until display is updated. and times do not have SIMD enabled (fft is not the big cost)

Strategy:

1/ Use fftw3 -> factor of 2-3. (see multiprocessing) 2/ Use Real-Complex 32 bit precision ffts. another factor of 2 (looks like psd already does this) 3/ Take log, abs and do window on a composite array for speed, and avoid sqrt

this is effective as chunks are small (128-1024) so numpy overhead per call of a few us can be important.

Expect about 2.3us (E4300, simd, 1thread, and 1.5 lenny) for a float32 real 512 pt fft: –> so for 2**20 array, 2000*2.3 = 5ms!

However the pyfftw overhead of about 10us has in impact for small bins like 512. The more direct interface of the other fftw3 could help here, but you would need to do copies on input and output - these take 1.5us each for float32, 512 bytes, so would be a nett win of 5us*512 =2.5ms for 512x512 case (15%) and 40ms for 2**22 (8192) = 13%

'FFTW_ESTIMATE', 'FFTW_MEASURE' (default), 'FFTW_PATIENT' and 'FFTW_EXHAUSTIVE'.

Improvements: Use the Advanced interface plan_many to get all short samples done all at once. Hopefully threads are allocated one chunk at a time.

Timings for no overlap, E4300 (power: no diff) clf() means preceded by clf, hold=1, using original (no simd) ubuntu libraries.: specgram This

512x512 512x8192 512x512 512x8192 512x32768

clf() 96 804 69 360 hold=0 125 765 62 351 hold=1 43 695 17.7 307 1486

Sun 21: 360ms for hold=0, 512*8192? instead of 351 cpu is showing 10% when idle, and 86 ms in imshow - no gain by reboot (don’t forget to change hold= in the imshow line)

With FFTW fft instead of numpy replacement, now 150 for ffts, 198 total (hold=0) (noverlap=512, NFFT=1024 - 426 (incl imshow) 336-360 before)

This compares with the raw fft time (noverlap=0) of 8k*2.3 = 18ms Summary of times:

window mult inp= F.execute allout= log imshow

7ms 16ms 15ms 11ms 8ms 77ms 50ms 184 cf 197actual

See /usr/lib/pymodules/python2.7/matplotlib/mlab.py

Can test animate speed by im_obj.set_data(im) # 50ms (std) 80ms 1200x800 im_obj.set_data(im[::5,::20]) # 30ms 60ms 1200x800 time for ii in range(10):draw()

pyfusion.data.specgram_fftw3.all_tests_specgram(slow=False)[source]¶: compare with pylab specgram for random noise data size=2**12 (and 2*22 for slow=True)

pyfusion.data.specgram_fftw3.specgram(x, NFFT=256, Fs=2, Fc=0, detrend=None, window=<function hanning>, noverlap=128, cmap=None, hold=None, dtype=<type 'numpy.float32'>, threads=1, im_obj=None, interpolation='nearest', pylab_scaling=True, fast=True)[source]¶: return and optionally plot the spectrogram of the data in x, using fftw3 to optimise the speed. Note that using fftw3, most of the time is spent elsewhere, so perhaps a cython implementation would be more efficient. Sofar only coded for float32 and float64 Slight increase in speed if the previous image_object is given in the arg list. Tests below on E4300, power on, 1 thread 6 secs til image for specgram of 16M points, NFFT=512, noverlap=128 std win 1.8 secs til first image fast=1, im_obj=im_obj (1 sec more for float64) e.g. (note this is a wide dynamic range (17 decades of power) so there will be slight errors visible in the float32 version. time figure(3);s=specgram(arange(2**24)**n,Fs=1e6,NFFT=512,im_obj=s[3],dtype=float64);n=(n+1)%3 matplotlib specgram has good dyn range for arange2*24, but error of ~ -8 for random

pyfusion.data.specgram_fftw3.test_specgram(x=array([ 0.52510351, 0.2582512, 0.47227109, ..., 0.07367249, 0.73439432, 0.03488037]), NFFT=256, noverlap=128, dtype='float32', rtol=1e-06, atol=3e-07, verbose=1)[source]¶: compare with pylab specgram for random noise atol=2e-7 shows errors < 1 in 10,000 times - verbose not yet implemented don’t run too many of these (i.e. > 1000)- eats into RAM

`tests` Module¶

Tests for the Data and related classes.

class pyfusion.data.tests.CheckChannelList[source]¶

Parameters:	low (float, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0. high (float) – Upper boundary of the output interval. All values generated will be less than high. The default value is 1.0. size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., `(m, n, k)`, then `m * n * k` samples are drawn. Default is None, in which case a single value is returned.
Returns:	out – Drawn samples, with shape size.
Return type:	ndarray

data Package¶

data Package¶

DA_datamining Module¶

base Module¶

convenience Module¶

evalexpr_script Module¶

filters Module¶

histogramHD Module¶

plots Module¶

plots_1 Module¶

pyfusion_sigproc Module¶

save_compress Module¶

signal_processing Module¶

specgram_fftw3 Module¶

tests Module¶

timeseries Module¶

utils Module¶

write_arff Module¶

`data` Package¶

`DA_datamining` Module¶

`base` Module¶

`convenience` Module¶

`evalexpr_script` Module¶

`filters` Module¶

`histogramHD` Module¶

`plots` Module¶

`plots_1` Module¶

`pyfusion_sigproc` Module¶

`save_compress` Module¶

`signal_processing` Module¶

`specgram_fftw3` Module¶

`tests` Module¶

`timeseries` Module¶

`utils` Module¶

`write_arff` Module¶