2. acquisition – data acquisition

The acquisition package is responsible for fetching data from an experimental database and returning pyfusion data objects. Base classes as well as datasystem-specific sub-packages are provided.

Two classes are involved in obtaining data. An acquisition class (subclass of BaseAcquisition) provides the basic interface to the data source, setting up any connections required. A fetcher class (subclass of BaseDataFetcher) is used to get data from a specified channel and shot number. In general usage, a fetcher class is not handled directly, but via the getdata() method. For example:

>>> import pyfusion
>>> h1 = pyfusion.getDevice('H1')
>>> mirnov_data = h1.acq.getdata(58123, 'H1_mirnov_array_1_coil_1')

Here, h1 is an instance of H1 (the subclass of Device specified in the [Device:H1] section in the configuration file). When instantiated, the device class checks the configuration file for a acquisition class specification, and attaches an instance of the specified acquisition class, here h1.acq (which is a synonym of h1.acquisition). The getdata() method checks for a configuration section (here it is a section named [Diagnostic:H1_mirnov_array_1_coil_1]) with information about the diagnostic including which data fetcher class to use. The data fetcher is then called to fetch and return the data.

2.1. Base classes

class pyfusion.acquisition.base.BaseAcquisition(config_name=None, **kwargs)[source]

Base class for datasystem specific acquisition classes.

Parameters:config_name – name of acquisition as specified in configuration file.

On instantiation, the pyfusion configuration is searched for a [Acquisition:config_name] section. The contents of the configuration section are loaded into the object namespace. For example, a configuration section:

[Acquisition:my_custom_acq]
acq_class = pyfusion.acquisition.base.BaseAcquisition
server = my.dataserver.com

will result in the following behaviour:

>>> from pyfusion.acquisition.base import BaseAcquisition
>>> my_acq = BaseAcquisition('my_custom_acq')
>>> print(my_acq.server)
my.dataserver.com

The configuration entries can be overridden with keyword arguments:

>>> my_other_acq = BaseAcquisition('my_custom_acq', server='your.data.net')
>>> print(my_other_acq.server)
your.data.net
getdata(shot, config_name=None, **kwargs)[source]

Get the data and return prescribed subclass of BaseData.

Parameters:
  • shot – shot number
  • config_name – ?? bdb name of a fetcher class in the configuration file
Returns:

an instance of a subclass of BaseData or BaseDataSet

This method needs to know which data fetcher class to use, if a config_name argument is supplied then the [Diagnostic:config_name] section must exist in the configuration file and contain a data_fetcher class specification, for example:

[Diagnostic:H1_mirnov_array_1_coil_1]
data_fetcher = pyfusion.acquisition.H1.fetch.H1DataFetcher
mds_path = \h1data::top.operations.mirnov:a14_14:input_1
coords_cylindrical = 1.114, 0.7732, 0.355
coord_transform = H1_mirnov

If a data_fetcher keyword argument is supplied, it overrides the configuration file specification.

The fetcher class is instantiated, including any supplied keyword arguments, and the result of the fetch method of the fetcher class is returned.

class pyfusion.acquisition.base.BaseDataFetcher(acq, shot, config_name=None, **kwargs)[source]

Base class providing interface for fetching data from an experimental database.

Parameters:
  • acq – in instance of a subclass of BaseAcquisition
  • shot – shot number
  • config_name – name of a Diagnostic configuration section.

It is expected that subclasses of BaseDataFetcher will be called via the getdata() method, which calls the data fetcher’s fetch() method.

do_fetch()[source]

Actually fetches the data, using the environment set up by setup()

Returns:an instance of a subclass of BaseData or BaseDataSet

Although BaseDataFetcher.do_fetch() does not return any data object itself, it is expected that a do_fetch() method on a subclass of BaseDataFetcher will.

error_info(step=None)[source]

return specific information about error to aid interpretation - e.g for mds, path The dummy return should be replaced in the specific routines

fetch()[source]

Always use this to fetch the data, so that setup() and pulldown() are used to setup and pull down the environmet used by do_fetch().

Returns:the instance of a subclass of BaseData or BaseDataSet returned by do_fetch()
find_valid_for_shot()[source]

Determine if this diag definition or modified diag is valid for this shot

pulldown()[source]

Called by fetch() after retrieving the data.

setup()[source]

Called by fetch() before retrieving the data. setup() ideally does sufficient of the preliminaries for fetch so that their is enough information in self. for a useful error report if fetch fails, while avoiding exceptions wherever possible (during setup()).

class pyfusion.acquisition.base.MultiChannelFetcher(acq, shot, config_name=None, **kwargs)[source]

Fetch data from a diagnostic with multiple timeseries channels.

This fetcher requres a multichannel configuration section such as:

[Diagnostic:H1_mirnov_array_1]
data_fetcher = pyfusion.acquisition.base.MultiChannelFetcher
channel_1 = H1_mirnov_array_1_coil_1
channel_2 = H1_mirnov_array_1_coil_2
channel_3 = H1_mirnov_array_1_coil_3
channel_4 = H1_mirnov_array_1_coil_4

The channel names must be channel_ followed by an integer, and the channel values must correspond to other configuration sections (for example [Diagnostic:H1_mirnov_array_1_coil_1], [Diagnostic:H1_mirnov_array_1_coil_1], etc) which each return a single channel instance of TimeseriesData.

fetch()[source]

Fetch each channel and combine into a multichannel instance of TimeseriesData.

Return type:TimeseriesData
ordered_channel_names()[source]

Get an ordered list of the channel names in the diagnostic

Return type:list

2.2. Sub-packages for specific data sources

Custom subclasses BaseAcquisition and BaseDataFetcher classes are contained in dedicated sub-packages. Each sub-package has the structure:

subpkg/
      __init__.py
      acq.py
      fetch.py

with acq.py containing a subclass of BaseAcquisition and fetch.py containing a subclass of BaseDataFetcher.

2.2.1. MDSPlus

Interface for MDSplus data acquisition and storage.

This package depends on the MDSplus python package, available from http://www.mdsplus.org/binaries/python/

Pyfusion supports four modes for accessing MDSplus data:

  1. local
  2. thick client
  3. thin client
  4. HTTP via a H1DS MDSplus web service

The data access mode used is determined by the mds path and server variables in the configuration file (or supplied to the acquisition class via keyword arguments):

[Acquisition:my_data]
acq_class = pyfusion.acquisition.MDSPlus.acq.MDSPlusAcquisition
mydata_path = ...
server = my.mdsdataserver.net

The full MDSplus node path is stored in a diagnostic configuration section:

[Diagnostic:my_probe]
data_fetcher = pyfusion.acquisition.MDSPlus.fetch.MDSPlusDataFetcher
mds_node_path = \mydata::top.probe_signal
  # Note that changing data sources (fetchers) is easier with :ref:`substitutions`

2.2.1.1. Local data access

The ‘local’ mode is used when a tree path definition refers to the local file system rather than an MDSplus server on the network. The mydata_path entry in the above example would look something like:

mydata_path = /path/to/my/data

2.2.1.2. Thick client access

The ‘thick client’ mode uses an MDSplus data server to retieve the raw data files, but the client is responsible for evaluating expressions and decompressing the data. The server tree definitions are used, and the server for a given mds tree is specified by the tree path in the format:

mydata_path = my.mdsdataserver.net::

or, if a port other than the default (8000) is used:

mydata_path = my.mdsdataserver.net:port_number::

2.2.1.3. Thin client access

The ‘thin client’ mode maintains a connection to an MDSplus data server. Expressions are evaluated and data decompressed on the server, requiring greater amounts of data to be transferred over the network. Because the thin client mode uses the tree paths defined on the server, no path variable is required. Instead, the server entry is used:

server = my.mdsdataserver.net

or, if a port other than the default (8000) is used:

server = my.mdsdataserver.net:port_number

2.2.1.4. HTTP web service access

The HTTP web service mode uses standard HTTP queries via the H1DS RESTful API to access the MDSplus data. The server is responsible for evaluating the data and transmits quantisation-compressed data to the client over port 80. This is especially useful if the MDSplus data is behind a firewall. The server attribute will be used for web service access if it begins with http://, for example:

server = http://h1svr.anu.edu.au/mdsplus/

The server attribute must be the URL component up to the MDSplus tree name. In this example, the URL for mds path:attr:\h1data::top.operations.mirnov:a14_14:input_1 and shot 58063 corresponds to http://h1svr.anu.edu.au/mdsplus/h1data/58063/top/operations/mirnov/a14_14/input_1/

2.2.1.5. How Pyfusion chooses the access mode

If an acquisition configuration section contains a server entry (which does not start with http://), then MDSPlusAcquisition will set up a connection to the mdsip server when it is instantiated. Additionally, any tree path definitions (local and thick client) are loaded into the runtime environment at this time. When a call to the data fetcher is made (via getdata()), the data fetcher uses the full node path (including tree name) from the configuration file. If a matching (tree name) _path variable is defined for the acquisition module, then the corresponding local or thick client mode will be used. If no tree path is defined then, if the server variable is defined, pyfusion will attempt to use either the web services mode (if server begins with http://) or the thin client mode (if server does not begin with http://).

2.2.1.6. Classes

class pyfusion.acquisition.MDSPlus.acq.MDSPlusAcquisition(*args, **kwargs)[source]

Acquisition class for MDSplus data systems.

If a ‘server’ configuration parameter (not starting with ‘http’) is provided, a connection for thin client access will be set up. Also, any configuration parameters which end with ‘_path’ will be loaded into the environment.

2.2.2. H1

The H1 data acquisition package.

This subpackage contains a subclass of the MDSplus data fetcher which gets additional H1 specific metadata.

2.2.2.1. Classes

2.2.3. LHD

Data acquisition for LHD.

2.2.3.1. Classes

class pyfusion.acquisition.LHD.fetch.LHDTimeseriesDataFetcher(acq, shot, config_name=None, **kwargs)[source]

need: export Retrieve=~/retrieve/bin/ # (maybe not) export INDEXSERVERNAME=DasIndex.LHD.nifs.ac.jp/LHD

Debugging

Off-site in pyfusion:

# set the config to use LHD fetcher
pyfusion.config.set('DEFAULT','LHDfetcher','pyfusion.acquisition.LHD.fetch.LHDTimeseriesDataFetcher')
# choose a shot that doesn't exist locally
run pyfusion/examples/plot_signals.py shot_number=999 diag_name='VSL_6' dev_name='LHD'

On-site test lines for exes:

retrieve SX8O 74181 1 33
retrieve Magnetics3lab1 74181 1 33
2015: retrieve_t seems to only work on FMD
retrieve_t FMD 117242 1 33
different error messages on Magnetics3lab1

Using retrieve_t:

Don't know when it is needed -  always trying it first?
if it gives an error, calculate according to .prm
timeit fmd=retriever.retrieve('Magnetics3lab1',105396,1,[33],False)
142ms without retrieve_t, 224 with, including failure (set True in above)

2.2.4. DSV

Acquisition module for data in a delimiter-separated value (DSV) format.

parameter description
filename Name of data file, with (shot) substitution string, e.g. /data/(shot).dat -> /data/12345.dat for shot 12345. (required)
delimiter Delimiter character for values, e.g. , for comma separated value (CSV) format. (optional, default is whitespace)

This module provides support for reading data from a plain text file via numpy’s genfromtxt function. The only required configuration parameter is filename, which can include a shot number substitution string (shot). An an example, consider the following datafile for 2-channel timeseries signal for shot number 12345:

# timebase   channel 1     channel 2
3.000000e+00 -1.201389e-01  3.177084e-01
3.000002e+00  6.437500e-01 -4.461806e-01
3.000004e+00  5.347222e-02 -1.684028e-01
3.000006e+00  1.923611e-01 -2.951390e-02
3.000008e+00  4.006945e-01 -5.156250e-01
3.000010e+00 -8.840278e-01  1.012153e+00
3.000012e+00  2.618056e-01 -2.031250e-01
3.000014e+00 -1.597222e-02 -1.336806e-01
3.000016e+00 -1.597222e-02  1.788194e-01
3.000018e+00  5.743055e-01 -7.586806e-01

If the datafile is saved at /data/mirnov_data_12345.txt, we could use the following configuration file:

[Acquisition:my_text_data]
acq_class = pyfusion.acquisition.DSV.acq.DSVAcquisition

[Diagnostic:mirnov_data]
data_fetcher = pyfusion.acquisition.DSV.fetch.DSVMultiChannelTimeseriesFetcher
filename = /data/mirnov_data_(shot).txt

And access the data with pyfusion:

>>> import pyfusion as pf
>>> acq = pf.getAcquisition("my_text_data")
>>> data = acq.getdata(12345, "mirnov_data")
>>> data.timebase
Timebase([ 3.      ,  3.000002,  3.000004,  3.000006,  3.000008,  3.00001 ,
        3.000012,  3.000014,  3.000016,  3.000018])
>>> data.signal[0]
Signal([-0.1201389 ,  0.64375   ,  0.05347222,  0.1923611 ,  0.4006945 ,
        -0.8840278 ,  0.2618056 , -0.01597222, -0.01597222,  0.5743055 ])
>>> data.signal[1]
Signal([ 0.3177084, -0.4461806, -0.1684028, -0.0295139, -0.515625 ,
        1.012153 , -0.203125 , -0.1336806,  0.1788194, -0.7586806])

By default, pyfusion expects values to be delimited by whitespace characters. The delimiting character can also be set in the configuration file, for example, the following datafile and configuration give the same result as the above example:

# timebase,   channel 1,     channel 2
3.000000e+00, -1.201389e-01,  3.177084e-01
3.000002e+00,  6.437500e-01, -4.461806e-01
3.000004e+00,  5.347222e-02, -1.684028e-01
3.000006e+00,  1.923611e-01, -2.951390e-02
3.000008e+00,  4.006945e-01, -5.156250e-01
3.000010e+00, -8.840278e-01,  1.012153e+00
3.000012e+00,  2.618056e-01, -2.031250e-01
3.000014e+00, -1.597222e-02, -1.336806e-01
3.000016e+00, -1.597222e-02,  1.788194e-01
3.000018e+00,  5.743055e-01, -7.586806e-01

where the configuration is:

[Acquisition:my_text_data]
acq_class = pyfusion.acquisition.DSV.acq.DSVAcquisition

[Diagnostic:mirnov_data]
data_fetcher = pyfusion.acquisition.DSV.fetch.DSVMultiChannelTimeseriesFetcher
filename = /data/mirnov_data_(shot).txt
delimiter = ,

Note that whitespace is stripped from configuration file values - if you want to use whitespace delimited data, as in the first example, simply omit the delimiter setting in your configuration.

2.2.4.1. Classes

class pyfusion.acquisition.DSV.acq.DSVAcquisition(config_name=None, **kwargs)[source]
class pyfusion.acquisition.DSV.fetch.DSVMultiChannelTimeseriesFetcher(acq, shot, config_name=None, **kwargs)[source]

Fetch DSV data from specified filename.

This data fetcher uses two configuration parameters, filename (required) and delimiter (optioanl).

The filename parameter can include a substitution string (shot) which will be replaced with the shot number.

By default, whitespace is used for the delimiter character (if the delimiter parameter is not provided.)

2.2.5. FakeData

Acquisition module for generating fake timeseries data for testing purposes.

At present, only a single channel sine wave generator is provided. Available configuration parameters are:

parameter description
t0 Starting time of signal timebase.
n_samples Number of samples.
sample_freq Sample frequency (Hz).
frequency Frequency of test sine-wave signal (Hz).
amplitude Amplitude of test sine-wave signal.

All parameters are required.

For example, with the following configuration:

[Acquisition:fake_acq]
acq_class = pyfusion.acquisition.FakeData.acq.FakeDataAcquisition

[Diagnostic:fake_data]
data_fetcher = pyfusion.acquisition.FakeData.fetch.SingleChannelSineFetcher
t0 = 0.0
n_samples = 1024
sample_freq = 1.e6
frequency = 2.e4
amplitude = 2.5

we can generate a 20 kHz sine wave:

>>> import pyfusion as pf
>>> shot = 12345
>>> acq = pf.getAcquisition("fake_acq")
>>> data = acq.getdata(shot, "fake_data")
>>> data.timebase
Timebase([  0.00000000e+00,   1.00000000e-06,   2.00000000e-06, ...,
         1.02100000e-03,   1.02200000e-03,   1.02300000e-03])
>>> data.signal
Signal([ 0.        ,  0.31333308,  0.62172472, ...,  1.20438419,
        0.92031138,  0.62172472])

2.2.5.1. Classes

class pyfusion.acquisition.FakeData.acq.FakeDataAcquisition(config_name=None, **kwargs)[source]

Acquisition class for generating fake data for testing purposes.

class pyfusion.acquisition.FakeData.fetch.SingleChannelSineFetcher(acq, shot, config_name=None, **kwargs)[source]

Data fetcher for single channel sine wave.