fosanalysis
A framework to evaluate distributed fiber optic sensor data
Loading...
Searching...
No Matches
fosanalysis.preprocessing.masking.OSCP Class Reference

Class for outlier detection an cancellation based on the outlier specific correction procedure (OSCP) as originally presented in [7] and [8]. More...

Inheritance diagram for fosanalysis.preprocessing.masking.OSCP:

Public Member Functions

 __init__ (self, int max_radius, float threshold=None, float delta_s=None, int n_quantile=50, float min_quantile=0.5, str timespace="1d_space", *args, **kwargs)
 Construct an instance of the class.
 
- Public Member Functions inherited from fosanalysis.preprocessing.masking.AnomalyMasker
np.array run (self, np.array x, np.array y, np.array z, bool make_copy=True, str timespace=None, bool identify_only=False, *args, **kwargs)
 Mask strain reading anomalies with NaNs.
 
- Public Member Functions inherited from fosanalysis.utils.base.Task
 __init__ (self, *args, **kwargs)
 
- Public Member Functions inherited from fosanalysis.utils.base.Base
 __init__ (self, *args, **kwargs)
 Construct the object and warn about unused/unknown arguments.
 

Public Attributes

 delta_s = delta_s
 Setting for the threshold estimation.
 
 max_radius = max_radius
 The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature.
 
 min_quantile = min_quantile
 The quantile, from which the cumulated density function of relative heights is kept for threshold estimation.
 
 n_quantile = n_quantile
 Granularity for the threshold estimation resampling.
 
 threshold = threshold
 Relative height threshold above which a pixel is flagged as SRA.
 

Protected Member Functions

np.array _get_median_heights (self, z, radius)
 Get the height difference to the local vicinity of all the pixels.
 
tuple _get_quantiles (self, np.array values)
 Get quantiles of the the given data (including finite values only).
 
float _get_threshold (self, np.array values)
 Estimate the anomaly threshold from the data.
 
list _merge_groups (self, initial_groups)
 Merge all groups in the input that have at least one pairwise common entry.
 
np.array _outlier_candidates (self, z, SRA_array)
 Detect outlier candidates in the given strain data.
 
tuple _run_1d (self, np.array x, np.array z, np.array SRA_array, *args, **kwargs)
 Estimate which entries are strain reading anomalies in 1D.
 
tuple _run_2d (self, np.array x, np.array y, np.array z, np.array SRA_array, *args, **kwargs)
 Estimate which entries are strain reading anomalies in 2D.
 
np.array _verify_candidates_1d (self, z, SRA_array)
 This is the second phase of the algorithm according to [7], adapted for 1D operation.
 
np.array _verify_candidates_2d (self, z, SRA_array)
 This is the second phase of the algorithm according to [7], adapted for 2D operation.
 
- Protected Member Functions inherited from fosanalysis.preprocessing.masking.AnomalyMasker
tuple _map_2d (self, np.array x, np.array y, np.array z, np.array SRA_array, str timespace=None, *args, **kwargs)
 Estimate, which entries are strain reading anomalies, in 2D.
 

Detailed Description

Class for outlier detection an cancellation based on the outlier specific correction procedure (OSCP) as originally presented in [7] and [8].

The outlier detection is a two stage algorithm. The first stage, the detection of outlier candidates is based on the height difference of a pixel to the median height of its surrounding. If this height difference of a pixel exceeds a threshold it is marked as an outlier candidate. The threshold can be estimated from the data, based on the change rate of the cumulated density function of all differences in the data. In the second stage, groups are formed, limited by large differences in-between two pixels, (like a simple edge detection). The threshold for the difference is estimated like in the first stage. The members of the groups are then assigned outlier or normal status. Groups consisting of outlier candidates only are considered outlier. All other groups are considered normal data. Finally, all outliers are converted to NaN.

Definition at line 289 of file masking.py.

Constructor & Destructor Documentation

◆ __init__()

fosanalysis.preprocessing.masking.OSCP.__init__ ( self,
int max_radius,
float threshold = None,
float delta_s = None,
int n_quantile = 50,
float min_quantile = 0.5,
str timespace = "1d_space",
* args,
** kwargs )

Construct an instance of the class.

Parameters
max_radiusThe radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature.
delta_sSetting for the threshold estimation. This is minimal slope before the cumulated density function (CDF) of relative heights is considered to be leveled out enough to only leave SRAs with higher relative heights. The meaning is the required increase in value per quantile:

\[ \Delta S = \frac{\Delta H}{\Delta \mathrm{cdf}(H)} \]

where \(\mathrm{cdf}(H)\) is given unitless, as the cdf is normalized to \(\mathrm{cdf}(\infty) = 1\).
n_quantileGranularity for the threshold estimation resampling. The upper part (see min_quantile) of the cumulated density function of relative heights is resampled using this many points. Defaults to 50, which is equivalent to percentage accuracy. Resampling can increase the both performance and reliability. Deactivate it by setting it to None.
thresholdRelative height threshold above which a pixel is flagged as SRA. If set to None (default), it is estimated from the data using delta_s.
min_quantileThe quantile, from which the cumulated density function of relative heights is kept for threshold estimation. Defaults to 0.5, which is the upper half.
timespace
*argsAdditional positional arguments, will be passed to the superconstructor.
**kwargsAdditional keyword arguments, will be passed to the superconstructor.

Definition at line 309 of file masking.py.

Here is the call graph for this function:
Here is the caller graph for this function:

Member Function Documentation

◆ _get_median_heights()

np.array fosanalysis.preprocessing.masking.OSCP._get_median_heights ( self,
z,
radius )
protected

Get the height difference to the local vicinity of all the pixels.

The median height is retrieved by filtering.SlidingFilter. The local vicinity is determined by the inradius \(r\) or the quadratic sliding window (see filtering.SlidingFilter.radius). Then, the absolute difference between the array of the median and and the pixels's values is returned.

Parameters
zArray containing strain data.
radiusInradius of the sliding window.

Definition at line 497 of file masking.py.

Here is the caller graph for this function:

◆ _get_quantiles()

tuple fosanalysis.preprocessing.masking.OSCP._get_quantiles ( self,
np.array values )
protected

Get quantiles of the the given data (including finite values only).

Only quantiles above min_quantile are returned. If n_quantile is None, the upper part (> min_quantile) of the sorted values is returned. Else, the upper part is resampled into n_quantile + 1 points.

Parameters
valuesArray, for which to calculate the quantiles.

Definition at line 518 of file masking.py.

Here is the caller graph for this function:

◆ _get_threshold()

float fosanalysis.preprocessing.masking.OSCP._get_threshold ( self,
np.array values )
protected

Estimate the anomaly threshold from the data.

The threshold \(t\) is set to the point, where the cumulated density function is leveled out. That is, whre the required increase in value per increase in quantile exceeds delta_s. If threshold is set to None it is determined from the data and delta_s, else it is simply returned.

Parameters
valuesArray, from which to estimate the threshold.

Definition at line 537 of file masking.py.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ _merge_groups()

list fosanalysis.preprocessing.masking.OSCP._merge_groups ( self,
initial_groups )
protected

Merge all groups in the input that have at least one pairwise common entry.

Each group is a set of tuple standing for the strain array indices. The result is a list of pairwise distinct groups, equivalent to the input.

Parameters
initial_groupsList of input groups (sets of tupless).

Definition at line 557 of file masking.py.

Here is the caller graph for this function:

◆ _outlier_candidates()

np.array fosanalysis.preprocessing.masking.OSCP._outlier_candidates ( self,
z,
SRA_array )
protected

Detect outlier candidates in the given strain data.

This is the first phase according to [7]. For each radius \(r \in [1, r_{\mathrm{max}}]\), the relative height of all pixels is compared to the threshold.

Parameters
zArray containing strain data.
SRA_arrayArray indicating, outlier condidates.
Returns
Returns an updated SRA_array, with outlier candidates.

Definition at line 384 of file masking.py.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ _run_1d()

tuple fosanalysis.preprocessing.masking.OSCP._run_1d ( self,
np.array x,
np.array z,
np.array SRA_array,
* args,
** kwargs )
protected

Estimate which entries are strain reading anomalies in 1D.

This operation might be applied to on a 2D array by _map_2d(). This function is called, if:

  • the z is 1D or
  • timespace is set to "1d_space" or "1d_time".
    Parameters
    xArray of coordinate positions. Dependent on timespace it may hold:
    • x: sensor coordinates, (timespace = "1d_space")
    • y: time data (timespace = "1d_time")
    • indices, if none of both previous options match the z's shape.
    zArray of strain data in accordance to x and y.
    *argsAdditional positional arguments to customize the behaviour.
    **kwargsAdditional keyword arguments to customize the behaviour.
    Returns
    Returns a tuple like (x, z). They correspond to the input variables of the same name. Each of those might be changed.
    Parameters
    SRA_arrayArray of boolean values indicating SRAs by True and a valid entries by False. This function returns the SRA_array instead of the z array.

Reimplemented from fosanalysis.preprocessing.masking.AnomalyMasker.

Definition at line 359 of file masking.py.

Here is the call graph for this function:

◆ _run_2d()

tuple fosanalysis.preprocessing.masking.OSCP._run_2d ( self,
np.array x,
np.array y,
np.array z,
np.array SRA_array,
* args,
** kwargs )
protected

Estimate which entries are strain reading anomalies in 2D.

Needs to be reimplemented by sub-classes. This function is only called, if z is 2D and timespace is "2D".

Parameters
xArray of measuring point positions.
yArray of time stamps.
zArray of strain data in accordance to x and y.
*argsAdditional positional arguments to customize the behaviour.
**kwargsAdditional keyword arguments to customize the behaviour.
Returns
Returns a tuple like (x, y, z). They correspond to the input variables of the same name. Each of those might be changed.
Parameters
SRA_arrayArray of boolean values indicating SRAs by True and a valid entries by False. This function returns the SRA_array instead of the z array.

Reimplemented from fosanalysis.preprocessing.masking.AnomalyMasker.

Definition at line 371 of file masking.py.

Here is the call graph for this function:

◆ _verify_candidates_1d()

np.array fosanalysis.preprocessing.masking.OSCP._verify_candidates_1d ( self,
z,
SRA_array )
protected

This is the second phase of the algorithm according to [7], adapted for 1D operation.

Outlier candidates are verified as SRAs, by building groups, which are bordered by large enough increments between neighboring entries. The increment threshold is estimated by _get_threshold().

Three different types of groups are possible:

  1. normal pixels only,
  2. mixed normal pixels and outlier candidates,
  3. outlier candidates only.

Groups of the third type are considered outliers. Outlier candidates in mixed groups are reaccepted as normal data.

Parameters
zArray containing strain data.
SRA_arrayArray indicating, outlier condidates.
Returns
Returns an updated SRA_array with the identified SRAs.

Definition at line 400 of file masking.py.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ _verify_candidates_2d()

np.array fosanalysis.preprocessing.masking.OSCP._verify_candidates_2d ( self,
z,
SRA_array )
protected

This is the second phase of the algorithm according to [7], adapted for 2D operation.

Outlier candidates are verified as SRAs, by building groups, which are bordered by large enough increments between neighboring entries. The increment threshold is estimated by _get_threshold().

Three different types of groups are possible:

  1. normal pixels only,
  2. mixed normal pixels and outlier candidates,
  3. outlier candidates only.

Groups of the third type are considered outliers. Outlier candidates in mixed groups are reaccepted as normal data.

Parameters
zArray containing strain data.
SRA_arrayArray indicating, outlier condidates.
Returns
Returns an updated SRA_array with the identified SRAs.

Adaptation to a 2D takes some more steps, because the building of the groups is not as straight-forward as in 1D. This is not described in [7] and [8], so a detailed description of the taken approach is provided here. The detection of group boundaries is separated for each direction. Once along the space axis and once along the time axis separately, increments are calculated and the increment threshold is estimated by _get_threshold(). The next step (still separated for each direction) is generating groups of indices by iterating over the arrays indices. A new group is started if

  • the current index is contained in the set of group boundaries (indices of the group's start) or
  • a new row (or column) is started (that is the end of the array in this direction is reached and the iteration resumes with the first entry of the next line.

After all such groups are stored in a single list, the groups of indices are merged using _merge_groups(), until only pairwise distinct groups are left. If a pixel is contained in two groups, those groups are connected and merged into one. This results in non-rectangular shaped groups being built.

Finally, only groups containing candidates only are verified as SRA.

Definition at line 432 of file masking.py.

Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ delta_s

fosanalysis.preprocessing.masking.OSCP.delta_s = delta_s

Setting for the threshold estimation.

This is minimal slope before the cumulated density function (CDF) of relative heights is considered to be leveled out enough to only leave SRAs with higher relative heights. The meaning is the required increase in value per quantile:

\[ \Delta S = \frac{\Delta H}{\Delta \mathrm{cdf}(H)} \]

where \(\mathrm{cdf}(H)\) is given unitless, as the cdf is normalized to \(\mathrm{cdf}(\infty) = 1\).

Definition at line 343 of file masking.py.

◆ max_radius

fosanalysis.preprocessing.masking.OSCP.max_radius = max_radius

The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature.

Definition at line 334 of file masking.py.

◆ min_quantile

fosanalysis.preprocessing.masking.OSCP.min_quantile = min_quantile

The quantile, from which the cumulated density function of relative heights is kept for threshold estimation.

Defaults to 0.5, which is the upper half.

Definition at line 358 of file masking.py.

◆ n_quantile

fosanalysis.preprocessing.masking.OSCP.n_quantile = n_quantile

Granularity for the threshold estimation resampling.

The upper part (see min_quantile) of the cumulated density function of relative heights is resampled using this many points. Defaults to 50, which is equivalent to percentage accuracy. Resampling can increase the both performance and reliability. Deactivate it by setting it to None.

Definition at line 351 of file masking.py.

◆ threshold

fosanalysis.preprocessing.masking.OSCP.threshold = threshold

Relative height threshold above which a pixel is flagged as SRA.

If set to None (default), it is estimated from the data using delta_s.

Definition at line 354 of file masking.py.


The documentation for this class was generated from the following file: