Class for outlier detection an cancellation based on the outlier specific correction procedure (OSCP) as originally presented in [7] and [8]. More...

Inheritance diagram for fosanalysis.preprocessing.masking.OSCP:

Public Member Functions
	__init__ (self, int max_radius, float threshold=None, float delta_s=None, int n_quantile=50, float min_quantile=0.5, str timespace="1d_space", args, *kwargs)
	Construct an instance of the class.

Public Member Functions inherited from fosanalysis.preprocessing.masking.AnomalyMasker
np.array	run (self, np.array x, np.array y, np.array z, bool make_copy=True, str timespace=None, bool identify_only=False, args, *kwargs)
	Mask strain reading anomalies with `NaN`s.

Public Member Functions inherited from fosanalysis.utils.base.Task
	__init__ (self, args, *kwargs)

Public Member Functions inherited from fosanalysis.utils.base.Base
	__init__ (self, args, *kwargs)
	Construct the object and warn about unused/unknown arguments.

Public Attributes
	delta_s = delta_s
	Setting for the threshold estimation.

	max_radius = max_radius
	The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature.

	min_quantile = min_quantile
	The quantile, from which the cumulated density function of relative heights is kept for threshold estimation.

	n_quantile = n_quantile
	Granularity for the threshold estimation resampling.

	threshold = threshold
	Relative height threshold above which a pixel is flagged as SRA.

Protected Member Functions
np.array	_get_median_heights (self, z, radius)
	Get the height difference to the local vicinity of all the pixels.

tuple	_get_quantiles (self, np.array values)
	Get quantiles of the the given data (including finite values only).

float	_get_threshold (self, np.array values)
	Estimate the anomaly threshold from the data.

list	_merge_groups (self, initial_groups)
	Merge all groups in the input that have at least one pairwise common entry.

np.array	_outlier_candidates (self, z, SRA_array)
	Detect outlier candidates in the given strain data.

tuple	_run_1d (self, np.array x, np.array z, np.array SRA_array, args, *kwargs)
	Estimate which entries are strain reading anomalies in 1D.

tuple	_run_2d (self, np.array x, np.array y, np.array z, np.array SRA_array, args, *kwargs)
	Estimate which entries are strain reading anomalies in 2D.

np.array	_verify_candidates_1d (self, z, SRA_array)
	This is the second phase of the algorithm according to [7], adapted for 1D operation.

np.array	_verify_candidates_2d (self, z, SRA_array)
	This is the second phase of the algorithm according to [7], adapted for 2D operation.

Protected Member Functions inherited from fosanalysis.preprocessing.masking.AnomalyMasker
tuple	_map_2d (self, np.array x, np.array y, np.array z, np.array SRA_array, str timespace=None, args, *kwargs)
	Estimate, which entries are strain reading anomalies, in 2D.

Detailed Description

Class for outlier detection an cancellation based on the outlier specific correction procedure (OSCP) as originally presented in [7] and [8].

The outlier detection is a two stage algorithm. The first stage, the detection of outlier candidates is based on the height difference of a pixel to the median height of its surrounding. If this height difference of a pixel exceeds a threshold it is marked as an outlier candidate. The threshold can be estimated from the data, based on the change rate of the cumulated density function of all differences in the data. In the second stage, groups are formed, limited by large differences in-between two pixels, (like a simple edge detection). The threshold for the difference is estimated like in the first stage. The members of the groups are then assigned outlier or normal status. Groups consisting of outlier candidates only are considered outlier. All other groups are considered normal data. Finally, all outliers are converted to NaN.

Definition at line 289 of file masking.py.

Constructor & Destructor Documentation

◆ init()

fosanalysis.preprocessing.masking.OSCP.__init__	(		self,
		int	max_radius,
		float	threshold = None,
		float	delta_s = None,
		int	n_quantile = 50,
		float	min_quantile = 0.5,
		str	timespace = "1d_space",
		*	args,
		**	kwargs )

Construct an instance of the class.

Parameters

max_radius	The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature.
delta_s	Setting for the threshold estimation. This is minimal slope before the cumulated density function (CDF) of relative heights is considered to be leveled out enough to only leave SRAs with higher relative heights. The meaning is the required increase in value per quantile: \[ \Delta S = \frac{\Delta H}{\Delta \mathrm{cdf}(H)} \] where \(\mathrm{cdf}(H)\) is given unitless, as the cdf is normalized to \(\mathrm{cdf}(\infty) = 1\).
n_quantile	Granularity for the threshold estimation resampling. The upper part (see min_quantile) of the cumulated density function of relative heights is resampled using this many points. Defaults to `50`, which is equivalent to percentage accuracy. Resampling can increase the both performance and reliability. Deactivate it by setting it to `None`.
threshold	Relative height threshold above which a pixel is flagged as SRA. If set to `None` (default), it is estimated from the data using delta_s.
min_quantile	The quantile, from which the cumulated density function of relative heights is kept for threshold estimation. Defaults to `0.5`, which is the upper half.
timespace
*args	Additional positional arguments, will be passed to the superconstructor.
**kwargs	Additional keyword arguments, will be passed to the superconstructor.

Definition at line 309 of file masking.py.

Here is the call graph for this function:

Here is the caller graph for this function:

Member Function Documentation

◆ _get_median_heights()

np.array fosanalysis.preprocessing.masking.OSCP._get_median_heights	(	self,
		z,
		radius )

protected

Get the height difference to the local vicinity of all the pixels.

The median height is retrieved by filtering.SlidingFilter. The local vicinity is determined by the inradius \(r\) or the quadratic sliding window (see filtering.SlidingFilter.radius). Then, the absolute difference between the array of the median and and the pixels's values is returned.

Parameters

z	Array containing strain data.
radius	Inradius of the sliding window.

Definition at line 497 of file masking.py.

Here is the caller graph for this function:

◆ _get_quantiles()

tuple fosanalysis.preprocessing.masking.OSCP._get_quantiles	(		self,
		np.array	values )

protected

Get quantiles of the the given data (including finite values only).

Only quantiles above min_quantile are returned. If n_quantile is None, the upper part (> min_quantile) of the sorted values is returned. Else, the upper part is resampled into n_quantile + 1 points.

Parameters

values Array, for which to calculate the quantiles.

Definition at line 518 of file masking.py.

Here is the caller graph for this function:

◆ _get_threshold()

float fosanalysis.preprocessing.masking.OSCP._get_threshold	(		self,
		np.array	values )

protected

Estimate the anomaly threshold from the data.

The threshold \(t\) is set to the point, where the cumulated density function is leveled out. That is, whre the required increase in value per increase in quantile exceeds delta_s. If threshold is set to None it is determined from the data and delta_s, else it is simply returned.

Parameters

values Array, from which to estimate the threshold.

Definition at line 537 of file masking.py.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _merge_groups()

list fosanalysis.preprocessing.masking.OSCP._merge_groups	(		self,
			initial_groups )

protected

Merge all groups in the input that have at least one pairwise common entry.

Each group is a set of tuple standing for the strain array indices. The result is a list of pairwise distinct groups, equivalent to the input.

Parameters

initial_groups List of input groups (sets of tupless).

Definition at line 557 of file masking.py.

Here is the caller graph for this function:

◆ _outlier_candidates()

np.array fosanalysis.preprocessing.masking.OSCP._outlier_candidates	(	self,
		z,
		SRA_array )

protected

Detect outlier candidates in the given strain data.

This is the first phase according to [7]. For each radius \(r \in [1, r_{\mathrm{max}}]\), the relative height of all pixels is compared to the threshold.

Parameters

z	Array containing strain data.
SRA_array	Array indicating, outlier condidates.

Returns: Returns an updated SRA_array, with outlier candidates.

Definition at line 384 of file masking.py.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _run_1d()

tuple fosanalysis.preprocessing.masking.OSCP._run_1d	(		self,
		np.array	x,
		np.array	z,
		np.array	SRA_array,
		*	args,
		**	kwargs )

protected

Estimate which entries are strain reading anomalies in 1D.

This operation might be applied to on a 2D array by _map_2d(). This function is called, if:

the z is 1D or

timespace is set to "1d_space" or "1d_time".

Parameters

x	Array of coordinate positions. Dependent on timespace it may hold: `x`: sensor coordinates, (`timespace = "1d_space"`) `y`: time data (`timespace = "1d_time"`) indices, if none of both previous options match the `z`'s shape.
z	Array of strain data in accordance to `x` and `y`.
*args	Additional positional arguments to customize the behaviour.
**kwargs	Additional keyword arguments to customize the behaviour.

Returns: Returns a tuple like (x, z). They correspond to the input variables of the same name. Each of those might be changed.

Parameters

SRA_array Array of boolean values indicating SRAs by True and a valid entries by False. This function returns the SRA_array instead of the z array.

Reimplemented from fosanalysis.preprocessing.masking.AnomalyMasker.

Definition at line 359 of file masking.py.

Here is the call graph for this function:

◆ _run_2d()

tuple fosanalysis.preprocessing.masking.OSCP._run_2d	(		self,
		np.array	x,
		np.array	y,
		np.array	z,
		np.array	SRA_array,
		*	args,
		**	kwargs )

protected

Estimate which entries are strain reading anomalies in 2D.

Needs to be reimplemented by sub-classes. This function is only called, if z is 2D and timespace is "2D".

Parameters

x	Array of measuring point positions.
y	Array of time stamps.
z	Array of strain data in accordance to `x` and `y`.
*args	Additional positional arguments to customize the behaviour.
**kwargs	Additional keyword arguments to customize the behaviour.

Returns: Returns a tuple like (x, y, z). They correspond to the input variables of the same name. Each of those might be changed.

Parameters

SRA_array Array of boolean values indicating SRAs by True and a valid entries by False. This function returns the SRA_array instead of the z array.

Reimplemented from fosanalysis.preprocessing.masking.AnomalyMasker.

Definition at line 371 of file masking.py.

Here is the call graph for this function:

◆ _verify_candidates_1d()

np.array fosanalysis.preprocessing.masking.OSCP._verify_candidates_1d	(	self,
		z,
		SRA_array )

protected

This is the second phase of the algorithm according to [7], adapted for 1D operation.

Outlier candidates are verified as SRAs, by building groups, which are bordered by large enough increments between neighboring entries. The increment threshold is estimated by _get_threshold().

Three different types of groups are possible:

normal pixels only,
mixed normal pixels and outlier candidates,
outlier candidates only.

Groups of the third type are considered outliers. Outlier candidates in mixed groups are reaccepted as normal data.

Parameters

z	Array containing strain data.
SRA_array	Array indicating, outlier condidates.

Returns: Returns an updated SRA_array with the identified SRAs.

Definition at line 400 of file masking.py.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _verify_candidates_2d()

np.array fosanalysis.preprocessing.masking.OSCP._verify_candidates_2d	(	self,
		z,
		SRA_array )

protected

This is the second phase of the algorithm according to [7], adapted for 2D operation.

Outlier candidates are verified as SRAs, by building groups, which are bordered by large enough increments between neighboring entries. The increment threshold is estimated by _get_threshold().

Three different types of groups are possible:

normal pixels only,
mixed normal pixels and outlier candidates,
outlier candidates only.

Groups of the third type are considered outliers. Outlier candidates in mixed groups are reaccepted as normal data.

Parameters

z	Array containing strain data.
SRA_array	Array indicating, outlier condidates.

Returns: Returns an updated SRA_array with the identified SRAs.

Adaptation to a 2D takes some more steps, because the building of the groups is not as straight-forward as in 1D. This is not described in [7] and [8], so a detailed description of the taken approach is provided here. The detection of group boundaries is separated for each direction. Once along the space axis and once along the time axis separately, increments are calculated and the increment threshold is estimated by _get_threshold(). The next step (still separated for each direction) is generating groups of indices by iterating over the arrays indices. A new group is started if

the current index is contained in the set of group boundaries (indices of the group's start) or
a new row (or column) is started (that is the end of the array in this direction is reached and the iteration resumes with the first entry of the next line.

After all such groups are stored in a single list, the groups of indices are merged using _merge_groups(), until only pairwise distinct groups are left. If a pixel is contained in two groups, those groups are connected and merged into one. This results in non-rectangular shaped groups being built.

Finally, only groups containing candidates only are verified as SRA.

Definition at line 432 of file masking.py.

Here is the call graph for this function:

Here is the caller graph for this function:

Member Data Documentation

◆ delta_s

fosanalysis.preprocessing.masking.OSCP.delta_s = delta_s

Setting for the threshold estimation.

This is minimal slope before the cumulated density function (CDF) of relative heights is considered to be leveled out enough to only leave SRAs with higher relative heights. The meaning is the required increase in value per quantile:

\[ \Delta S = \frac{\Delta H}{\Delta \mathrm{cdf}(H)} \]

where \(\mathrm{cdf}(H)\) is given unitless, as the cdf is normalized to \(\mathrm{cdf}(\infty) = 1\).

Definition at line 343 of file masking.py.

◆ max_radius

fosanalysis.preprocessing.masking.OSCP.max_radius = max_radius

The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature.

Definition at line 334 of file masking.py.

◆ min_quantile

fosanalysis.preprocessing.masking.OSCP.min_quantile = min_quantile

The quantile, from which the cumulated density function of relative heights is kept for threshold estimation.

Defaults to 0.5, which is the upper half.

Definition at line 358 of file masking.py.

◆ n_quantile

fosanalysis.preprocessing.masking.OSCP.n_quantile = n_quantile

Granularity for the threshold estimation resampling.

The upper part (see min_quantile) of the cumulated density function of relative heights is resampled using this many points. Defaults to 50, which is equivalent to percentage accuracy. Resampling can increase the both performance and reliability. Deactivate it by setting it to None.

Definition at line 351 of file masking.py.

◆ threshold

fosanalysis.preprocessing.masking.OSCP.threshold = threshold

Relative height threshold above which a pixel is flagged as SRA.

If set to None (default), it is estimated from the data using delta_s.

Definition at line 354 of file masking.py.

The documentation for this class was generated from the following file:

src/fosanalysis/preprocessing/masking.py

Public Member Functions

Public Attributes

Protected Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ _get_median_heights()

◆ _get_quantiles()

◆ _get_threshold()

◆ _merge_groups()

◆ _outlier_candidates()

◆ _run_1d()

◆ _run_2d()

◆ _verify_candidates_1d()

◆ _verify_candidates_2d()

Member Data Documentation

◆ delta_s

◆ max_radius

◆ min_quantile

◆ n_quantile

◆ threshold

◆ init()