|
fosanalysis
A framework to evaluate distributed fiber optic sensor data
|
Class for outlier detection an cancellation based on the outlier specific correction procedure (OSCP) as originally presented in [7] and [8]. More...
Public Member Functions | |
| __init__ (self, int max_radius, float threshold=None, float delta_s=None, int n_quantile=50, float min_quantile=0.5, str timespace="1d_space", *args, **kwargs) | |
| Construct an instance of the class. | |
Public Member Functions inherited from fosanalysis.preprocessing.masking.AnomalyMasker | |
| np.array | run (self, np.array x, np.array y, np.array z, bool make_copy=True, str timespace=None, bool identify_only=False, *args, **kwargs) |
Mask strain reading anomalies with NaNs. | |
Public Member Functions inherited from fosanalysis.utils.base.Task | |
| __init__ (self, *args, **kwargs) | |
Public Member Functions inherited from fosanalysis.utils.base.Base | |
| __init__ (self, *args, **kwargs) | |
| Construct the object and warn about unused/unknown arguments. | |
Public Attributes | |
| delta_s = delta_s | |
| Setting for the threshold estimation. | |
| max_radius = max_radius | |
| The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature. | |
| min_quantile = min_quantile | |
| The quantile, from which the cumulated density function of relative heights is kept for threshold estimation. | |
| n_quantile = n_quantile | |
| Granularity for the threshold estimation resampling. | |
| threshold = threshold | |
| Relative height threshold above which a pixel is flagged as SRA. | |
Protected Member Functions | |
| np.array | _get_median_heights (self, z, radius) |
| Get the height difference to the local vicinity of all the pixels. | |
| tuple | _get_quantiles (self, np.array values) |
| Get quantiles of the the given data (including finite values only). | |
| float | _get_threshold (self, np.array values) |
| Estimate the anomaly threshold from the data. | |
| list | _merge_groups (self, initial_groups) |
| Merge all groups in the input that have at least one pairwise common entry. | |
| np.array | _outlier_candidates (self, z, SRA_array) |
| Detect outlier candidates in the given strain data. | |
| tuple | _run_1d (self, np.array x, np.array z, np.array SRA_array, *args, **kwargs) |
| Estimate which entries are strain reading anomalies in 1D. | |
| tuple | _run_2d (self, np.array x, np.array y, np.array z, np.array SRA_array, *args, **kwargs) |
| Estimate which entries are strain reading anomalies in 2D. | |
| np.array | _verify_candidates_1d (self, z, SRA_array) |
| This is the second phase of the algorithm according to [7], adapted for 1D operation. | |
| np.array | _verify_candidates_2d (self, z, SRA_array) |
| This is the second phase of the algorithm according to [7], adapted for 2D operation. | |
Protected Member Functions inherited from fosanalysis.preprocessing.masking.AnomalyMasker | |
| tuple | _map_2d (self, np.array x, np.array y, np.array z, np.array SRA_array, str timespace=None, *args, **kwargs) |
| Estimate, which entries are strain reading anomalies, in 2D. | |
Class for outlier detection an cancellation based on the outlier specific correction procedure (OSCP) as originally presented in [7] and [8].
The outlier detection is a two stage algorithm. The first stage, the detection of outlier candidates is based on the height difference of a pixel to the median height of its surrounding. If this height difference of a pixel exceeds a threshold it is marked as an outlier candidate. The threshold can be estimated from the data, based on the change rate of the cumulated density function of all differences in the data. In the second stage, groups are formed, limited by large differences in-between two pixels, (like a simple edge detection). The threshold for the difference is estimated like in the first stage. The members of the groups are then assigned outlier or normal status. Groups consisting of outlier candidates only are considered outlier. All other groups are considered normal data. Finally, all outliers are converted to NaN.
Definition at line 289 of file masking.py.
| fosanalysis.preprocessing.masking.OSCP.__init__ | ( | self, | |
| int | max_radius, | ||
| float | threshold = None, | ||
| float | delta_s = None, | ||
| int | n_quantile = 50, | ||
| float | min_quantile = 0.5, | ||
| str | timespace = "1d_space", | ||
| * | args, | ||
| ** | kwargs ) |
Construct an instance of the class.
| max_radius | The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature. |
| delta_s | Setting for the threshold estimation. This is minimal slope before the cumulated density function (CDF) of relative heights is considered to be leveled out enough to only leave SRAs with higher relative heights. The meaning is the required increase in value per quantile: \[ \Delta S = \frac{\Delta H}{\Delta \mathrm{cdf}(H)} \] where \(\mathrm{cdf}(H)\) is given unitless, as the cdf is normalized to \(\mathrm{cdf}(\infty) = 1\). |
| n_quantile | Granularity for the threshold estimation resampling. The upper part (see min_quantile) of the cumulated density function of relative heights is resampled using this many points. Defaults to 50, which is equivalent to percentage accuracy. Resampling can increase the both performance and reliability. Deactivate it by setting it to None. |
| threshold | Relative height threshold above which a pixel is flagged as SRA. If set to None (default), it is estimated from the data using delta_s. |
| min_quantile | The quantile, from which the cumulated density function of relative heights is kept for threshold estimation. Defaults to 0.5, which is the upper half. |
| timespace | |
| *args | Additional positional arguments, will be passed to the superconstructor. |
| **kwargs | Additional keyword arguments, will be passed to the superconstructor. |
Definition at line 309 of file masking.py.
|
protected |
Get the height difference to the local vicinity of all the pixels.
The median height is retrieved by filtering.SlidingFilter. The local vicinity is determined by the inradius \(r\) or the quadratic sliding window (see filtering.SlidingFilter.radius). Then, the absolute difference between the array of the median and and the pixels's values is returned.
| z | Array containing strain data. |
| radius | Inradius of the sliding window. |
Definition at line 497 of file masking.py.
|
protected |
Get quantiles of the the given data (including finite values only).
Only quantiles above min_quantile are returned. If n_quantile is None, the upper part (> min_quantile) of the sorted values is returned. Else, the upper part is resampled into n_quantile + 1 points.
| values | Array, for which to calculate the quantiles. |
Definition at line 518 of file masking.py.
|
protected |
Estimate the anomaly threshold from the data.
The threshold \(t\) is set to the point, where the cumulated density function is leveled out. That is, whre the required increase in value per increase in quantile exceeds delta_s. If threshold is set to None it is determined from the data and delta_s, else it is simply returned.
| values | Array, from which to estimate the threshold. |
Definition at line 537 of file masking.py.
|
protected |
Merge all groups in the input that have at least one pairwise common entry.
Each group is a set of tuple standing for the strain array indices. The result is a list of pairwise distinct groups, equivalent to the input.
| initial_groups | List of input groups (sets of tupless). |
Definition at line 557 of file masking.py.
|
protected |
Detect outlier candidates in the given strain data.
This is the first phase according to [7]. For each radius \(r \in [1, r_{\mathrm{max}}]\), the relative height of all pixels is compared to the threshold.
| z | Array containing strain data. |
| SRA_array | Array indicating, outlier condidates. |
SRA_array, with outlier candidates. Definition at line 384 of file masking.py.
|
protected |
Estimate which entries are strain reading anomalies in 1D.
This operation might be applied to on a 2D array by _map_2d(). This function is called, if:
z is 1D or"1d_space" or "1d_time". | x | Array of coordinate positions. Dependent on timespace it may hold:
|
| z | Array of strain data in accordance to x and y. |
| *args | Additional positional arguments to customize the behaviour. |
| **kwargs | Additional keyword arguments to customize the behaviour. |
(x, z). They correspond to the input variables of the same name. Each of those might be changed. | SRA_array | Array of boolean values indicating SRAs by True and a valid entries by False. This function returns the SRA_array instead of the z array. |
Reimplemented from fosanalysis.preprocessing.masking.AnomalyMasker.
Definition at line 359 of file masking.py.
|
protected |
Estimate which entries are strain reading anomalies in 2D.
Needs to be reimplemented by sub-classes. This function is only called, if z is 2D and timespace is "2D".
| x | Array of measuring point positions. |
| y | Array of time stamps. |
| z | Array of strain data in accordance to x and y. |
| *args | Additional positional arguments to customize the behaviour. |
| **kwargs | Additional keyword arguments to customize the behaviour. |
(x, y, z). They correspond to the input variables of the same name. Each of those might be changed. | SRA_array | Array of boolean values indicating SRAs by True and a valid entries by False. This function returns the SRA_array instead of the z array. |
Reimplemented from fosanalysis.preprocessing.masking.AnomalyMasker.
Definition at line 371 of file masking.py.
|
protected |
This is the second phase of the algorithm according to [7], adapted for 1D operation.
Outlier candidates are verified as SRAs, by building groups, which are bordered by large enough increments between neighboring entries. The increment threshold is estimated by _get_threshold().
Three different types of groups are possible:
Groups of the third type are considered outliers. Outlier candidates in mixed groups are reaccepted as normal data.
| z | Array containing strain data. |
| SRA_array | Array indicating, outlier condidates. |
SRA_array with the identified SRAs. Definition at line 400 of file masking.py.
|
protected |
This is the second phase of the algorithm according to [7], adapted for 2D operation.
Outlier candidates are verified as SRAs, by building groups, which are bordered by large enough increments between neighboring entries. The increment threshold is estimated by _get_threshold().
Three different types of groups are possible:
Groups of the third type are considered outliers. Outlier candidates in mixed groups are reaccepted as normal data.
| z | Array containing strain data. |
| SRA_array | Array indicating, outlier condidates. |
SRA_array with the identified SRAs. Adaptation to a 2D takes some more steps, because the building of the groups is not as straight-forward as in 1D. This is not described in [7] and [8], so a detailed description of the taken approach is provided here. The detection of group boundaries is separated for each direction. Once along the space axis and once along the time axis separately, increments are calculated and the increment threshold is estimated by _get_threshold(). The next step (still separated for each direction) is generating groups of indices by iterating over the arrays indices. A new group is started if
After all such groups are stored in a single list, the groups of indices are merged using _merge_groups(), until only pairwise distinct groups are left. If a pixel is contained in two groups, those groups are connected and merged into one. This results in non-rectangular shaped groups being built.
Finally, only groups containing candidates only are verified as SRA.
Definition at line 432 of file masking.py.
| fosanalysis.preprocessing.masking.OSCP.delta_s = delta_s |
Setting for the threshold estimation.
This is minimal slope before the cumulated density function (CDF) of relative heights is considered to be leveled out enough to only leave SRAs with higher relative heights. The meaning is the required increase in value per quantile:
\[ \Delta S = \frac{\Delta H}{\Delta \mathrm{cdf}(H)} \]
where \(\mathrm{cdf}(H)\) is given unitless, as the cdf is normalized to \(\mathrm{cdf}(\infty) = 1\).
Definition at line 343 of file masking.py.
| fosanalysis.preprocessing.masking.OSCP.max_radius = max_radius |
The radius of the largest sliding window used in the outlier candidate detection stage \(r_{\mathrm{max}} > 1\) determines the size of the largest detectable outlier cluster, but also the the smallest preservable feature.
Definition at line 334 of file masking.py.
| fosanalysis.preprocessing.masking.OSCP.min_quantile = min_quantile |
The quantile, from which the cumulated density function of relative heights is kept for threshold estimation.
Defaults to 0.5, which is the upper half.
Definition at line 358 of file masking.py.
| fosanalysis.preprocessing.masking.OSCP.n_quantile = n_quantile |
Granularity for the threshold estimation resampling.
The upper part (see min_quantile) of the cumulated density function of relative heights is resampled using this many points. Defaults to 50, which is equivalent to percentage accuracy. Resampling can increase the both performance and reliability. Deactivate it by setting it to None.
Definition at line 351 of file masking.py.
| fosanalysis.preprocessing.masking.OSCP.threshold = threshold |
Relative height threshold above which a pixel is flagged as SRA.
If set to None (default), it is estimated from the data using delta_s.
Definition at line 354 of file masking.py.