Error detection¶
HoloClean learns to clean data by first splitting it into two categories clean and dont_know or dk for short. We’ve provided two kinds of error detectors, the SqlDCErrorDetection which uses Denial Constraints to make these splits and the SqlnullErrorDetection which labels all the cells that have null values as don’t know cells. HoloClean also support custom or user-defined error detection methods by creating a new class that inherits from ErrorDetection and overrides the required methods.
ErrorDetector¶
-
class
holoclean.errordetection.errordetector.
ErrorDetection
(holo_obj, dataset)[source]¶ - This class is an abstract class for general error_detection ,
- it requires for every sub-class to implement the
get_clean_cells and get_noisy_cells method
SqlDCErrorDetection¶
-
class
holoclean.errordetection.sql_dcerrordetector.
SqlDCErrorDetection
(session)[source]¶ This class is a subclass of ErrorDetection class and will returns don’t know cells and clean cells based on the denial constraints
SqlnullErrorDetection¶
-
class
holoclean.errordetection.sql_dcerrordetector.
SqlDCErrorDetection
(session)[source] This class is a subclass of ErrorDetection class and will returns don’t know cells and clean cells based on the denial constraints
-
get_clean_cells
()[source] - Returns a dataframe that consists of index of clean cells index,
- attribute
Returns: spark dataframe
-
get_noisy_cells
()[source] - Returns a dataframe that consists of index of noisy cells index,
- attribute
Returns: spark_dataframe
-