Featurization
Holoclean uses different signals to create the features for the model that it will use for the learning part. We’ve provided three kinds of signals: SignalInit, SignalDC, SignalCooccur. HoloClean also support custom or user-defined signal by creating a new class that inherits from Featurizer and overrides the required methods.
Featurizer
-
class
holoclean.featurization.featurizer.
Featurizer
(session)[source]
This class is an abstract class for general featurizer, it requires for
every sub-class to implement the get_query method
-
get_query
()[source]
This method creates a string or strings of the query/queries that are
used to create the Signal
- :return a string or a list of strings of the query/queries that
- are used to create the Signal
SignalInit
-
class
holoclean.featurization.initfeaturizer.
SignalInit
(session)[source]
This class is a subclass of Featurizer class and
will return the query which represent the Initial Signal for the
clean and don’t know cells
-
get_query
(clean=1)[source]
Creates a string for the query that it is used to create the Initial
Signal
Parameters: | clean – shows if create the feature table for the clean or |
the don’t know cells
:return a list of length 1 with a string with the query
for this feature
SignalDC
-
class
holoclean.featurization.dcfeaturizer.
SignalDC
(denial_constraints, session)[source]
This class is a subclass of the Featurizer class and
will return a list of queries which represent the DC Signal for the
clean and don’t know cells
-
get_query
(clean=1, dcquery_prod=None)[source]
Creates a list of strings for the queries that are used to create the
DC Signals
Parameters: | clean – shows if we create the feature table for the clean or the |
dk cells
:param dcquery_prod: a thread that we will produce the final queries
:return a list of strings for the queries for this feature
SignalCooccur
-
class
holoclean.featurization.cooccurrencefeaturizer.
SignalCooccur
(session)[source]
This class is a subclass of Featurizer class for the co-occur signal and
will fill the tensor
-
get_query
(clean=1)[source]
Adding co-occur feature
Parameters: | clean – shows if create the feature table for the clean or the dk
cells |
:return list
-
insert_to_tensor
(tensor, clean)[source]
Inserting co-occur data into tensor
Parameters: |
- tensor – tensor object
- clean – Nat value that identifies if we are calculating feature
|
value for training data (clean cells) or testing data