Featurization

Holoclean uses different signals to create the features for the model that it will use for the learning part. We’ve provided three kinds of signals: SignalInit, SignalDC, SignalCooccur. HoloClean also support custom or user-defined signal by creating a new class that inherits from Featurizer and overrides the required methods.

Featurizer

class holoclean.featurization.featurizer.Featurizer(session)[source]

This class is an abstract class for general featurizer, it requires for every sub-class to implement the get_query method

get_query()[source]
This method creates a string or strings of the query/queries that are used to create the Signal
:return a string or a list of strings of the query/queries that
are used to create the Signal

SignalInit

class holoclean.featurization.initfeaturizer.SignalInit(session)[source]

This class is a subclass of Featurizer class and will return the query which represent the Initial Signal for the clean and don’t know cells

get_query(clean=1)[source]

Creates a string for the query that it is used to create the Initial Signal

Parameters:clean – shows if create the feature table for the clean or

the don’t know cells

:return a list of length 1 with a string with the query for this feature

SignalDC

class holoclean.featurization.dcfeaturizer.SignalDC(denial_constraints, session)[source]

This class is a subclass of the Featurizer class and will return a list of queries which represent the DC Signal for the clean and don’t know cells

get_query(clean=1, dcquery_prod=None)[source]

Creates a list of strings for the queries that are used to create the DC Signals

Parameters:clean – shows if we create the feature table for the clean or the

dk cells :param dcquery_prod: a thread that we will produce the final queries

:return a list of strings for the queries for this feature

SignalCooccur

class holoclean.featurization.cooccurrencefeaturizer.SignalCooccur(session)[source]

This class is a subclass of Featurizer class for the co-occur signal and will fill the tensor

get_query(clean=1)[source]

Adding co-occur feature

Parameters:clean – shows if create the feature table for the clean or the dk cells

:return list

insert_to_tensor(tensor, clean)[source]

Inserting co-occur data into tensor

Parameters:
  • tensor – tensor object
  • clean – Nat value that identifies if we are calculating feature

value for training data (clean cells) or testing data

Returns:None