Repairing

The clean cells are used as training examples to learn the parameters (weights) of a softmax regression model. Once those weights are defined, we use this model to perform inference on the “don’t-know” cells and insert the most likely value for each cell.

Softmax

class holoclean.learning.softmax.SoftMax(session, X_training)[source]
build_model(featurizers, input_dim_non_dc, input_dim_dc, output_dim, tie_init=True, tie_DC=True)[source]

Initializes the logreg part of our model

Parameters:
  • input_dim_non_dc – number of init + cooccur features
  • featurizers – list of featurizers
  • input_dim_dc – number of dc features
  • output_dim – number of classes
  • tie_init – boolean to decide weight tying for init features
  • tie_DC – boolean to decide weight tying for dc features
Returns:

newly created LogReg model

log_weights()[source]

Writes weights in the logger

Returns:Null
logreg(featurizers)[source]

Trains our model on clean cells and predicts vals for clean cells

Returns:predictions
predict(model, x_val, mask=None)[source]

Runs our model on the test set

Parameters:
  • model – trained logreg model
  • x_val – test x tensor
  • mask – masking tensor to restrict domain
Returns:

predicted classes with probabilities

save_prediction(Y)[source]

Stores our predicted values in the database

Parameters:Y – tensor with probability for each class
Returns:Null
setupMask(clean=1, N=1, L=1)[source]

Initializes a masking tensor for ignoring impossible classes

Parameters:
  • clean – 1 if clean cells, 0 if don’t-know
  • N – number of examples
  • L – number of classes
Returns:

masking tensor

setuptrainingX(sparse=0)[source]

Initializes an X tensor of features for training

Parameters:sparse – 0 if dense tensor, 1 if sparse
Returns:x tensor of features
train(model, loss, optimizer, x_val, y_val, mask=None)[source]

Trains our model on the clean cells

Parameters:
  • model – logistic regression model
  • loss – loss function used for evaluating performance
  • optimizer – optimizer for our neural net
  • x_val – x tensor - features
  • y_val – y tensor - output for comparison
  • mask – masking tensor
Returns:

cost of traininng