pulsar_playground package

Submodules

pulsar_playground.models module

Module for defining models based on parameters.py file.

pulsar_playground.models.keras_model(n, m, input_dim, drop_visible, drop_hidden)[source]

Function to build a sequential neural network.

Parameters:
  • n (int) – Number of hidden layers (network width).
  • m (int) – Number of units per layer (network height).
  • input_dim (int) – Length of feature vector.
pulsar_playground.models.model_dict = {'ann': (<keras.wrappers.scikit_learn.KerasClassifier object>, {'n': [1, 2], 'm': [12, 14], 'input_dim': [8], 'epochs': [10], 'batch_size': [100], 'drop_visible': [0.0], 'drop_hidden': [0.0, 0.1, 0.2], 'verbose': [0], 'callbacks': [[<keras.callbacks.EarlyStopping object>]]}), 'knn': (KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform'), {'n_neighbors': range(3, 12), 'weights': ['uniform', 'distance']}), 'lgr': (LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2', random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False), {'penalty': ['l1', 'l2'], 'C': array([0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45]), 'class_weight': [None, 'balanced'], 'solver': ['liblinear'], 'max_iter': [200]}), 'xgb': (XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), {'n_estimators': [400], 'max_depth': [3], 'min_child_weight': [3], 'gamma': [5], 'colsample_bytree': [0.8], 'learning_rate': [0.01], 'subsample': [1]}), 'xgb_gpu': (XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), {'tree_method': ['gpu_hist'], 'predictor': ['cpu_predictor'], 'n_estimators': [400], 'max_depth': [7], 'min_child_weight': [1], 'gamma': [9], 'learning_rate': [0.05], 'colsample_bytree': [1.0], 'subsample': [1.0]})}

Stores the available models.

Type:dictionary

pulsar_playground.parameters module

Parameters for preprocessing and fine-tuning models.

pulsar_playground.parameters.ann_params = {'batch_size': [100], 'callbacks': [[<keras.callbacks.EarlyStopping object>]], 'drop_hidden': [0.0, 0.1, 0.2], 'drop_visible': [0.0], 'epochs': [10], 'input_dim': [8], 'm': [12, 14], 'n': [1, 2], 'verbose': [0]}

Parameter grid for KerasClassifier. If “rotate” is True then “input_dim” should match “n_components”. Otherwise must be equal to number of features. Please refer to Keras documentation for more information.

Type:dictionary
pulsar_playground.parameters.disable_warnings = True

Disable warnings.

Type:bool
pulsar_playground.parameters.knn_params = {'n_neighbors': range(3, 12), 'weights': ['uniform', 'distance']}

Parameter grid for KNeighborsClassifier. Please refer to Scikit Learn’s documentation for more information.

Type:dictionary
pulsar_playground.parameters.lgr_params = {'C': array([0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45]), 'class_weight': [None, 'balanced'], 'max_iter': [200], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}

Parameter grid for LogisticRegression. Please refer to Scikit Learn’s documentation for more information.

Type:dictionary
pulsar_playground.parameters.n_iter = 100

Max number of iterations for RandomizedSearchCV.

Type:integer
pulsar_playground.parameters.oversample = True

Use SMOTE to fix class imbalance.

Type:bool
pulsar_playground.parameters.scale = True

Standarize features with StandardScaler.

Type:bool
pulsar_playground.parameters.searchargs = {'cv': 3, 'n_jobs': -1, 'scoring': 'accuracy', 'verbose': 2}

Extra arguments for Grid/RandomSearchCV.

Type:dictionary
pulsar_playground.parameters.xgb_gpu_params = {'colsample_bytree': [1.0], 'gamma': [9], 'learning_rate': [0.05], 'max_depth': [7], 'min_child_weight': [1], 'n_estimators': [400], 'predictor': ['cpu_predictor'], 'subsample': [1.0], 'tree_method': ['gpu_hist']}

Parameter grid for XGBoostClassifier (GPU). Please refer to the XGBoost API documentation for more information.

Type:dictionary
pulsar_playground.parameters.xgb_params = {'colsample_bytree': [0.8], 'gamma': [5], 'learning_rate': [0.01], 'max_depth': [3], 'min_child_weight': [3], 'n_estimators': [400], 'subsample': [1]}

Parameter grid for XGBoostClassifier. Please refer to the XGBoost API documentation for more information.

Type:dictionary

pulsar_playground.plots module

Plotting module for data visualization and ML metrics

pulsar_playground.plots.dump_idx(y_pred_proba, threshold, filename='candidates.csv')[source]

Save indexes of examples predicted as positive.

Parameters:
  • y_pred_proba (array) – Predicted probability.
  • threshold (float) – Decision threshold.
  • filename (str) – Output file.
pulsar_playground.plots.plot_classprop(data, ax=None)[source]

Proportion of examples per class (pieplot).

Parameters:
  • data (DataFrame) – Pandas dataframe.
  • ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.plots.plot_cm(y_test, y_pred_proba, threshold, ax=None)[source]

Confusion matrix.

Parameters:
  • y_test (array) – Classes from the test split.
  • y_pred_proba (array) – Predicted probability.
  • threshold (float) – Decision threshold.
  • ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.plots.plot_ecdf(data, x_axis, ax=None)[source]

Plots the empirical cumulative distribution for each class.

Parameters:
  • data (DataFrame) – Pandas dataframe.
  • x_axis (str) – Column name from dataframe.
  • ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.plots.plot_fcorr(data, x_axis, y_axis, transform_x='none', transform_y='none', ax=None)[source]

Feature vs. feature plot (scatterplot).

Parameters:
  • data (DataFrame) – Pandas dataframe.
  • x_axis (str) – Column name from dataframe.
  • y_axis (str) – Column name from dataframe.
  • transform_x (str) – Dictionary key from ‘tfs’ dict.
  • transform_y (str) – Dictionary key from ‘tfs’ dict.
  • ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.plots.plot_hist(data, x_axis, bins=10, ax=None)[source]

Plots histograms for each class.

Parameters:
  • data (DataFrame) – Pandas dataframe.
  • x_axis (str) – Column name from dataframe.
  • bins (int) – Number of bins.
  • ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.plots.plot_info(data, ax=None)[source]

Summary of given dataframe.

Parameters:
  • data (DataFrame) – Pandas dataframe.
  • ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.plots.plot_nulls(data, ax=None)[source]

Percentage of null entries per feature (barplot).

Parameters:
  • data (DataFrame) – Pandas dataframe.
  • ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.plots.plot_prc(y_test, y_pred_proba, threshold, ax=None)[source]

Precision and recall vs. threshold curves.

Parameters:
  • y_test (array) – Classes from the test split.
  • y_pred_proba (array) – Predicted probability.
  • threshold (float) – Decision threshold.
  • ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.utils module

Module for common tasks.

pulsar_playground.utils.get_n_params(model)[source]

Returns the total number of elements of a param grid.

Parameters:model (str) – Dictionary key from ‘model’ dict from models.py.
pulsar_playground.utils.make_sets(filename, test_size=0.3, random_state=42, stratify=True)[source]

Splits dataset in two files: ‘train.csv’ and ‘test.csv’. Also binarizes the labels.

Parameters:
  • filename (str) – Input filename.
  • test_size (float) – Test set ratio.
  • random_state (int) – Random seed.
  • stratify (bool) – Stratification by label.

Module contents