pulsar_playground package¶

Submodules¶

pulsar_playground.models module¶

Module for defining models based on parameters.py file.

pulsar_playground.models.keras_model(n, m, input_dim, drop_visible, drop_hidden)[source]¶

Function to build a sequential neural network.

Parameters:	n (int) – Number of hidden layers (network width). m (int) – Number of units per layer (network height). input_dim (int) – Length of feature vector.

pulsar_playground.models.model_dict = {'ann': (<keras.wrappers.scikit_learn.KerasClassifier object>, {'n': [1, 2], 'm': [12, 14], 'input_dim': [8], 'epochs': [10], 'batch_size': [100], 'drop_visible': [0.0], 'drop_hidden': [0.0, 0.1, 0.2], 'verbose': [0], 'callbacks': [[<keras.callbacks.EarlyStopping object>]]}), 'knn': (KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform'), {'n_neighbors': range(3, 12), 'weights': ['uniform', 'distance']}), 'lgr': (LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2', random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False), {'penalty': ['l1', 'l2'], 'C': array([0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45]), 'class_weight': [None, 'balanced'], 'solver': ['liblinear'], 'max_iter': [200]}), 'xgb': (XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), {'n_estimators': [400], 'max_depth': [3], 'min_child_weight': [3], 'gamma': [5], 'colsample_bytree': [0.8], 'learning_rate': [0.01], 'subsample': [1]}), 'xgb_gpu': (XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), {'tree_method': ['gpu_hist'], 'predictor': ['cpu_predictor'], 'n_estimators': [400], 'max_depth': [7], 'min_child_weight': [1], 'gamma': [9], 'learning_rate': [0.05], 'colsample_bytree': [1.0], 'subsample': [1.0]})}¶

Stores the available models.

Type:	dictionary

pulsar_playground.parameters module¶

Parameters for preprocessing and fine-tuning models.

pulsar_playground.parameters.ann_params = {'batch_size': [100], 'callbacks': [[<keras.callbacks.EarlyStopping object>]], 'drop_hidden': [0.0, 0.1, 0.2], 'drop_visible': [0.0], 'epochs': [10], 'input_dim': [8], 'm': [12, 14], 'n': [1, 2], 'verbose': [0]}¶

Parameter grid for KerasClassifier. If “rotate” is True then “input_dim” should match “n_components”. Otherwise must be equal to number of features. Please refer to Keras documentation for more information.

Type:	dictionary

pulsar_playground.parameters.disable_warnings = True¶

Disable warnings.

Type:	bool

pulsar_playground.parameters.knn_params = {'n_neighbors': range(3, 12), 'weights': ['uniform', 'distance']}¶

Parameter grid for KNeighborsClassifier. Please refer to Scikit Learn’s documentation for more information.

Type:	dictionary

pulsar_playground.parameters.lgr_params = {'C': array([0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45]), 'class_weight': [None, 'balanced'], 'max_iter': [200], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}¶

Parameter grid for LogisticRegression. Please refer to Scikit Learn’s documentation for more information.

Type:	dictionary

pulsar_playground.parameters.n_iter = 100¶

Max number of iterations for RandomizedSearchCV.

Type:	integer

pulsar_playground.parameters.oversample = True¶

Use SMOTE to fix class imbalance.

Type:	bool

pulsar_playground.parameters.scale = True¶

Standarize features with StandardScaler.

Type:	bool

pulsar_playground.parameters.searchargs = {'cv': 3, 'n_jobs': -1, 'scoring': 'accuracy', 'verbose': 2}¶

Extra arguments for Grid/RandomSearchCV.

Type:	dictionary

pulsar_playground.parameters.xgb_gpu_params = {'colsample_bytree': [1.0], 'gamma': [9], 'learning_rate': [0.05], 'max_depth': [7], 'min_child_weight': [1], 'n_estimators': [400], 'predictor': ['cpu_predictor'], 'subsample': [1.0], 'tree_method': ['gpu_hist']}¶

Parameter grid for XGBoostClassifier (GPU). Please refer to the XGBoost API documentation for more information.

Type:	dictionary

pulsar_playground.parameters.xgb_params = {'colsample_bytree': [0.8], 'gamma': [5], 'learning_rate': [0.01], 'max_depth': [3], 'min_child_weight': [3], 'n_estimators': [400], 'subsample': [1]}¶

Parameter grid for XGBoostClassifier. Please refer to the XGBoost API documentation for more information.

Type:	dictionary

pulsar_playground.plots module¶

Plotting module for data visualization and ML metrics

pulsar_playground.plots.dump_idx(y_pred_proba, threshold, filename='candidates.csv')[source]¶

Save indexes of examples predicted as positive.

Parameters:	y_pred_proba (array) – Predicted probability. threshold (float) – Decision threshold. filename (str) – Output file.

pulsar_playground.plots.plot_classprop(data, ax=None)[source]¶

Proportion of examples per class (pieplot).

Parameters:	data (DataFrame) – Pandas dataframe. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.plots.plot_cm(y_test, y_pred_proba, threshold, ax=None)[source]¶

Confusion matrix.

Parameters:	y_test (array) – Classes from the test split. y_pred_proba (array) – Predicted probability. threshold (float) – Decision threshold. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.plots.plot_ecdf(data, x_axis, ax=None)[source]¶

Plots the empirical cumulative distribution for each class.

Parameters:	data (DataFrame) – Pandas dataframe. x_axis (str) – Column name from dataframe. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.plots.plot_fcorr(data, x_axis, y_axis, transform_x='none', transform_y='none', ax=None)[source]¶

Feature vs. feature plot (scatterplot).

Parameters:	data (DataFrame) – Pandas dataframe. x_axis (str) – Column name from dataframe. y_axis (str) – Column name from dataframe. transform_x (str) – Dictionary key from ‘tfs’ dict. transform_y (str) – Dictionary key from ‘tfs’ dict. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.plots.plot_hist(data, x_axis, bins=10, ax=None)[source]¶

Plots histograms for each class.

Parameters:	data (DataFrame) – Pandas dataframe. x_axis (str) – Column name from dataframe. bins (int) – Number of bins. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.plots.plot_info(data, ax=None)[source]¶

Summary of given dataframe.

Parameters:	data (DataFrame) – Pandas dataframe. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.plots.plot_nulls(data, ax=None)[source]¶

Percentage of null entries per feature (barplot).

Parameters:	data (DataFrame) – Pandas dataframe. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.plots.plot_prc(y_test, y_pred_proba, threshold, ax=None)[source]¶

Precision and recall vs. threshold curves.

Parameters:	y_test (array) – Classes from the test split. y_pred_proba (array) – Predicted probability. threshold (float) – Decision threshold. ax (Axes) – Matplotlib subfigure axes.

pulsar_playground.utils module¶

Module for common tasks.

pulsar_playground.utils.get_n_params(model)[source]¶

Returns the total number of elements of a param grid.

Parameters:	model (str) – Dictionary key from ‘model’ dict from models.py.

pulsar_playground.utils.make_sets(filename, test_size=0.3, random_state=42, stratify=True)[source]¶

Splits dataset in two files: ‘train.csv’ and ‘test.csv’. Also binarizes the labels.

Parameters:	filename (str) – Input filename. test_size (float) – Test set ratio. random_state (int) – Random seed. stratify (bool) – Stratification by label.