pulsar_playground package¶
Submodules¶
pulsar_playground.models module¶
Module for defining models based on parameters.py file.
-
pulsar_playground.models.keras_model(n, m, input_dim, drop_visible, drop_hidden)[source]¶ Function to build a sequential neural network.
Parameters: - n (int) – Number of hidden layers (network width).
- m (int) – Number of units per layer (network height).
- input_dim (int) – Length of feature vector.
-
pulsar_playground.models.model_dict= {'ann': (<keras.wrappers.scikit_learn.KerasClassifier object>, {'n': [1, 2], 'm': [12, 14], 'input_dim': [8], 'epochs': [10], 'batch_size': [100], 'drop_visible': [0.0], 'drop_hidden': [0.0, 0.1, 0.2], 'verbose': [0], 'callbacks': [[<keras.callbacks.EarlyStopping object>]]}), 'knn': (KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform'), {'n_neighbors': range(3, 12), 'weights': ['uniform', 'distance']}), 'lgr': (LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2', random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False), {'penalty': ['l1', 'l2'], 'C': array([0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45]), 'class_weight': [None, 'balanced'], 'solver': ['liblinear'], 'max_iter': [200]}), 'xgb': (XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), {'n_estimators': [400], 'max_depth': [3], 'min_child_weight': [3], 'gamma': [5], 'colsample_bytree': [0.8], 'learning_rate': [0.01], 'subsample': [1]}), 'xgb_gpu': (XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=True, subsample=1), {'tree_method': ['gpu_hist'], 'predictor': ['cpu_predictor'], 'n_estimators': [400], 'max_depth': [7], 'min_child_weight': [1], 'gamma': [9], 'learning_rate': [0.05], 'colsample_bytree': [1.0], 'subsample': [1.0]})}¶ Stores the available models.
Type: dictionary
pulsar_playground.parameters module¶
Parameters for preprocessing and fine-tuning models.
-
pulsar_playground.parameters.ann_params= {'batch_size': [100], 'callbacks': [[<keras.callbacks.EarlyStopping object>]], 'drop_hidden': [0.0, 0.1, 0.2], 'drop_visible': [0.0], 'epochs': [10], 'input_dim': [8], 'm': [12, 14], 'n': [1, 2], 'verbose': [0]}¶ Parameter grid for KerasClassifier. If “rotate” is True then “input_dim” should match “n_components”. Otherwise must be equal to number of features. Please refer to Keras documentation for more information.
Type: dictionary
-
pulsar_playground.parameters.disable_warnings= True¶ Disable warnings.
Type: bool
-
pulsar_playground.parameters.knn_params= {'n_neighbors': range(3, 12), 'weights': ['uniform', 'distance']}¶ Parameter grid for KNeighborsClassifier. Please refer to Scikit Learn’s documentation for more information.
Type: dictionary
-
pulsar_playground.parameters.lgr_params= {'C': array([0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43, 0.44, 0.45]), 'class_weight': [None, 'balanced'], 'max_iter': [200], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}¶ Parameter grid for LogisticRegression. Please refer to Scikit Learn’s documentation for more information.
Type: dictionary
-
pulsar_playground.parameters.n_iter= 100¶ Max number of iterations for RandomizedSearchCV.
Type: integer
-
pulsar_playground.parameters.oversample= True¶ Use SMOTE to fix class imbalance.
Type: bool
-
pulsar_playground.parameters.scale= True¶ Standarize features with StandardScaler.
Type: bool
-
pulsar_playground.parameters.searchargs= {'cv': 3, 'n_jobs': -1, 'scoring': 'accuracy', 'verbose': 2}¶ Extra arguments for Grid/RandomSearchCV.
Type: dictionary
-
pulsar_playground.parameters.xgb_gpu_params= {'colsample_bytree': [1.0], 'gamma': [9], 'learning_rate': [0.05], 'max_depth': [7], 'min_child_weight': [1], 'n_estimators': [400], 'predictor': ['cpu_predictor'], 'subsample': [1.0], 'tree_method': ['gpu_hist']}¶ Parameter grid for XGBoostClassifier (GPU). Please refer to the XGBoost API documentation for more information.
Type: dictionary
-
pulsar_playground.parameters.xgb_params= {'colsample_bytree': [0.8], 'gamma': [5], 'learning_rate': [0.01], 'max_depth': [3], 'min_child_weight': [3], 'n_estimators': [400], 'subsample': [1]}¶ Parameter grid for XGBoostClassifier. Please refer to the XGBoost API documentation for more information.
Type: dictionary
pulsar_playground.plots module¶
Plotting module for data visualization and ML metrics
-
pulsar_playground.plots.dump_idx(y_pred_proba, threshold, filename='candidates.csv')[source]¶ Save indexes of examples predicted as positive.
Parameters: - y_pred_proba (array) – Predicted probability.
- threshold (float) – Decision threshold.
- filename (str) – Output file.
-
pulsar_playground.plots.plot_classprop(data, ax=None)[source]¶ Proportion of examples per class (pieplot).
Parameters: - data (DataFrame) – Pandas dataframe.
- ax (Axes) – Matplotlib subfigure axes.
-
pulsar_playground.plots.plot_cm(y_test, y_pred_proba, threshold, ax=None)[source]¶ Confusion matrix.
Parameters: - y_test (array) – Classes from the test split.
- y_pred_proba (array) – Predicted probability.
- threshold (float) – Decision threshold.
- ax (Axes) – Matplotlib subfigure axes.
-
pulsar_playground.plots.plot_ecdf(data, x_axis, ax=None)[source]¶ Plots the empirical cumulative distribution for each class.
Parameters: - data (DataFrame) – Pandas dataframe.
- x_axis (str) – Column name from dataframe.
- ax (Axes) – Matplotlib subfigure axes.
-
pulsar_playground.plots.plot_fcorr(data, x_axis, y_axis, transform_x='none', transform_y='none', ax=None)[source]¶ Feature vs. feature plot (scatterplot).
Parameters: - data (DataFrame) – Pandas dataframe.
- x_axis (str) – Column name from dataframe.
- y_axis (str) – Column name from dataframe.
- transform_x (str) – Dictionary key from ‘tfs’ dict.
- transform_y (str) – Dictionary key from ‘tfs’ dict.
- ax (Axes) – Matplotlib subfigure axes.
-
pulsar_playground.plots.plot_hist(data, x_axis, bins=10, ax=None)[source]¶ Plots histograms for each class.
Parameters: - data (DataFrame) – Pandas dataframe.
- x_axis (str) – Column name from dataframe.
- bins (int) – Number of bins.
- ax (Axes) – Matplotlib subfigure axes.
-
pulsar_playground.plots.plot_info(data, ax=None)[source]¶ Summary of given dataframe.
Parameters: - data (DataFrame) – Pandas dataframe.
- ax (Axes) – Matplotlib subfigure axes.
-
pulsar_playground.plots.plot_nulls(data, ax=None)[source]¶ Percentage of null entries per feature (barplot).
Parameters: - data (DataFrame) – Pandas dataframe.
- ax (Axes) – Matplotlib subfigure axes.
-
pulsar_playground.plots.plot_prc(y_test, y_pred_proba, threshold, ax=None)[source]¶ Precision and recall vs. threshold curves.
Parameters: - y_test (array) – Classes from the test split.
- y_pred_proba (array) – Predicted probability.
- threshold (float) – Decision threshold.
- ax (Axes) – Matplotlib subfigure axes.
pulsar_playground.utils module¶
Module for common tasks.
-
pulsar_playground.utils.get_n_params(model)[source]¶ Returns the total number of elements of a param grid.
Parameters: model (str) – Dictionary key from ‘model’ dict from models.py.
-
pulsar_playground.utils.make_sets(filename, test_size=0.3, random_state=42, stratify=True)[source]¶ Splits dataset in two files: ‘train.csv’ and ‘test.csv’. Also binarizes the labels.
Parameters: - filename (str) – Input filename.
- test_size (float) – Test set ratio.
- random_state (int) – Random seed.
- stratify (bool) – Stratification by label.