Quickstart
Install the Python package (Python >=3.6 supported):
pip install compactem
Note
There might be issues with LightGBM installation on Mac (which our library depends on). See here and here.
Let’s get started with this short example:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits
from compactem.oracles import get_calibrated_gbm
from compactem.main import compact_using_oracle
from compactem.utils.data_format import DataInfo
from compactem.model_builder import DecisionTree
import pandas as pd
pd.options.display.float_format = '{:,.2f}'.format
# use small N, T for quick results
N, T = 1000, 50
X, y = load_digits(return_X_y=True)
X, _, y, _ = train_test_split(X, y, train_size=N, stratify=y, random_state=0)
dataset_info = DataInfo("digits", (X, y), [3, 4, 5], evals=T)
# if you run this a second time on the same task_dir you might want to set "overwrite=True"
aggr_results_df = compact_using_oracle(datasets_info=dataset_info,
model_builder_class=DecisionTree,
oracle=get_calibrated_gbm,
task_dir=r'output/quickstart')
print("Result summary:")
print(aggr_results_df[['dataset_name', 'complexity', 'avg_original_score',
'avg_new_score', 'pct_improvement']])
Here’s the output:
dataset_name |
complexity |
avg_original_score |
avg_new_score |
pct_improvement |
|
---|---|---|---|---|---|
0 |
digits |
3 |
0.39 |
0.46 |
18.25 |
1 |
digits |
4 |
0.55 |
0.58 |
6.87 |
2 |
digits |
5 |
0.70 |
0.71 |
1.28 |
You will likely not see those exact numbers, but if you successfully have a table displayed on the console, congratulations, it’s alive!
Here’s what happened in the above example:
We wanted to compact decision trees of certain sizes …
… using Gradient Boosted Decision Trees as the oracle.
Since our algorithm is iterative, we have also provided a budget of iterations.
The pct_improvement
shows how much the oracle guided scores, avg_new_score
, improve over the original scores, avg_original_score
, for a given model complexity
.
You can also obtain the instances the model selectively trained on - see Additional Stuff.