Auto Text Classification
In [ ]:
Copied!
import os
import sys
os.chdir("../../")
import os
import sys
os.chdir("../../")
In [ ]:
Copied!
import warnings
warnings.filterwarnings("ignore")
import warnings
warnings.filterwarnings("ignore")
In [ ]:
Copied!
!pip install git+https://github.com/gradsflow/gradsflow@main
!pip install git+https://github.com/gradsflow/gradsflow@main
Installing collected packages: gradsflow Successfully installed gradsflow-0.0.8
In [ ]:
Copied!
!pip install "lightning-flash[text]" wandb
!pip install "lightning-flash[text]" wandb
In [ ]:
Copied!
from flash.core.data.utils import download_data
from flash.text import TextClassificationData
from gradsflow import AutoTextClassifier
import ray
from flash.core.data.utils import download_data
from flash.text import TextClassificationData
from gradsflow import AutoTextClassifier
import ray
In [ ]:
Copied!
download_data("https://pl-flash-data.s3.amazonaws.com/imdb.zip", "./data/")
datamodule = TextClassificationData.from_csv(
"review",
"sentiment",
train_file="data/imdb/train.csv",
val_file="data/imdb/valid.csv",
batch_size=64,
)
download_data("https://pl-flash-data.s3.amazonaws.com/imdb.zip", "./data/")
datamodule = TextClassificationData.from_csv(
"review",
"sentiment",
train_file="data/imdb/train.csv",
val_file="data/imdb/valid.csv",
batch_size=64,
)
Using custom data configuration default-9e8b8625e23aa791
Downloading and preparing dataset csv/default to /root/.cache/huggingface/datasets/csv/default-9e8b8625e23aa791/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519...
Downloading data files: 0%| | 0/1 [00:00<?, ?it/s]
Extracting data files: 0%| | 0/1 [00:00<?, ?it/s]
Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-9e8b8625e23aa791/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519. Subsequent calls will reuse this data.
0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/22500 [00:00<?, ?ex/s]
Using custom data configuration default-6a8d347e65f511f3
Downloading and preparing dataset csv/default to /root/.cache/huggingface/datasets/csv/default-6a8d347e65f511f3/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519...
Downloading data files: 0%| | 0/1 [00:00<?, ?it/s]
Extracting data files: 0%| | 0/1 [00:00<?, ?it/s]
Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-6a8d347e65f511f3/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519. Subsequent calls will reuse this data.
0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/2500 [00:00<?, ?ex/s]
In [ ]:
Copied!
suggested_conf = dict(
optimizer=["adam", "adamw"],
lr=(5e-4, 1e-3),
)
model = AutoTextClassifier(
datamodule,
suggested_backbones=["prajjwal1/bert-tiny"],
suggested_conf=suggested_conf,
max_epochs=4,
optimization_metric="val_accuracy",
n_trials=1,
)
print("AutoTextClassifier initialised!")
trainer_config = {"accelerator":"auto", "devices":1}
model.hp_tune(gpu=1, trainer_config=trainer_config)
suggested_conf = dict(
optimizer=["adam", "adamw"],
lr=(5e-4, 1e-3),
)
model = AutoTextClassifier(
datamodule,
suggested_backbones=["prajjwal1/bert-tiny"],
suggested_conf=suggested_conf,
max_epochs=4,
optimization_metric="val_accuracy",
n_trials=1,
)
print("AutoTextClassifier initialised!")
trainer_config = {"accelerator":"auto", "devices":1}
model.hp_tune(gpu=1, trainer_config=trainer_config)
In [ ]:
Copied!
model.analysis.dataframe()
model.analysis.dataframe()
Out[ ]:
val_accuracy | train_accuracy | time_this_iter_s | should_checkpoint | done | timesteps_total | episodes_total | training_iteration | trial_id | experiment_id | ... | hostname | node_ip | time_since_restore | timesteps_since_restore | iterations_since_restore | warmup_time | config/backbone | config/lr | config/optimizer | logdir | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.8004 | 0.984375 | 30.232423 | True | False | NaN | NaN | 4 | f3e66_00000 | 7ada6e549a4e4a7fbdfddadc99012877 | ... | b206c2fd5e9b | 172.28.0.2 | 139.360543 | 0 | 4 | 0.007067 | prajjwal1/bert-tiny | 0.000919 | adam | /root/ray_results/optimization_objective_2022-... |
1 rows ร 24 columns
In [ ]:
Copied!
from flash import Trainer
trainer = Trainer(accelerator="auto", devices=1)
from flash import Trainer
trainer = Trainer(accelerator="auto", devices=1)
GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs
In [ ]:
Copied!
trainer.validate(model.model, datamodule=datamodule)
trainer.validate(model.model, datamodule=datamodule)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Validation: 0it [00:00, ?it/s]
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Validate metric โ DataLoader 0 โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ val_accuracy โ 0.8787053227424622 โ โ val_cross_entropy โ 0.410677433013916 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Out[ ]:
[{'val_accuracy': 0.8787053227424622, 'val_cross_entropy': 0.410677433013916}]
Last update:
May 18, 2022