from fastbook import *Data
fastai
Notes on the the DataBlock api.
Data In fastai
One of the most important things in fastai to understand is how you prepare your data for a model. The main workhorse for accomplishing this in fastai is the DataBlock api. Here is a hello world example of how this works:
Hello World DataBlock
The argument get_x and get_y operate on an iterable. Let’s define an interable as our data:
data = list(range(100))def get_x(r): return r
def get_y(r): return r + 10
dblock = DataBlock(get_x=get_x, get_y = get_y)
dsets = dblock.datasets(data)You can see a dataset like so:
dsets.train[0](89, 99)
You can also see a DataLoader like so:
dls = dblock.dataloaders(data, bs=5)next(iter(dls.train))(tensor([57, 66, 73, 30, 14]), tensor([67, 76, 83, 40, 24]))
With A DataFrame
Similarly, you can operate on one row at a time:
import pandas as pd
df = pd.DataFrame({'x': range(100), 'y': range(100) })
df.head()| x | y | |
|---|---|---|
| 0 | 0 | 0 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
def get_x(r): return r.x
def get_y(r): return r.y + 10
dblock = DataBlock(get_x=get_x, get_y=get_y)
dsets = dblock.datasets(df)dsets.train[0](78, 88)
dls = dblock.dataloaders(df, bs=3)
next(iter(dls.train))(tensor([90, 55, 11]), tensor([100, 65, 21]))
def tracer(nm):
def f(x, nm):
# print(f'{nm}:')
# print(f'\tinput: {x}')
# import ipdb; ipdb.set_trace()
return str(x)
return partial(f, nm=nm)def mult_0(x): return x * 0
def add_1(x): return x +1
tb = TransformBlock(item_tfms=[tracer('item_tfms')])
# def get_y(l): return sum(l)
db = DataBlock(blocks=(TransformBlock, TransformBlock),
get_x=mult_0,
get_y=add_1,
item_tfms=lambda x: str(x))data = L(range(10))
result = db.datasets(data)db.summary(data)Setting-up type transforms pipelines
Collecting items from [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Found 10 items
2 datasets of sizes 8,2
Setting up Pipeline: mult_0
Setting up Pipeline: add_1
Building one sample
Pipeline: mult_0
starting from
1
applying mult_0 gives
0
Pipeline: add_1
starting from
1
applying add_1 gives
2
Final sample: (0, 2)
Collecting items from [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Found 10 items
2 datasets of sizes 8,2
Setting up Pipeline: mult_0
Setting up Pipeline: add_1
Setting up after_item: Pipeline: <lambda> -> ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline:
Building one batch
Applying item_tfms to the first sample:
Pipeline: <lambda> -> ToTensor
starting from
(0, 2)
applying <lambda> gives
(0, 2)
applying ToTensor gives
(0, 2)
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
No batch_tfms to apply
result.train[0](0, 5)
result = db.dataloaders(data, bs=3)thing = iter(result.train)next(thing)(('0', '0', '0'), ('6', '7', '4'))
next(thing)(('0', '0', '0'), ('9', '5', '3'))
??TransformBlockdb = DataBlock(blocks=(TransformBlock, tb),
get_y=lambda x: str(x),
batch_tfms=tracer('batch_tfms'))result = db.datasets(data)
result = db.dataloaders(data, bs=3)result<fastai.data.core.DataLoaders>
thing = iter(result.train)next(thing)(('1', '5', '6'), ('1', '5', '6'))
f = aug_transforms()[0]fFlip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5}:
encodes: (TensorImage,object) -> encodes
(TensorMask,object) -> encodes
(TensorBBox,object) -> encodes
(TensorPoint,object) -> encodes
decodes: