from fastbook import *
Data
fastai
Notes on the the DataBlock api.
Data In fastai
One of the most important things in fastai to understand is how you prepare your data for a model. The main workhorse for accomplishing this in fastai is the DataBlock
api. Here is a hello world example of how this works:
Hello World DataBlock
The argument get_x
and get_y
operate on an iterable. Let’s define an interable as our data:
= list(range(100)) data
def get_x(r): return r
def get_y(r): return r + 10
= DataBlock(get_x=get_x, get_y = get_y)
dblock = dblock.datasets(data) dsets
You can see a dataset like so:
0] dsets.train[
(89, 99)
You can also see a DataLoader like so:
= dblock.dataloaders(data, bs=5) dls
next(iter(dls.train))
(tensor([57, 66, 73, 30, 14]), tensor([67, 76, 83, 40, 24]))
With A DataFrame
Similarly, you can operate on one row at a time:
import pandas as pd
= pd.DataFrame({'x': range(100), 'y': range(100) })
df df.head()
x | y | |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 2 | 2 |
3 | 3 | 3 |
4 | 4 | 4 |
def get_x(r): return r.x
def get_y(r): return r.y + 10
= DataBlock(get_x=get_x, get_y=get_y)
dblock = dblock.datasets(df) dsets
0] dsets.train[
(78, 88)
= dblock.dataloaders(df, bs=3)
dls next(iter(dls.train))
(tensor([90, 55, 11]), tensor([100, 65, 21]))
def tracer(nm):
def f(x, nm):
# print(f'{nm}:')
# print(f'\tinput: {x}')
# import ipdb; ipdb.set_trace()
return str(x)
return partial(f, nm=nm)
def mult_0(x): return x * 0
def add_1(x): return x +1
= TransformBlock(item_tfms=[tracer('item_tfms')])
tb # def get_y(l): return sum(l)
= DataBlock(blocks=(TransformBlock, TransformBlock),
db =mult_0,
get_x=add_1,
get_y=lambda x: str(x)) item_tfms
= L(range(10))
data = db.datasets(data) result
db.summary(data)
Setting-up type transforms pipelines
Collecting items from [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Found 10 items
2 datasets of sizes 8,2
Setting up Pipeline: mult_0
Setting up Pipeline: add_1
Building one sample
Pipeline: mult_0
starting from
1
applying mult_0 gives
0
Pipeline: add_1
starting from
1
applying add_1 gives
2
Final sample: (0, 2)
Collecting items from [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Found 10 items
2 datasets of sizes 8,2
Setting up Pipeline: mult_0
Setting up Pipeline: add_1
Setting up after_item: Pipeline: <lambda> -> ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline:
Building one batch
Applying item_tfms to the first sample:
Pipeline: <lambda> -> ToTensor
starting from
(0, 2)
applying <lambda> gives
(0, 2)
applying ToTensor gives
(0, 2)
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
No batch_tfms to apply
0] result.train[
(0, 5)
= db.dataloaders(data, bs=3) result
= iter(result.train) thing
next(thing)
(('0', '0', '0'), ('6', '7', '4'))
next(thing)
(('0', '0', '0'), ('9', '5', '3'))
??TransformBlock
= DataBlock(blocks=(TransformBlock, tb),
db =lambda x: str(x),
get_y=tracer('batch_tfms')) batch_tfms
= db.datasets(data)
result = db.dataloaders(data, bs=3) result
result
<fastai.data.core.DataLoaders>
= iter(result.train) thing
next(thing)
(('1', '5', '6'), ('1', '5', '6'))
= aug_transforms()[0] f
f
Flip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5}:
encodes: (TensorImage,object) -> encodes
(TensorMask,object) -> encodes
(TensorBBox,object) -> encodes
(TensorPoint,object) -> encodes
decodes: