A Practical Guide to Deep Learning: From Data to Deployment

Access and Collect Labeled Data for Deep Learning

Why Is So Much Training Data Necessary?

Deep learning networks try to classify abstract patterns without having experience or existing knowledge to draw from. Deep learning needs more training data than traditional methods to offset humans’ domain knowledge. Your network will be only as good as the labeled data you provide. Several methods exist for acquiring labeled data.

Collect Your Own Data

You can build a database from scratch by collecting data from sensors. This is a good option in some cases, such as with autonomous vehicles, because billions of vehicles are on the road. Collecting your own data seems straightforward at first, but you need to consider collecting data across the entire solution space and labeling that data.

Access and Augment Existing Data

You can find all the required labeled data in an existing database. For example, for image classification you can use ImageNet. If an existing database doesn’t contain all needed training data, you can augment the data set by duplicating it with adjusted speech frequency and scaled and rotated images.

Synthesize Data

If you understand the physics of your problem well enough, you can build a simulation to synthesize training data. A benefit of this approach is that the data is already labeled. Synthesized data can also be used when it is too expensive or difficult to collect real data.

Example: Synthesizing Waveform Data

RF modulation schemes and the impairments that produce noise on them are well known, so they are perfect candidates for synthesized training data. The real test is how well a network trained on synthesized data can label actual RF data.

Explore another topic in this pocket guide:

Want to read a more detailed version?

Send to my email