Useful Database Seeding - Build APIs You Won't Hate: Everyone and their dog wants an API, so you should probably learn how to build them (2014)

Build APIs You Won't Hate: Everyone and their dog wants an API, so you should probably learn how to build them (2014)

1. Useful Database Seeding

1.1 Introduction

The first step to creating any sort of application is creating the database. Whether you are using some sort of relational platform, MongoDB, Riak, or whatever, you will need a vague idea of how your data is going to be stored.

For relational databases it is very likely you will start off your planning with an entity-relationship diagram and for document based databases such as MongoDB, CouchDB or ElasticSearch you will just let your application magically build a schema, but either way you need to create a plan - even if it is on a napkin. This book will assume a traditional relational database is storing your data but the principles are easily adapted for NoSQL systems too.

This chapter assumes you have already got a database designed and built. This chapter skips the “planning a database” section because there are plenty of other books on that already.

1.2 Introduction to Database Seeding

With a database schema designed and implemented, the next step is to store some data. Instead of entering your real data, it is far easier to use “dummy data” to test if the schema is appropriate for your API application. This brings the added benefit of letting you ditch the database and try again without worrying about maintaining the data.

The process of populating a database is known as “seeding”.

This data could be:

· test users

· content entries with a bunch of comments

· fake locations available for check-in

· fake notifications to display in an iPhone app (one of each type)

· credit-card payments at various stages of processing - with some complete, some half done and some super-fraudulent looking ones

The process of creating seeding scripts means you don’t need to waste time creating this manually over and over again. Ultimately, the more processes you can automate during the development of your API, the more time you have to consider the intricacies of your applications which need much more consideration.

Dummy data is necessary for realistic acceptance testing, getting freelancers/new hires up to speed with useful content, keeping real customer data private to those outside your company, and avoiding the temptation to copy live data over to your development environments.

Why is using production data in development bad?

Have you ever been writing a script that sends out emails and used some dummy copy while you’re building it? Ever used some cheeky words in that content? Ever accidentally sent that email out to 10,000 real customers email addresses? Ever been fired for losing a company over £200,000?

I haven’t, but I know a guy that has. Don’t be that guy.

What data should you use?

Garbage! Use absolute nonsense for your development database, but nonsense of the correct data type, size, and format. That can be done with a fun little library called Faker by François Zaninotto which is a wonderful little library that can essentially bullshit for Queen and country.

1.3 Building Seeders

Kapture, the company I work for, uses the Laravel framework which has the joys of having Database Seeding baked in. This is essentially a tarted up <abbr title="Command Line Interface">CLI</abbr> task which almost any modern PHP framework will have (or bloody well should do) so the principles are applicable to all.

Break your Database Seeders down into logical groupings. This doesn’t need to be “one seeder-per-table” but it can be. The reason I don’t try to stick to that rule is that sometimes your data needs to be built at the same time as other types of data, so for us Users are created in the same “seeder” as their settings, OAuth tokens, and friendship data is made. Putting that stuff into multiple seeders purely to keep things tidy would be an exercise in futility, and slow everything down for no reason.

In this chapter I will use, as an example, a check-in application. The application handles “users” and tracks their “check-ins” into “merchants” (or “venues”). “Merchants” also provide “campaigns” (or “opportunities”).

So, this is a drastically simplified version of our user seeder all in one go, ignoring the Laravel specific structure. If you are using Laravel 4, just shove this in your run() method.

Creating a user with Faker and Eloquent ORM


1 $faker = Faker\Factory::create();

2

3 for ($i = 0; $i < Config::get('seeding.users'); $i++) {

4

5 $user = User::create([

6 'name' => $faker->name,

7 'email' => $faker->email,

8 'active' => $i === 0 ? true : rand(0, 1),

9 'gender' => rand(0, 1) ? 'male' : 'female',

10 'timezone' => mt_rand(-10, 10),

11 'birthday' => rand(0, 1) ? $faker->dateTimeBetween('-40 years', '-18 yea\

12 rs') : null,

13 'location' => rand(0, 1) ? "{$faker->city}, {$faker->state}" : null,

14 'had_feedback_email' => (bool) rand(0, 1),

15 'sync_name_bio' => (bool) rand(0, 1),

16 'bio' => $faker->sentence(100),

17 'picture_url' => $this->picture_url[rand(0, 19)],

18 ]);

19 }


So what do we have here? Let’s go through this section at a time:

1 $faker = Faker\Factory::create();

An instance of Faker, our bullshit artist for-hire.

1 for ($i = 0; $i < Config::get('seeding.users'); $i++) {

We are going to want a certain number of users, but I’d recommend you have a few less on development than you do on testing or staging, because time.

1 $user = User::create([

2 'name' => $faker->name,

3 'email' => $faker->email,

Make a random name and random email. We don’t have to define the pool of random data it uses, because IT’S MAGIC!

1 'active' => $i === 0 ? true : rand(0, 1),

Ok I lied, our garbage is not 100% random. We want user number 1 to be active for tests later on.

1 'gender' => $faker->randomElement(['male', 'female']),

Gender equality is important.

1 'timezone' => mt_rand(-10, 10),

Our original developer decided that saving timezones as an integer was a clever thing to do.

warning

Store Timezones, Not Offsets

Did you know that some timezones are not complete hours? Did you know that Nepal is UTC/GMT +05:45? Did you know that Chatham Island (New Zealand) goes from UTC/GMT +12:45 to UTC/GMT +13:45 in their local summer? Did you know that some places add 30 minutes when in daylight savings time? Don’t use integers as timestamps.
PHP implements the IANA timezone database, which is an industry standard. If you store America/New_York or Asia/Khandyga for users, then the offset and daylight savings time will be automatically calculated.

1 'birthday' => rand(0, 1) ? $faker->dateTimeBetween('-40 years', '-18 yea\

2 rs') : null,

Users of all of our target age demographic.

1 'location' => rand(0, 1) ? "{$faker->city}, {$faker->state}" : null,

Give us a city name and a state name. This works fine with foreign countries too which is cool.

1 'had_feedback_email' => $faker->boolean,

2 'sync_name_bio' => $faker->boolean,

Some user flags we don’t care much about. True or false, whatever.

1 'bio' => $faker->sentence(100),

Make a sentence with 100 characters in it.

1.4 That’s about it

You will end up making a lot of these files, and you will want to populate pretty much every table you have with data. You will also want to tell your Database Seeder to wipe all the tables that will be populated. Do this globally right at the start of the process. Do not wipe tables at the top of each seeder, or content in that table from other seeders will be deleted.

Example of an overall system in Laravel 4


1 classDatabaseSeederextends Seeder

2 {

3 publicfunction run()

4 {

5 if (App::environment() === 'production') {

6 exit('I just stopped you getting fired. Love Phil');

7 }

8

9 Eloquent::unguard();

10

11 $tables = [

12 'locations',

13 'merchants',

14 'opps',

15 'opps_locations',

16 'moments',

17 'rewards',

18 'users',

19 'oauth_sessions',

20 'notifications',

21 'favorites',

22 'settings',

23 'friendships',

24 'impressions',

25 ];

26

27 foreach ($tables as $table) {

28 DB::table($table)->truncate();

29 }

30

31 $this->call('MerchantTableSeeder');

32 $this->call('PlaceTableSeeder');

33 $this->call('UserTableSeeder');

34 $this->call('OppTableSeeder');

35 $this->call('MomentTableSeeder');

36 }

37 }


This wipes everything, then runs other seeder classes to do their thing.

tip

Foreign Keys

It can be difficult to wipe a database when foreign keys constraints are enforced, so in that scenario your seeder should run DB::statement('SET FOREIGN_KEY_CHECKS=0;'); before the truncation of the tables and DB::statement('SET FOREIGN_KEY_CHECKS=1;'); afterwards to re-enable the checks.

1.5 Secondary Data

As I said it is quite likely that you will need to insert data that relates to each other. To do this you work out which data will be primary (like users); in the case of a check-in system probably you will also consider “venues” or “merchants” depending on the nomenclature of your system.

For this example I will show how to create “merchants”, then attach “opportunities”, which are essentially “campaigns”.

Primary Seeder for the Merchant Table


1 <?php

2

3 classMerchantTableSeederextends Seeder

4 {

5 /**

6 * Run the database seeds.

7 *

8 * @return void

9 */

10 publicfunction run()

11 {

12 $faker = Faker\Factory::create();

13

14 // Create however many merchants

15 for ($i = 0; $i < Config::get('seeding.merchants'); $i++) {

16 Merchant::create([

17 'name' => $faker->company,

18 'website' => $faker->url,

19 'phone' => $faker->phoneNumber,

20 'description' => $faker->text(200),

21 ]);

22 }

23 }

24 }


Primary Seeder for the Opp Table


1 <?php

2

3 use Carbon\Carbon;

4 use Kapture\CategoryFinder;

5

6 classOppTableSeederextends Seeder

7 {

8 /**

9 * Build it up

10 *

11 * @param Place

12 */

13 publicfunction __construct(CategoryFinder $finder, Place $places)

14 {

15 $this->categoryFinder = $finder;

16 $this->places = $places;

17 }

18

19 /**

20 * Images.

21 *

22 * @var string

23 */

24 protected $imageArray = [

25 'http://example.com/images/example1.jpg',

26 'http://example.com/images/example2.jpg',

27 'http://example.com/images/example3.jpg',

28 'http://example.com/images/example4.jpg',

29 'http://example.com/images/example5.jpg',

30 ];

31

32 /**

33 * Run the database seeds.

34 *

35 * @return void

36 */

37 publicfunction run()

38 {

39 $faker = Faker\Factory::create();

40

41 foreach (Merchant::all() as $merchant) {

42

43 // Create however many opps for this merchant

44 foreach (range(1, rand(2, 4)) as $i) {

45

46 // There are three types of image to add

47 $image = Image::create([

48 'name' => "{$merchant->name} Image #{$i}",

49 'url' => $faker->randomElement($this->imageArray),

50 ]);

51

52 // Start it immediately and make it last for 2 months

53 $starts = Carbon::now();

54

55 // We need to definitely have at least one we are in control of

56 if ($i === 1) {

57 // Have ONE that ends really soon

58 $ends = Carbon::now()->addDays(2);

59 $teaser = 'Something about cheese';

60

61 } else {

62 $ends = Carbon::now()->addDays(60);

63 $teaser = $faker->sentence(rand(3, 5));

64 }

65

66 $category = $this->categoryFinder->setRandom()->getOne();

67

68 $opp = Opp::create([

69 'name' => $faker->sentence(rand(3, 5)),

70 'teaser' => $teaser,

71 'details' => $faker->paragraph(3),

72 'starts' => $starts->format('Y-m-d H:i:s'),

73 'ends' => $ends->format('Y-m-d H:i:s'),

74 'category_id' => $category->id,

75 'merchant_id' => $merchant->id,

76 'published' => true,

77 ]);

78

79 // Attach the location to the opp

80 $opp->images()->attach($image, [

81 'published' => true

82 ]);

83 }

84

85 echo "Created $i Opps for $merchant->name\n";

86 }

87 }

88 }


This might look a little crazy and it is certainly a mixture of lazy-static ORM usage in the controller and some dependency injection, but these seeders have not received a large amount of love. They definitely do their job, and could always be cleaner, but the basics here are:

1 foreach (Merchant::all() as $merchant) {

Loop through all merchants.

1 // Create however many opps for this merchant

2 foreach (range(1, rand(2, 4)) as $i) {

Create between 1 and 4 opportunities for a merchant.

1 // There are three types of image to add

2 $image = Image::create([

3 'name' => "{$merchant->name} Image #{$i}",

4 'url' => $faker->randomElement($this->imageArray),

5 ]);

Add an image from our array of example images on S3 or our website somewhere. The more the merrier.

1 $category = $this->categoryFinder->setRandom()->getOne();

I will talk more about finders in a later chapter, but for now just know this is a convenient way of getting a single random category back.

The rest should all be relatively obvious.

If you’re using Laravel 4 you can run the above commands on the command line with: $ php artisan db:seed.

1.6 When to run this?

This is often run manually, and automatically depending on the instances.

For example, if you just added a new endpoint with new data, you will want to let your team-mates know to pull the latest code, run the migrations and run the db seed.

This is also great of course when a freelancer comes in to do some work, or a new developer starts up, or your iPhone dev wants to get some data to use. In all these instances that command just needs to be run on the command line.

This is also occasionally run manually on the staging server, and automatically on the Jenkins testing server when we deploy new builds of the API.