Data Relationships - Build APIs You Won't Hate: Everyone and their dog wants an API, so you should probably learn how to build them (2014)

Build APIs You Won't Hate: Everyone and their dog wants an API, so you should probably learn how to build them (2014)

7. Data Relationships

7.1 Introduction

If you’ve ever worked with relational databases the chances are you understand relationships. Users have comments. Authors have one or many books. Books belong to a Publisher. Southerners have one or more teeth. Whatever the example, relationships are incredibly important to any application and therefore an API too.

RESTful Relationships don’t necessarily need to be directly mapped to database relationships. If your database relationships are built properly, RESTful relationships will often be similar, but your RESTful output might have extra dynamic relationships that aren’t defined by a JOIN, and might not necessarily include every possible database relationship.

Put more eloquently:

REST components communicate by transferring a representation of a resource in a format matching one of an evolving set of standard data types, selected dynamically based on the capabilities or desires of the recipient and the nature of the resource. Whether the representation is in the same format as the raw source, or is derived from the source, remains hidden behind the interface. – Roy Fielding

This explanation highlights an important factor: the output has to be based on the “desires of the recipient”. There are many popular approaches to designing RESTful relationships, but many of them don’t satisfy the “desires of the recipient”. Still, I will cover the popular approaches with their pros and cons regardless.

7.2 Sub-Resources

One very simplistic way to approach related data is to offer up new URL’s for your API consumers to digest. This was covered lightly in Chapter 2: Planning and Creating Endpoints, and is a perfectly valid approach.

If an API has places as a resource and wants to allow access to a place’s checkins, an endpoint could be made to handle exactly that:

/places/X/checkins

The downside here is that it is an extra HTTP request. Imagine an iPhone application that wants to get all places in an area and put them on a map, then allow a user to browse through them. If the place search happens as one request, then the /places/X/checkins is executed each time the user clicks on a place, forcing the user to do a lot of unecessary waiting. This is known as 1 + n, meaning the work done is increased by an extra one request for each place you look up.

That also assumes the only related data is checkins. At Kapture our API also has merchant, images, current_campaign and previous_campaigns to look up. Using “sub-resources” only would mean that four extra HTTP requests per place need to happen, which is 1 + 4n.

If 50 places were returned and each time the related data had to be loaded, assuming the app user looked through all 50 places there would be 1 initial request to get 50 results. For each of those results would be 4 more, meaning: 1 + (50 x 4) = 251. 251 HTTP requests happening (even assuming they are asyncronous) is just unnecessary and going over HTTP on a mobile is the slowest things you can do. Even with caching, depending on the data set, it could still be 251 requests.

Some API developers try to avoid going over HTTP too many times by shoving as much data as possible into one request, so when you call the /places endpoint you automatically get checkins, current_opps, merchants and images. Well, if you do not want that information you are waiting for huge file downloads full of irrelevant JSON! Even with GZIP compression enabled on the web-server, downloading something you don’t need is obviously not desirable, and can be avoided. This can mean major performance gains on mobile, and minor gains over a slow network or weak Wi-Fi for desktop or tablets.

The trade-off here between “downloading enough data to avoid making the user wait for subsequent loads” and “downloading too much data to make them wait for the initial load” is hard. An API needs the flexibility and making sub-resources the only way to load related data is restrictive for the API consumer.

7.3 Foreign Key Arrays

Another approach to related data is to provide an array of foreign keys in the output. To use an example from EmberJS, if a post has multiple comments, the /posts endpoint could contain the following:

1 {

2 "post": {

3 "id": 1

4 "title": "Progressive Enhancement is Dead",

5 "comments": ["1", "2"],

6 "_links": {

7 "user": "/people/tomdale"

8 }

9 }

10 }

This is better. You still end up with n + 1 requests, but at least you can take those ID’s and make a grouped request like /comments/1,2 or /comments?ids=1,2 to reduce how many HTTP requests are being made.

Back to the places example, if you have 50 places returned and need 4 extra pieces of data, you could iterate through the 50, map which items expect which pieces of data, request all unique pieces of data and only end up with 1 + 4 = 5 HTTP requests instead of 251.

The downside is that the API consumer has to stitch all of that data together, which could be a lot of work for a large dataset.

7.4 Compound Documents (a.k.a Side-Loading)

Instead of just putting the foreign keys into the resource you can optionally side-load the data. I was having a rough time of things trying to word an introduction, so I will let somebody else do it:

Compound documents contain multiple collections to allow for side-loading of related objects. Side-loading is desirable when nested representation of related objects would result in potentially expensive repetition. For example, given a list of 50 comments by only 3 authors, a nested representation would include 50 author objects where a side-loaded representation would contain only 3 author objects.
Source: canvas.instructure.com

I found that by searching for “compound document”. I found that term by searching for “REST Side-Loading”. I found that after having a horrible time with EmberJS forcing me to use the “side-loading” approach for Ember Data, and they barely explain it themselves.

It looks a little like this:

1 {

2 "meta": {"primaryCollection": "comments"},

3 "comments": [...],

4 "authors": [...]

5 }

The pro suggested in the quote above is: if an embedded piece of data is commonly recurring, you do not have to download the same resource multiple times. The con is that context gets lost in larger data structures and it has the same issue as the “Foreign Key Array”: the mapping of data to create an accurate structure is left to the API consumer, and that can be hard work.

7.5 Embedded Documents (a.k.a Nesting)

This is the approach I have been using in the latest two versions of the API at Kapture, and I will continue to use it for the foreseeable future. It offers the most flexibility for the API consumer: meaning it can reduce HTTP requests or reduce download size depending on what the consumer wants.

If an API consumer were to call the URL /places?embed=checkins,merchant then they would see checkin and merchant data in the response inside the place resource:

1 {

2 "data": [

3 {

4 "id": 2,

5 "name": "Videology",

6 "lat": 40.713857,

7 "lon": -73.961936,

8 "created_at": "2013-04-02",

9 "checkins" : [

10 // ...

11 ],

12 "merchant" : {

13 // ...

14 }

15 },

16 {

17 "id": 1,

18 "name": "Barcade",

19 "lat": 40.712017,

20 "lon": -73.950995,

21 "created_at": "2012-09-23",

22 "checkins" : [

23 // ...

24 ],

25 "merchant" : {

26 // ...

27 }

28 }

29 ]

30 }

Some systems (like Facebook, or any API using Fractal) will let you nest those embeds with dot notation:

E.g: /places?embed=checkins,merchant,current_opp.images

Embedding with Fractal

Picking back up from Chapter 6, your transformer at this point is mainly just giving you a method to handle array conversion from your data source to a simple array. Fractal can however embed resources and collections too. Continuing the theme of users, places and checkins, theUserTransformer might have a checkins list, to see a user’s checkin history.

UserTransformer using Fractal


1 <?php namespace App\Transformer;

2

3 use User;

4

5 use League\Fractal\TransformerAbstract;

6

7 classUserTransformerextends TransformerAbstract

8 {

9 protected $availableEmbeds = [

10 'checkins'

11 ];

12

13 /**

14 * Turn this item object into a generic array

15 *

16 * @return array

17 */

18 publicfunction transform(User $user)

19 {

20 return [

21 'id' => (int) $user->id,

22 'name' => $user->name,

23 'bio' => $user->bio,

24 'gender' => $user->gender,

25 'location' => $user->location,

26 'birthday' => $user->birthday,

27 'joined' => (string) $user->created_at,

28 ];

29 }

30

31 /**

32 * Embed Checkins

33 *

34 * @return League\Fractal\Resource\Collection

35 */

36 publicfunction embedCheckins(User $user)

37 {

38 $checkins = $user->checkins;

39

40 return $this->collection($checkins, new CheckinTransformer);

41 }

42 }


The CheckinTransfer can then have both a user and a place. There is no benefit to requesting the user in this context, because we know that already, but asking for the place would return information about the location that is being checked into.

CheckinTransformer using Fractal


1 <?php namespace App\Transformer;

2

3 use Checkin;

4 use League\Fractal\TransformerAbstract;

5

6 classCheckinTransformerextends TransformerAbstract

7 {

8 /**

9 * List of resources possible to embed via this processor

10 *

11 * @var array

12 */

13 protected $availableEmbeds = [

14 'place',

15 'user',

16 ];

17

18 /**

19 * Turn this item object into a generic array

20 *

21 * @return array

22 */

23 publicfunction transform(Checkin $checkin)

24 {

25 return [

26 'id' => (int) $checkin->id,

27 'created_at' => (string) $checkin->created_at,

28 ];

29 }

30

31 /**

32 * Embed Place

33 *

34 * @return League\Fractal\Resource\Item

35 */

36 publicfunction embedPlace(Checkin $checkin)

37 {

38 $place = $checkin->place;

39

40 return $this->item($place, new PlaceTransformer);

41 }

42

43 /**

44 * Embed User

45 *

46 * @return League\Fractal\Resource\Item

47 */

48 publicfunction embedUser(Checkin $checkin)

49 {

50 $user = $checkin->user;

51

52 return $this->item($user, new UserTransformer);

53 }

54 }


These examples happen to be using the lazy-loading functionality of an ORM for $user->checkins and $checkin->place, but there is no reason that eager-loading could not also be used by inspecting the $_GET['embed'] list of requested scopes. Something like this can easily go in your controller constructor, somewhere in the base controller or… something:

Example of user input dictating which Eloquent ORM (Laravel) relationships to eager-load


1 $requestedEmbeds = Input::get('embed'); // ['checkins', 'place'] or just ['place']

2

3 // Left is relationship names. Right is embed names.

4 // Avoids exposing relationships and whatever not directly set

5 $possibleRelationships = [

6 'checkins' => 'checkins',

7 'venue' => 'place',

8 ];

9

10 // Check for potential ORM relationships, and convert from generic "embed" names

11 $eagerLoad = array_keys(array_intersect($possibleRelationships, $requestedEmbeds));

12

13 $books = Book::with($eagerLoad)->get();

14

15 // do the usual fractal stuff


Having the following code somewhere in the ApiController, or in your bootstrap, will make this all work:

1 class ApiController

2 {

3 // ...

4

5 publicfunction __construct(Manager $fractal)

6 {

7 $this->fractal = $fractal;

8

9 // Are we going to try and include embedded data?

10 $this->fractal->setRequestedScopes(explode(',', Input::get('embed')));

11 }

12

13 // ...

14 }

That’s how you’d do things in Laravel at least.

Embedding with Rails

The Rails lot are big fans of their ActiveRecord package, and most suggest to use it to embed data. The specific part is in the Serializaton::to_json Documentation.

To include associations, use blog.to_json(:include => :posts).

1 {

2 "id": 1, "name": "Konata Izumi", "age": 16,

3 "created_at": "2006/08/01", "awesome": true,

4 "posts": [{

5 "id": 1,

6 "author_id": 1,

7 "title": "Welcome to the weblog"

8 }, {

9 "id": 2,

10 author_id: 1,

11 "title": "So I was thinking"

12 }]

13 }

2nd level and higher order associations work as well.

1 blog.to_json(:include => {

2 :posts => {

3 :include => {

4 :comments => {

5 :only => :body

6 }

7 },

8 :only => :title

9 }

10 })

A little more complicated, but you get more control over what is returned.

1 {

2 "id": 1,

3 "name": "Konata Izumi",

4 "age": 16,

5 "created_at": "2006/08/01",

6 "awesome": true,

7 "posts": [{

8 "comments": [{

9 "body": "1st post!"

10 }, {

11 "body": "Second!"

12 }],

13 "title": "Welcome to the weblog"

14 },

15 {

16 "comments": [{

17 "body": "Don't think too hard"

18 }],

19 "title": "So I was thinking"

20 }]

21 }

This will work well assuming everything is represented as ActiveRecord, which who knows, it might be.

Being a RESTful Rebel

I read a blog article by Ian Bentley suggesting that this approach is not entirely RESTful. It points to a Roy Fielding quote:

The central feature that distinguishes the REST architectural style from other network-based styles is its emphasis on a uniform interface between components (Figure 5-6). By applying the software engineering principle of generality to the component interface, the overall system architecture is simplified and the visibility of interactions is improved. Roy Fielding

All of these solutions are - according to somebody - “wrong”. There are technical pros and cons and what I refer to as “moral” issues, but those moral issues are just down to how technically RESTful you care about being. The technical benefits that optional embedded relationships provide are so beneficial I do not care about crossing Roy and his RESTful spec to do it.

Make your own choices. Facebook, Twitter and most popular “RESTful API’s” fundamentally ignore parts of (or dump all over the entirety of) the RESTful spec. So, respecting everything else and popping your toe over the line here a little would not be the biggest travesty for your API.