Creating Mock Datasets with Mockaroo

by Kieran Lyddon

As a common occurrence in the world of data, you may find yourself needing to work on a project without having access to the data you need to get going. In these cases, using sample datasets to build out early design iterations can get the ball rolling for a successful project. While there are many useful sample datasets to implement, sometimes you will want to use a dataset with more customisable options. For this, Mockaroo is an exceptionally useful  tool.

What is Mockaroo?

Mockaroo runs through a web interface and allows users to create custom data schemas that can then be populated with fake data, as well as several other useful features. These schemas can be manipulated and customised to a granular level of detail, allowing a user to create example datasets that mirror the final product.

The Mockaroo homepage view includes various sections starting with the ribbon pane at the top that allows a user to jump between their loaded schemas, datasets, projects, and more. Below that you will find a premade sample table including various generic fields to give an idea of what can be made. These field setup options include:

  • Field Name
  • Field Type
  • Advanced Options

Below the fields we then find further options relating to the format in which the dataset can be exported. Importantly, at any step we can view a preview of the data from a button at the bottom to load a example of our current schema.

To begin customising a dataset, the “Add Another Field” button allows the creation of new fields. This button will default to duplicating the last field in your schema, which can then be easily renamed by typing out a new Field Name in the relevant box. To change the type of data that populates this field, selecting the drop-down box under “Type” will open a window prompting you to select a field type, which will define the data imported into your new table. There are dozens of premade options with examples of the values that populate each type underneath.

From my perspective, a particularly impressive and important feature for making your data feel real is that fields are not all randomly assigned independently of each other. What that means is that when identifying fields like names or geographic locations are used in a dataset, values for other fields are restricted based on them. This means that first and last names will match up with full name fields, emails, and usernames, while cities and states will be restricted to the countries generated for that row. While a small feature, this goes a long way in helping improve the believability of your dataset, which gives later designs built off of it an extra air of legitimacy.

Advanced Use

Under the Options section of any field, the sigma (Σ) button allows you to open a formula window. This lets you further edit and specify features about the field using the Ruby Programming Language. For more advanced degrees of customisation, the following field types may be helpful:

  • Custom List – creates a user-defined list of values that populating rows can take.
  • Formula – allows you to perform row-level calculations pulling information from other columns.
  • Dataset Column – allows you to pull records from user-owned datasets that have been uploaded to Mockaroo.
  • Distributions – Allow you to make more believably distributed numeric data using one of 5 different types of distribution that you can select.

Exporting Your Data

When you have finished designed your mock schema, you can then download a dataset built using it by selecting the “Download Data” button at the bottom of the screen. The default dataset will contain 1000 rows, but this can be changed. The file format can also be changed to a small set of widely used data formats, allowing you to practice loading in files in different formats.

Mockaroo includes many other features useful for a variety of use-cases, such as including the ability to mock up a custom API to call, but this blog should give you enough of an introduction to the basics to allow you to play around and create your own custom schema. Have fun!