Making Our Own Types

Haskell Data Types

Welcome to the third and final part of our Haskell liftoff series! In case you missed them, here are the links to part 1 and part 2. In part 1 covered the basics of installing the Haskell platform. Then we dug into writing some basic Haskell expressions in the interpreter. In part 2, we started writing our own functions in Haskell modules. We also learned a lot of cool syntax tricks to build bigger and better functions.

Now in part three, we're going to wrap up by going more in depth with the type system. We're going to learn how to build our own types. We'll also learn some interesting tricks to make it easier to describe our types. Once you're done with this article, you should download our Haskell Beginner's Checklist! It will point you to some other tools and resources to help you further hone your skills. It also goes over all the main language concepts we learned in this series!

If you want to take these skills and learn how to make a Haskell project with them, you should also check out our Stack Mini-Course as well!

Making a New Data Type

Now on to data types! Remember that we have a Github Repository where you can follow the code in this part! If you want to implement the code yourself, you can go to the DataTypes module. But if you just want to look at the complete code as a reference, you can check out DataTypesComplete.

For this article, let's suppose we're trying to model someone's TODO list. We'll create several different Task data types to represent each individual task on their list throughout this article. We create a data type by first using the data keyword and following it up with the type name. Then we'll add the = assignment operator:

module DataTypes where

data Task1 = ...

Notice that unlike the expressions and function names we used in the previous lessons, our type starts with a capital letter. This is what distinguishes types from normal expressions in Haskell. We're now going to make our first constructor. A constructor is a special type of expression that allows us to create an object of our Task type. They have some similarities to constructors in, say, Java. But they're also very different. Constructors have an uppercase name, and then they have a list of types. This list of types is the information contained by that constructor. In our case, we want our task to have a name, and an expected length of time (in minutes). We'll represent the name with a string, and the length of time with an Int.

data Task1 = BasicTask1 String Int

And just like that, we can now start making Task objects! For instance, let's define a couple basic tasks as expressions within our module:

assignment1 :: Task1
assignment1 = BasicTask1 "Do assignment 1" 60

laundry1 :: Task1
laundry1 = BasicTask1 "Do Laundry" 45

We could also load up our code in the interpreter to check that it still compiles and makes sense:

>> :l MyData.hs
>> :t assignment1
assignment1 :: Task1
>> :t laundry1 
laundry1 :: Task

Notice that the type of our expression is Task1 even though we construct the objects using the BasicTask1constructor. Now in Java, we can have many constructors for the same type. We can also do this in Haskell but it looks a little different. Let's define another type for the different locations where we can perform a task. We could perform a Task at school, the office, or at home. We'll represent this by creating a constructor for each of these. We separate the constructors using the vertical bar |:

data Location =
  School |
  Office |
  Home

In this case, each of the constructors is a simple marker that has no parameters or data stored within it. This is an example of an "Enum" type. We can technically make different types of expressions representing each of these:

schoolLocation :: Location
schoolLocation = School

officeLocation :: Location
officeLocation = Office

homeLocation :: Location
homeLocation = Home

But these expressions aren't any more useful than using the constructors themselves.

Now that we have a couple different types, we can actually have one of our types contain the other! We'll add a new constructor to our task type. It will represent a more complicated task that also lists a location:

data Task1 =
  BasicTask1 String Int |
  ComplexTask1 String Int Location
...

complexTask :: Task1
complexTask = ComplexTask1 "Write Memo" 30 Office

So this is very different from constructors in other language. We can actually have different fields for different representations of our type. We can wrap completely different data depending on the constructor we use. This is awesome and gives us a lot of flexibility that other languages struggle to give us.

Parameterized Types

Another cool thing we can do with our type definitions is to use type parameters. This means that one or more of the fields actually depends on a type that the person writing the code gets to select. Let's suppose we have a type that has a few basic constructors for different amounts of time. This would restrict our description of the time for the sake of simplicity.

data TaskLength =
  QuarterHour |
  HalfHour |
  ThreeQuarterHour |
  Hour |
  HourAndHalf |
  TwoHours |
  ThreeHours

Now we might want to describe a task where the length of the task is an Int. But we might also want a task to be able to use this new task length type. Let's make a second version our our Task type that can use either type for the length. We can do this by parameterizing the type like so:

data Task2 a =
  BasicTask2 String a |
  ComplexTask2 String a Location

The type a is now a mystery type that we can fill in as we please. But now whenever we list the Task2 type in a type signature, we have to fill in the proper definition:

assignment2 :: Task2 Int
assignment2 = BasicTask2 "Do assignment 2" 60

assignment2' :: Task2 TaskLength
assignment2' = BasicTask2 "Do assignment 2" Hour

laundry2 :: Task2 Int
laundry2 = BasicTask2 "Do Laundry" 45

laundry2' :: Task2 TaskLength
laundry2' = BasicTask "Do Laundry" ThreeQuarterHour

complexTask2 :: Task2 TaskLength
complexTask2 = ComplexTask2 "Write Memo" HalfHour Office

We have to be careful though, since this can restrict our ability to do certain things. For instance, we cannot create a list that contains both assignment2 and complexTask2. This is because the two expressions now have different types!

-- THIS WILL CAUSE A COMPILER ERROR
badTaskList :: [Task2 a]
badTaskList = [assignment2, complexTask2]

List Example

Speaking of lists, we can actually unravel a bit of the mystery about how lists are implemented now.

There is a lot of syntactic sugar that changes how we actually write lists in practice. But at the source level, lists are defined by two constructors, Nil and Cons.

data List a =
  Nil |
  Cons a (List a)

As we should expect, the List type has a single type parameter. This is what allows us to either have [Int] or [String].The Nil constructor is an empty list. It contains no objects. So any time you're using the [] expression, you're actually using Nil. Then the second constructor concatenates a single element with another list. The type of the element and the list must match up obviously. When you use the : operator to prepend an element to a list, you are really using the Cons constructor.

emptyList :: [Int]
emptyList = [] -- Actually Nil

fullList :: [Int]
-- Equivalent to Cons 1 (Cons 2 (Cons 3 Nil))
-- More commonly written as [1,2,3]
fullList = 1 : 2 : 3 : []

Another cool thing here is that our data structure is recursive. We can see in the Cons constructor how a list contains another list as a parameter. This works fine as fine as long as there's some base case! In this situation, we have Nil. Imagine if we only had a single constructor and it took a recursive parameter. We'd be in a real pickle about how we create any list in the first place!

Record Syntax

So let's go back to our basic, unparameterized Task data type. Suppose we don't care about the entire Task item. Rather, we want one of its pieces, like the name or time. As our code is now, the only real way to do that is to use a pattern match that reveals these fields.

import Data.Char (toUpper)

...

twiceLength :: Task1 -> Int
twiceLength (BasicTask1 name time) = 2 * time

capitalizedName :: Task1 -> String
capitalizedName (BasicTask1 name time) = map toUpper name

tripleTaskLength :: Task1 -> Task1
tripleTaskLength (BasicTask1 name time) = BasicTask1 name (3 * time)

Now we can simplify this a teensy bit. You can use underscores instead of parameters that you won't use. But even so, this can get very cumbersome if you have a data type that has a lot of fields. We could write our own functions allowing us to access individual fields. Of course, these will have to use pattern matching under the hood:

taskName :: Task1 -> String
taskName (BasicTask1 name _) = name

taskLength :: Task1 -> Int
taskLength (BasicTask1 _ time) = time

twiceLength :: Task1 -> Int
twiceLength task = 2 * (taskLength task)

capitalizedName :: Task1 -> String
capitalizedName task = map toUpper (taskName task)

tripleTaskLength :: Task1 -> Task1
tripleTaskLength task = BasicTask1 (taskName task) (3 * (taskLength task))

But this approach doesn't scale, since we'll have to write these functions for every different field of every data type we create. Now imagine how easy it is to use a "setter" method in Java. Compare that to tripleTaskLength above. We have to re-iterate most of the existing fields, which is tedious. The exciting news is that we can get Haskell to write these functions for us using record syntax. To do this, all we have to do is assign each field a name in our data definition. Let's make a new version of Task:

data Task3 = BasicTask3
  { taskName :: String
  , taskLength :: Int }

Now we can write the same code WITHOUT the "getter" functions we wrote above.

-- These will now work WITHOUT our separate definitions for "taskName" and 
-- "taskLength"
twiceLength :: Task3 -> Int
twiceLength task = 2 * (taskLength task)

capitalizedName :: Task3 -> String
capitalizedName task = map toUpper (taskName task)

Now when we construct tasks, we can still use the BasicTask3 constructor by itself. But for code clarity, we can also initialize the object using record syntax, where we name the field:

-- BasicTask3 "Do assignment 3" 60 would also work
assignment3 :: Task3
assignment3 = BasicTask3 
  { taskName = "Do assignment 3" 
  , taskLength = 60 }

laundry3 :: Task3
laundry3 = BasicTask3 
  { taskName = "Do Laundry"
  , taskLength = 45 }

We can also write a "setter" more easily using record syntax. We use the previous task and then a list of "changes" to make within braces:

tripleTaskLength :: Task3 -> Task3
tripleTaskLength task = task { taskLength = 3 * (taskLength task) }

Generally, we only use record syntax when there is a single constructor for a data type. We can use different fields for different constructors, but our code becomes a bit less safe. Let's see another example Task definition:

data Task4 = 
  BasicTask4
    { taskName4 :: String,
      taskLength4 :: Int }
  |
  ComplexTask4 
    { taskName4 :: String,
      taskLength4 :: Int,
      taskLocation4 :: Location }

The trouble with this system is that the compiler will generate a taskLocation4 function that will compile for any task. But the function will only be valid when called on a ComplexTask4. So the following code will compile, even though it will cause a crash, and we want to avoid that:

causeError :: Location
causeError = taskLocation4 (BasicTask4 "Cause error" 10)

In addition, if our different constructors use different types, we can't use the same name for them. This can be frustrating when we want to represent the same concept with different types. This example won't compile because GHC cannot determine the type of the taskLength4 function. It could either have type Task -> Int or Task -> TaskLength.

data Task4 = 
  BasicTask4
    { taskName4 :: String,
      taskLength4 :: Int }
  |
  ComplexTask4 
    { taskName4 :: String,
      taskLength4 :: TaskLength, -- Note we use "TaskLength" and not an Int here!
      taskLocation4 :: Location }

The Type Keyword

Now we know most of the ins and outs of making our own data types. But there are times when you don't need to do this. We can create new type names without making a completely new data structure. There are two ways to do this. The first is the type keyword. It allows you to create a synonym for a type, like the typedef keyword in C++. The most common of these, as we've seen, is that a String is actually a list of characters:

type String = [Char]

A common use case for this is when you've combined many different types together in a tuple. It can be quite tedious to write this tuple down several times in your code:

makeTupleBigger :: (Int, String, Task) -> (Int, String, Task)
makeTupleBigger (intValue, stringValue, (BasicTask name time) = 
  (2 * intValue, map toUpper stringValue, (BasicTask (map toUpper name) (2 * time)))

A type synonym would make the signature here look a lot cleaner:

type TaskTuple = (Int, String, Task)

makeTupleBigger :: TaskTuple -> TaskTuple
makeTupleBigger (intValue, stringValue, (BasicTask name length) = 
  (2 * intValue, map toUpper stringValue, (BasicTask (map toUpper name) (2 * length))

Of course, if this collection of items shows up a lot, it might be worth making a full data type for it. There are also some reasons why type synonyms aren't always the best choice. For one thing, they can lead to compile errors that can be difficult to work through. You've probably come across a few errors already where the compiler told you it expected a [Char]. It would have been far more clear if it had said String.

It can also lead to some unintuitive code. Suppose you use a basic tuple instead of a data type to represent a Task. Someone might expect your Task type to be its own data type. Then they'll be a little confused when you manipulate it like a tuple:

type Task5 = (String, Int)

twiceTaskLength :: Task5 -> Int
-- "snd task" is confusing here
twiceTaskLength task = 2 * (snd task)

Newtypes

The last topic we'll cover is "newtypes". These are like type synonyms in some ways, and ADTs in other ways. But they still have a unique place in Haskell and it is good to get accustomed to using them. Let's suppose we want to have a new approach to representing TaskLength. We want to use a regular number, but we want it to have its own separate type. We can do this using "newtype":

newtype TaskLength2 = TaskLength2 Int

The syntax for newtypes looks a lot like defining an ADT. However, a newtype definition can only have a single constructor. And that constructor can only take a single type argument. The big difference between an ADT and a newtype comes after your code is compiled. In this example, there won't be a difference between the TaskLength and Int types at runtime. This is good because a lot of code for Int types is specialized to run fast. If we were to make this a true ADT, this would not be the case:

-- Not as fast!
data TaskLength2 = TaskLength2 Int

But otherwise, we can do a lot of the same tricks with our newtype that we can do with ADTs. We can, for instance, use record syntax in the constructor for our newtype. This allows us to use a name to unwrap the inside value without pattern matching on the type. A frequent pattern when using record syntax is to use something like "un-TypeName" value as the field name. Also note that we can't use the newtype value with the same functions as the original type. When we had type synonyms, we could do this, but it won't here:

data Task6 = BasicTask6 String TaskLength2

newtype TaskLength2 = TaskLength2
  { unTaskLength :: Int }

mkTask :: String -> Int -> Task6
mkTask name time = BasicTask6 name (TaskLength2 time)

twiceLength :: Task6 -> Int
twiceLength (BasicTask6 _ len) = 2 * (unTaskLength len)
-- The following would be WRONG!
-- 2 *len

Now, TaskLength2 is effectively a wrapper type around an Int. This makes it seem a lot like a type synonym, except that we can't simply use the Int value itself. As you can see in the examples above, we do have to go through the process of wrapping and unwrapping the value. This seems tedious. But it is quite useful because it solves the main problems we've seen from using type synonyms. Now if we make a mistake involving TaskLength, the compiler will tell us it's about TaskLength. We won't be wondering if there's a synonym we're missing!

Here's another example. Suppose we have a function with several integral arguments. If we always use Int types, we can easily confuse the order of the arguments. But when we use a newtype, the compiler will catch this type of error for us.

Conclusion

This wraps up our discussion on creating your own data types and is the conclusion of our Liftoff Series! If you need a refresher, don't forget to check out part 1 and part 2 to refresh yourself on the basics. For some more resources on learning Haskell, download our free Beginner's Checklist! You'll be able to review all the concepts you learned in this series. The checklist will also tell you about some tools that will streamline your Haskell workflow!

If you want to take the next step in your Haskell education, you should check out our Stack Mini-Course. This short video course walk you through how to use Stack and the Haskell platform to start making your own Haskell project!