Generalized algebraic data type

In functional programming, a generalized algebraic data type (GADT, also first-class phantom type,[1] guarded recursive datatype,[2] or equality-qualified type[3]) is a generalization of parametric algebraic data types.

Overview

In a GADT, the product constructors (called data constructors in Haskell) can provide an explicit instantiation of the ADT as the type instantiation of their return value. This allows defining functions with a more advanced type behaviour. For a data constructor of Haskell 2010, the return value has the type instantiation implied by the instantiation of the ADT parameters at the constructor's application.

-- A parametric ADT that is not a GADT
data List a = Nil | Cons a (List a)

integers = Cons 12 (Cons 107 Nil)       -- the type of integers is List Int
strings = Cons "boat" (Cons "dock" Nil) -- the type of strings is List String

-- A GADT
data Expr a where
    EBool  :: Bool     -> Expr Bool
    EInt   :: Int      -> Expr Int
    EEqual :: Expr Int -> Expr Int  -> Expr Bool

eval :: Expr a -> a

eval e = case e of
    EBool a    -> a
    EInt a     -> a
    EEqual a b -> (eval a) == (eval b)

expr1 = EEqual (EInt 2) (EInt 3)        -- the type of expr1 is Expr Bool
ret = eval expr1                        -- ret is False

They are currently implemented in the GHC compiler as a non-standard extension, used by, among others, Pugs and Darcs. OCaml supports GADT natively since version 4.00.[4]

The GHC implementation provides support for existentially quantified type parameters and for local constraints.

History

An early version of generalized algebraic data types were described by Augustsson & Petersson (1994) and based on pattern matching in ALF.

Generalized algebraic data types were introduced independently by Cheney & Hinze (2003) and prior by Xi, Chen & Chen (2003) as extensions to ML's and Haskell's algebraic data types.[5] Both are essentially equivalent to each other. They are similar to the inductive families of data types (or inductive datatypes) found in Coq's Calculus of Inductive Constructions and other dependently typed languages, modulo the dependent types and except that the latter have an additional positivity restriction which is not enforced in GADTs.[6]

Sulzmann, Wazny & Stuckey (2006) introduced extended algebraic data types which combine GADTs together with the existential data types and type class constraints introduced by Perry (1991), Läufer & Odersky (1994) and Läufer (1996).

Type inference in the absence of any programmer supplied type annotations is undecidable[7] and functions defined over GADTs do not admit principal types in general.[8] Type reconstruction requires several design trade-offs and is an area of active research (Peyton Jones, Washburn & Weirich 2004; Peyton Jones et al. 2006; Pottier & Régis-Gianas 2006; Sulzmann, Schrijvers & Stuckey 2006; Simonet & Pottier 2007; Schrijvers et al. 2009; Lin & Sheard 2010a; Lin & Sheard 2010b; Vytiniotis, Peyton Jones & Schrijvers 2010; Vytiniotis et al. 2011).

Applications

Applications of GADTs include generic programming, modelling programming languages (higher-order abstract syntax), maintaining invariants in data structures, expressing constraints in embedded domain-specific languages, and modelling objects.[9]

Higher-order abstract syntax

An important application of GADTs is to embed higher-order abstract syntax in a type safe fashion. Here is an embedding of the simply typed lambda calculus with an arbitrary collection of base types, tuples and a fixed point combinator:

data Lam :: * -> * where
  Lift :: a                     -> Lam a        -- ^ lifted value
  Pair :: Lam a -> Lam b        -> Lam (a, b)   -- ^ product
  Lam  :: (Lam a -> Lam b)      -> Lam (a -> b) -- ^ lambda abstraction
  App  :: Lam (a -> b) -> Lam a -> Lam b        -- ^ function application
  Fix  :: Lam (a -> a)          -> Lam a        -- ^ fixed point

And a type safe evaluation function:

eval :: Lam t -> t
eval (Lift v)   = v
eval (Pair l r) = (eval l, eval r)
eval (Lam f)    = \x -> eval (f (Lift x))
eval (App f x)  = (eval f) (eval x)
eval (Fix f)    = (eval f) (eval (Fix f))

The factorial function can now be written as:

fact = Fix (Lam (\f -> Lam (\y -> Lift (if eval y == 0 then 1 else eval y * (eval f) (eval y - 1)))))
eval(fact)(10)

We would have run into problems using regular algebraic data types. Dropping the type parameter would have made the lifted base types existentially quantified, making it impossible to write the evaluator. With a type parameter we would still be restricted to a single base type. Furthermore, ill-formed expressions such as App (Lam (\x -> Lam (\y -> App x y))) (Lift True) would have been possible to construct, while they are type incorrect using the GADT. A well-formed analogue is App (Lam (\x -> Lam (\y -> App x y))) (Lift (\z -> True)). This is because the type of x is Lam (a -> b), inferred from the type of the Lam data constructor.

Notes

Cheney & Hinze 2003.
Xi, Chen & Chen 2003.
Sheard & Pasalic 2004.
"OCaml 4.00.1". ocaml.org.
Cheney & Hinze 2003, p. 25.
Cheney & Hinze 2003, pp. 25–26.
Peyton Jones, Washburn & Weirich 2004, p. 7.
Schrijvers et al. 2009, p. 1.
Peyton Jones, Washburn & Weirich 2004, p. 3.

External links

Generalised Algebraic Datatype Page on the Haskell wiki
Generalised Algebraic Data Types in the GHC Users' Guide
Generalized Algebraic Data Types and Object-Oriented Programming
GADTs – Haskell Prime – Trac
Papers about type inference for GADTs, bibliography by Simon Peyton Jones
Type inference with constraints, bibliography by Simon Peyton Jones
Emulating GADTs in Java via the Yoneda lemma

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[FOOTNOTECheneyHinze2003-1] Cheney & Hinze 2003.

[FOOTNOTEXiChenChen2003-2] Xi, Chen & Chen 2003.

[FOOTNOTESheardPasalic2004-3] Sheard & Pasalic 2004.

[4] "OCaml 4.00.1". ocaml.org.

[FOOTNOTECheneyHinze200325-5] Cheney & Hinze 2003, p. 25.

[FOOTNOTECheneyHinze200325–26-6] Cheney & Hinze 2003, pp. 25–26.

[FOOTNOTEPeyton_JonesWashburnWeirich20047-7] Peyton Jones, Washburn & Weirich 2004, p. 7.

[FOOTNOTESchrijversPeyton_JonesSulzmannVytiniotis20091-8] Schrijvers et al. 2009, p. 1.

[FOOTNOTEPeyton_JonesWashburnWeirich20043-9] Peyton Jones, Washburn & Weirich 2004, p. 3.

Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Floating point Double precision Extended precision Long double Octuple precision Quadruple precision Single precision Reduced precision Minifloat Half precision bfloat16 Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Top type Type class Unit type Void
Related topics	Abstract data type Data structure Generic Kind metaclass Object type Parametric polymorphism Primitive data type Protocol interface Subtyping Type constructor Type conversion Type system Type theory Variable