The Graphical Law

Contact Kamal Radharamanan / @kamalasaurus

Hi!

I'm a prototype engineer at Ravel Law! I'd like to describe some of the challenges we've encountered trying to communicate the relationships between legal entities (like cases, judges, and lawyers), and discuss how we've used data visualization to clarify some of those relationships.

First I'm going to give a brief overview of our problem space, mention some rules and thought patterns we apply as we try to visualize our data, and then walk through an example involving a big data set.

A Brief Overview

What is the law? In our case there is: statutory law, regulatory law, and common law (case law). These documents have citations between each other, similar to a hyperlink on the world wide web.

Primarily, we focus on case law.

Perhaps more interestingly, each of these legal documents is comprised of many entities; like, lawyers, judges, and companies. These entities, naturally, are associated with each other! Opinions can be authored by the same judge or in the same jurisdiction; they can contain mentions to the same law firms or affected parties.

Our example will focus on the citation network between judges.

Some Rules

(...don't step in the lava)

Explanatory vs. Exploratory

A curated and carefully designed story described with data. vs. An interface that transitions between states to provide more specific understanding.

When is a picture worth a thousand words?

When it provides context!

Providing Context

1) Limit the scope.

If you want to display one dimension clearly, other dimensions will be undepicted or diminished.

PROTIP: the diminished dimensions will provide the context.

2) For a data visualization, independent of words, the the final image needs a language.

I think of a language as a consistent set of rules by which information is encoded. In this case, in the image.

What is a Visual Language?

idiogram -> alphabet

skeuomorphic -> flat


Heiroglyphics courtesy of The Free Encyclopedia

I've come to think of data visualizations as elaborate heiroglyphics... or comic books with shapes for characters.

What are your nouns and verbs?

Malofiej Awards for excellence in infographics
Jonathan Corum NY Times data journalist (check out the whale example)

Usually

Your language will consist of circles, squares, lines, and polygons (shapes). Along with colors and positioning.

In General

Groupings are stronger than position.
Positions are stronger than shapes.
Shapes are stronger than colors.
Colors are stronger than opacity.

Picking your dimensions

The more dimensions you show, the more pristine your data must be! Random artifacts might imply relationships that don't necessarily exist.

There's a limit to how much a person can comprehend at once. Can exploit transitions to show an evolution of information, but may not be generalizable, especially if the visualization is 3D.

In general: try to show exactly 1 thing.

Other things will leak in.

The more data you're showing simultaneously, the more likely you are going to express an unintentional relationship.

Like with accidental groupings in force direction.

The 3rd Dimension

The visual system sees a 2D projection of 3D things. To fully reason about data mapped in 3 dimensions, it needs to be interactive. This immediately limits your explanatory capacity to scripted interactions (which are not generalizable).

You will probably discover that your data is not as pristine as you need it to be.

So you wanna build a snowman

Our Data

Power Law distribution of citations amongst case law. (I forgot to histogram between judges.)

It's a scale-free network!

Total number of cases in US Law 8~10million Total number of Links 60~80million

Data visualization is intrinsically an aggregation problem. There are graphs so large that there aren’t screens with enough pixels to display each element uniquely. What metrics help us aggregate for the legal dataset?

Characterize your Data

How many of what things are you going to be showing in general?

If you're showing a link graph, is the graph sparse or dense?

A sparse graph is where the number of edges is ~ O(n). Tree-like visualizations work here.

A dense graph is one where the number of edges is ~ O(n^2). Have to get creative w/ aggregation.

judges: 3590
n: 3590
n^2: 12888100
links: 1315803

So, we're somewhere near the (log-scale) middle! We have about (O(n))^2...

Is that a significant metric? Most of these definitions are for the mathematical description of a graph. Not necessarily the visual description.

What might be more relevant is the ratio of pixels to elements.

No mans land as it were.

Here we gooooooo~

So what do we want to know?

Hypothesis: Judges cite more to their controlling and controlled jurisdictions, or jurisdictions they used to occupy, or judges they once clerked for.

A classical example of somewhere visualization can help you make sense of associations.

Jurisdictions

and why they're problematic

Supreme Court Cir. 1 Cir. 2 Cir. 3 D. CT ED.NY ND.NY SD.NY NY

A Map.

Squares!

Hmmm...

None of these make intuitive sense given our constraints. Let's try something a bit off the wall.

Radial!

Judges with explicit and implicit grouping; their last court

The Time Dimension

Too many lines

Mixed metaphors

Breaking the density metric!

Helical! (their last date)

...back to Radial!

Side-note on the random distribution of judge locations in radial distribution.

...with the links!

Visual Compromises

Exactly white noise.

How do you aggregate that data? Every aggregation you make reduces the resolution of your data.

Opacity, overlap, vs. collision detection vs. display room, vs. force directed collection

Just Scalia

So... what did we learn

Out hypothesis is false.

Judges are very promiscuous with their citation networks.

Force direction does not always make sense, O(n^3), limits the scale of the visualization tremendously -- javascript is single threaded. Would have to precompute the coordinates.

Sum random distributions for approximate gaussians! But uniform distribution will have the most even spreading.

Need to label associations between the judges. Did so and so clerk for this and that? In general, the pattern shows that judges cite outside of their jurisdiction indiscriminately.

Future Things

more + more != more

  • Display the associated judges only (the others are noise)
  • Have a dynamic legend with the linked judges grouped by their jurisdictions
  • Do time-series depictions of all the judges in a given jurisdiction. Outlier patterns would be visually evident.

multithreaded force-direction
and the guy that developed it

Addendum of things to come...