A C# Trie implementation

· Read in about 2 min · (407 words) ·

What’s a trie?

A trie is a datastructure that’s optimized for prefix retrieval. It’s basically a prefix tree with each node being the letters seen till that point. For example - A trie with apple

image

Now if we add apply to the tree above, we get:

image

What if we add something without a common prefix - say boat

image

I needed a trie recently and was checking Nuget for available libraries - surprisingly there aren’t too many datastructure libraries available for .net on nuget. I found VDS.Common which seems to be most popular at 36K downloads.

Anyway, after using VDS.Common and getting around 700ms search time for distinct search results over a dataset, I thought I’d give writing my own a shot to see if I could do better.

Here’s the base interface:

 public interface ITrie<in TK, TV>
    {
        IEnumerable<TV> Find(IEnumerable<TK> keyprefix);
        void Add(IEnumerable<TK> key, TV value);
    }

Next, we model a trie node - this keeps links to child nodes and at the leaf level, a reference to the data we want to store in the trie.

 class TrieNode<TK, TV>
    {
        private TK keyBit;
        private TV value;
        private IDictionary<TK, TrieNode<TK, TV>> children;

        public TrieNode(TC ch, TV value)
        {
            this.keyBit = ch;
            this.value = value;
            this.children = new Dictionary<TK, TrieNode<TK, TV>>();
        }

        public IDictionary<TK, TrieNode<TK, TV>> Children
        {
            get { return children; }
            set { children = value; }
        }

        public void Add(IEnumerable<TK> keySeq, TV value)
        {
            var node = this;
            foreach (var ch in keySeq)
            {
                if (node.children.ContainsKey(ch))
                {
                    node = node.children[ch];
                }
                else
                {
                    var newNode= new TrieNode<TK, TV>(ch, default(TV));
                    node.children[ch] = newNode;
                    node = newNode;
                }
            }
            node.value = value;

        }

        public IEnumerable<TV> GetValues()
        {
            var stack = new Stack<TrieNode<TK, TV>>();
            stack.Push(this);
            while (stack.Count > 0)
            {
                var node = stack.Pop();
                if (node.value != null)
                    yield return node.value;
                foreach (var kv in node.Children.Reverse())
                {
                    stack.Push(kv.Value);
                }

            }
        }
    }

Finally, let’s have a concrete implementation of ITrie

    public class Trie<TK, TV> : ITrie<TK, TV>
    {
        private TrieNode<TK, TV> root;

        public Trie()
        {
            this.root = new TrieNode<TK, TV>(ch: default(TK), value: default(TV));
        }

        public IEnumerable<TV> Find(IEnumerable<TK> keyprefix)
        {
            var node = this.root;
            foreach (var ch in keyprefix)
            {
                if (node.Children.ContainsKey(ch))
                {
                    node = node.Children[ch];
                }
                else
                {
                    return Enumerable.Empty<TV>();
                }

            }
            return node.GetValues();
        }

        public void Add(IEnumerable<TK> key, TV value)
        {
            this.root.Add(key, value);
        }
    }

On the same dataset, I get < 500ms over VDS.Common. Admittedly, this is less full featured, but I can live with that.