Question: C# - match ID to filename using functional principles

Question

C# - match ID to filename using functional principles

Answers 2
Added at 2016-12-26 15:12
Tags
Question

I'm rewriting an old C# project from scratch trying to figure out how it can be improved with the use of functional design. So far I've stuck with a couple of principles (except where the GUI is concerned):

  • Set every variable in every class as readonly and assign it a value only once.
  • Use immutable collections
  • Don't write code with side effects.

Now I'm trying to create a function which, given a folder, using yield return, enumerates a list of objects, one for each file in the given folder. Each object contains a unique ID, starting from firstAssignedID, and a filename.

Thing is, I'm not sure how to approach the problem at all. Is what I've just described even the right way of thinking about it? My code so far is a half-baked, incomplete mess. Is it possible to use a lambda here? Would that help, or is there a better way?

The FileObject class simply contains a string fileName and an int id, and the FileObject constructor simply and naively creates an instance given those two values.

    public IEnumerable<FileObject> EnumerateImagesInPath(string folderPath, int firstAssignedID)
    {
        foreach (string path in Directory.EnumerateFiles(folderPath)
        {
            yield return new FileObject(Path.GetFileName(imagePath) , );
        }
    }
Answers to

C# - match ID to filename using functional principles

nr: #1 dodano: 2016-12-26 16:12

Using yeild seems unnecessary:

    public IEnumerable<FileObject> EnumerateImagesInPath(string folderPath, int firstAssignedID)
    {
        foreach (FileObject File in Directory.EnumerateFiles(folderPath)
            .Select(FileName => new FileObject(FileName, firstAssignedID++)))
        {
            yield return File;
        }
    }
nr: #2 dodano: 2016-12-27 21:12

The most functional way of doing what you want is this:

IEnumerable<FileObject> EnumerateImagesInPath(string path, int firstAssignedID) =>
    Enumerable.Zip(
        Enumerable.Range(firstAssignedID, Int32.MaxValue),
        Directory.EnumerateFiles(path), 
        FileObject.New);

With a FileObject type defined like so:

public class FileObject
{
    public readonly int Id;
    public readonly string Filename;

    FileObject(int id, string fileName)
    {
        Id = id;
        Filename = fileName;
    }

    public static FileObject New(int id, string fileName) =>
        new FileObject(id, fileName);
}

It doesn't use yield, but that doesn't matter because Enumerable.Range and Enumerable.Zip do, so it's a lazy function just like your original example.

I use Enumerable.Range to create a lazy list of integers from firstAssignedId to Int32.MaxValue. This is zipped together with the enumerable of files in the directory. FileObject.New(id. path) is invoked as part of the zip computation.

There is no in-place state modification like the accepted answer (firstAssignedID++), and the whole function can be represented as an expression.

The other way of achieving your goal is to use the fold pattern. It is the most common way of aggregating state in functional programming. This is how to define it for IEnumerable

public static class EnumerableExt
{
    public static S Fold<S, T>(this IEnumerable<T> self, S state, Func<S, T, S> folder) =>
        self.Any()
            ? Fold(self.Skip(1), folder(state, self.First()), folder)
            : state;
}

You should be able to see that its a recursive function that runs a delegate (folder) on the head of the list if there is one, then uses that as new state when calling recursively calling Fold. If it reaches the end of the list, then the aggregate state is returned.

You may notice that the implementation of EnumerableExt.Fold can blow up the stack in C# (because of a lack of tail-call optimisation). So a better way of implementing the Fold function is to do so imperatively:

public static S Fold<S, T>(this IEnumerable<T> self, S state, Func<S, T, S> folder)
{
    foreach(var x in self)
    {
        state = folder(state, x);
    }
    return state;
}

There is a dual to Fold known as FoldBack (sometimes they're called 'fold left' and 'fold right'). FoldBack essentially aggregates from the tail of the list to the head, where Fold is from the head to the tail.

public static S FoldBack<S, T>(this IEnumerable<T> self, S state, Func<S, T, S> folder)
{
    foreach(var x in self.Reverse())  // Note the Reverse()
    {
        state = folder(state, x);
    }
    return state;
}

Fold is so flexible, for example you could implement Count for an enumerable in terms of fold like so:

int Count<T>(this IEnumerable<T> self) =>
    self.Fold(0, (state, item) => state + 1);

Or Sum like so:

int Sum<int>(this IEnumerable<int> self) =>
    self.Fold(0, (state, item) => state + item);

Or most of the IEnumerable API!

public static bool Any<T>(this IEnumerable<T> self) =>
    self.Fold(false, (state, item) => true);

public static bool Exists<T>(this IEnumerable<T> self, Func<T, bool> predicate) =>
    self.Fold(false, (state, item) => state || predicate(item));

public static bool ForAll<T>(this IEnumerable<T> self, Func<T, bool> predicate) =>
    self.Fold(true, (state, item) => state && predicate(item));

public static IEnumerable<R> Select<T, R>(this IEnumerable<T> self, Func<T, R> map) =>
    self.FoldBack(Enumerable.Empty<R>(), (state, item) => map(item).Cons(state));

public static IEnumerable<T> Where<T>(this IEnumerable<T> self, Func<T, bool> predicate) =>
    self.FoldBack(Enumerable.Empty<T>(), (state, item) => 
        predicate(item) 
            ? item.Cons(state)
            : state);

It's very powerful, and allows the aggregation of state for a collection (so this allows us to do firstAssignedId++ without an imperative in-place state modification).

Our FileObject example is a little more complex than Count or Sum, because we need to maintain two pieces of state: the aggregate ID and the resulting IEnumerable<FileObject>. So our state is a Tuple<int, IEnumerable<FileObject>>

IEnumerable<FileObject> FoldImagesInPath(string folderPath, int firstAssignedID) =>
    Directory.EnumerateFiles(folderPath)
             .Fold(
                  Tuple.Create(firstAssignedID, Enumerable.Empty<FileObject>()),
                  (state, path) => Tuple.Create(state.Item1 + 1, FileObject.New(state.Item1, path).Cons(state.Item2)))
             .Item2;

You can make this even more declarative by providing some extension and static methods for Tuple<int, IEnumerable<FileObject>>:

public static class FileObjectsState
{
    // Creates a tuple with state ID of zero (Item1) and an empty FileObject enumerable (Item2)
    public static readonly Tuple<int, IEnumerable<FileObject>> Zero = 
        Tuple.Create(0, Enumerable.Empty<FileObject>());

    // Returns a new tuple with the ID (Item1) set to the supplied argument
    public static Tuple<int, IEnumerable<FileObject>> SetId(this Tuple<int, IEnumerable<FileObject>> self, int id) =>
        Tuple.Create(id, self.Item2);

    // Returns the important part of the result, the enumerable of FileObjects
    public static IEnumerable<FileObject> Result(this Tuple<int, IEnumerable<FileObject>> self) =>
        self.Item2;

    // Adds a new path to the aggregate state and increases the ID by one.
    public static Tuple<int, IEnumerable<FileObject>> Add(this Tuple<int, IEnumerable<FileObject>> self, string path) =>
        Tuple.Create(self.Item1 + 1, FileObject.New(self.Item1, path).Cons(self.Item2));
}

The extension methods capture the operations on the aggregate state and make the resulting fold computation very clear:

IEnumerable<FileObject> FoldImagesInPath(string folderPath, int firstAssignedID) =>
    Directory.EnumerateFiles(folderPath)
             .Fold(
                FileObjectsState.Zero.SetId(firstAssignedID),
                FileObjectsState.Add)
             .Result();

Obviously using Fold for the use-case you provided is overkill, and that's why I used Zip instead. But the more general problem you were struggling with (functional aggregate state) is what Fold is for.

There is one more extension method I used in the example above: Cons:

public static IEnumerable<T> Cons<T>(this T x, IEnumerable<T> xs)
{
    yield return x;
    foreach(var a in xs)
    {
        yield return a;
    }
}

More info on cons can be found here

If you want to learn more about using functional technique in C#, please check my library: language-ext. It will give you a ton of stuff that the C# BCL is missing.

Source Show
◀ Wstecz