CHANGES.txt

Cascading Change Log

3.3.0

  Fixed issue where planning with a c.p.Merge of two or more c.p.HashJoins would fail. Currently unresolved for the
  Apache Tez planner.

  Fixed issue where c.t.h.PartitionTap could not initialize with more than a few thousand input partitions in a
  reasonable time frame. Fix supports PartitionTap initialization in under 10sec w/ 1 million paths.

  Added c.f.h.FailOnMissingSuccessFlowListener c.f.FlowListener implementation that will prevent a c.f.Flow from
  executing if any source c.t.Tap that is a directory does not have a _SUCCESS marker.

  Updated c.t.h.PartitionTap to write a _SUCCESS marker in the root path on completion.

  Fixed issue with c.t.MultiSourceTap that prevented it from aggregating c.t.h.PartitionTap instances reliably.

  Updated c.t.l.PartitionTap to support recursive deletes when calling c.t.Tap#deleteResource().

  Fixed issue where c.t.h.Hfs did not clear the file status cache after locally modifying a resource.

  Updated Apache Hadoop 2 based sub-projects to 2.7.3.

  Updated c.f.Flow for all platforms to fail if any sink is c.t.SinkMode#KEEP, and the resource exists. Prior it has
  been the responsibility of any given platform (Hadoop MR or Tez) to determine if the resource existed and to fail
  if true. This firms the c.t.*.PartitionTap contract preventing overwrites of the parent Tap.

  Fixed issue where local mode did not honor c.t.SinkMode#KEEP, allowing a c.f.Flow to overwrite existing data.

3.2.1

  Fixed issue when using a c.p.ConfigDef on a sink c.t.Tap could cause a j.l.ClassCastException.

  Added Apache Tez to output comparison tests.

  Added new convenience methods to c.t.Fields.

3.2.0

  Updated c.s.Scheme with optional #sourceRePrepare() method allowing the Scheme instance to be notified on Input
  instances changes. c.t.TupleEntrySchemeIterator currently allows for an Iterator of Input instances, this method
  allows Scheme to be notified as the Iterator is traversed.

  Fixed issue on Apache Hadoop MapReduce where a c.t.Tap used in both accumulated and streamed roles within a single
  step, but having distinct roles in different pipelines, could not be successfully planned and executed.

  Fixed issue where a triangle of split through a c.p.HashJoin into a c.p.Merge on Apache Tez would fail
  during planning.

  Fixed issue where a triangle of split and joins on Apache Hadoop MapReduce would fail during planning.

  Fixed issue where trivial pipe assemblies that reduced to assemblies with multiple edges between two elements would
  fail, e.g. merging a file with itself.

  Added new planner rule for Apache Hadoop MapReduce and Apache Tez that will decorate any Tap instances on the
  accumulated side of a c.p.HashJoin with a platform specific version of the c.t.h.DistCacheTap.

  Added c.u.NullSafeComparator to replace all uses of j.u.Collections#reverseOrder() in tuple sorting/grouping
  code paths.

  Added c.f.h.MultiMapReduceFlow to provide support for a single c.f.Flow instance with many interrelated
  predefined c.a.h.m.JobConf instances that can be executed at once or added incrementally.

3.1.2

  Fixed issue on MapReduce that prevented counters from registering if incremented on a pipe assembly branch feeding the
  accumulated side of a c.p.HashJoin.

3.1.1

  Fixed issue with the MapReduce planner where a c.p.Checkpoint after a c.p.GroupBy merge would fail the planner.

3.1.0

  Fixed issue where Hadoop throws an NPE when polling for the current job state making the state polling loop unstable.

  Prevent c.t.h.Hfs and c.t.h.Lfs from setting the current configuration into stand-alone mode when running
  cluster-side and Hfs or Lfs are used to read/write data from within an operation or internally. This has historically
  had no other effect than emitting a confusing log message on the cluster slaves.

  Updated Apache Hadoop 2 based sub-projects to 2.7.2.

  Updated Apache Tez to 0.8.2.

  Updated jgrapht to 0.9.2.

  Fixed an issue on Apache MapReduce 1.x where an NPE will be thrown when fetching slice level counters on a map only
  job and the c.a.h.m.TaskCompletionEvent falsely identifies itself as a reducer task event.

  Fixed issue where c.m.a.URISanitizer would fail parsing Windows path names.

  Fixed issue where local mode could deadlock across multiple c.p.HashJoin or c.p.CoGroup instances in some situations.

  Fixed issue on Apache MapReduce platform that caused planner failures when merging the results of two c.p.GroupBy
  pipes via a single c.p.Merge pipe.

  Fixed issue on Apache Tez platform that prevented additional clean up of Hadoop created meta-data files on c.f.Flow
  completion.

  Updated c.m.a.URISanitizer to treat opaque URIs differently than hierarchical URIs by hiding the scheme specific
  parts in PUBLIC and PROTECTED visibility and storing the full URI for PRIVATE.

  Updated c.t.p.BasePartitionTap and subsequent PartitionTap sub-classes to allow for partition filters by providing
  one or more c.t.Fields argument selector and c.o.Filter instance pairs.

  Fixed issue on Apache Tez that did not create a proper vertex edge on a split in the pipe assembly under some
  circumstances.

  Updated c.m.a.PropertyAnnotation to have default #visibility() of PUBLIC, and added an #optional() property that
  defaults to true.

  Fix for NPE when attempting to set null values on underlying config instance.

  Created cascading-hadoop2-tez-stats sub-project to isolate Tez/YARN timeline server dependencies.

  Fixed issue on Apache Tez where multiple prior splits and subsequent splicing back into a c.p.HashJoin could create an
  invalid plan.

  Added new c.f.FlowStep#getFlowStepDescriptor and c.f.FlowStepDescriptors to store additional metadata on c.f.FlowStep
  instances.

  Updated c.f.t.p.r.t.BoundaryBalanceGroupSplitHashJoinTransformer to insert After and not AfterEachEdge.

  Fixed c.f.p.i.t.InsertionGraphTransformer to properly place insertions.

  Created cascading-expression sub-project to isolate all 'expression' operations based on Janino and it isolate
  the Janino dependency. A dependency to cascading-expression must be added to projects that depend on the isolated
  classes.

  Created cascading-hadoop2-io sub-project to isolate Hadoop 2.x HDFS and serialization dependencies and updated
  cascading-hadoop2-mr1 and cascading-hadoop2-tez to depend on cascading-hadoop2-io.

  Updated c.t.Fields to provide a constructor accepting type information, and a method #applyFields() to update
  field names.

  Added support for configuring split combining across supported platforms through the c.f.FlowRuntimeProps
  "cascading.flow.runtime.splits.combine" property. If enabled, will induce c.t.h.Hfs to enable combined files
  support on the MapReduce platforms.

  Updated Hadoop, Hadoop2, and Tez serialization and comparator frameworks to fully leverage declared field type
  information to reduce serialized data and perform bitwise equality comparisons. See c.t.h.TupleSerializationProps to
  disable bitwise comparisons.

3.0.4

  Fixed issue where c.m.a.URISanitizer would fail parsing glob expressions containing curly braces.

  Fixed issue where a c.f.FlowStep attempts to determine if it should be skipped but throws an Exception preventing
  the c.s.FlowStepStats from advancing to the 'started' state from 'pending'.

3.0.3

  Fixed issues with c.t.Fields#applyType() and c.t.Fields#resolve().

  Fixed issue with the Hadoop MapReduce planner that created a malformed plan for a single source split and join pipe
  assembly.

3.0.2

  Updated Apache Tez to 0.6.2 to prevent deadlocks in complex DAGs. Note this release is incompatible with Tez 0.6.1.

  Fixed issue where platform information was not consistently retrieved and reported where possible.

  Fixed issue in c.p.AppProps where getApplicationJarPath would return null when used with a j.u.Properties instance.

  Fixed issue that prevented o.a.h.m.OutputCollector sub-classes from properly flushing.

  Fixed issue on Apache Tez where diagnostic failure data would not propagate.

  Fixed issue where c.f.h.MapReduceFlow could throw a NPE.

  Updated c.f.h.MapReduceFlow to accept properties and c.f.FlowDescriptors values map.

  Fixed issue when resolving Fields through a Boundary when immediately following grouping operations.

  Fixed issues concerning detailed stats retrieval robustness for both MapReduce and Tez platforms.

  Fixed issue where child stats detail retrieval may not fetch final state of children.

  Fixed issue where c.s.FlowNodeStats node kind could be mislabeled.

  Fixed issue where c.u.ShutdownUtil could log a NPE if a hook is removed during JVM shutdown.

  Updated build to exclude jgrapht-ext, further isolation of jgrapht apis to support reliable shading.

  Fixed issue where c.f.p.ProcessFlow would not propagate the application name, version, tags, etc to the management
  services.

  Fixed issue where c.s.FlowStepStats#getProcessStatusURL() always returned null for Apache MapReduce and Tez platforms.

  Fixed issue with c.t.u.TupleHasher#ObjectHasher not being serializable.

  Fixed issue where an unreachable YARN timeline server could cause the application to fail.

  Fixed issue with NPE when retrieving Tez task status from timeline server.

3.0.1

  Fixed issue in c.f.t.p.Hadoop2TezFlowStepJob where the LocalResources were not passed to the AppMaster correctly
  causing ClassNotFoundException during split calculation for custom InputFormats.

3.0.0

  Updated Apache Hadoop 2 based sub-projects to 2.6.0.

  Updated c.f.h.ProcessFlow and related classes to be independent of the Hadoop platforms. The class has been moved to
  c.f.p.ProcessFlow.

  Added ability to specify counters to be logged cluster side when a slice has completed executing. See the
  c.f.FlowRuntimeProps class.

  Update build to support Gradle 2.3 by removing use of deprecated 1.x features/apis.

  Updated jgrapht to 0.9.1 so that all internal graphs can be backed by a j.u.IdentityHashMap.

  Added support for node level c.p.ConfigDef properties on both c.p.Pipe and c.t.Tap instances. Only the Tez platform
  is supported as there are no Map/Reduce independent node configurations on the MR platforms.

  Updated c.c.Cascade and c.f.Flow implementations to fire c.f.FlowListener and c.f.FlowStepListener when a c.f.Flow
  or c.f.FlowStep are marked skipped.

  Updated HashFunction in c.t.u.TupleHasher to pass null values to implementations of c.t.Hasher. All custom
  implementations must be null safe as of now.

  Update c.o.r.RegexMatcher and sub-classes to honor type coercions, allows for custom value delimiter.

  Fixed issue where the Hadoop o.a.h.m.OutputFormat would be ignored during job configuration as configured by a custom
  c.s.Scheme that was not a o.a.h.m.FileOutputFormat sub-class. In such cases a o.a.h.m.l.NullOutputFormat would be
  erroneously set and passed to c.t.Tap#openForWrite().

  Fixed issue where c.f.t.Hadoop2TezFlowStep was setting 'mapred.output.path' for non file based o.a.h.m.OutputFormat
  implementations.

  Fixed issue where local mode could deadlock during a c.p.HashJoin on the same source in some situations.

  Removed the deprecated c.t.PlatformRunner, c.t.HadoopPlatform, and c.t.LocalPlatform. See c.p.PlatformRunner,
  c.p.h.HadoopPlatform, and c.p.l.LocalPlatform as alternatives.

  Removed the deprecated c.f.h.HadoopFlowConnector from the cascading-hadoop2-mr1 sub-project.

  Removed all deprecated methods, constructors, enums, and constants.

  Removed the deprecated c.o.a.Max, c.o.a.Min, and c.o.a.ExtremaBase classes. See c.o.a.MaxValue and c.o.a.MinValue
  classes as alternatives.

  Removed the deprecated c.t.h.TemplateTap, c.t.l.TemplateTap, and c.t.BaseTemplateTap classes and associated tests.

  Fixed issue where a start/stop race condition in c.c.Cascade could allow a downstream c.f.Flow to start when a
  predecessor fails.

  Update janino to 2.7.6.

  Added support for Apache Tez. See README for details.

  Added c.f.FlowRuntimeProps to allow for setting cluster side specific properties per c.f.Flow instance in a platform
  independent manner.

  Changed planner to disallow duplicate c.p.Pipe head and tail names.

  Updated c.f.p.FlowPlanner to use generalized isomorphic sub-graph matching rules to apply platform specific plan
  assertions, transforms, and step partitioning.

2.7.1

  Fixed issue where c.p.GroupBy or c.p.CoGroup would fail if attempting to group or join incoming Fields.UNKNOWN
  tuple streams using relative positions in the grouping fields selectors.

  Fixed issue where c.u.ShutdownUtil could log a NPE if a hook is removed during JVM shutdown.

2.7.0

  Updated Riffle to 1.0.0.

  Deprecated c.f.h.ProcessFlow and related classes, which will be moved to a different package in Cascading 3.0.

  Fixed issue where trap c.t.Tap#commitResource() would not get called if c.f.Flow#complete() was not called.

  Added support for o.a.h.m.l.CombineFileInputFormat in the Hadoop specific c.t.h.PartitionTap implementation.

  Fixed issue where c.f.h.HadoopFlowStep was setting 'mapred.output.path' for non file based o.a.h.m.OutputFormat
  implementations (backport from wip-3.0 at 5e0493a).

  Added c.t.Tap#prepareResourceForRead() and c.t.Tap#prepareResourceForWrite() methods to allow for client side tap
  resource initialization.

  Fixed issue where a failure to open or write a trap would pass the throwable up to the prior trap. Failures on trap
  io will now result in a c.f.Flow failure.

  Fixed issue where c.t.TupleEntry#setTuple( Tuple tuple ) and c.t.TupleEntry#setCanonicalTuple( Tuple tuple ) would
  cause an NPE if given an null argument.

  Updated trap handling to capture diagnostic information within a trap when configured via a c.t.TrapProps instance.

  Added the c.t.TrapProp class to provide fine grained configuration over c.t.Tap traps per c.f.Flow or per
  c.t.Tap instances.

  Updated c.t.u.TupleHasher to use MurmurHash3 32bit for hashCode calculation. Users relying on the old hashCode
  implementation for partitioning can set "cascading.tuple.hadoop.util.hasherpartitioner.uselegacyhash" to true.

  Updated c.f.h.HadoopPlanner and c.f.h2.Hadoop2MR1Planner to log a warning if a flow is being run on the wrong version
  of Hadoop.

  Fixed issue where c.m.a.URISanitizer would fail parsing glob expressions.

  Added ability to provide a custom cache to be used in c.p.a.AggregateBy and c.p.a.Unique.

  Added ability to use custom properties in the various invoke methods in c.CascadingTestCase to simplify testing of
  functions, filters, buffers and aggregators.

  Updated c.f.h.ProcessFlow to support optional counters provided by Riffle based flows.

  Updated c.p.AppProps and c.p.UnitOfWorkDef to log a warning if a tag contains whitespace characters.

  Fixed issue where c.c.CascadeDef was allowing multiple flows with the same sink to be part of a Cascade.

  Updated c.f.h.MapReduceFlow to support both the org.apache.hadoop.mapred.* and org.apache.hadoop.mapreduce.* APIs.

  Fixed issue where c.t.TupleEntrySchemeIterator was not behaving correctly if #hasNext() is called multiple times
  without calling #next().

  Fixed issue where c.f.h.ProcessFlow would not report Exceptions to registered FlowListeners.

  Fixed issue where a start/stop race condition in c.c.Cascade could allow a downstream c.f.Flow to start when a
  predecessor fails.

2.6.3

  Updated c.p.Splice to throw an IllegalArgumentException if performing a self c.p.Merge on a split with no intermediate
  c.o.Operations after the split.

  Fixed issue where c.p.a.FirstBy would perform a comparison on the aggregating values when no j.u.Comparator was
  provided to the argument c.t.Fields selector.

  Updated local mode counter implementation to be thread-safe.

  Updated c.t.h.i.MultiRecordReaderIterator to use an existing o.a.h.m.Reporter if present.

  Fixed issues in c.f.h.FlowPlatformTest which caused the test go into an endless loop. Also increased timeout to make
  tests more reliable on slower hardware.

  Fixed issue where c.t.Tuple#set( Fields declarator, Fields selector, Tuple tuple ) did not honor given type
  information.

  Fixed issue where c.t.TupleEntry#set( TupleEntry tupleEntry ) could cause an NPE if complete type information is
  not provided.

2.6.2

  Fixed issue where c.s.h.SequenceFile default ctor would throw an NPE.

  Updated c.u.Version to warn if multiple 'cascading/version.properties' files are present on the classpath.

  Fixed issue where a c.p.a.Coerce constructor would throw a j.l.IllegalArgumentException on a valid types argument.

  Fixed issue where c.t.TapPlatformTest was not preserving properties coming from the TestPlatform when creating a Flow
  causing remote test failures.

  Fixed issue in c.p.h.Hadoop2MR1Platform causing tests to not properly run on a remote cluster when configured to do
  so.

2.6.1

  Updated c.p.h.Hadoop2MR1Platform to enforce settings to make local mode behave the same across distributions.

  Fixed issues where a c.f.Flow instance could be marked stopped while transitioning to a started state when used in
  a c.c.Cascade.

  Fixed issue where c.t.h.i.TapOutputCollector did not honor the current task o.a.h.m.Reporter instance on the
  cluster side. This should improve the accuracy of Hadoop counters wrapped by c.t.h.PartitionTap.

  Updated c.t.Tuple#isUnmodifiable to be transient to prevent the value from being serialized and restored resulting in
  an unmodifiable Tuple from a data source.

  Updated c.s.h.HadoopStepStats to reduce memory pressure when fetching TaskReports and TaskCompletionEvents from
  Hadoop 2.x.

  Updated c.p.h.HadoopPlatform to set 'mapreduce.jobtracker.staging.root.dir' to a fully qualified path for non-cluster
  tests.

  Fixed issue where c.u.Version was leaking file descriptors.

  Fixed issue where c.t.h.Hfs would not properly ignore 'hidden' files starting with '.' or '_' when listing children
  in a directory.

2.6.0

  Updated c.p.a.AggregateBy and c.p.a.Unique to count cache flushes, hits, and misses. Previously only AggregateBy
  tracked cache flushes.

  Updated slf4j to 1.7.5.

  Added ability to customize trace data captured for debugging purposes.

  Added CONTRIBUTING.md.

  Updated c.t.h.DistCacheTap to support simple file globing as provided by c.t.h.Hfs.

  Fixed issue where c.p.a.UniqueBy was not honoring the c.t.Hasher interface.

  Added c.t.h.DistCacheTap a decorator for a c.t.h.Hfs instance that uses o.a.h.f.DistributedCache to read files
  transparently from local disk. This is useful for c.p.HashJoins.

  Added c.t.DecoratorTap class to simplify wrapping a given c.t.Tap instance with additional meta-data.

  Updated c.f.p.FlowPlanner to allow both intermediate temporary c.t.Tap or any c.p.Checkpoint tap to be decorated
  by a configured c.t.DecoratorTap class via new c.f.FlowConnectorProps properties.

  Fixed issue where c.p.a.AggregateBy was not honoring the c.t.Hasher interface.

  Fixed issues around c.o.e.ExpressionFunction and c.o.e.ExpressionFilter either accepting Fields.NONE as incoming
  arguments, or inheriting incoming type information from the resolved arguments.

  Added c.m.a.URISanitizer, an implementation of the c.m.a.Sanitizer interface, for sanitizing URIs of different
  resources (file, HTTP, HDFS, JDBC etc.). c.t.Tap and all subclasses use it for the identifier.

  Fixed issue in c.f.h.ProcessFlow where the flowStats object would try to mark a flow as "STOPPED" even if it was
  already "FINISHED" causing an IllegalStateException.

  Added a new c.t.TupleEntrySchemeIterator property to set certain exceptions to be caught, ignored, and logged during
  read. Commonly java.io.EOFException is thrown and can be safely ignored. By default no exception will be ignored.

  Fixed issue in c.f.h.p.HadoopStepGraph where Traps would be ignored if the Flow had no operation ("copy flows").

  Updated Janino to 2.7.5.

  Added ability to add more meta information about a c.f.Flow, which can be read and used by a c.m.DocumentService.

  Fixed null handling problem in c.p.a.MaxBy and c.p.a.MinBy.

  Added Java Annotations to c.m.annotation for marking and granting access of custom properties to c.m.DocumentService
  implementations like the Driven plug-in. Instrumented core Operations, SubAssemblies, Taps, and Schemes.

  Updated Apache Hadoop to 2.4.1 in cascading-hadoop2-mr1.

2.5.6

  Updated for Cascading Fluid compatibility.

2.5.5

  Added new c.t.p.BasePartitionTap property to control to control the behaviour in case of an Exception while closing a
  c.t.TupleEntryCollector. Setting "cascading.tap.partition.failonclose" to "true" will cause the Exception to be
  rethrown as a c.t.TapException. When set to "false", the default, it will log the error and continue.

  Added custom error reporting for Hadoop standalone mode. The o.a.h.mapred.LocalJobRunner does not return
  o.a.h.mapred.TaskReports which would cause the actual Exception to be lost. c.f.h.FlowMapper and c.f.h.FlowReducer
  will now report the Exception directly to c.f.h.p.HadoopFlowStepJob. This has no influence on Jobs running on a
  real cluster.

  Fixed issue where c.f.h.HadoopFlowStep would not set a o.a.h.mapred.Partitioner that supports custom c.t.Hasher
  implementations during partitioning. c.t.h.u.GroupingPartitioner has been renamed to
  c.t.h.u.GroupingSortingPartitioner and a new c.t.h.u.GroupingPartitioner has been introduced that uses the hashCode
  of the tuples while honoring custom hashers.

  Fixed issue where the ctor of c.t.Fields was not checking the given types for null values.

  Fixed issue where Hadoop credentials could be shared across job submissions and become corrupted
  causing j.i.EOFExceptions.

  Fixed issue where c.t.Fields#resolve() would lose type information with complex selectors.

  Added new c.f.h.p.HadoopPlanner property to disable adjacent tap removal optimization. Setting
  "cascading.multimapreduceplanner.collapseadjacentaps" to false will disable the optimization that is on by default.
  This optimization can in a few cases reduce the number of MR jobs, but without consistent type information, could
  result in a type mismatch errors during joins.

2.5.4

  Fixed an issue where c.t.h.Hfs#getChildIdentifiers() could throw an j.l.StringIndexOutOfBoundsException.

  Updated c.p.a.AggregateBy$CompositeFunction to not use the capacity in #equals or #hashCode.

  Fixed issue where a c.p.Merge could hide the streamed/accumulated nature of a stream when leading to a c.p.Group
  pipe. This could result in duplicate data passed to the c.p.GroupBy or c.p.CoGroup within a MapReduce job.

  Fixed issue where c.p.a.FirstBy only accepted a single field name.

  Updated c.t.p.PartitionCollector in c.t.p.BasePartitionTap to be public.

2.5.3

  Updated c.f.h.ProcessFlow to include missing status changes.

  Deprecated both c.t.l.TemplateTap and c.t.h.TemplateTap for the respective PartitionTap.

  Updated c.p.Pipe and c.p.SubAssembly to cache any resolved name as its own name to improve #hashCode() performance.

  Fixed issue where c.t.Fields#merge() did not honor underlying Fields type information properly.

  Fixed issue where c.t.Fields#getType() attempted to resolve position when there is no associated type information.

2.5.2

  Updated c.t.TupleEntryCollector javadoc to clarify re-use of c.t.Tuple instances.

  Updated c.t.h.Hfs to log a warning and disable o.a.h.m.l.CombineFileInputFormat (if enabled) if
  c.t.h.HfsProps#isCombineInputSafeMode is true but the current o.a.h.mapred.InputFormat is not
  a o.a.h.mapred.FileInputFormat.

  Updated c.p.h2.Hadoop2MR1Platform to return a name consistent with other resources and artifacts for Hadoop2 MR1.

  Fixed issue in c.o.f.Logic filter sub-classes where argumentFields was not properly set causing some nested
  c.o.Filter instances to fail.

2.5.1

  Updated c.t.h.Hfs to throw an exception if the o.a.h.m.l.CombineFileInputFormat is enabled but the wrapped
  o.a.h.mapred.InputFormat is not a o.a.h.mapred.FileInputFormat.

  Fixed issue in c.c.Cascade where a race condition during start/stop/complete could result in state exception.

  Updated Hadoop 1 platform tests to enable default num task retries.

2.5.0

  Updated c.f.BaseFlow to fail when deleting resources fails.

  Updated c.t.h.PartitionTap to append sequence numbers to part files to prevent filename collisions within a task.

  Added the c.f.FlowStepListener listener interface and subsequent listener support to c.f.FlowStep. @Ahmed--Mohsen

  Updated Hadoop 1 dependency to use Hadoop 1.2.1.

  Updated c.f.h.p.HadoopFlowStepJob to call kill only jobs on that are not complete. In theory calling kill on a
  completed job should have no effect, but resulting logs could be confusing during postmortem.

  Added c.t.l.PartitionTap and c.t.h.PartitionTap to replace c.t.l.TemplateTap and c.t.h.TemplateTap respectively. The
  PartitionTap can be used as both a sink and source and provides pluggable partitioning via c.t.p.Partition.

  Added c.p.j.BufferJoin as a convenience to flag to the planner the following c.o.Buffer implements a join strategy.

  Updated c.o.BufferCall to allow access to the current c.p.j.JoinerClosure to allow for more complex join operations
  to be built out within a c.o.Buffer implementation.

  Added support for Apache Hadoop 2 and YARN.

2.2.1

  Updated Hadoop platform to fail during planning if "mapred.job.tracker" is not set.

  Updated c.t.h.Hfs to improve duplicate identifier check performance. @gianm

  Fixed issue where resolved fields were not properly presented to c.t.MultiSinkTap child c.t.Tap and c.s.Scheme
  instances preventing header information from being written in the case of TextDelimited files.

  Fixed issue where the number of fields parsed by c.s.u.DelimitedParser were greater than those declared could cause
  an j.l.ArrayIndexOutOfBoundsException.

  Fixed issue where a race condition could cause a NPE between c.c.Cascade#start() and Cascade#stop().

2.2.0

  Fixed issue where c.p.CoGroup in local mode did not properly handle joins where the grouping j.u.Comparator
  did not treat null values as equal. SQL semantics expect null values to not be equivalent. c.p.HashJoin
  does not support non-equality between null and will issue a warning.

  Updated c.p.a.AggregateBy sub-classes to pass 0 as default capacity value to allow the system default value
  to be honored.

  Added c.o.a.MaxValue and c.o.a.MinValue c.o.Aggregator sub-classes to replace c.o.a.Max and c.o.a.Min classes
  respectively. MaxValue and MinValue rely on the values compared to be j.l.Comparable types resulting in a simpler
  implementation and support for max/min of non numeric types.

  Fixed issue where c.o.t.DateParser would drop incoming Tuples if the argument was null.

  Fixed issue where c.t.Hasher was not honored during grouping in local mode.

  Updated c.t.h.GlobHfs to use fewer resources when deriving member identifiers.

  Updated c.t.h.HadoopTapPlatformTest to skip the c.t.h.Dfs test if HDFS filesystem is unavailable on the current
  configuration.

  Fixed issue where c.t.h.Hfs#resourceExists() could fail is the identifier represented a file globing pattern.

  Changed regex j.u.r.Pattern builder methods on c.s.u.DelimitedParser from static to instance methods.

  Updated c.t.TupleEntry to issue a warning if an "unmodifiable" c.t.Tuple is set via #setTuple() on a "modifiable"
  TupleEntry instance. This typically is an indicator the Tuple instance is about to be cached and/or modified at a
  later point. Unmodifiable, system created, Tuples should never be cached.

  Added c.t.TupleEntry#selectInto() to provide a more efficient way to copy values from one c.t.Tuple into another.

  Added c.t.TupleEntry#selectTupleCopy() and #selectEntryCopy method to always provide a modifiable and cacheable
  instance.

  Fixed issue where c.t.TupleEntry#selectTuple() and #selectEntry() could return a unmodifiable or un-cacheable
  c.t.Tuple or TupleEntry depending on the given c.t.Fields selector.

  Fixed issue where c.t.MultiSourceTap could keep too many open resources if #openForRead() is called directly.

  Fixed issue where c.o.Buffer#flush() was never called.

  Fixed issue where an exception at #close() on step state reader could mask more prominent errors.

  Fixed issue where the c.t.TupleEntryCollector was not set to "null" on the c.o.OperationCall before
  c.o.Operation#cleanup() was called to prevent the method from emitting values during cleanup. See Operation#flush().
  Use "cascading.compatibility.retain.collector" to disable.

  Fixed issue where c.f.h.ProcessFlow would not honor c.f.FlowListener instances. Currently does not support
  the #onThrowable event.

  Updated c.p.a.Unique to use c.o.b.FirstNBuffer to improve performance.

  Added c.o.b.FirstNBuffer to provide a faster implementation of returning the first N tuples encountered in a grouping.

  Updated junit to version 4.11.

  Update default Apache Hadoop support to version 1.1.x. End support for 0.20.2.

  Updated c.f.FlowDef to accept classpath elements that allow for pipe assemblies to load additional resources
  from the current context j.l.ClassLoader.

  Updated error messages in c.t.Fields, delegate property initialization to c.f.Flow sub-classes. @fderose

  Removed Hadoop oro dependency from build and test runtime classpaths to stop transient build failures.

  Added ability to pass System level properties into platform level property sets to override defaults during testing.

  Fixed issue where c.t.l.FileTap#getFullIdentifier() was not returning the fully qualified path.

  Added c.t.h.HfsProps to localize optional Hadoop HDFS specific properties, specifically provides properties for
  enabling the combining of small files into larger splits.

  Updated c.t.h.Hfs to allow for smaller files to be combined into fewer splits, thus fewer map tasks. @sjlee

  Updated c.p.SubAssembly to support setting local and step properties via the c.p.ConfigDef.

  Updated c.o.Buffer to allow implementations to disable nulling of non-grouping fields after the arguments iterator
  has completed. This simplifies appending aggregated fields to the incoming tuple stream.

  Updated c.t.Fields to return appending value when calling Fields#append on Fields.NONE and optimized Fields#subtract
  when subtracting Fields.NONE.

  Added c.f.AssemblyPlanner interface to allow for platform independent generative c.f.Flow planning.

  Fixed issue in local mode where an OOME could cause a cascading set of additional OOMEs making the jvm unstable.

  Updated c.f.s.MemoryCoGroupGate and c.f.l.s.LocalGroupByGate to drain internal collections when pipelining
  tuples downstream in the pipeline.

  Added c.t.h.BigDecimalSerialization to allow Hadoop to serialize and deserialize j.m.BigDecimal instances.

  Update slf4j to version 1.7.2.

  Added coercion support for j.m.BigDecimal.

  Added c.p.PlatformSuite annotation allowing a c.PlatformTestCase sub-class to be marked as being a JUnit suite
  of tests accessible, by default, via a static "suite" method.

  Updated provided c.s.Scheme subclasses to honor field type information.

  Updated c.o.expression, c.o.aggregator, and c.p.assembly operations to honor field type information.

  Updated c.o.Identity and c.p.a.Coerce to uses field type information during coercion.

  Added c.t.t.CoercibleType interface to allow for customization of individual field data types and formats. Also
  added the c.t.t.DateType implementation for managing string formatted dates to and from a long timestamp.

  Updated c.p.Splice to fail during planning if grouping or merging fields do not share the same field types, unless
  the field in question has a j.u.Comparator to handle the incompatible comparisons.

  Fixed issue where a c.p.CoGroup join on Fields.NONE would fail during planning.

  Updated c.p.a.Unique to optionally filter out null values.

  Added c.o.e.ScriptFunction, ScriptTupleFunction, and c.o.e.ScriptFilter operations to allow for more expressive
  Java scripts.

  Added "test.platform.includes" system property so tests can be limited to specified platforms.

  Added c.p.a.MaxBy and c.p.a.MinBy c.p.a.AggregateBy sub-classes to perform max and min, respectively.

  Updated c.p.a.SumBy and c.p.a.AverageBy to honor result fields type declaration by coercing the result to the
  declared type.

  Updated c.p.a.CountBy to count all value occurrences, non-null values, or only null values, within a grouping. Using
  grouping Fields.NONE provides an efficient count for a set of columns. Counting distinct values is not supported.

  Updated c.t.Fields to accept type information and to propagate type values along with fields.

  Updated c.s.l.TextDelimited and c.s.h.TextDelimited to take c.s.u.DelimitedParser on the constructor to allow
  for overriding parsing behavior. DelimitedParser now takes a c.s.u.FieldTypeResolver to allow for field name
  permutations during source and sink, and type inference from field names.

2.1.6

  Updated c.p.SubAssembly to throw UnsupportedOperationException on #getConfigDef() and #getStepConfigDef() calls.

  Fixed issue where join field level c.t.Hasher instances were not honored during a c.p.HashJoin.

  Fixed issue where a j.l.StackOverflowError would be thrown if the Hadoop mapred.input.format.class property
  was not set.

  Updated c.t.Fields#size() to return Fields.NONE on size == 0, instead of failing.

  Fixed issue where Fields.REPLACE on an incoming Fields.UNKNOWN could result in a
  java.lang.ArrayIndexOutOfBoundsException during runtime.

  Updated c.s.h.HadoopStepStats counter caching strategy to make a final attempt even if max timeouts have been
  met. Added "cascading.step.counter.timeout" property to allow tuning of timeout period.

2.1.5

  Updated c.t.h.u.BytesComparator to implement c.t.Hasher as a convenience.

  Fixed issue where c.c.CascadeListener was receiving null as the c.c.Cascade parameter.

2.1.4

  Added ability to capture frameworks used in an application via c.p.AppProps.

  Restored platform test compatibility with Cascading 2.0.x via return of c.p.PlatformRunner.Platform annotation
  and deprecated c.t.LocalPlatform and c.t.HadoopPlatform platform implementations.

2.1.3

  Fix for extra trailing ']' in c.t.Tap#toString().

  Fix for c.f.FlowProcess#getNumProcessSlices() incorrectly returning zero in local mode, should be 1.

  Fix for c.p.a.AggregateBy not honoring the global system property capacity value if not overridden on the ctor.

  Fix for NPE if c.f.FlowProcess returns null config.

  Fixed issue where a c.f.FlowStep would attempt to detect if it should be skipped regardless of whether the "runID"
  had been set or not on the c.f.Flow enabling restartable flows.

2.1.2

  Fixed issue where c.f.FlowProcess#openForWrite on Hadoop would re-use the existing o.a.h.m.OutputCollector instance
  as that used in the current task.

  Fixed issue where fetching remote Hadoop counter values could block indefinitely. Fetching remote counters is now
  serialized across jobs to prevent deadlocks inside the Hadoop API and counter values are now cached with a final
  refresh on job completion.

  Fixed issue where NPE could be thrown by c.s.CascadingStats#getCounterValue if given counter had no value.

2.1.1

  Fixed issue where c.s.h.TextDelimited would not honor charsetName.

  Fixed issue where c.t.BaseTemplateTap would lose parent fields if they were declared as Fields.ALL.

  Fixed issue where c.t.Fields#append would not include current Fields instance when appending an array of Fields
  instances.

  Fixed issue where subsequent c.p.Merge pipes in a pipeline path would obscure prior Merges preventing a c.t.Tap
  insertion during planning resulting in a missing Tap configuration resource property.

  Fixed NPE with c.s.l.TextDelimited when line after header was null.

  Fix for c.s.u.DelimitedParser not fully honoring the default strict parsing policy. This resolution may cause
  some text delimited files to fail if they have arbitrary numbers of fields.

  Added quote and delimiter getters to c.s.l.TextDelimited and c.s.h.TextDelimited.

  Fixed issue where a c.f.FlowStep being skipped was not considered successful after 2.0.7 merge.

2.1.0

  Added c.t.t.FileType interface to mark specific platform c.t.Tap classes as representing a file like interface.

  Fixed issue where c.p.a.Coerce would coerce a null value to 0 if the coerce type was a j.l.Number
  instead of a numeric primitive, or false if the coerce type was j.l.Boolean instead of boolean.

  Fixed issue where c.s.u.DelimitedParser did not honor number of field found in a text delimited file header.

  Fixed issue where c.t.Tap#openForWrite did not honor the c.t.SinkMode#REPLACE setting.

  Added version update check to print out latest available release. Use system property cascading.update.skip=true
  to disable.

  Updated all tuple stream permutations to minimize new c.t.Tuple instantiations and maximize upstream Tuple reuse.

  Updated janino to version 2.6.1.

  Updated c.s.l.TextLine, c.s.l.TextDelimited, c.s.h.TextLine, and c.s.h.TextDelimited to encode/decode any supported
  j.n.c.Charset.

  Fixed issue where c.o.t.DateParser may throw an NPE if the value to be parsed was null.

  Added c.p.Props#buildProperties( Iterable<Map.Entry<String, String>> defaultProperties ) to allow for re-using
  and existing o.a.h.m.JobConf instances as default properties.

  Added c.p.a.FirstBy partial aggregator to allow for capturing first seen c.t.Tuple in a Tuple stream. Argument
  c.f.Fields j.u.Comparators are honored for secondary sorting.

  Updated c.p.a.AggregateBy to honor argumentField c.f.Fields j.u.Comparator instances for secondary sorting.

  Updated c.o.a.First to accumulate the first N seen c.t.Tuple instances.

  Added support for c.c.CascadeListener on c.c.Cascade instances.

  Updated c.p.j.InnerJoin.JoinIterator and sub-classes to re-use c.t.Tuple instances.

  Added support for restartable checkpoint c.f.Flow instances by providing a runID to identify run attempts.

  Updated build and tests to simplify development of alternative planners.

2.0.8

  Updated c.m.CascadingServices to more robustly load optional services. Service agent jar may now be optionally defined
  in a cascading-service.properties file from the CLASSPATH with the "cascading.management.service.jar" property.

2.0.7

  Fixed issue where c.t.Tap instances were not presented resolved c.t.Fields instances in local mode during planning.

  Fixed issue where Hadoop forgets past job completion status of a job during very long running c.f.Flows and
  throws a NPE when queried for the result.

2.0.6

  Added "cascading.step.display.id.truncate" property to allow simple truncation of flow and step ID values in
  the step display name.

  Fixed issue where attempting to iterate the left most side of a join more than once would silently fail on the
  Hadoop platform.

  Fixed issue where step state was not properly removed from the Hadoop distributed cache during cleanup.

  Fixed issue where c.f.Flow#writeStepsDot() would fail if a Flow c.f.FlowStep had multiple sinks.

  Fix for c.t.h.i.MultiInputFormat throwing j.l.java.lang.ArrayIndexOutOfBoundsException when there aren't any
  actual o.a.h.m.FileInputFormat input paths.

  Fix for c.t.h.i.MultiInputFormat throwing j.l.IllegalStateException on an empty child o.a.h.m.InputSplit array.

  Fix for j.l.IndexOutOfBoundsException thrown on an empty c.c.Cascade.

  Fix for c.t.c.SpillableProps#SPILL_COMPRESS not being honored if set to false.

2.0.5

  Updated c.f.p.ElementGraphException messages to name disconnected elements.

  Properly scope c.t.Tap properties to c.f.l.LocalFlowStep and then pass them to source/sink stages in
  c.f.l.s.LocalStepStreamGraph. @mrwalker

  Fix for c.s.u.DelimitedParser to support delimiter as last char in quoted field.

  Fix for c.o.f.UnGroup constructor failing against correct constructor values.

  Added missing setter methods on c.p.AppProps for application jar path and class values.

  Fix for possible NPE when debug logging is enabled during planning.

  Improved error message when Hadoop serializer for a given type cannot be found in some cases.

2.0.4

  Removed remnant log4j dependency in c.t.h.i.MultiInputSplit.

  Fixed issue where c.t.Tap may fail resolving outgoing fields.

  Added missing #equals() method to c.t.TupleEntry that will honor field j.u.Comparator instances.

  Fixed issue where c.f.s.SparseTupleComparator would not properly sort with re-ordered sort fields.

  Fixed issue where c.t.TupleEntryChainIterator#hasNext() would fail if called more than once.

  Updated c.t.h.Hfs internal methods call #getPath() instead of #getIdentifier() so sub-classes can override.

  Updated the #verify() methods on c.s.l.TextLine and c.s.h.TextLine to be protected.

2.0.3

  Fixed issue where the c.f.p.FlowPlanner would allow declared fields in a checkpoint c.t.Tap instance.

  Fixed issue where c.f.Flow#writeStepsDot() would fail if the Flow was planned by the local mode planner.

  Added c.f.h.u.ObjectSerializer to allow for custom state serializers. To override the default
  c.f.h.u.JavaObjectSerializer, specify the name of a class that implements ObjectSerializer (and optionally
  implements o.a.h.c.Configurable) via the "cascading.util.serializer" property. @sritchie

2.0.2

  Added cascading.version property to Hadoop job configuration.

  Removed tests for deprecated method c.t.Tuple#parse().

  Fixed error message in c.s.u.DelimitedParser where parsed value was not being reported.

  Updated c.s.h.TextLine and c.s.l.TextLine to ignore planner presented fields to allow instances to be re-used.

  Changed c.t.c.SpillableTupleList to use j.u.LinkedList to reduce memory footprint when backing a
  c.t.c.SpillableTupleMap.

  Fixed issue where c.p.Merge into the streamed side of a c.p.HashJoin would produce an incorrect plan.

  Fixed issue where c.p.CoGroup was not properly resolving fields from immediate prior c.p.Every pipes.

2.0.1

  Changed c.s.h.TextDelimited to use fully qualified path when reading headers so that the filesystem scheme
  will be inherited.

  Removed redundant property value kept by c.t.h.i.MultiInputSplit to reduce input split serialized size.

  Updated commit and rollback functionality in c.f.BaseFlow and c.f.p.BaseFlowStep to fail the c.f.Flow on a
  c.t.Tap#commitResource failure and to call Tap#rollbackResource on subsequent tap instances. Note this isn't
  intended to provide a 2PC type transactional functionality.

  Updated dependency to Hadoop 1.0.3

2.0.0

  Added c.p.Checkpoint pipe to force any supported planners to persist the tuple stream at that location. If bound to
  a checkpoint c.t.Tap via the c.f.FlowDef, this data will not be cleaned up after the c.f.Flow completes. This pipe
  is useful in conjunction with a c.p.HashJoin to minimize replicated data.

  Added c.t.l.TemplateTap for local mode. Refactored out c.t.BaseTemplateTap to simplify support for additional
  platforms.

  Added c.t.l.StdIn, StdOut, and StdErr local mode c.t.Tap types.

  Changed c.f.h.HadoopFlowStep to save step state to the Hadoop distributed cache if larger than Short.MAX_VALUE.

  Fixed issue where a null value was printed as "null" in c.o.r.RegexMatcher, c.o.r.RegexFilter, c.o.a.AssertGroupBase,
  and c.o.t.FieldJoiner.

  Updated dependency to Hadoop 1.0.2.

  Changed c.s.h.TextDelimited and c.s.l.TextDelimited to optionally read the field names from from the header during
  planning if skipHeaders or hasHeaders is set to true and if Fields.ALL or Fields.UNKNOWN is declared on the
  constructor.

  Changed the planner and added new methods to c.s.Scheme so that field names can be retrieved after a proper
  configuration has been built, but before the planner resolves fields internally. This is useful for reading field
  names from a header of a text file, or meta-data in a binary file. These methods are optional.

  Fixed issue where any c.p.Splice following a c.p.Merge may be unable to resolve the tuple stream branch.

  Added support for c.p.ConfigDef on c.p.Pipe and c.t.Tap classes to allow for process and pipe/tap level
  property values. Where process allows a Pipe or Tap to set c.f.FlowStep specific properties.

  Added c.p.Props base and sub-classes to simplify managing Cascading and Hadoop related properties.

  Added c.m.UnitOfWorkSpawnStrategy interface to allow for pluggable thread management services. Also added
  c.m.UnitOfWorkExecutorStrategy class as the default implementation.

  Added typed set and add methods to c.t.Tuple and c.t.TupleEntry.

  Changed packages for many internal types to simplify documentation.

  Changed c.f.Flow and c.f.FlowStep to interfaces to hide internal only methods.

  Added support for trapping actual raw input data as read by a c.s.Scheme during processing by allowing
  c.t.TupleException to accept a payload c.t.Tuple instance with the data to be trapped. Updated c.s.h.TextDelimited
  and c.s.l.TextDelimited to provide a proper payload when sourcing and parsing text.

  Fixed issue where a c.p.GroupBy following a c.p.Every could not see result Aggregator fields from the Every instance.

  Changed c.s.h.TextDelimited and c.s.l.TextDelimited to optionally write headers if writeHeaders or hasHeaders
  is set to true. If Fields.ALL or Fields.UNKNOWN is declared, during sinking the field names will be resolved
  at runtime.

  Added the c.t.TupleCollectionFactory and c.t.TupleMapFactory interfaces and relevant implementations to allow
  custom c.t.Spillable types to be plugged into a given execution. Spillable types are used to back in memory
  collections to disk to improve scalability of c.p.CoGroup and c.p.HashJoin pipes on different platforms.

  Fixed issue where a c.s.Scheme was not seeing properly resolved fields if they were not declared in the Scheme
  instance. This allows a Scheme declared to sink c.t.Fields#ALL to see the actual field names during the
  Scheme#sinkPrepare() and Scheme#sink() methods.

  Changed c.t.TupleEntrySchemeSelector#prepare method to protected and is now called lazily internally during
  the first add method. This should simplify custom c.t.Tap development and allows for lazily setting of resolved
  sink fields.

  Fixed issue where the grouping Tuple resulting from a c.p.CoGroup did not properly reflect all the current
  grouping keys and field names. This fix allows an c.o.Aggregator or c.o.Buffer see which fields are null, if at all,
  during an "outer" join type. resultGroupFields parameter now must reflect all joined fields as well.

  Fixed issue where a c.p.GroupBy merge of branches with the same names threw a NPE.

  Fixed issue where c.p.a.AggregateBy.AveragePartials functor was using fixed declared fields.

  Added the "cascading.aggregateby.capacity" property so that a default capacity can be set for the
  c.p.a.AggregateBy sub-assemblies.

  Added the c.m.UnitOfWork interface to give c.f.Flow and c.c.Cascade a common contract.

  Changed c.t.h.TupleSerialization#setSerializations() to force TupleSerialization and o.a.h.i.s.WritableSerialization
  are first in the "io.serializations" list.

  Added support for properties scoped at the pipe or process scope. Process scope properties will be inherited by
  the current job if any.

  Added c.t.SpillableTupleMap to allow durable groups during asymmetrical joins.

  Changed c.t.SpillableTupleList to implement c.u.Collection and c.t.Spillable interfaces.

  Renamed the c.p.Group class to c.p.Splice and created a c.p.Group interface. c.p.Groupby, CoGroup, Merge, and HashJoin
  are all c.p.Splice types. Only GroupBy and CoGroup are c.p.Group types.

  Moved all "joiners" to c.p.joiner package from c.p.cogroup as they are now shared with the c.p.HashJoin pipe.

  Added c.p.HashJoin pipe to join two or more streams by a common key value without blocking/accumulating the largest
  data stream. This differs from c.p.CoGroup in that there is no grouping or sorting, and on the MapReduce platform,
  no Reduce task. The is commonly known as an asymmetrical or replicated join.

  Changed c.t.h.TupleSerialization#setSerializations() to always include o.a.h.i.s.WritableSerialization as some
  Hadoop versions do not include it if omitted. WritableSerialization is required by c.t.h.MultiInputSplit.

  Added c.p.Merge pipe to create a union of multiple tuple streams. This differs from c.p.GroupBy in that there
  is no grouping or sorting, and on the MapReduce platform, no Reduce task.

  Added c.t.Tap#commitResource() && #rollbackResource() to allow the underlying resource to be notified write processing
  has successfully completed or has failed, respectively, so that any additional cleanup or processing may be completed.

  Added c.t.Hasher to allow any field level Comparators to have hashCode generation delegated to them for their
  respective c.t.Tuple element/field value.

  Added c.f.FlowStepStrategy interface to allow customization of c.f.p.FlowStep configuration information.

  Changed c.f.Flow to lazily test child source taps for modified time to reduce file meta-data queries.

  Changed c.t.CompositeTap#getChildTaps to return an j.u.Iterator to allow for lazy resolution of child tap instances.

  Added "cascading.default.comparator" property to allow for a default j.u.Comparator class to be set and used
  if no Comparator is returned by the c.t.Comparison interface or set on a c.t.Fields instance.
  See c.t.h.TupleSerialization for the static accessor.

  Changed planner to allow traps to be re-used across any branches. Prior planner would throw an error.

  Changed c.f.Flow to delete traps during the same conditions a sink will be deleted before execution.

  Fixed issue where the c.t.h.TemplateTap would not properly remove Hadoop temporary directories on completion.

  Changed the behavior of traps to capture operation argument values instead of all the incoming values so that it is
  simpler to identify the values causing the failure and reduce the data stored in the trap and log files, which
  record a truncated stringified version of the argument tuple.

  Updated c.f.h.MapReduceFlow to allow source/sink/trap create methods to be overridden by a sub-class in order
  to support path identifiers not compatible with the Hadoop FS.

  Changed c.f.FlowProcess increment methods to take a long instead of int type.

  Fixed issue where the c.t.h.TemplateTap would not properly handle pathFields value if set to c.f.Fields.ALL.

  Fixed issue where the c.p.a.AB.CompositeFunction was not getting flushed when planned into a reduce task.

  Renamed c.p.a.Shape to c.p.a.Retain, as it retains given fields, and created c.p.a.Discard to perform the opposite
  function or discarding given fields.

  Added c.o.NoOp operation to allow fields to be dropped from a stream when used with c.t.Fields.SWAP.

  Added c.T.Fields.NONE to denote no fields in a c.t.Tuple.

  Added shutdown hook for the c.c.Cascade class so during jvm shutdown #stop() will be called forcing proper state
  change.

  Changed #stop() to push from c.c.Cascade down through c.f.Flow and c.f.p.FlowStep instances.

  Added new JUnit runner for injecting platform dependencies into c.t.PlatformTestCase subclasses. Subclasses should
  use c.t.PlatformRunner.Platform Annotation to specify relevant c.t.TestPlatform instances.

  Changed test and assertion helper methods on c.t.CascadingTestCase static to remove subclassing requirement.

  Upgraded to support JUnit 4.8.x.

  Changed license from GPLv3 to APLv2.

  Changed c.f.Flow to prevent #complete() from returning while #stop() is executing. Should prevent certain kinds
  of race conditions when a shutdown hook is used, from a different thread, to stop running flows.

  Added support for gradle.

  Renamed c.f.FlowSkipIfSinkStale to FlowSkipIfSinkNotStale to match the semantics.

  Added support for c.f.Flow tags via the c.f.FlowDef class.

  Added c.f.FlowDef to allow for creating flow definitions via a fluent builder interface.

  Added STARTED and SUBMITTED status to c.s.CascadingStats to properly track when a job is submitted vs when it actually begins
  processing after being queued.

  Added management interfaces for capturing detailed statistics.

  Decoupled core from Apache Hadoop, removed stack based streaming model. Use c.f.h.HadoopFlowConnector to plan
  Hadoop specific flows.

  Implemented 'local' mode to support independent processing of complex processes in memory. Use
  c.f.l.LocalFlowConnector for local mode specific flows.

  Updated and simplified c.t.Tap and c.t.Scheme interfaces. Changes are not backwards compatible to 1.x releases.

  Implemented new pipelining infrastructure to support more complex streaming topologies.

1.2.6

  Fixed bug in TupleEntry#selectInteger() and marked it as deprecated.

1.2.5

  Removed accidental SLF4J dependencies.

  Fixed bug where ISE was thrown if c.f.Flow#stop() was called immediately after #start().

1.2.4

  Added info logging of current split input path with a task, if any.

  Fixed bug in c.o.f.And, c.o.f.Or, and c.o.f.Xor where the sub-select of arguments was not honored.

  Added info log message when writing "direct" to a filesystem, bypassing the temporary folder removing the need to
  rename the output file to its target location.

  Fixed bug where if all paths that match a glob pattern are empty, an exception is not thrown causing Hadoop to throw
  a java.lang.ArrayIndexOutOfBoundsException.

  Updated planner to issue an error message if a tail c.p.Pipe instance doesn't not properly bind to a c.t.Tap instance.

1.2.3

  Added c.f.Flow#setMaxConcurrentSteps to set the maximum number of steps that can be submitted concurrently.

  Fixed bug where NPE was thrown when c.c.CascadeConnector tried to unwind nested c.t.MultiSourceTap instances.

  Fixed bug where c.t.Fields#append() would fail when appending unordered selectors.

  Updated c.f.FlowProcess to include #isCounterStatusInitialized() to test if the underlying reporting framework
  is initialized.

  Updated c.f.FlowProcess#keepAlive() method to fail silently if the underlying reporting framework is not initialized.

  Updated error message thrown by c.f.FlowStep when unable to find c.t.Tap or c.p.Pipe instances in the flow plan due
  to a Class serialized field not implementing #hashCode() or #equals() and relying in the object identity.

  Added error message explaining the Hadoop mapred.jobtracker.completeuserjobs.maximum property needs to be increased
  when dealing with large numbers of jobs. Also caching success value to lower chance of failure.

  Fixed bug in c.t.GlobHfs where #equals() and #hashCode() were not consistent between calls.

1.2.2

  Fixed bug where OOME caught from within the source c.t.Tap was not being re-thrown properly.

  Added #getMapProgress() and #getReduceProgress() to c.f.h.HadoopStepStats.

  Fixed NPE with some invocations of c.t.TupleEntry ctor.

  Fixed bug where if an operation declared it returned Fields.ARGS and the argument selector used positions, the
  outgoing values may merge incorrectly.

1.2.1

  Changed info message to not announce ambiguous source trap if none has been set.

  Fixed bug where if the c.o.Function result c.t.Tuple was passed immediately to a c.p.Group, it may become modified.

  Fixed bug where c.t.TupleEntryIterator#hasNext() failed if called again after returning false.

  Fixed issue where reduce task may fail with a OOME during sorting.

1.2.0

  Added c.p.a.AverageBy sub-assembly for optimizing averaging processes.

  Added c.p.c.GroupClosure#getFlowProcess method to allow c.p.c.Joiner implementations access to current
  properties and counters.

  Added c.s.CascadingStats methods for accessing available counter groups and names.

  Added c.s.WritableSequenceFile as a convenience for reading/writing sequence files holding custom Hadoop
  Writable types in either they key, value, or key and value positions.

  Added retrieve/publish support to the Conjars repo via Ivy.

  Added the c.p.a.AggregateBy class to encapsulate parallel partial Function aggregations and their reduce
  side Aggregator. This is a superior alternative to so called MapReduce Combiners. See javadoc for details.

  Changed c.o.Debug to print the number of tuples encountered on #cleanup().

  Changed c.s.TextDelimited to always return the expected number of fields even if they are not parsed from
  the current line and strict is false, unless Fields.ALL or Fields.UNKNOWN is declared.

  Added c.p.a.SumBy sub-assembly for optimizing summing processes.

  Added c.p.a.CountBy sub-assembly for optimizing counting processes.

  Added c.s.CascadingStatus.Status.Skipped state so skipped c.f.Flow instances can be identified.

  Added c.f.Flow#setSubmitPriority() to allow for custom order of Flows.

  Fixed bug where c.t.MultiSourceTap#pathExists() would return true if one of the child paths was missing.

  Changed c.c.CascadeConnector to fail if it detects cycles in the set of given c.f.Flow instances to manage.

  Disable Hadoop warning about not using "options parser".

  Added #isSource() and #isSink() methods to c.s.Scheme so that some Scheme instances can report they are either
  sink or source only.

  Added c.t.Fields#merge() method to allow simple merging of Fields instances which discarding duplicate names and
  positions.

  Added convenience methods on c.c.CascadeConnector#connect() and c.f.FlowConnector#connect() to accept
  j.u.Collection<Flow> and j.u.Collection<Pipe> arguments, respectively.

  Added Riffle support via the new c.f.ProcessFlow wrapper class. Riffle allows for non-Cascading jobs and/or
  sets of iterative Flows to participate in a c.c.Cascade.

  Changed c.c.Cascade instances to disable parallel execution if more than one Flow is a local only job.

  Added c.c.Cascade#setMaxConcurrentFlows() property that limits the number of concurrently running Flows.

  Added c.c.Cascade#writeDOT method for visualizing the dependencies between flow instances.

  Added c.p.a.Unique sub-assembly for optimizing de-duping processes.

  Changed c.s.TextDelimited to accept Fields.ALL or Fields.UNKNOWN for arbitrarily sized or unknown records.

  Changed c.t.MultiSourceTap to support #openForRead().

  Added c.t.Comparison and c.t.StreamComparator interfaces which allow for custom types to be
  lazily deserialized during sort comparisons.

  Added support for lazy deserialization during c.t.Tuple comparisons while shuffle sorting.

1.1.3

  Added publishing of artifacts to the conjars.org jar repo via Ivy.

  Added method c.s.CascadingStats#getCurrentDuration to return the current execution duration whether or not the
  process/work is finished.

  Fixed issues where c.t.Fields#getIndex may return invalid results if accessed from multiple threads simultaneously.

  Fixed NPE when attempting to increment a counter before the first map/reduce invocation. Now throws a more
  informative ISE message.

  Fixed possible NPE when accessing counters via c.f.h.HadoopStepStats.

  Fixed bug in c.s.TextDelimited where some unquoted empty values would not be properly parsed.

  Added c.f.FlowStep#setName() method to allow override of MR job names. Use in conjunction with
  FlowStep#containsPipeNamed() to find appropriate steps.

  Fixed bug where c.f.MultiMapReducePlanner did not detect a split after a c.p.GroupBy or c.p.CoGroup where
  one or more of the immediate pipes is an c.p.Every instance. An Each split is allowed.

  Fixed c.t.TupleEntry#set method so that it may take a c.t.Fields instance for a field name.

  Fixed NPE in c.t.TempHfs when parent c.f.Flow is used in a Cascade under certain conditions.

  Fixed bug where mixed absolute and relative paths didn't not result in a proper topological sort when used
  in a c.c.Cascade.

  Fixed bug where a c.c.Cascading of c.f.Flow and c.f.MapReduceFlow instances did not properly sort topologically.

  Added c.c.Cascade#writeDOT method to simplify debugging Cascade instances.

1.1.2

  Fixed bug preventing c.s.TextDelimited schemes from being used with a c.t.TemplateTap.

  Updated c.t.Scheme base class to force Field.ALL source declaration to Fields.UNKNOWN, and to force Fields.UNKNOWN
  sink declaration to Fields.ALL.

  Fixed bug where if null was passed to c.s.TextLine sinkCompression, the behavior would be undefined.

  Added back c.t.Tuple#add( Comparable ) to remain backwards compatible with 1.0.

  Fixed bug preventing Fields.ALL selector in c.p.Every when incoming positions are used instead of field names
  and the given aggregator declares field names.

  Fixed bug that prevented the configured codecs from loading for co-group spills.

  Fixed bug where c.s.TextDelimited would fail on delimiters that are also regex special characters.

  Fixed random j.u.ConcurrentModificationException error when running in Hadoop local mode by synchronizing
  the c.f.s.StackElement#closeTraps method.

  Fixed missing property values when stored in a nested j.u.Properties object.

  Fixed NPE when counter group does not exist yet when querying c.s.FlowStats#getCounterValue.

1.1.1

  Fixed bug where some unsafe operations followed by named c.p.Pipe instances were not considered during planning.

  Removed imports for SLF4J and replaced with Apache LOG4j in c.s.TextDelimited.

  Fixed bug where c.t.Fields.SWAP did not properly resolve when following a c.p.Every pipe.

1.1.0

  Fixed bug where a c.t.Fields instance can be marked as ordered when modified via #set call.

  Changed c.p.CoGroup to detect self-joins and optimize for them.

  Changed trap handling to include failures from source and sink c.t.Tap instances. The source Tap will inherit
  the assembly head trap and the sink will inherit the assembly tail trap.

  Deprecated c.t.Tuple#parse(). It does not properly handle null values or types other than primitives.

  Changed c.f.s.StackElement to log a warning for each trap captured. This includes a truncated print of the offending
  c.t.TupleEntry and the thrown exception and stack trace. Traps being for exceptional cases, logging exceptions is a
  reasonable response.

  Changed map and reduce operation stack so that collected c.t.Tuple instances do not remain 'unmodifiable' after
  being collected via the c.t.TupleEntryCollector.

  Add #getArgumentFields() to c.o.OperationCall for all operations.

  Added support for custom EMR properties used for managing task attempt temporary path management for some filesystems.

  Changed c.t.TemplateTap to support an openTapsThreshold value. The default open taps is 300. After the capacity
  is met, 10% of the least recently used open taps will be closed.

  Changed c.t.Fields #setComparator fieldName argument to accept Fields instances as the fieldName argument.
  Only the first field name or position is considered.

  Changed c.t.TupleEntry 'get as type' accessors to now also accept c.t.Fields instances as the fieldName argument. Only
  the first field name or position is considered.

  Updated janino to 2.5.16.

  Updated jgrapht to 0.8.1.

  Changed c.f.s.FlowMapperStack to source key/value pairs once, instead of per branch.

  Changed c.f.FlowPlanner to fail if not all sources or sinks are bound to heads or tails, respectively.

  Changed c.t.TupleOutputStream to lookup tuple element writers by Class identity.

  Added j.b.ConstructorProperties annotation to relevant class constructors.

  Added new convenience method c.p.Pipe#names to return an array of all the pipe names in an assembly. This supports
  the dynamic creation of traps from opaque assemblies.

  Added new c.s.Scheme type c.s.TextDelimited to allow native support for delimited text files.

  Added optimization during CoGrouping where the most LHS pipe will not ever be accumulated, instead the values iterator
  will be used directly. This allows for the most dense values to be on the LHS, and the most sparse to be on the
  RHS of the join.

  Added new counters for tuple spills and reads. Also logs grouping after first spill.

  Added compression of object serialization and deserialization, on by default. This improves reliability
  of very large jobs with very large numbers of input files.

  Fixed bad cast of j.l.Error when caught in map/reduce pipeline stack.

  Added c.t.Fields#rename to simplify Fields instance manipulations.

  Added support for resultGroupFields in c.p.CoGroup. This allows the outgoing grouping fields to be set.

  Added c.t.h.BytesSerialization and c.t.h.BytesComparator to allow for c.t.Tuple instances
  to hold raw byte arrays (byte[]), and allow joining, grouping, and secondary sorting.

  Changed c.t.Tuple and underlying framework to support j.l.Object instead of j.l.Comparable. Note that
  Tuple#get() returns Comparable to maintain backwards compatibility.

  Added support for custom j.u.Comparator instances to control the grouping and sort orders in c.p.CoGroup and
  c.p.GroupBy via the c.t.Fields class.

  Added support for planner managed debugging levels via the c.o.DebugLevel enum. Now c.o.Debug operations
  can be planned out at runtime in the same manner as c.o.Assertion operations.

  Refactored xpath operations to re-use j.x.p.DocumentBuilder instances.

  Refactored fields resolver framework to emit consistent error messages across all field resolution types.

  Fixed bug where c.t.Tuples would fail when coercing non-standard java types or primitives.

  Fixed bug where c.t.Tap instances that returned true for #isWriteDirect() were not properly being initialized
  when used as a sink.

  Added guid like ID values to c.f.Flow and c.c.Cascade instances.

  Refactored reduce side grouping and co-grouping operations to remove redundant code calls.

  Added ability to capture Hadoop specific job details like task start and stop times, and all available counter values.

  Added accessor for increment counters on c.s.CascadingStats. This allows applications to pull aggregate counter
  values from c.c.Cascade, c.f.Flow, or c.f.FlowSteps.

  Added c.t.GlobHfs c.t.Tap type that accepts Hadoop style globbing syntax. This allows multiple files that match
  a given pattern to be used as the sources to a Flow.

  Added c.o.s.State and c.o.s.Counter helper operations that respectively set 'state' and increment counters.

  Added c.f.FlowProcess#setStatus method to allow for text status messages to be posted.

  Added c.o.a.AssertNotEquals assertion type.

  Removed planner restriction that traps must not cross map/reduce boundaries. This allows for a single c.t.Tap
  trap to be used across a whole branch, regardless of underlying topology.

  Added new c.t.Field field set type named Fields.SWAP. Can only be used as a result selector. Specifies operation
  results will replace the argument fields. The remaining input fields will remain intact.

  Deprecated c.t.SinkMode#APPEND and replaced with c.t.SinkMode#UPDATE.

  Added c.t.MultiSinkTap to allow for simultaneous writes to multiple unique locations.

  Added support for compression of c.t.SpillableTupleList by default in order to speed up c.p.CoGrouping operations
  where there are very large numbers of values per grouping key.

  Added c.o.f.SetValue function for setting values based on the result of a c.o.Filter instance.

  Added support for configuring polling interval of job status via c.f.h.MultiMapReducePlanner.

  Added c.f.h.MultiMapReducePlanner optimization to detect 'equivalent' adjacent c.t.Tap instances in a c.f.Flow.
  This can drastically reduce the number of jobs when there are intermediate sinks between pipe assemblies.
  If the taps are not compatible, a job will be inserted to convert the temp tap data to the sink format.

  Added support for 'safe' c.o.Operations. By default Operations are safe, that is, they have no side-effects, or
  if they do, they are idempotent. Non-safe operations are treated differently by the c.f.h.MultiMapReducePlanner.

  Added new c.t.Field field set type named Fields.REPLACE. Can only be used as a result selector. Specifies the
  operation results will replace values in fields with the same names. That is, inline values can be replaced in a
  single c.p.Each or c.p.Every. It is especially useful when used with Fields.ARGS as the operation field declaration.

  Fix for case where one side of a branch multiplexed in a mapper could step on c.t.Tuple values before being
  handed to the next branch. Previous fix was only for CoGroup, this support GroupBy merges.

1.0.18

  Changed c.t.Tuple#print to not quote null elements to distinguish between 'null' Strings and null values.

  Changed planner exception messages to quote head and tail names.

  Changed log messages to info when hdfs client finalizer hook cannot be found.

  Fix for NPE in c.t.h.MultiInputFormat during certain testing scenarios. Also changed proportioning to honor
  suggested numSplits value.

  Fix for temp files starting with underscores (_) causing them to be ignored.

  Fix for mixed types in properties object causing ClassCastExceptions.

  Fix for case where one side of a branch multiplexed in a mapper could step on c.t.Tuple values before being
  handed to the next branch.

  Fix for edge case where Cascading jars are stored in Hadoop classpath and deserialization of c.f.Flow fails.

  Fix for bad cast of j.l.Error when caught in map/reduce pipeline stack.

  Fix for bug when selecting positional Fields from positional Fields.

  Fix for case when an c.o.Aggregator#start is called when there are no values to iterate across in current grouping.

1.0.17

  Changed behavior when cleaning temp files that allows shutdown to continue even if an exception is thrown
  during temp file delete.

  Fix bug where c.f.FlowProcess#openTapForRead() included current input file values in iterator.

  Fix for intermediate temp files not being cleaned up on c.f.Flow#stop().

  Fixed bug where NPE is thrown if all hadoop default properties are not available.

1.0.16

  Fixed bug where in some instances o.a.h.m.JobConf hangs when instantiated during co-grouping.

  Fixed bug in c.CascadingTestCase#invokeBuffer where the output collector was not properly being set. Added
  new methods on #invokeBuffer and #invokeAggregator to take a groping c.t.TupleEntry.

1.0.15

  Fixed bug where c.t.Fields did not check for a null field name or position on the ctor.

  Fixed bug in c.u.Util#join() methods where if the first value was empty, the delimiter was not properly applied.

  Fixed issue in c.t.h.FSDigestOutputStream where seek() now must be implemented with modern versions of Hadoop.

1.0.14

  Fixed bug in planner where JGraphT sometimes returns null instead of an empty List.

  Fixed bug in c.o.x.XPathParser that prevented use of multiple xpath expressions.

  Added configuration propety allowing job polling interval to be configured per c.f.Flow via
  Flow#setJobPollingInterval().

  Updated ant build to not hard-code hadoop/lib sub-dir names.

1.0.13

  Fixed bug where non-String j.u.Property values where not being copied to the internal o.a.h.m.JobConf instance.

  Fixed bug where custom serializations where not recognized during co-grouping spills inside c.t.SpillableTupleList.

1.0.12

  Fixed bug where the c.f.FlowPlanner did not detect that tails were not bound to sinks, or that some tail references
  were missing.

  Fixed j.u.ConcurrentModificationException when using a c.c.CascadeConnector on c.f.Flows using a c.t.MultiSink
  c.t.Tap.

  Fixed bug where c.f.s.StackException was being wrapped preventing failures within sink c.t.Tap instances from
  causing the c.f.Flow to fail. This mainly affected Flows using traps.

1.0.11

  Added clearer error message when c.t.Tap is used as both source and sink in a given Flow.

  Demoted all DEBUG related c.t.Tuple#print() calls to TRACE.

  Fixed NPE when planner finds inconsistencies with c.t.Tap and c.p.Pipe names.

1.0.10

  Updated planner error messages when field name collisions detected.

  Fixed issue where temporary paths were not getting deleted consistently.

1.0.9

  Fixed issue where reverse ordering a c.p.GroupBy was not possible when sortFields were not given.

  Changed c.f.s.StackElement#close() behavior to close elements from the top of the stack.

1.0.8

  Fixed bug where Hadoop FS shutdown hooks prevented cleanup of c.f.Flow intermediate files.

  Fixed bug where c.t.MultiTap was not accounted for when planning a c.c.Cascade.

  Fixed bug where operations in the default package caused NPE when calculating the stacktrace.

  Added c.f.StepCounters enum and now increment the counters Tuples_Read, Tuples_Written, Tuples_Trapped.

  Fixes for instabilities when using traps in some instances.

  Workaround for bug in o.a.h.f.s.NativeS3FileSystem where a null is returned when getting a FileStatus array
  in some cases.

1.0.7

  Fixed bug where c.o.r.RegexSplitter did not consistently split incoming values if the value had blank
  fields between the split delimiter. This only occurs if the incoming tuple is declared Fields.UNKNOWN
  and won't affect any tuple with declared field names. Though this is an incompatible change, the bug
  breaks the contract of the splitter.

  Deprecated all S3 supporting classes, including c.t.S3fs. The s3n:// protocol is the preferred S3 interface.

  Fixed bug where c.t.Hfs caused a NPE from the NativeS3FileSytem when attempting to delete the root directory.
  Hfs now detects a delete is attempted on the root dir, and returns immediately.

1.0.6

  Fixed bug where a uri path to a s3n://bucket/ could cause an NPE when determining mod time on the path.

  Fixed bug where sink c.s.Scheme sink fields were not being consulted during planning. This fix may
  cause planner errors in existing applications where the sink fields are not actually available in the incoming
  tuple stream.

  Updated application jar discovery to provide more sane defaults supporting simple cases.

  Fixed bug where default properties in nested j.u.Properties object were not being copied.

1.0.5

  Added check if num reducers is zero, if so, assume #reduce() has no intention of being called and return silently.

1.0.4

  Updated split optimizer to perform a multipass optimization.

  Fixed bug where c.f.MultiMapReducePlanner was not properly handling splits on named Pipe instances.

  Added c.t.TemplateTap constructor arg that allows for independent tuple selection for use by template path.

  Fixed bug where unsafe filename characters were leaking into temporary filenames, didn't take the first time.

1.0.3

  Fixed bug in c.f.MultiMapReducePlanner where split and joins with the same source were not handled properly.

  Fixed bug in c.f.Flow#writeDOT caused by changes in 1.0.2.

  Fixed bug in c.o.t.DateFormatter and c.o.t.DateParser where the TimeZone value was not being properly set. This
  fix could affect existing applications.

1.0.2

  Added rules to verify no duplicate head or tail names exist in an assembly when calling c.f.FlowConnector#connect().
  Currently a WARNING will be issued via the logger, next major release this will be an exception. This is a change
  that was supported in prior releases, but turns out to allow error prone code. Two workarounds are availabe: bind
  the same tap to both names in the tap map, or split from a single named c.p.Pipe instance.

  Added support for c.o.e.ExpressionFunction to evaluate expressions with no input parameters.

  Reverted MR job naming to include sink c.t.Tap name. More verbose, but easier for degugging.

  Update c.c.Cascade to not delete c.f.Flow sinks if they are appendable before the Flow is executed.

  Updated error messages to warn when internal element graphs remove all place holders resulting in an empty graph
  usually due to missing linkages between pipe assemblies.

  Allowing Fields.UNKNOWN to propagate through pipes that do not declare argument selectors. This is a relaxation
  of the strict planning and seems very natural when assembling pipes to process unknown field sets. Reserving
  the right to revert this feature if it causes unforseen issues.

  Fixed bug in c.o.f.UnGroup where the num arg value was improperly calculated.

  Allow for white space in the serializations token property so it can be set in a config file simply.

  Added new log message if no serialization token is found for a class being serialized out.

  Fixed bug that allowed c.t.Field instances to be nested in new Fields instances.

  Updated many error messages to print the number of fields along with a list of the field names.

  Fixed bug preventing custom c.s.Scheme types from using a different key/value classes in some situations.

  Fixed bug preventing c.t.TemplateTap from being written to in Reducer.

1.0.1

  Improved error message for the case a Hadoop serializer/deserializer cannot be found.

  Changed c.s.Scheme sourceFields default to Fields.UKNOWN. sinkFields default remains Fields.ALL.

  Fixed bug where unsafe filename characters were leaking into temporary filenames.

  Changed SinkMode.APPEND support checks to be done in c.t.Hfs, instead of c.t.Tap.

1.0.0

  Updated copyright messages.

  Fixed bug where c.t.TuplePair threw a NPE during dubugging.

  Fixed bug where positional selectors failed against Fields.UNKNOWN.

  Changed all constructors on c.p.Group to be protected. Must now use subclasses to construct.

  Renamed c.t.Fields#minus to subtract.

0.10.0

  Changed c.p.CoGroup "repeat" parameter to numSelfJoins to respresent the actual number of self joins to be performed.
  Thus a value of 1, will cause a single self join of a pipe. Users will need to decrement the current value by 1.

  Changed c.p.CoGroup "repeat" parameter to numSelfJoins to respresent the actual number of self joins to be performed.
  Thus a value of 1, will cause a single self join of a pipe. Users will need to decrement the current value by 1.

  Fixed bug with temporary filename generation where path created was too long.

  Fixed Janino c.o.expression operations to require parameter names and types. Janino
  was returning guessed parameter names in an undeterministic order.

  Fixed boolean type c.t.Tuple serialization.

  Fixed c.p.GroupBy merging case where grouping field names were not properly resolved.

  Changed c.o.r.RegexParser to emit variable sized Tuples if a fieldDeclaration is not given. Also will emit group
  matches if they are any, otherwise the match is emitted.

  Removed deprecated classes; c.o.t.Texts, c.o.r.Regexes, c.p.EndPipe.

  Removed experimental c.p.EndPipe class.

  Changed c.t.Tap#isUseTapCollector to Tap#isWriteDirect.

  Changed c.t.Tap and c.f.Flow to return c.t.TupleEntryIterator instead of c.t.TupleIterator. This is more consistent
  and more useful.

  Added c.t.TemplateTap to support dynamically writing out c.t.Tuple values to unique directories.

  Changed Cascading to support null values returned from c.t.Tap#source() and subsequently c.t.Scheme#source().
  This allows for Schemes to skip records returned by an internal Hadoop InputFormat without having to implement
  a custom Hadoop InputFormat or instrument a pipe assembly with a c.o.Filter.

0.9.0

  Updated c.o.Debug to allow for printing field names and tuple values in intervals.

  Changed planner to fail if traps are not contained within single Map or Reduce tasks. This prevents the chance of
  multiple tasks writing to the same output location. Hadoop only partially supports appends, so it is not currently
  possible to append subsequent jobs to existing trap files. Naming sections of a pipe assembly allows traps to be
  bound to smaller sections of assemblies.

  c.o.f.Sample and c.o.f.Limit Filters. Sample allows a given percentage of Tuples to pass. Limit only allows the
  specified number of Tuples to pass.

  c.p.Pipe instances now capture line numbers and classnames where they are instantiated so this information
  can be printed out during planner failures.

  Added c.f.FlowSkipStrategy interface to allow for pluggable rules for when to skip executing a c.f.Flow participating
  in a c.c.Cascade. The default implementation is c.f.FlowSkipIfSinkStale, with an optional c.f.FlowSkipIfSinkExists.
  Setting a skip strategy on a Cascade overrides all Flow instance strategies.

  Fixed bug with c.t.Tuple#remove() method not correctly removing values from Tuple.

  Updated c.t.Tap api to support c.t.SinkMode enums. This opens up ability to support appends in the near future.

  Added support for Hadoop 0.19.x. This release skips Hadoop 0.18.x.

  Changed project structure so that XML functions live in their own sub-project. This includes renaming the base
  Cascading tree and jars to 'core'.

  Fixed bug that prevented Fields.UNKNOWN input sources from begin fed into a c.p.CoGroup for joining.

  Changed all operations so that incoming c.t.Tuple and c.t.TupleEntry instances are unmodifiable. An
  UnsupportedOperationException will be thrown on any attempt to modify argument tuples within an operation.
  This enforces the rule argument tuples should not be modified to protect against concurrent modification in
  parallel threads.

  Updated c.o.r.RegexMatcher base class to use j.u.r.Matcher#find() instead of #matches(). This is more consistent
  with default behaviors of popular languages. Matcher is now also initialized in prepare() and reset() in
  the operation to reduce overhead.

  Added new lifecycle methods to c.o.Operation, prepare and cleanup. These methods are called so that an Operation
  instance can initialize and destroy any resources. They may be called more than once before the instance is
  garbage collected.

  Added a new operation called c.o.Buffer. Buffers are similiar to Reduce in MapReduce. They are given an Iterator
  of input arguments and can emit any number of result c.t.Tuple instances. For many problems, this is more
  efficient than using an c.o.Aggregator operation. Only one c.p.Every pipe with a Buffer operation may
  follow a GroupBy or CoGroup.

  Fixed dot file writing so GraphViz can properly load.

  Upgraded jgrapht library, requires JDK 1.6.

  Fixed bug where selecting postions from a c.t.Fields.UNKNOWN declaration would return the first position, not
  the specified position.

  Renamed c.t.Fields.KEYS to c.t.Fields.GROUP to be consistent with the Cascading model.

  Fixed bug where c.t.Tap may inappropriately delete a sink from a task.

  Changed c.o.Aggregator to no longer use a Map for the context. Users can now specify custom types by returning
  either a new instance from start() or recycling an instance passed into start(). This change will break all existing
  implementations of Aggregator. Note, simply setting a new Map<Object,Object> on the call instance in start()
  should be sufficient.

  Changed all c.o.Function, c.o.Filter, c.o.Aggregator, c.o.ValueAssertion, and c.o.GroupAssertions to accept
  a c.f.FlowProcess object on all relevant methods. FlowProcess provides call-backs into the underyling system
  to get configuration properties, fire a "keep alive" ping, or increment a custom counter. This change will
  break all existing implemenations of the above interfaces.

  Added ability to set serialization tokens via the cascading.serialization.tokens property. This compliments the
  c.t.h.SerializationToken annotation.

  Optimized co-grouping operation by using c.t.IndexTuple instead of a nested c.t.Tuple.

  Changed c.t.Tap and c.s.Scheme sink methods to take a c.t.TupleEntry, instead of c.t.Fields and c.t.Tuple
  individually.

  Added the c.t.h.SerializationToken Java Annotation. This allows for an int value to be written during serialization
  instead of a Class name for custom objects nested in c.t.Tuple instances. This feature should dramatically reduce
  the size of Tuples saved in SequenceFiles, and improve the general performance during 'shuffling' between Map and
  Reduce stages.

  Added c.t.h.TupleSerialization, a Hadoop Serialization implementation. Tuple is no longer Hadoop Writable
  and now relies on TupleSerialization for serialization support. Subequently nested objects in c.t.Tuple
  only need to be c.l.Comparable. So they can be serialized properly, a Serialization implementation must be
  registered with Hadoop. Note all primitive types are handled directly by Tuple, but custom types must
  have a Serialization implementation, or must be Hadoop WritableComparable so that the default WritableSerialization
  implementation will write them out.

0.8.3

  Fix for c.p.CoGroup declared fields being generated out of order.

0.8.2

  Added new properties via c.f.FlowConnector.setJarClass and c.f.FlowConnector.setJarPath for
  setting the application jar file.

  Fixed bug where job jar was not being inherited by subsequent MapReduce jobs when the first job was executed
  in local mode.

  Fixed bug where unserializable Operations were being squashed internally. c.f.Flow instances will now
  fail immediately and be marked as 'failed'.

0.8.1

  Fixed bug where c.t.Lfs did not force local mode for current MapReduce step.

  Fixed bug where writing to a c.t.TupleCollector would fail if using a c.s.SequenceFile in some cases.

  Added a few minor improvements to reduce stray object creations, and speedup c.t.Tuple serialization.

0.8.0

  Updated c.o.x.TagSoupParser to accept 'features', use these features to recover past behaviors.

  Updated janino and tagsoup libraries to 2.5.15 and 1.2, respectively. Note that tagsoup, in theory, is not
  backwards compatible by default. See their release notes: http://home.ccil.org/~cowan/XML/tagsoup/#1.2

  Added some forward compatible changes for supporting Hadoop 0.18 at the API level. Currently there are other
  issues preventing some tests from passing on Hadoop 0.18.

  Changed c.f.FlowException to return the parent c.f.Flow name.

  Changed behavior of c.f.MultiMapReducePlanner to use c.t.h.MultiInputFormat to allow single Mappers
  to support many different Hadoop InputFormat types simultaneously. This deprecates the need to normalize
  sources to a map and reduces the number of jobs in a c.f.Flow in some cases.

  Changed behavior of Cascading to allow for multiple paths from the same c.t.Tap source to be co-grouped on
  via c.p.CoGroup. This allows for a kind of self-join where each stream is processed by a different operation
  path within the Mapper.

  Added c.o.f.And, c.o.f.Or, c.o.f.Xor, and c.o.f.Not logic operator c.o.Filter implementations. They should be used
  to compose more complex filters from existing implementations.

  Changed the behavior of c.o.BaseOperation to properly initialize itself if it is a c.o.Filter instance. This
  removes the requirement that Filter implementations must set declaredFields to Fields.ALL, as it makes no
  sense for a Filter to declare fields.

  Added c.f.PlannerException, a subclass of c.f.FlowException, and updated c.f.MultiMapReducePlanner to throw
  it on failures. Functionality of writing DOT files has been moved from FlowException to PlannerException.

  Added c.o.f.FilterNotNull and c.o.f.FilterNull filter classes.

  Changed c.f.MultiMapReducePlanner to fail if it encounters an c.p.Each to c.p.Every chain. In these cases, a
  c.p.Group type must be between them.

  Deleted c.o.Cut class as it was effectively a duplicate of c.o.Identity.

  Changed c.f.MultiMapReducePlanner to fail if a c.p.GroupAssertion is not accompanied by another c.o.Aggregator
  operation. This is required so that the GroupAssertion does not change the passing tuple stream if it is planned out.

  Changed c.f.MultiMapReducePlanner to no longer insert new c.p.Each( ..., new Identity(), ... ) as a place holder.

  Renamed c.p.PipeAssembly to c.p.SubAssembly to better reflect its purpose, which is to encapuslate reusable
  pipe assemblies in the same manner as a sub-process or sub-routine. A temporary c.p.PipeAssembly class has been
  provided for backwards compatibility.

  Fixed bug where c.t.TapCollector would throw an NPE if a custom Tap was not using paths.

  Changed behavior of c.f.Flow where if a c.f.FlowListener throws an exception, the Flow instance receiving the
  exception will stop (by calling Flow.stop()). Listeners will continue to fire as expected and Flow.complete()
  will re-throw the thrown exception (as was the original behavior).

  Added ability to set a Cascading specific temporary directory path for use by intermediate taps created
  within c.f.Flow instances. Use c.t.Hfs.setTemporaryDirectory() to configure.

  Fixed bug where the 'mapred.jar' property was begin stepped on if previously set by the calling application.

  Changed c.t.Tap and c.f.Flow to return c.t.TupleIterator and c.t.TupleCollector instead of c.t.TapIterator and
  c.t.TapCollector, respectively.

  Added c.t.Tap.flowInit( c.f.Flow flow ) to allow a given tap to know what flows it is participating in. It is called
  immediately after the Flow instance is initailized.

  Fixed bug with nested c.p.PipeAssembly instances where some nested assemblies threw an internal error from
  the planner.

  Changed c.o.Debug to accept a prefix text string that will be prefixed to every message.

  Fixed bug where c.f.MultiMapReducePlanner would fail when normalizing inputs to a group where the inputs
  passed through one or more splits.

  Fixed bug where c.g.CoGroup silently stepped on input pipes with the same input name.

0.7.1

  Fixed bug in c.f.MultiMapReducePlanner where a source used on more than one c.p.Group would cause an internal
  error during planning.

  Changed c.f.MultiMapReducePlanner to normalize heterogeneous sinks.

  Changed c.f.MultiMapReducePlanner to keep a splitting c.p.Each on the previous step, instead of being duplicated
  on each branch. If the Each is preceeded by a source c.t.Tap, it will be duplicated across branches to reduce
  the number of step in the Flow.

  Fixed bug in c.f.MultiMapReducePlanner where too many temp tap instances were being inserted while normalizing
  the flow sources.

  Changed c.t.Fields to fail if given duplicate field names.

  Changed behavior if Hadoop FileInputSplit is not used and property "map.input.file" is not set. If there is one
  source, it will returned as the source for the mapper stack, otherwise an exception is thrown. Subsequently joins
  and merges of non-file sources is not supported until a discriminator can be passed to the mapper.

  Fixed bug in c.t.Tuple where NPE was thrown under certain compareTo operations.

  Fixed bug that prevented CoGrouping or Merging on the same source even though it was one or more Groupings away.

0.7.0

  Changes project structure, removed 'examples' sub-project.

  Updated to support Hadoop 0.17.x. This version is not API compatible with any Hadoop version less than 0.17.0.

  Added ability to stop all c.f.Flows executing within a c.c.Cascade instance via the stop() method.

  Changed c.f.FlowConnector to only take a Map of properties. These properties are passed downstream to various
  subsystems. This removes the Hadoop JobConf constructor, but it still can be passed as a property value. Also
  properties will be pushed into a defaul JobConf, bypassing any direct JobConf coupling in applications.

  Changed c.f.Flow to automatically register a shutdown hook killing remote jobs on vm exit.

  Changed c.f.Flow.stop() to immediately stop all running jobs.

  Changed c.o.Operation to an interface and introduced c.o.BaseOperation. This makes creating custom Operation types
  more flexible and intuitive. c.o.Filter, c.o.Function, c.o.Aggregator, and c.o.Assertion now extend c.o.Operation.

  Added c.p.c.OuterJoin, c.p.c.MixedJoin, c.p.c.LeftJoin, and c.p.c.RightJoin c.p.c.CoGrouper classes. They
  compliment the default c.p.c.InnerJoin CoGrouper class.

  Added support for passing an intermediateSchemeClass to the underlying planner to be used as the default c.s.Scheme
  for intermediate c.t.Tap instances internal to a given c.f.Flow.

  Fixed bug where c.p.Group is immediately followed by another c.p.Group (or their sub-classes) and fields could not
  be resolved between them.

  Added support for c.t.Tap instances implementing c.f.FlowListener. If implemented, they will automatically be
  added to the Flow event listeners collection and will receive Flow events.

  Fixed case where multiple source c.t.Tap instances return true for the containsFile method. Now verifies only one
  Tap contains the file, and fails otherwise.

  Changed c.s.TextLine to not set numSinkParts to 1 by default. Now uses the natural number of parts.

  Changed MapReduce planner to force an intermediate file between branches with Hadoop incompatible source Taps
  on joins/merges. If the taps are compatible (have same Scheme), all branches will be processed in same Mapper
  before the c.p.Group.

  Added merge capabilities in c.p.GroupBy. This allows multiple input branches to be grouped as if a single stream.

  Fixed bug in c.t.TapCollector where writing to a Sequence file threw a NPE.

  Added c.f.MapReduceFlow to support custom MapReduce jobs, allowing them to participate in a Cascade job.

0.6.1

  Changed thrown c.f.FlowException instances to include cause message.

  Fixed bug where empty sink or source map was not detected.

0.6.0

  Changed default argument selector for c.p.Every to be Fields.ALL, to be consistent with the default value of c.p.Each.

  Added support for assembly traps. If an exception is thrown from inside an c.o.Operation, the offending Tuple
  can be saved to a file for later processing, allowing the job to complete.

  Added support for stream assertions. STRICT and VALID assertions can be built into a pipe assembly, and optionally
  planned out during runtime. Assertions will throw exceptions if they fail.

  Changed c.o.a.First, Last, Min, and Max to optionally ignore specified values. Useful if you do not wish
  for a 'default' value to be considered first, or last in a set.

  Changed c.o.a.Sum to take a Class for coercion of the result value.

  Changes c.o.Max and Min to use infinity as initial values so zero is bigger than a really small number
  for Max, and zero is smaller than a really big number for Min.

  Changed order of JobConf initialization. c.f.FlowStep now is added to the JobConf last in order to catch
  all lazily configured values.

  Changed compile to include debug info by default.

  Fixed bug in c.t.MultiTap where super scheme was not returned if available.

0.5.0

  Added skipIfSinkExists property to c.f.Flow. Set to true if the c.c.Cascade should skip the Flow instance even
  if the sink is stale and not set to be deleted on initialization.

  Fixed bug in c.t.h.HttpFileSystem that URL escaped the ? prefixing the query string.

  Fixed bug where a join with duplicate taps was not recognized during job planning. Now an appropriate error
  message is displayed, instead of jobs completing with only one instance of the resource stream.

  Fixed c.t.h.HttpFileSystem to remember authority information in the url and prefix it when missing.

  Changed c.s.TextLine to accept either on or two source fields. If one, only the 'line' value
  is sourced from the value, discarding the 'offset' value.

  Added c.o.r.RegexSplitGenerator to support splitting single tuple values into multiple tuples based on a regex
  delimiter. Includes new tests.

  Added c.s.CascadeStats and c.s.FlowStats to provide access to current state and statistics of particular
  Cascade, Flow, or the child Flows of a Cascade.

  Added ability to sort grouping values with sort argument on c.p.GroupBy. Sorts can be reversed.

  Added c.o.e.ExpressionFilter, the c.o.Filter analog to c.o.e.ExpressionFunction.

0.4.1

  Fixed path normalization regex in c.u.Util where it munged any path starting with file:///.

0.4.0

  Changed c.p.GroupBy default grouping fields to c.t.Fields.ALL from Fields.FIRST. This change provides a simple
  way to sort a tuple stream based on the order of the tuple fields.

  Changed c.f.FlowConnector to create c.f.Flow instances that will bypass the reducer if no c.p.Group is participating
  in the assembly. Previoiusly Group instances were inserted if missing. This allows a chain of c.p.Every instances
  to be used to process/filter a tuple stream without the invoking the reducer needlessly (if a sort isn't required).
  This change also supports bypassing the default Hadoop OutputCollector in the mapper via the sink c.t.Tap instance.

  Changed c.f.FlowStep behavior to run in 'local' mode if either the sink or source tap is a c.t.Lfs instance. This
  allows for c.f.Flow instances to run mixed if configured to execute on a particular cluster by default. This behavior
  supports complex import/export processes against the HDFS or other supported remote filesystem.

  Changed behavior of c.t.Dfs to force use of HDFS. Previously Dfs would default to the local FileSystem
  if the job was run in 'local'mode. Now a Dfs instance will cause failures if it cannot connect to a HDFS cluster.
  Using c.t.Hfs will provide previous Dfs behavior. Hfs will use the 'default' filesystem if a scheme is not present
  in the 'stringPath' (i.e. hdfs://host:port/some/path).

  Added c.stats package to allow for collecting statics of Cascades, Flows, and FlowSteps.

  Updated c.f.Flow and c.c.Cascade log messages to be easier to follow when executing many flow instances
  simultaneously.

  Added compression flag to c.s.TextLine. Can now toggle compression (Hadoop style compression) per Tap instance.
  This prevents clusters with compression enabled by default to export text files with a .deflate extension.

  Added support for bypassing Hadoop OutputCollector via Tap.setUseTapCollector() method. Setting to true will force
  Cascading to use the c.t.TapCollector instead. This bypasses bugs in Hadoop with custom FileSystem types. This will
  always be true for http(s) and s3tp filesystems when using a c.t.Hfs Tap type (atleast until HADOOP-3021 is resolved).

  Added c.t.TupleCollector, complementing c.t.TupleIterator, for directly writing Tuple instances out via a c.t.Tap
  instance.

  Added c.f.FlowListener so that c.f.Flow instances can fire events on starting, completed, and throwable.

  Changed c.t.h.S3HttpFileSystem so it can now create files remotely.

  Renamed cascading.spill.capacity to cascading.cogroup.spill.capacity, so there is less a chance of collision.

  Made numerous optimizations to improve overall performance. Namely split and merge of key/value tuples to remove
  redundancy in the stream between the mapper and reducer.

  Changed c.p.Operators to push c.o.Operation results directly through to next operation without intermediate
  collection. This should improve pipelining of large result streams and lower runtime memory footprint.

  Changed c.c.Cascade so it now runs Flows in parallel if Hadoop is clustered, and there are no dependencies between the
  Flows.

  Moved c.Cascade and related classed to c.cascade package. Wanted to preempt any future ugliness.

  Added support in c.t.h.S3HttpFileSystem for these properties: fs.s3tp.awsAccessKeyId and fs.s3tp.awsSecretAccessKey

0.3.0

  Added ability to push Log4j logger properties to mapper/reducer via JobConf.
  Use jobConf.set("log4j.logger","logger1=LEVEL,logger2=LEVEL")

  Added missing equals() and hashCode() in c.t.MultiTap.

  Added c.t.h.ZipInputFormat (and ZipSplit) to support zip files. c.s.TextLine supports transparent
  reading of zip files if the filename ends with .zip, but cannot write to them. This code is
  loosely based on HADOOP-1824. If the underlying filesystem is hdfs or file, splits will be created
  for each ZipEntry. Otherwise ZipEntries are iterated over to be more stream friendly. Progress status is
  supported.

  Added http, https, and s3tp read-only file systems to Hadoop. Use these URLs, respectively:
  http://, https://, and s3tp://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@bucket-name/key

  Added c.o.t.DateFormatter supporting text formatting of time stamps created by c.o.t.DateParser.

  Fixed bug where in complex assemblies, some Scopes were not resolved.

  Fixed bug where tap instances were not being inserted before some CoGroup joins if there was a previous Group in the
  assembly.

  Upgraded JGraphT to 0.7.3

  Changed c.t.SpillableTupleList allows for iteration across entries.

  Changed c.f.FlowException to optionally allow for printing of underlying pipe graph for debugging.

  Added c.o.t.FieldFormatter function to format Tuples into complex strings using j.u.Formatter formatting.

  Added c.o.a.Last aggregator to find the last value encountered in a group.

  Changed c.o.a.Max and c.o.a.Min to maintain original value type. Will return null if no values are encountered.

  Changed c.o.a.First to use Fields.ARG by default. Removed Fields constructor.

  Added c.t.Fields.join(Fields...) method to allow for joining multiple Fields instances into a new instance.

  Can retrieve Tuple values by field name through the TupleEntry class via the get(String) method.

  Added c.t.TupleCollector interface to simplify the operation interfaces.

  Added a Debug filter that will print to either stderr or stdout. Useful for debugging stream transformations.

  Added CascadingTestCase base test class

  Added Insert Function that allows for literal values to be inserted into the Tuple stream.

0.2.0

  CoGroup will now spill to disk on extremely large co-groupings. Configurable via "cascading.spill.capacity".
  Defaults to 10k elements.

  java.util.Properties instances can be used to set defauls for FlowConnectors.

  Fix for InnerJoin, the default join for CoGroup.

  Introduced MultiTap to support concatenation of files into a pipe assembly.

  RegexParser now fails on a failed match. Prevents it being used or behaving as a filter.

  Fixed bug with PipeAssembly instances not properly being assimiliated into the pipeGraph.

  Fixed assertion error thrown by JGraphT.

  Renamed Tap method deleteOnInit to deleteOnSinkInit.


0.1.0

  First release.