Skip to content

Latest commit

 

History

History
52 lines (41 loc) · 2.02 KB

README.md

File metadata and controls

52 lines (41 loc) · 2.02 KB

regex-repo

A regular expression repository for natural history collection data.

The goal of this project is to create a repository of regular expressions that are focused on parsing strings commonly found in records of natural history collections. There are many regex repositories avaiable on the web but this repository will not only focus on the most relevant expressions, but will also include documentation and tests to verify each regular expression.

This project is in a very early stage and will rely on the contributions of regexperts (hello iDigBio Hackathoners).

Here is a sample of how the contents of the repository might be displayed. This document is generated in the build process (see below).

Requirements for building and testing

Python - http://www.python.org
PyYAML - http://pyyaml.org

RegEx Documents

Each regular expression has a YAML-formatted document in the ./regex directory. Below is a sample of the document contents.

authors:
- author1
- author2
created: date
description: DescriptionHere
keywords:
- key1
- key2
notes: NotesHere
regex: RegExStringHere
sampleMatches:
- sample1
- sample2
sampleNonMatches:
- sample1
- sample2
testResults:
- result1
- result2
- result3
testString: Sample string.
title: TitleHere
updated: date
uuid: uuid
version: VersionHere

Build

The build.py script reads all the regex documents in the ./regex directory and writes a summary of each into ./docs/repo-contents.md. Eventually other documents will be created to list regexes by keyword, etc.

Test

The test.py script reads all the regex documents in the ./regex directory and compares test results with actual results.

Make Template

The make-template.py script will create a YAML-formatted document in the ./regex directory with boilerplate keys and values as well as a datestamp. The script generates a UUID for naming the file.