This dataset is part of the larger legalcomplex
dataset, which provides detailed profiles of companies operating in and around the legal process industry. The data was originally posted on Legalpioneer.org and has been made available under the Open Database License (ODbl).
The dataset contains a variety of information about 13,000+ companies, including but not limited to:
- Company name
- Industry sector
- Scope of operations
- Size in terms of personnel
- Financial data where available
This dataset is intended for students, researchers, entrepreneurs, legal professionals, and industry analysts who are interested in the legal process industry's landscape.
For a understanding of each field within the CSV file, please refer to the data_dictionary.md
file included in this repository. The data dictionary outlines every column in the dataset, its data type, and a description of what each entry represents. Note that 'Legal Tech' is in the eye of the beholder. Check the FAQ on Legalpioneer.org for some background.
To use this dataset, you should:
- Ensure that you have the necessary software to open and manipulate CSV files (e.g., Excel, R, Python with pandas).
- Review the data dictionary to understand the dataset's structure.
- Import the dataset into your preferred data analysis tool or AI framework.
- Perform your analysis or extract insights as required.
- Share your findings with the world.
- Market Size Calculation: Allocate a single category to each profile to prevent overlap in data categorization.
- Market Size Definition: Market size should be assessed by the total number of companies and their investment amounts.
- Growth Measurement: Combine employee counts with company age and financial data. For example, a young company with a growing employee count and $0 in funding represents genuine growth.
- Health Indicators: Market size and growth can indicate which market areas are thriving.
- Context is Everything: Compare recent announcements with past metrics to avoid recency bias. Recency bias can create artificial trends in data interpretation.
This dataset is provided under the Open Database License (ODbl). You are free to:
- Copy, modify, and merge the dataset.
- Redistribute the dataset.
- Use the dataset for any purpose and without concern for liability.
However, you must provide attribution as specified by the ODbl license.
The original data was sourced from Legalpioneer.org. The dataset has been curated and made available under the ODbl license by Legalcomplex.
If you wish to contribute to this dataset, please follow these guidelines:
- Fork the repository.
- Make your changes locally.
- Ensure that any new data adheres to the ODbl license.
- Submit a pull request with a clear description of your changes.
If you have any questions about this dataset, its usage, or its licensing, please open an issue in the GitHub repository. Your feedback is valuable and helps improve the dataset.
- How often are updates planned for the dataset? Get in touch with Raymond Blyd over at Legalcomplex for JSON, API or other questions
- What jurisdiction does this dataset primarily cover, and are there plans to expand to other regions? We went to great lengths to cover all geographies. This data set contains over 1000 cities globally.
- What type of companies were excluded? Dead or acquired companies at discovery. SmartTech companies that leverage large language models to solve business and civic problems e.g. OpenAI. FinTech, RiskTech and WealthTech, see spark_max_investors.
We dropped 35,000+ here on our GitHub
The spark_max_investors.csv
file was collected while tracking investments in legal. Every time we stored an investment, we also collected who invested.
The file also has metrics per investor to see if they have or have not invested in legal and...will be interested in investing early 😇.
So you can now search for investors based on the fear of missing out (FOMO) on legal 😈.
If you want to avoid investors in your competitors, let's chat.
Legalpioneers: We Power World Peace