In this article we will introduce you to an IoT startup that managed to attract various rounds of investments, a large clientele, as well as the successful establishment on the international market in a short time.
We will talk about the company "CRATE Technology" ( crate.io ), more precisely about the product CrateDB, which is actively used in the IoT sphere. About this and much more you can learn from this article, as well as find out how suitable this solution is for your project or company.
CrateDB is a distributed SQL (Structured Query Language) database management system that integrates a fully searchable document-oriented database. It is open source, written in Java, based on a non-shared architecture and designed for high scalability and includes components from Facebook Presto, Apache Lucene, Elasticsearch and Netty.
The CrateDB project was started by Jodok Batlogg, an open source author and creator who contributed to OSIV (Open Source Initiative Vorarlber), and at Lovely Systems in Dornbirn. The software is an open source cluster database used for rapid text search and analytics. The company, now called Crate.io, raised its first round of funding in April 2014, a $4 million round in March 2016 and $2.5 million in January 2017 from Dawn Capital, Draper Esprit, Speedinvest and Sunstone Capital.
In June 2014, Crate.io won a Jury's Choice nomination at the GigaOm Structure Launchpad competition, and in October 2014 they won TechCrunch Disrupt Europe in London.
CrateDB 1.0 was released in December 2016 and reportedly had over a million downloads. CrateDB 2.0 and Enterprise Edition were released in May 2017
The CrateDB language is SQL, but it takes a document-oriented, NoSQL-style approach to the database for documents. The software uses Facebook Presto's SQL parser, proprietary query analysis, and a distributed query engine. Elasticsearch and Lucene are used to define the transport protocol and cluster, and Netty for asynchronous network application environments.
CrateDB offers automatic data replication and self-healing clusters for high availability.
CrateDB includes a built-in administration interface. Its command line interface (Crate Shell - CraSh) allows interactive queries. Its Python client is the most advanced and has SQLAlchemy integration.
In June 2016, Kyle Kingsbury tested the concurrency and consistency of CrateDB 0.54 to identify several fault tolerance issues due to Elasticsearch dependencies. He does not recommend Crate as the primary repository if every record really matters, but keeps the records in a separate database and uses Crate for quick queries. He presented his results again in April 2017 during a keynote presentation at the Scala Days conference.
Protecting IT systems from cybersecurity threats is one of CrateDB's most popular applications. CrateDB is a cybersecurity database that integrates a real-time SQL engine built on NoSQL. This gives the scalability, performance, and analytical flexibility of NoSQL DBMS without sacrificing the ease of use and integration of SQL.
CrateDB allows SQL developers to process logs and network traffic in real-time or at high volumes to support a wide range of cybersecurity use cases. Here are a few of the cybersecurity systems that are powered by CrateDB:
• Skyhigh Networks - Cloud Services Access Security Broker (CASB)
• StackRox - adaptive threat protection for containers
• Kryptos Logic - threat and threat protection
Solutions like these are included in the Cyber Security CrateDB database features:
• Handling multiple data points per second
Elastic scaling allows CrateDB to receive data at high speeds on clusters of low-cost commodity servers
• Real-time query speeds
Columnar indexes, field caches provide in-memory SQL performance in network or log data streams
• Text search, IP fields, AI, time series
Dynamic schemas and optimization to handle a wide range of data and cybersecurity analytics
• Continuous inclusion
Built-in data replication, data distribution and cluster balancing provide nonstop threat detection and protection
• ANSI SQL
Easy to use for any developer and integrates with standard data sources and visualization
CrateDB is a distributed SQL database based on the foundation of NoSQL (for storage, indexing and networking), and it's best if you need it:
• Handling large amounts of data - millions of inserts per second
• Query versatility - real-time, time series, geospatial, text search, AI
• Data versatility - dynamic patterns of structured or unstructured data
• SQL - for ease of use and integration without locking
• Easy scalability - easy to create a database to handle more data or users
Database expansion should be simple, then easy to implement with CrateDB. Automatic data rebalancing and a non-shared architecture make it easy to scale. Just add new machines to create and grow your CrateDB cluster. No need to know how to redistribute data in the cluster, because CrateDB does it for you.
CrateDB's distributed SQL query engine includes columnar field caches and a more advanced query scheduler. This gives CrateDB the unique ability to perform aggregations, JOINs, subsets, and ad hoc queries at memory speed. CrateDB also integrates built-in full-text search functions that allow you to store and query structured or unstructured data together. So you no longer need to use separate SQL and Search databases to manage tabular and non-tabular data.
Even if things go wrong in the data center, CrateDB continues to work. Automatic data replication across your cluster and rolling software updates help avoid hardware failures and scheduled maintenance without interrupting data access. In addition, CrateDB clusters are self-healing, so when nodes are added to the cluster, CrateDB automatically loads them with data.
Analytical data is often loaded in batches, transactional locks, and other overhead. In contrast, CrateDB eliminates overhead locks to provide bulk write performance (e.g., 40,000+ inserts per second per node on commodity hardware). In addition, CrateDB can provide query performance per millisecond, even when records are in action.
CrateDB supports both relational data and nested JSON documents. All nested JSON attributes can be included in any SQL command. CrateDB also provides BLOB storage, so you can store and retrieve BLOB files such as images, videos, or large unstructured files, providing a fully distributed BLOB storage cluster solution.
Time series data is important for identifying trends and anomalies. CrateDB makes time series analysis fast and easy with automatic table partitions, which are like virtual tables that can be queried, moved or deleted. Partitioning the data by time interval ensures very fast time querying.
Location is important for many machine data analyses. For this reason, CrateDB can store and query geographic information using geo_point and geo_shape types. You can control the accuracy and resolution of the geographic index to get faster query results, and you can also perform accurate queries with scalar functions such as intersections, within and across distances.
Unlike many other SQL databases, CrateDB schemas are completely flexible. It's possible to add columns at any time without slowing down performance or downtime. This is great for flexible development and fast deployment.
CrateDB is consistent but offers transactional semantics. CrateDB is consistent at the row level, so each row is either fully written or not. By offering post-write consistency, real-time synchronous access to single records immediately after they are written is allowed. Although CrateDB does not support ACID transactions with rollbacks, etc., it offers optimistic concurrency control by providing internal version control that allows write conflicts to be detected and resolved.
CrateDB can save incremental database snapshots for storage. The snapshots contain the state of the tables in the CrateDB cluster at the time the snapshot was created and can be restored to the cluster at any time.
It is possible to run CrateDB anywhere, either in the data center or in the cloud.
Most CrateDB customers use it for operational analytics workloads, rapid time-series execution, text search, machine learning queries on data streams and inactive data in Industrial IoT, corporate cybersecurity and systems monitoring across industries, smart cities and building infrastructure, fleet tracking and management, and marketing analytics.
The open-source SQL database, Crate.io, has a built-in search engine for storing and analyzing machine learning data in real time. The company was founded in 2013 to provide developers with an open-source SQL database for collecting, analyzing and managing machine learning and artificial intelligence data.
If, while reading this, you have any questions about using this technology in your project, we can help you with the implementation. Our developers have innumerable experience with Azure, AWS DevOps as well as in database development technology.
If you're interested, book a time for a consultation at our Calendar or contact us via the email given in the contact information.