As data grows many enterprises will need to think differently about how to architect their CRM for large data volumes. Salesforce Connect with an external database is a great way to integrate outside systems, or offload non-core or historical data. Salesforce Connect virtualizes data in external objects which solve the problems of storage or replicating large volumes of data via API.
With the big cloud vendors you can spin up all manner of databases easily and at relatively low cost, but what database should you choose for Salesforce Connect?
To answer this, let's look at the core capabilities of cloud database vendors and offer some recommendations that we use and support.
The marketing lingo has evolved a lot in the last five years. Cloud databases, cloud data warehouse, data lakes and the modern data stack all refer to the same trend but with nuances in architecture and capabilities. If you're a Salesforce expert, this blog will inform you on the leaders in the cloud data warehouse market.
Core capabilities of cloud databases
Overall, for Salesforce Connect you need to have speed, scalability and connectivity. While Salesforce Connect was originally used for OData integrations, connecting live to your legacy application database is a bad idea.
Connecting live to your legacy application database is a bad idea
Not least because of how this workload will disrupt its performance. The significant gains will come from centralizing enterprise data and creating better data processing steps.
Reduce storage costs
Salesforce storage costs are very high compared to today's public cloud options, however, the premium reflects the proximity of storage to the core CRM. Storing related or external data in the CRM is very convenient and therefore costly, whereas using blob storage is super cheap but requires extra services to process it.
The new era of cloud database vendors have wrapped up these extra services and called them modern cloud data warehouses. As a result of this architecture, cloud vendors like Snowflake sell storage at the same price as the public cloud vendors. Snowflake charge $23/month for 1TB of storage, the same as an AWS S3 bucket.
Moving large volumes of data between enterprise systems is challenging. A cloud data warehouse allows you to take advantage of ultra-low cost storage, and when paired with Salesforce Connect, makes access as convenient as on-platform data. It’s the best of both worlds.
ETL to ELT
The leading cloud data warehouses are architected with a complete separation of compute and storage so huge volumes of data can sit in storage without affecting compute operations. This has led to a new paradigm of data integration, ELT.
In summary, instead of limiting the amount of data extracted from source systems, new tools simply replicate and land raw data in the data warehouse. Tools like Fivetran, Stitch and Airbyte make incredibly simple and automated to replicate Salesforce data to a cloud data warehouse. Once this has happened, consolidation and modelling of data can be performed within the data warehouse itself using popular frameworks like dbt.
Fast query response
Salesforce Connect will call out to the external database and return the results to the external object. For something like a related list or report, this happens while the page loads. This means your database needs to be able to process the query in this time to ensure users are not waiting for a lagging component or report.
There are many databases available in the cloud, only a few vendors have designed new query engines from scratch for the cloud. For relational databases like Snowflake and BigQuery, the types of queries run are fairly easy to process and so perform well but large tables may benefit from some indexing work. On the other hand, Rockset is a new database that indexes all columns by default and is optimised for millisecond query response. Given the expected Salesforce page load times, you don’t need to optimise for absolute speed, but it must be fast enough for a good user experience.
The best overall performing external database also needs to be scalable for operational use.
Modern cloud data warehouses have largely solved the scalability problem. When using a data warehouse as external storage, you need a technology that can scale to workload created by your CRM users and processes that you run off it. With up to thousands of CRM users, scaling for concurrency is the most important factor.
Snowflake allows you to have your Salesforce Connect queries hit a dedicated compute resource. If you have many concurrent CRM users, Snowflake also has the option of auto-scaling horizontally. Say you had surges of queries, like a sales team doing their forecasts at the same time. Snowflake would automatically add compute resources up to the limits you set and then automatically switch them off as demand subsides. Snowflake charges you by the compute time and allows you to micro-manage that time.
On the other hand, BigQuery works in a more automated way. It will auto-scale vertically and horizontally as needed, but the data warehouse admins have less control over exactly how. BigQuery charge you per bytes processed in a query and allow you to set limits on what queries will run.
In either case, these platforms allow you to manage costs, share workloads and optimise performance for your Salesforce Connect use case.
True cloud service
Before the latest generation of cloud data warehouses, ‘cloud databases’ were simply on-premise databases hosted in a public cloud. This means they live inside a cloud VPC and don't have an HTTP API endpoint.
Platforms like Snowflake, BigQuery and Rockset are all ‘true cloud services’. All customers share the same service which mean they can be queried from a public endpoint. Apps like Omnata can query them directly from another cloud without middleware infrastructure. Middleware-free integration removes the complexity of a third party processing data and significantly reduces cost of tools, people and cyber-security overheads.
What have we left out?
Heroku Postgres and Salesforce have a native integration, called Heroku Connect, that uses OData with external objects. Heroku Postgres is great for developers and recommended when you have an app that bridges both Heroku and Salesforce. However, it is not as well-suited to being a central data repository. As a transactional database it requires a lot of close attention to indexing compared to a modern, columnar, cloud data platform. Heavy workloads, large datasets and many concurrent users could create challenges.
AWS Redshift became popular in the first iteration of cloud data warehouses. It is somewhat less known Redshift was licensed from Actian and is built on top of Postgres. For our purposes, Redshift isn't truly cloud-based as it runs inside your AWS VPC. While initially groundbreaking in speed and scalability, more recently Snowflake and BigQuery have upped the standard for large datasets and concurrent users.
Databricks is another heavy hitter in the modern data platform space that is built on Apache Spark. Originally optimised for data processing and machine learning, they have made moves towards more general SQL-like capabilities. However, Databricks is deployed inside a customer's cloud VPC and so its API endpoint can only be accessed from within the instance. For our purposes, Databricks don’t offer a ‘true cloud service’ that can be queried directly from another cloud, but we’re hopeful that this will be on the roadmap.
What do we recommend?
Omnata uses and supports Snowflake, BigQuery & Rockset.
Omnata's native Salesforce app lets you integrate enterprise data from the leading cloud data platforms. To do this, you need database technology that can scale to handle large datasets and large numbers of concurrent users. You also need to reduce costs of middleware and specialized integrators. This is achieved by removing the middleware layer by choosing a truly cloud platform that can be queried directly.
Overall, many organisations are already centralizing enterprise data on platforms like Snowflake and BigQuery and these are your best bet to use with Salesforce Connect. However, if you have a narrower scope of real-time use cases then Rockset is a compelling option.
Does your company already use one of these technologies?
These new technologies have made it much easier to bring on new teams, datasets and use cases. You should spark up a conversation with your data team today, or, talk to us about how you can get value from these platforms.