Snowflake Interview Questions ￼
- May 27, 2022
- Posted by: Laraonline2020
- Category: Interview Question and Answers
Snowflake Interview Questions
1. What is a Snowflake cloud data warehouse?
Snowflake Interview Questions : Snowflake is an analytic data warehouse implemented as a SaaS service. It is built on a new SQL database engine with a unique architecture built for the cloud. This cloud-based data warehouse solution was first available on AWS as software to load and analyze massive volumes of data. The most remarkable feature of Snowflake is its ability to spin up any number of virtual warehouses, which means the user can operate an unlimited number of independent workloads against the same data without any risk of contention.
2. Explain Snowflake architecture
Snowflake is built on an AWS cloud data warehouse and is truly a Saas offering. There is no software, hardware, ongoing maintenance, tuning, etc. needed to work with Snowflake.
Three main layers make the Snowflake architecture – database storage, query processing, and cloud services.
- Data storage – In Snowflake, the stored data is reorganized into its internal optimized, columnar, and optimized format.
- Query processing – Virtual warehouses process the queries in Snowflake.
- Cloud services – This layer coordinates and handles all activities across the Snowflake. It provides the best results for Authentication, Metadata management, Infrastructure management, Access control, and Query parsing.
Unique features of the Snowflake data warehouse are listed below:
- Database and Object Closing
- Support for XML
- External tables
- Hive meta store integration
- Supports geospatial data
- Security and data protection
- Data sharing
- Search optimization service
- Table streams on external tables and shared tables
- Result Caching
4. Describe Snowflake computing.
Snowflake cloud data warehouse platform provides instant, secure, and governed access to the entire data network and a core architecture to enable various types of data workloads, including a single platform for developing modern data applications. Snowflake brings together the power of data warehouses, the scalability of big data platforms, the elasticity of the cloud, and real-time data sharing at a fraction of the cost of traditional solutions.
5. What are the cloud platforms currently supported by Snowflake?
- Amazon Web Services (AWS)
- Google Cloud Platform (GCP)
- Microsoft Azure (Azure)
6. What is the use of the Cloud Services layer in Snowflake?
The services layer acts as the brain of the Snowflake. In Snowflake, the Services layer authenticates user sessions, applies security functions, offers management, performs optimization, and organizes all the transactions.
7. Is Snowflake an ETL tool?
Snowflake supports both transformations during (ETL) or after loading (ELT). Snowflake integrates with a variety of data integration solutions, including Informatica, Talend, Tableau, Matillion, and others.
In data engineering, new tools and self-service pipelines are displacing traditional tasks such as manual ETL coding and data cleaning. With Snowflake’s simple ETL and ELT options, data engineers can spend more time focusing on essential data strategy and pipeline improvement initiatives. Furthermore, using the Snowflake Cloud Platform as your data lake and data warehouse, extract, convert, and load may be efficiently avoided, as no pre-transformations or pre-schemas are needed.
Following are the best ETL tools for Snowflake
- Hevo Data
Snowflake is built entirely on a SQL database. It’s a columnar-stored relational database that works well with Excel, Tableau, and many other tools. Snowflake contains its query tool, supports multi-statement transactions, role-based security, etc., which are expected in a SQL database.
10. What kind of SQL does Snowflake use?
Snowflake supports the most common standardized version of SQL, i.e., ANSI for powerful relational database querying.
11. How is data stored in Snowflake?
Snowflake stores the data in multiple micro partitions which are internally optimized and compressed. The data is stored in a columnar format in the cloud storage of Snowflake. The data objects stored by Snowflake cannot be accessed or visible to the users. By running SQL query operations on Snowflake, you can access them.
12. How many editions of Snowflake are available?
Snowflake offers four editions depending on your usage requirements.
- Standard edition – Its introductory level offering provides unlimited access to Snowflake’s standard features.
- Enterprise edition – Along with Standard edition features and services, offers additional features required for large-scale enterprises.
- Business-critical edition – Also, called Enterprise for Sensitive Data (ESD). It offers high-level data protection for sensitive data to organization needs.
- Virtual Private Snowflake (VPS) – Provides high-level security for organizations dealing with financial activities.
13. Explain Virtual warehouse
In Snowflake, a virtual warehouse, often known as a “warehouse,” is a collection of computational resources. A virtual warehouse provides the resources required for the users like CPU, memory, and temporary storage to perform multiple Snowflake operations:
- Execute the SQL SELECT statements that necessitate the use of computing resources (e.g. retrieving rows from tables and views).
- DML operations include:
- Updating table rows (DELETE , INSERT , UPDATE).
- Data Loading into tables (COPY INTO <table>).
- Data unloading from tables (COPY INTO <location>).
An OLTP (Online Transactional Processing) database contains detailed and up-to-date data, as well as a large volume of typically small data transactions. In turn, online analytical processing (OLAP) often necessitates complex and aggregated queries with a small number of transactions. Snowflake’s database schema is built around online analytical processing.
15. Explain Columnar database
The columnar database is opposite to the conventional databases. It saves the data in columns in place of rows, eases the method for analytical query processing and offers more incredible performance for databases. Columnar database eases analytics processes, and it is the future of business intelligence.
16. What is the use of a database storage layer?
Whenever we load the data into the Snowflake, it organizes the data into the compressed, columnar, and optimized format. Snowflake deals with storing the data that comprises data compression, organization, statistics, file size, and other properties associated with the data storage. All the data objects we store in the Snowflake are inaccessible and invisible. We can access the data objects by executing the SQL query operation through Snowflake.
17. What is the use of the Compute layer in Snowflake?
In Snowflake, Virtual warehouses perform all the data handling tasks. Which are multiple clusters of the compute resources. While performing a query, virtual warehouses extract the least data needed from the storage layer to satisfy the query requests.
18. What are the different ways to access the Snowflake Cloud data warehouse?
We can access the Snowflake data warehouse through:
- A web-based user interface from which all aspects of Snowflake management and usage can be accessed.
- Command-line clients (such as SnowSQL) that can access all parts of Snowflake management and use.
- Snowflake has ODBC and JDBC drivers, which allow other applications (like Tableau) to connect to it.
- Native connectors (e.g., Python, Spark) for developing programmes that connect to Snowflake.
- Third-party connectors can be used to link applications such as ETL tools (e.g., Informatica) and BI tools (e.g., ThoughtSpot) to Snowflake.
19. Why is Snowflake highly successful?
Snowflake is highly successful because of the following reasons:
- It assists a wide variety of technology areas like data integration, business intelligence, advanced analytics, security, and governance.
- It offers cloud infrastructure and supports advanced design architectures ideal for dynamic and quick usage developments.
- Snowflake supports predetermined features like data cloning, data sharing, division of computing and storage, and directly scalable computing.
- Snowflake eases data processing.
- Snowflake provides extendable computing power.
- Snowflake suits various applications like ODS with the staged data, data lakes with data warehouse, raw marts, and data marts with acceptable and modelled data.
20. How do we secure the data in the Snowflake?
Data security plays a prominent role in all enterprises. Snowflake adapts the best-in-class security standards for encrypting and securing the customer accounts and data that we store in the Snowflake. It provides the industry-leading key management features at no extra cost.
21. Tell me something about Snowflake AWS?
For managing today’s data analytics, companies rely on a data platform that offers rapid deployment, compelling performance, and on-demand scalability. Snowflake on the AWS platform serves as a SQL data warehouse, which makes modern data warehousing effective, manageable, and accessible to all data users. It enables the data-driven enterprise with secure data sharing, elasticity, and per-second pricing.
22. Can AWS glue connect to Snowflake?
Definitely. AWS glue presents a comprehensive managed environment that easily connects with Snowflake as a data warehouse service. These two solutions collectively enable you to handle data ingestion and transformation with more ease and flexibility.
23. What are Micro Partitions?
Snowflake comes along with a robust and unique kind of data partitioning known as micro partitioning. Data that exists in the Snowflake tables are systematically converted into micro partitions. Generally, we perform Micro partitioning on the Snowflake tables.
Both Redshift and Snowflake provide on-demand pricing but vary in package features. Snowflake splits compute storage from usage in its pricing pattern, whereas Redshift integrates both.
|Snowflake is a comprehensive SaaS solution that requires no maintenance.
|AWS Redshift clusters necessitate some manual maintenance.
|Snowflake separates computing and storage, allowing for customizable pricing and setup.
|Reserved/Spot instance price in Redshift provides for cost optimization.
|Snowflake uses real-time auto-scaling.
|Redshift, on the other hand, involves the addition and removal of nodes in order to scale.
|Snowflake provides less data customisation options.
|Where Redshift facilitates data flexibility with features such as partitioning and distribution.
|Snowflake provides always-on encryption with strict security checks.
|While Redshift offers a flexible, customised security strategy.
Snowpipe is Snowflake’s continuous data ingestion service. Snowpipe loads data in minutes once files are uploaded to a stage and submitted for ingestion. Snowflake maintains load capacity with Snowpipe’s serverless compute approach, assuring appropriate compute resources to meet demand. In a nutshell, Snowpipe provides a “pipeline” for loading new data in micro-batches as soon as it becomes available.
The data is loaded using the COPY command defined in a connected pipe. Snowpipe can use a pipe, which is a named, first-class Snowflake object containing a COPY statement. The COPY statement specifies the location of the data files (i.e., a stage) as well as the target table. All data types, including semi-structured data types like JSON and Avro, are supported.
There are several ways for detecting staged files:
- Using cloud messaging to automate Snowpipe
- REST endpoints in Snowpipe
The Snowpipe benefits are as follows:
- Real-time insights
In Snowflake, a schema is a logical grouping of database objects such as tables, views, etc. The snowflake schema is made up of fact tables that are centralised and linked to multiple dimensions. A Snowflake Schema is a dimension-added extension of a Star Schema. The dimension tables have been normalized, resulting in the data being split into additional tables.
The benefits of using Snowflake schemas are it provides structured data and uses small disk space.
An example of Snowflake Schema is shown below:
27. What is the difference between Star Schema and Snowflake Schema?
Both Snowflake and Star Schemas are identical, yet the difference exists in dimensions. In Snowflake, we normalize only a few dimensions, and in a star schema, we denormalise the logical dimensions into tables.
|The fact tables and dimension tables are both contained in the star schema.
|The fact tables, dimension tables, and sub dimension tables are all contained in the snowflake schema.
|The star schema is a top-down model.
|While it is a bottom-up model.
|The star schema takes up more space.
|While it takes up less space.
|Queries are executed in less time.
|Here query execution takes longer than with the star schema.
|Normalization is not employed in the star schema.
|Both normalisation and denormalization are employed in this.
|It has a very simple design.
|While its design is complex.
|Star schema has a low query complexity.
|Snowflake schema has a higher query complexity than star schema.
|It contains fewer foreign keys.
|It has a larger number of foreign keys.
|It has a high level of data redundancy.
|While it has a minimal level of data redundancy.
Snowflake Time Travel tool allows us to access the past data at any moment in the specified period. Through this, we can see the data that we can change or delete. Through this tool, we can carry out the following tasks:
- Restore the data-associated objects that may have lost unintentionally.
- For examining the data utilization and changes done to the data in a specific time period.
- Duplicating and backing up the data from the essential points in history.
29. Differentiate Fail-Safe and Time-Travel in Snowflake
|According to the Snowflake edition, account or object particular time travel setup, users can retrieve and set the data reverting to the history.
|Fail-Safe, the User does not have control over the recovery of data valuable merely after completing the period. In this context, only Snowflake assistance can help for 7 days. Therefore if you set time travel as six days, we retrieve the database objects after executing the transaction + 6 days duration.
30. What is zero-copy Cloning in Snowflake?
Zero copy cloning is a snowflake implementation in which a simple keyword CLONE allows you to generate a clone of your tables, schemas, and databases without replicating the actual data. As a result, you can have practically real-time data from production cloned into your dev and stage environments to conduct various activities.
- There are no additional storage costs associated with data replication.
- There is no waiting time for copying data from production to non-production contexts.
- There is no need for administrative efforts since cloning is as simple as a click of a button.
- No copy, only clone: Data exists only in one place.
- Promote corrected/fixed data to production instantly.
31. What is Data Retention Period in Snowflake?
The data retention period is an important aspect of Snowflake Time Travel.
When data in a table is modified, such as deletion or discarding an object holding data, Snowflake saves the data’s previous state. The data retention period determines the number of days that this historical data is kept and, as a result, Time Travel operations (SELECT, CREATE… CLONE, UNDROP) can be performed on it.
The standard retention period is one day (24 hours) and is enabled by default for all Snowflake accounts.
32. What is SnowSQL used for?
SnowSQL is the command-line client used to connect to Snowflake and conduct SQL queries as well as complete all DDL and DML actions such as loading and unloading data from database tables.
SnowSQL (snowsql executable) can be operated as an interactive shell or in batch mode via stdin or with the -f option.
33. What is the use of Snowflake Connectors?
The Snowflake connector is a piece of software that allows us to connect to the Snowflake data warehouse platform and conduct activities such as Read/Write, Metadata import, and Bulk data loading.
The Snowflake connector can be used to execute the following tasks:
- Read data from or publish data to tables in the Snowflake data warehouse.
- Load data in bulk into a Snowflake data warehouse table.
- You can insert or bulk load data into numerous tables at the same time by using the Numerous input connections functionality.
- To lookup records from a table in the Snowflake data warehouse.
Following are the types of Snowflake Connectors:
- Snowflake Connector for Kafka
- Snowflake Connector for Spark
- Snowflake Connector for Python
34. What are Snowflake views?
Views are useful for displaying certain rows and columns in one or more tables. A view makes it possible to obtain the result of a query as if it were a table. The CREATE VIEW statement defines the query. Snowflake supports two different types of views:
- Non-materialized views (often referred to as “views”) – The results of a non-materialized view are obtained by executing the query at the moment the view is referenced in a query. When compared to materialised views, performance is slower.
- Materialized views – Although named as a type of view, a materialised view behaves more like a table in many aspects. The results of a materialised view are saved in a similar way to that of a table. This allows for faster access, but it necessitates storage space and active maintenance, both of which incur extra expenses.
35. Describe Snowflake Clustering
In Snowflake, data partitioning is called clustering, which specifies cluster keys on the table. The method by which you manage clustered data in a table is called re-clustering.
A clustering key is a subset of columns in a table (or expressions on a database) that are deliberately intended to co-locate the table’s data in the same micro-partitions. This is beneficial for very large tables where the ordering was not perfect (at the time the data was inserted/loaded) or if extensive DML has weakened the table’s natural clustering.
Some general indicators that can help determine whether a clustering key should be defined for a table are as follows:
- Table queries are running slower than expected or have degraded noticeably over time.
- The table’s clustering depth is large.
36. Explain Data Shares
Snowflake Data sharing allows organizations to securely and immediately share their data. Secure data sharing enables sharing of the data between the accounts through Snowflake secure views, database tables.
37. Does Snowflake use Indexes?
No, Snowflake does not use indexes. This is one of the aspects that set the Snowflake scale so good for the queries.
38. Where do we store data in Snowflake?
Snowflake systematically creates metadata for the files in the external or internal stages. We store metadata in the virtual columns, and we can query through the standard “SELECT” statement.
39. What is “Stage” in the Snowflake?
In Snowflake, stages are data storage locations. If the data to be imported into Snowflake is stored in another cloud area, such as AWS S3, Azure, or GCP, these are referred to as External stages; if the data is stored within Snowflake, they are referred to as Internal stages.
Internal Stages are further divided as below
- Table Stage
- User Stage
- Internal Named Stage
40. Does Snowflake maintain stored procedures?
41. How do we execute the Snowflake procedure?
Stored procedures allow us to create modular code comprising complicated business logic by adding various SQL statements with procedural logic. For executing the Snowflake procedure, carry out the below steps:
- Run a SQL statement
- Extract the query results
- Extract the result set metadata
42. Explain Snowflake Compression
All the data we enter into the Snowflake gets compacted systematically. Snowflake utilizes modern data compression algorithms for compressing and storing the data. Customers have to pay for the packed data, not the exact data.
Following are the advantages of the Snowflake Compression:
- Storage expenses are lesser than original cloud storage because of compression.
- No storage expenditure for on-disk caches.
- Approximately zero storage expenses for data sharing or data cloning.
To create a Snowflake task, we have to use the “CREATE TASK” command. Procedure to create a snowflake task:
- CREATE TASK in the schema.
- USAGE in the warehouse on task definition.
- Run SQL statement or stored procedure in the task definition.
44. How do we create temporary tables?
To create temporary tables, we have to use the following syntax:
Create temporary table mytable (id number, creation_date date);