GoogleCloud Interview Questions and Answers
What is Google Cloud Platform (GCP)?
- Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google. It provides a wide range of services, including compute, storage, networking, databases, big data, machine learning, and operations, allowing users to build, deploy, and scale applications and services.
What are the main service models in cloud computing (IaaS, PaaS, SaaS)? How do they relate to GCP services?
- IaaS (Infrastructure as a Service): Provides fundamental compute, storage, and networking resources. You manage the operating system and applications. GCP examples: Compute Engine, Cloud Storage, VPC.
- PaaS (Platform as a Service): Provides a platform for developing, running, and managing applications without the complexity of building and maintaining the underlying infrastructure. GCP examples: App Engine, Cloud SQL, Cloud Dataflow.
- SaaS (Software as a Service): Provides fully managed applications delivered over the internet. Users access the software via a web browser or API. GCP examples: Google Workspace (Gmail, Drive), BigQuery (from a user perspective using the UI/API).
Explain the concept of Regions and Zones in GCP.
- Region: A specific geographical location where Google Cloud data centers are clustered. Regions are independent of each other and are typically composed of multiple zones. Choosing a region is important for latency, data sovereignty, and cost.
- Zone: An isolated location within a region. Zones are independent of each other within a region, meaning a failure in one zone is unlikely to affect other zones in the same region. Deploying resources across multiple zones within a region provides high availability.
What is the purpose of a GCP Project?
- A GCP Project is the fundamental organizational unit for Google Cloud resources. It acts as a container for your resources, billing information, IAM policies, and API management. Resources must belong to a project.
What is Google Cloud IAM? Why is it important?
- Google Cloud Identity and Access Management (IAM) controls who (identity) can do what (role) on which resources. It's crucial for security as it allows you to grant granular access to GCP resources based on the principle of least privilege, ensuring users only have the necessary permissions.
Explain the different types of IAM roles (Primitive, Predefined, Custom).
- Primitive Roles: Broad roles (Owner, Editor, Viewer) that existed before IAM. They grant permissions across all GCP services.
- Predefined Roles: More granular roles defined by Google for specific services (e.g., Compute Engine Admin, BigQuery Data Viewer). Recommended for most use cases.
- Custom Roles: Allow you to create roles with a specific set of permissions tailored to your needs, providing the most granular control.
What is gcloud
? How do you use it?
gcloud
is the Google Cloud command-line tool. It's part of the Google Cloud SDK and provides a unified interface for interacting with Google Cloud services from your terminal. You use commands likegcloud compute instances create
,gcloud storage buckets list
, etc.
What is Compute Engine? What are its key features?
- Compute Engine is GCP's Infrastructure as a Service (IaaS) offering for running Virtual Machines (VMs). Key features include: customizable machine types, persistent disks, global network, autoscaling, load balancing, and integration with other GCP services.
What are the different types of Compute Engine machine types?
-
Machine types define the virtual hardware of a VM instance (CPU, RAM). Common types include:
- General-purpose (N1, N2, N2D, E2): Balanced price/performance.
- Compute-optimized (C2, C2D): High CPU performance.
- Memory-optimized (M1, M2): Large amounts of memory.
- Accelerator-optimized (A2): Designed for GPUs.
What are Persistent Disks in Compute Engine? What are the different types?
-
Persistent Disks are durable block storage devices that provide storage for Compute Engine VMs. They are independent of the VM's lifecycle. Types include:
- Standard Persistent Disk (
pd-standard
): HDD-based, cost-effective for large, sequential reads/writes. - Balanced Persistent Disk (
pd-balanced
): SSD-based, balance of performance and cost. - SSD Persistent Disk (
pd-ssd
): SSD-based, high performance for demanding applications. - Extreme Persistent Disk (
pd-extreme
): Highest performance SSD, provisioned IOPS.
- Standard Persistent Disk (
What is Google Kubernetes Engine (GKE)? Why use it?
- Google Kubernetes Engine (GKE) is a managed service for deploying, managing, and scaling containerized applications using Kubernetes. You use it to automate the deployment, scaling, and operations of application containers, simplifying the management of complex microservice architectures.
Explain the difference between GKE Autopilot and GKE Standard.
- GKE Standard: You manage the underlying VM nodes (node pools). You have more control over node configuration but are responsible for node maintenance, upgrades, and scaling.
- GKE Autopilot: Google fully manages the underlying nodes, including scaling, patching, and repair. You only pay for the resources your pods consume. It simplifies cluster management significantly.
What is Cloud Run? What are its advantages?
- Cloud Run is a serverless platform for running stateless containers. You deploy a container image, and Cloud Run automatically scales it up or down based on incoming requests, even to zero instances when idle. Advantages include: serverless operations, pay-per-use billing, support for any language/framework that can be containerized, and rapid deployment.
What are Cloud Functions? When would you use them?
- Cloud Functions are serverless, event-driven compute services. You write small, single-purpose functions in various languages that respond to events (HTTP requests, changes in Cloud Storage, Pub/Sub messages, etc.). You use them for tasks like image processing after upload, triggering database updates, or building simple APIs, without managing servers.
What is App Engine? What are the differences between Standard and Flexible environments?
-
App Engine is a fully managed Platform as a Service (PaaS) for deploying web applications.
- Standard Environment: Sandboxed environment with language-specific runtimes (Python, Node.js, Java, Go, etc.). Offers rapid scaling to zero, free tier, and strong security isolation. Limited to specific language versions and libraries.
- Flexible Environment: Runs applications in Docker containers on Compute Engine VMs. Offers more flexibility in language versions, libraries, and custom runtimes. Does not scale to zero instances.
What is Cloud Storage? What are its main use cases?
- Cloud Storage is GCP's object storage service. It's highly scalable, durable, and available. Use cases include: storing unstructured data (images, videos, documents), hosting static websites, backup and disaster recovery, and data lakes for big data analytics.
Explain the different Cloud Storage classes. When would you use each?
- Standard: For frequently accessed data ("hot" data). Lowest access latency.
- Nearline: For data accessed less than once a month. Lower storage cost than Standard, higher access cost and latency.
- Coldline: For data accessed less than once a quarter. Lower storage cost than Nearline, higher access cost and latency.
- Archive: For data accessed less than once a year. Lowest storage cost, highest access cost and latency. Suitable for long-term backups and archives.
What are Cloud SQL and Cloud Spanner? When would you choose one over the other?
- Cloud SQL: A fully managed relational database service for MySQL, PostgreSQL, and SQL Server. Suitable for traditional relational workloads that fit within a single regional database instance or read replicas.
- Cloud Spanner: A globally distributed, strongly consistent, and highly available relational database service. Offers unlimited horizontal scaling and 99.999% availability for multi-region configurations. Choose Spanner for mission-critical, globally distributed applications requiring high availability and massive scale.
What is Firestore? What is its data model?
- Firestore is a serverless, NoSQL document database. Its data model is based on collections of documents. Documents contain key-value pairs and can contain nested subcollections. It's designed for web, mobile, and server development, offering real-time synchronization and offline support.
What is BigQuery? What makes it unique?
- BigQuery is a fully managed, serverless data warehouse. It's unique because of its massive scalability, speed, ease of use, and separation of compute and storage. You pay for storage and query processing. It's ideal for analyzing large datasets using standard SQL.
How do you load data into BigQuery?
-
You can load data into BigQuery from:
- Cloud Storage (batch load).
- Streaming insertion API (real-time).
- Google Drive, local files, or other cloud providers.
- Using data transfer services (e.g., BigQuery Data Transfer Service).
What is Cloud Pub/Sub? What is its purpose?
- Cloud Pub/Sub is a fully managed, scalable, and reliable asynchronous messaging service. It enables you to decouple senders (publishers) from receivers (subscribers) of messages. It's used for event-driven architectures, streaming analytics, and building reliable distributed systems.
Explain the difference between a Pub/Sub Topic and a Subscription.
- Topic: A named resource to which messages are sent by publishers.
- Subscription: A named resource representing the stream of messages from a single, specific topic to be delivered to a subscribing application. Subscribers receive messages from their subscription.
What is a Virtual Private Cloud (VPC) network in GCP?
- A VPC network is a global, software-defined network that provides networking functionality for your GCP resources (like Compute Engine VMs, GKE clusters, etc.). It's a virtual version of a physical network, providing routing, firewalling, and connectivity.
What are Subnets in a GCP VPC?
- Subnets are IP address ranges within a VPC network. They are regional resources, meaning a single subnet can span across multiple zones within a region. Resources in a subnet can communicate with each other using internal IP addresses.
How do you control traffic flow in a GCP VPC?
- Using Firewall Rules. Firewall rules are global and can allow or deny traffic based on protocols, ports, source/destination IP ranges, and source/destination tags/service accounts.
What is Cloud Load Balancing? What are the different types?
-
Cloud Load Balancing is a fully distributed, software-defined load balancing service that scales with your traffic. Types include:
- Global HTTP(S) Load Balancing (for web traffic, global backends).
- Regional HTTP(S) Load Balancing.
- TCP/UDP Load Balancing (Global and Regional).
- Internal HTTP(S) and TCP/UDP Load Balancing.
- SSL Proxy Load Balancing.
- TCP Proxy Load Balancing.
What is Cloud CDN? How does it improve performance?
- Cloud CDN (Content Delivery Network) caches your static content (images, videos, CSS, JS) at Google's global edge locations. This reduces latency for end-users by serving content from a location closer to them, improving website/application performance and reducing the load on your origin servers.
What is Cloud Interconnect and Cloud VPN? When would you use them?
-
Both are used to connect your on-premises network to your GCP VPC network.
- Cloud Interconnect: Provides high-bandwidth, low-latency connections directly from your data center to Google's network edge. Suitable for mission-critical applications and large data transfers.
- Cloud VPN: Provides secure tunnels over the public internet. More cost-effective and easier to set up than Interconnect, but performance depends on internet conditions. Suitable for non-critical workloads or connecting branch offices.
What is Identity-Aware Proxy (IAP)? (Security related)
- Identity-Aware Proxy (IAP) is a service that controls access to cloud applications and resources based on user identity and context. Instead of using a traditional VPN, IAP verifies a user's identity and context to determine if they should be allowed to access a resource. It's a Zero Trust security model implementation.
What is Cloud Monitoring? What is its purpose?
- Cloud Monitoring (part of Operations suite) collects metrics, events, and metadata from GCP, AWS, and on-premises resources. It's used to monitor the performance, availability, and health of your applications and infrastructure, create dashboards, and set up alerting policies.
What is Cloud Logging? What is its purpose?
- Cloud Logging (part of Operations suite) is a fully managed service for storing, searching, analyzing, and exporting log data from GCP and other sources. It's used for troubleshooting, auditing, and understanding application behavior.
What is Cloud Trace and Cloud Profiler? (Operations related)
- Cloud Trace: A distributed tracing system that collects latency data from applications and displays it in the Google Cloud Console. Helps understand how requests propagate through a distributed system and identify performance bottlenecks.
- Cloud Profiler: Continuously collects CPU and memory profiling data from applications. Helps identify which parts of your code consume the most resources, aiding in performance optimization.
What is Cloud Build? How is it used in CI/CD?
- Cloud Build is a serverless CI/CD platform that executes your builds on GCP. It can fetch source code from various repositories (Cloud Source Repositories, GitHub, Bitbucket), execute build steps (using containers), and produce artifacts. It's used to automate the process of building, testing, and deploying applications.
What is Cloud Source Repositories?
- Cloud Source Repositories is a fully featured, private Git repository service hosted on Google Cloud. It's integrated with other GCP services like Cloud Build.
What is Bigtable? What are its ideal use cases?
- Bigtable is a fully managed, scalable NoSQL wide-column database service. It's ideal for large operational and analytical workloads that require very high throughput and low latency, such as time series data, IoT data, and operational analytics.
What is Memorystore? When would you use it?
- Memorystore is a fully managed, in-memory data store service compatible with Redis and Memcached. You use it as a cache layer for your applications to reduce latency and offload your primary database, or as a primary data store for use cases requiring sub-millisecond latency.
What is the Shared VPC concept?
- Shared VPC allows you to share a VPC network from a host project with one or more service projects. Resources in the service projects can use the shared network's subnets and internal IP addresses. This simplifies network management and enables centralized control.
What is VPC Network Peering?
- VPC Network Peering allows you to connect two VPC networks privately using their internal IP addresses. Resources in the peered networks can communicate as if they were in the same network. It's a many-to-many relationship and is non-transitive.
How do you manage secrets in GCP?
- Use Secret Manager. It's a dedicated service for storing, managing, and accessing secrets (API keys, passwords, certificates) securely. It offers versioning, access control (IAM), and audit logging.
What is the principle of least privilege in IAM? How do you apply it?
- The principle of least privilege states that users and service accounts should only be granted the minimum permissions necessary to perform their required tasks. You apply it by assigning granular, predefined roles or custom roles instead of broad primitive roles (like Editor or Owner).
What is a Service Account in GCP? Why are they used?
- A service account is a special type of Google account that represents a non-human user (like an application or VM instance) that needs to authenticate and access GCP resources. They are used to provide identity and permissions to applications running on GCP, allowing them to interact with other services securely.
How do you provide credentials to applications running on Compute Engine VMs to access other GCP services?
- The recommended way is to assign a service account to the VM instance. The VM instance then automatically uses the service account's credentials (via the metadata server) to authenticate with GCP services. You grant the necessary IAM roles to the service account.
What is Cloud Dataflow? What is Apache Beam?
- Cloud Dataflow is a fully managed service for executing data processing pipelines. It can handle both batch and stream processing. Apache Beam is an open-source unified programming model that allows you to define batch and streaming data processing pipelines that can run on various execution engines, including Cloud Dataflow.
What is Cloud Dataproc? When would you use it?
- Cloud Dataproc is a fully managed, cost-effective service for running Apache Hadoop and Spark clusters. You use it for large-scale data processing, ETL (Extract, Transform, Load), and machine learning workloads that are built on the Hadoop/Spark ecosystem, without the hassle of managing the clusters yourself.
What is the difference between Cloud Dataflow and Cloud Dataproc?
- Cloud Dataflow: Serverless, uses the Apache Beam programming model, ideal for building flexible, complex batch and streaming pipelines.
- Cloud Dataproc: Managed service for traditional Hadoop/Spark clusters, suitable if you have existing Hadoop/Spark jobs or need direct control over the cluster environment.
What is BigQuery ML?
- BigQuery ML allows you to create and execute machine learning models directly within BigQuery using standard SQL queries. This enables data analysts to build ML models without needing to learn separate ML frameworks and move data out of the data warehouse.
What is Vertex AI? What services does it consolidate?
- Vertex AI is Google Cloud's unified platform for building, deploying, and scaling machine learning models. It consolidates various previously separate services (AI Platform, AutoML, etc.) into a single platform covering the entire MLOps lifecycle, from data preparation and model training to deployment and monitoring.
What are some examples of pre-trained AI APIs offered by GCP?
- Cloud Vision AI (image analysis)
- Cloud Natural Language AI (text analysis)
- Cloud Translation AI (language translation)
- Cloud Speech-to-Text / Text-to-Speech AI (audio processing)
- Dialogflow (conversational interfaces)
What is the shared responsibility model in cloud security?
- It defines the security responsibilities shared between the cloud provider (Google) and the customer. Google is responsible for the security *of* the cloud (infrastructure, physical security, network security). The customer is responsible for security *in* the cloud (data security, access control, application security, guest OS patching on IaaS).
How does GCP help with compliance requirements?
- GCP adheres to numerous global compliance standards (e.g., ISO 27001, SOC 1/2/3, HIPAA, GDPR). They provide tools and services (like Cloud Audit Logs, IAM, Security Command Center) to help customers meet their specific compliance obligations.
What are Organization Policies? How are they different from IAM?
- Organization Policies are rules that apply constraints on the configuration of GCP resources across your organization, folders, or projects. They are different from IAM, which controls *who* can do *what*. Organization Policies control *what* configurations are *allowed* for resources (e.g., restricting allowed external IP addresses, requiring specific services to be enabled).
What is Cloud Armor? (Security related)
- Cloud Armor is a DDoS protection and web application firewall (WAF) service. It helps protect your applications and services running on GCP (behind a load balancer) from various network and application layer attacks.
What is Security Command Center? (Security related) (Optional)
- Security Command Center is a security and data risk platform for GCP. It helps you understand your security and data risk posture, detect threats, and prevent, detect, and respond to threats against your resources.
How do you monitor costs in GCP?
- Use the Cloud Billing reports in the GCP Console to view cost trends, breakdown costs by project, service, SKU, etc. Set up budgets and budget alerts to be notified when costs exceed predefined thresholds. Use labels to categorize costs.
What are Sustained Usage Discounts?
- Sustained Usage Discounts are automatic discounts applied to Compute Engine VM instances (and some other services) when you run them for a significant portion of a billing cycle (more than 25% of the month). You don't need to do anything to enable them.
What are Committed Use Discounts?
- Committed Use Discounts (CUDs) are offered in exchange for committing to a specific amount of resource usage (e.g., vCPU, memory) or spending for a term of 1 or 3 years. They provide lower pricing than sustained usage discounts but require a commitment.
How can you optimize costs on GCP?
- Right-size your VM instances and other resources.
- Utilize autoscaling.
- Choose appropriate storage classes in Cloud Storage.
- Use serverless services where applicable (Cloud Run, Cloud Functions, BigQuery).
- Take advantage of Sustained Usage and Committed Use Discounts.
- Implement lifecycle policies for data in Cloud Storage.
- Delete unused resources.
What is Cloud CDN? How is it configured? (Revisited)
- Cloud CDN is configured on a per-backend basis for certain types of Cloud Load Balancers (Global External HTTP(S) Load Balancer, Classic HTTP(S) Load Balancer, Media CDN). You enable it on the backend service or backend bucket.
What is the difference between a public and private IP address in GCP?
- Public IP Address: An IP address routable on the public internet. Used for resources that need to be accessible from outside the GCP network.
- Private IP Address: An IP address within the private IP address space of your VPC network. Used for communication between resources within the same or peered VPC networks. Not routable on the public internet.
What is Cloud NAT? (Networking related)
- Cloud NAT (Network Address Translation) allows instances without external IP addresses to send outbound traffic to the internet. It provides a managed, distributed NAT service for your VPC network.
What is the purpose of VPC Service Controls? (Security related) (Advanced)
- VPC Service Controls allow you to create security perimeters around your sensitive data stored in GCP services (like BigQuery, Cloud Storage, Pub/Sub) to mitigate data exfiltration risks. You can define boundaries and restrict data movement outside the perimeter.
What is Cloud Key Management Service (KMS)? (Security related)
- Cloud KMS is a cloud-hosted key management service that lets you manage cryptographic keys in a cloud service or a hardware security module (HSM) cluster. You can use it to protect data in your applications and various GCP services.
What is the difference between Customer-Managed Encryption Keys (CMEK) and Customer-Supplied Encryption Keys (CSEK)? (Security related)
- CMEK: You manage the encryption keys using Cloud KMS. Google uses your keys to encrypt/decrypt your data in supported services. Google manages the encryption at rest using your key.
- CSEK: You generate and manage the encryption keys yourself outside of GCP. You provide the key to GCP when performing operations (like reading/writing to Cloud Storage or Persistent Disks). Google does not store your key. You are responsible for key management and availability.
What is Cloud Identity? How does it relate to GCP IAM?
- Cloud Identity is Google's Identity as a Service (IDaaS) platform. It provides identity and access management for users and groups who don't have Google Workspace. It integrates with GCP IAM, allowing you to use Cloud Identity users and groups as principals in your IAM policies.
What is Resource Manager? What is its role in the resource hierarchy? (Revisited)
- Resource Manager provides the ability to manage resources programmatically in the Google Cloud resource hierarchy: Organization, Folders, and Projects. It allows you to organize resources, manage access control (IAM) at different levels, and apply Organization Policies.
What is Cloud Data Loss Prevention (DLP)? (Security related) (Optional)
- Cloud DLP is a service that helps you discover, classify, and protect sensitive data. It can inspect text, images, and structured data for sensitive information types (like credit card numbers, social security numbers) and apply de-identification techniques.
What is Network Service Tiers?
- Network Service Tiers let you choose between premium tier (Google's global network, lower latency) and standard tier (uses the public internet for traffic between GCP regions and the user, lower cost). Premium tier is the default.
What is preemptible VM instance? When would you use it?
- A preemptible VM instance is a Compute Engine VM instance that can be terminated by Google Compute Engine at any time (with a 30-second notice), typically after 24 hours. They are significantly cheaper than regular instances. You use them for fault-tolerant, batch processing jobs or workloads that can tolerate being stopped and restarted.
What is the purpose of Managed Instance Groups (MIGs)?
- Managed Instance Groups (MIGs) are collections of identical VM instances that you can manage as a single entity. They provide features like autoscaling, autohealing (recreating unhealthy VMs), rolling updates, and load balancing integration, simplifying the management of stateless workloads.
How does autoscaling work in Compute Engine MIGs?
- Autoscaling in MIGs automatically adds or removes VM instances from the group based on metrics like CPU utilization, load balancing serving capacity, or Pub/Sub queue size. This ensures your application can handle varying loads efficiently.
What is Cloud Endpoints? (API Management)
- Cloud Endpoints is a distributed API management system that helps you develop, deploy, protect, and monitor your APIs. It provides features like authentication, monitoring, and logging for APIs built on GCP services.
What is Apigee? How does it differ from Cloud Endpoints? (API Management) (Optional)
- Apigee is a comprehensive, enterprise-grade API management platform. It offers more advanced features than Cloud Endpoints, including API analytics, developer portals, monetization, and security policies. You use Apigee for complex API programs and digital transformation initiatives.
What is Cloud Functions cold start? How can you mitigate it?
-
Cold start is the latency incurred when a serverless function (like Cloud Functions or Cloud Run) is invoked after a period of inactivity. The runtime environment needs to be initialized. You can mitigate it by:
- Using minimum instances (for Cloud Run).
- Increasing allocated memory/CPU (can speed up initialization).
- Optimizing function code for faster startup.
- Using event triggers that keep instances warm (less effective than min instances).
What is the difference between Cloud Functions and Cloud Run?
- Cloud Functions: FaaS, designed for short-lived, event-driven tasks. You deploy source code.
- Cloud Run: Serverless Containers, designed for stateless request-driven workloads. You deploy a container image. Offers more flexibility in language/libraries and longer request timeouts.
What is Cloud Tasks? (Task Management)
- Cloud Tasks is a fully managed service that allows you to manage the execution of a large number of distributed tasks. It provides features like rate limiting, retries, and scheduled execution. It's often used with Cloud Functions or other web services to handle background tasks.
What is Workflows? (Orchestration) (Optional)
- Workflows is a fully managed orchestration platform that executes a series of steps defined in a declarative syntax (YAML or JSON). It can combine various GCP services and external APIs into serverless workflows, managing state, retries, and error handling.
What is the purpose of the .gcloudignore
file?
- Similar to
.gitignore
, the.gcloudignore
file specifies which files and directories should be ignored when uploading source code to GCP services like Cloud Build, Cloud Functions, or App Engine. This helps speed up deployments and avoid including unnecessary files.
How do you deploy a containerized application to GCP? (Multiple ways)
- Google Kubernetes Engine (GKE)
- Cloud Run
- App Engine Flexible Environment
- Compute Engine (running containers directly)
What is Artifact Registry? How is it used?
- Artifact Registry is a fully managed service for storing, managing, and securing your build artifacts (like Docker images, Maven packages, npm packages). It integrates with Cloud Build and GKE, providing a central repository for your application dependencies and container images.
What is the difference between Cloud Storage and Persistent Disks? (Revisited)
- Cloud Storage: Object storage, unstructured data, accessed via API/HTTP, globally available, eventually consistent, ideal for static assets, backups, data lakes.
- Persistent Disks: Block storage, structured data, attached to VMs, accessed like a local disk, zonal or regional, strongly consistent, ideal for VM operating systems and application data.
What are Signed URLs in Cloud Storage?
- Signed URLs provide limited-time access to a Cloud Storage object using a specific URL. You can generate a signed URL that grants read, write, or delete permissions for a specified duration, without requiring the user to have Google credentials. Useful for granting temporary access to private objects.
What is the purpose of Object Versioning in Cloud Storage?
- Object Versioning keeps a history of changes to your objects in a bucket. When you overwrite or delete an object, the previous version is kept. This helps protect against accidental deletions or overwrites and allows you to restore previous versions.
What is Cloud Data Fusion? (Data Integration) (Optional)
- Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. It provides a visual interface for designing pipelines and integrates with various data sources and destinations.
What is Data Catalog? (Data Discovery and Governance) (Optional)
- Data Catalog is a fully managed, scalable metadata management service. It allows organizations to discover, manage, and understand their data assets across GCP and other platforms.
What is the purpose of Labels in GCP?
- Labels are key-value pairs that you can attach to GCP resources. They are used for organizing resources, filtering resources, and reporting on costs (cost allocation).
What is the difference between Labels and Tags? (Networking related)
- Labels: Key-value pairs for resource organization and cost allocation. Used by services like Billing, Monitoring, etc.
- Network Tags: Metadata attached to Compute Engine VM instances. Primarily used for applying firewall rules and routes to specific instances.
What is the purpose of the Metadata Server in Compute Engine?
- The Metadata Server is a source of information about a VM instance (instance ID, name, zone, project ID, assigned service account, custom metadata) that the instance can query without requiring authentication. It's commonly used by applications running on the VM to obtain configuration or credentials (via the service account).
What is the difference between a Service and an Ingress in Kubernetes/GKE?
- Service: An abstraction that defines a logical set of Pods and a policy by which to access them. Provides stable IP addresses and DNS names for accessing groups of pods.
- Ingress: Manages external access to services within a cluster, typically HTTP(S). It provides features like load balancing, SSL termination, and name-based virtual hosting.
What is Cloud Functions execution environment?
- Cloud Functions runs your code in a fully managed, stateless execution environment. Each function invocation runs in an isolated environment. The environment scales automatically based on the number of incoming requests.
How do you deploy a web application to App Engine?
- You define your application configuration in an
app.yaml
file. Then, you use thegcloud app deploy
command to deploy your application code to App Engine.
What is the difference between a VPC network and a legacy network? (Networking related)
- VPC Network: Global, single network across regions with regional subnets. More flexible and recommended.
- Legacy Network: Regional, single network per region with zonal subnets. Deprecated for new projects.
What is Cloud Router? (Networking related)
- Cloud Router is a fully distributed, managed routing service that uses Border Gateway Protocol (BGP) to exchange routes between your GCP VPC network and your on-premises network (via Cloud Interconnect or Cloud VPN).
What is the purpose of the default
network in GCP?
- The
default
network is a pre-configured auto mode VPC network created automatically with every new project. It has a subnet in each region and comes with default firewall rules. It's suitable for getting started but often replaced with a custom VPC network for production environments.
What is the difference between auto mode and custom mode VPC networks?
- Auto Mode: Automatically creates a subnet in each new GCP region. IP ranges are pre-defined. Easier to set up but less flexible.
- Custom Mode: You have full control over creating subnets, specifying IP ranges, and choosing which regions have subnets. Recommended for production environments for better IP space management.
What is Cloud Datastore? (Note: Migrating to Firestore)
- Cloud Datastore was a highly scalable, NoSQL document database. It has largely been superseded by Firestore, which offers similar functionality with improved features and real-time capabilities. Existing Datastore users are encouraged to migrate to Firestore Native mode.
What is the purpose of Stackdriver? (Deprecated term, now Cloud Operations suite)
- Stackdriver was the umbrella term for Google Cloud's monitoring, logging, tracing, and profiling services. It has been rebranded as the Google Cloud Operations suite, encompassing Cloud Monitoring, Cloud Logging, Cloud Trace, Cloud Profiler, and Error Reporting.
What is Error Reporting? (Operations related)
- Error Reporting is a service that counts, analyzes, and aggregates crashes in your running cloud services. It notifies you when new errors occur and provides tools to help you understand and fix them.
What is Cloud Deployment Manager? (Infrastructure as Code) (Optional)
- Cloud Deployment Manager is an infrastructure deployment service that allows you to specify all the resources needed for your application in a declarative format (YAML). It automatically creates and manages those resources for you. (Often compared to Terraform).
What is Terraform? How is it used with GCP? (Infrastructure as Code)
- Terraform is a popular open-source Infrastructure as Code (IaC) tool. You define your infrastructure using HashiCorp Configuration Language (HCL). Terraform can provision and manage resources across various cloud providers, including GCP, allowing you to automate the creation, update, and deletion of your GCP infrastructure.
What is Cloud Build Triggers?
- Cloud Build Triggers automate the start of a build in response to events, such as commits to a Git repository (Cloud Source Repositories, GitHub, Bitbucket) or changes to source files in Cloud Storage.
What is Cloud Deploy? (Continuous Delivery) (Optional)
- Cloud Deploy is a managed continuous delivery service that automates deployment to target environments (like GKE and Cloud Run) in a secure and accelerated way. It manages release pipelines and rollouts.
What is the purpose of the go get
command in the context of Go on GCP?
- While primarily a Go language command for fetching and installing packages, in the context of GCP, it's often used within build processes (like in Cloud Build) to download project dependencies defined in the
go.mod
file.
How do you handle secrets in a Cloud Build pipeline? (Security related)
- Avoid hardcoding secrets in your build definition. Use Secret Manager to store secrets and access them securely within the Cloud Build steps. Cloud Build has built-in integration with Secret Manager.
What is the Service Directory? (Networking related) (Optional)
- Service Directory is a platform that allows you to store, publish, and discover services. It integrates with DNS and provides a central registry for your services, making it easier for applications to find and connect to each other.
What is the purpose of Private Google Access? (Networking related)
- Private Google Access allows VM instances in a subnet that do not have external IP addresses to access Google APIs and services (like Cloud Storage, BigQuery) using a private IP address. The traffic stays within Google's network.
What is Cloud NAT? (Revisited)
- Cloud NAT enables instances without external IP addresses to initiate connections to the internet. It's a regional resource.
What is the difference between a Global and Regional external IP address?
- Global IP Address: Can be attached to resources that are globally available (like Global HTTP(S) Load Balancers).
- Regional IP Address: Can be attached to resources within a specific region (like Compute Engine VMs, Regional Load Balancers).
What is BigQuery Data Transfer Service? (Data Transfer)
- BigQuery Data Transfer Service is a fully managed service for automating data movement from various sources (like Google Ads, Google Marketing Platform, YouTube, Amazon S3, Teradata, etc.) into BigQuery.
What is Cloud Data Catalog? (Revisited)
- Data Catalog is a metadata management service for discovering, understanding, and managing data assets. It allows you to tag, search, and govern your data across different storage systems.
What are the benefits of using Managed Services on GCP?
- Reduced operational overhead (Google manages patching, scaling, availability).
- Faster time to market.
- Pay-as-you-go pricing.
- Built-in scalability and reliability.
- Integration with other GCP services.
What are the considerations when migrating an application to GCP?
- Migration strategy (Lift & Shift, Replatform, Refactor, Rebuild, Replace).
- Choosing the right GCP services.
- Networking design.
- Data migration strategy.
- Security and compliance requirements.
- Cost estimation and optimization.
- Testing and validation.
What is the purpose of the GCP Free Tier?
- The GCP Free Tier provides a set of resources that are free to use up to specific monthly limits. It includes services like Compute Engine, Cloud Storage, BigQuery, Cloud Functions, etc. It allows users to explore and experiment with GCP without incurring costs.
What is the purpose of the GCP Free Trial?
- The Free Trial provides a credit (e.g., $300) that you can use over a limited time period (e.g., 90 days) to explore and use any GCP service. It's intended for evaluating the platform before committing.
How do you set up budget alerts in GCP? (Cost Management) (Revisited)
- In the Cloud Billing section of the Console, you can create a budget for your billing account or specific projects. You set a target amount and configure alert rules to be notified via email or Pub/Sub when your actual or forecasted spending exceeds a percentage of the budget.
What is the difference between zonal and regional resources in GCP?
- Zonal Resources: Resources located within a single zone (e.g., Compute Engine VM instance, Persistent Disk). A zone failure can affect these resources.
- Regional Resources: Resources distributed across multiple zones within a region (e.g., Regional Managed Instance Group, Regional Persistent Disk, Subnet). Provides higher availability within a region.
- Global Resources: Resources accessible from any region (e.g., VPC Network, Global Load Balancer, Cloud Storage bucket, DNS).
What is the purpose of the terraform plan
command? (Terraform with GCP) (Optional)
terraform plan
creates an execution plan that shows you exactly what Terraform will do (resources to create, modify, or destroy) to achieve the desired state defined in your configuration files. It's a crucial step before applying changes to avoid unexpected modifications.
What is the purpose of the terraform apply
command? (Terraform with GCP) (Optional)
terraform apply
executes the plan generated byterraform plan
. It provisions and manages the resources defined in your configuration files in the target cloud provider (GCP in this case).
How can you ensure high availability for an application on GCP?
- Deploy resources across multiple zones within a region (e.g., using Regional MIGs or GKE clusters spanning multiple zones).
- Use regional services where possible (e.g., Regional Persistent Disks).
- Implement load balancing to distribute traffic across healthy instances.
- Configure autohealing in MIGs or Kubernetes.
- Use highly available database services (Cloud SQL with HA, Cloud Spanner).
- Design for statelessness where possible.
How can you ensure disaster recovery for an application on GCP?
- Implement backups (e.g., Cloud SQL backups, Persistent Disk snapshots, Cloud Storage object versioning).
- Replicate data to another region (e.g., Cloud SQL read replicas in another region, Cloud Storage bucket replication).
- Deploy applications across multiple regions.
- Have a strategy for failing over traffic to the disaster recovery region (e.g., using Global Load Balancing and DNS failover).
- Regularly test your disaster recovery plan.
What is the difference between Cloud Functions (1st gen) and Cloud Functions (2nd gen)?
- Cloud Functions (2nd gen) is built on Cloud Run and Eventarc. It offers longer request timeouts, larger instance sizes, concurrency, and broader event trigger support compared to 1st gen, which is built on a custom architecture. 2nd gen provides better integration with the Google Cloud ecosystem.
What is Eventarc? (Eventing) (Optional)
- Eventarc is a service that allows you to connect events from various sources (GCP services, custom sources) to destinations (like Cloud Functions, Cloud Run, GKE). It provides a standardized way to build event-driven architectures on GCP. Cloud Functions (2nd gen) uses Eventarc for its triggers.
What is Workload Identity Federation? (Security related) (Advanced)
- Workload Identity Federation allows workloads running outside of Google Cloud (e.g., on AWS, Azure, or on-premises) to access GCP resources using short-lived credentials without needing service account keys. It leverages standard identity protocols (like OIDC or SAML).
What is the purpose of the --scopes
flag in Compute Engine? (Deprecated, now use Service Accounts)
- Historically, you could use the
--scopes
flag when creating a VM to grant the VM access to specific Google API scopes. The recommended modern approach is to attach a service account to the VM and grant IAM roles to the service account.
What is Cloud Build's builder concept?
- Cloud Build executes build steps using container images called "builders". Google provides builders for common tasks (e.g.,
gcr.io/cloud-builders/docker
,gcr.io/cloud-builders/go
,gcr.io/cloud-builders/gcloud
), or you can use custom container images as builders.
What is the purpose of cloudbuild.yaml
?
cloudbuild.yaml
is the build configuration file for Cloud Build. It defines the steps that Cloud Build executes, including fetching source, running tests, building container images, and deploying applications.
What is Secret Manager? How does it integrate with other services? (Revisited)
- Secret Manager integrates with services like Cloud Build, Cloud Functions, Cloud Run, and GKE (via the Secret Manager CSI driver) to allow applications to access secrets securely without hardcoding them. Access is controlled via IAM.
What is VPC Flow Logs? (Networking related)
- VPC Flow Logs record a sample of network flows sent from and received by VM instances. They can be exported to Cloud Logging, BigQuery, or Pub/Sub for analysis, helping with network monitoring, security analysis, and troubleshooting.
What is Firewall Insights? (Security related) (Optional)
- Firewall Insights provides insights into your firewall rules, such as identifying shadowed rules (rules that are ineffective because other rules match traffic first) or overprovisioned rules (rules that are too broad). It helps improve your network security posture.
What is the difference between Cloud Storage and a file system?
- Cloud Storage: Object storage, flat namespace (objects within buckets), accessed via APIs/HTTP, eventually consistent (depending on read type), designed for unstructured data at scale.
- File System: Hierarchical structure (directories and files), accessed via block-level I/O or network protocols (like NFS), strongly consistent, designed for structured data accessed by operating systems and applications.
What is Filestore? When would you use it? (Revisited)
- Filestore is a managed Network Attached Storage (NAS) service based on NFS. You use it when you need a shared file system that can be mounted by multiple Compute Engine VMs, similar to traditional file servers. Suitable for workloads requiring shared storage like rendering farms, genomics analysis, or web serving with shared content.
What is Cloud Functions concurrency? (Cloud Functions 2nd gen)
- Cloud Functions (2nd gen) supports concurrency, meaning a single instance of your function can handle multiple simultaneous requests. This improves efficiency and reduces cold starts. You can configure the maximum number of concurrent requests per instance. (1st gen is single concurrency per instance).
What is the purpose of package main
and func main
in a Go application on GCP?
package main
: Declares the package as an executable program.func main
: The entry point for the executable program. When deploying a Go application to services like Cloud Run or App Engine, the service runs the compiled executable starting from themain
function. For Cloud Functions, the function you export and specify as the entry point is invoked.
How do you handle dependencies for a Go application deployed to Cloud Run or Cloud Functions?
- Use Go Modules. Define dependencies in the
go.mod
file. Cloud Build (or the build process used by Cloud Functions/Cloud Run) will automatically download and include these dependencies during the build.
What is the purpose of the Procfile
in App Engine? (Optional)
- In App Engine Flexible Environment, the
Procfile
is an optional file that explicitly declares what command should be executed to start your application. If not provided, App Engine attempts to infer the command.
What is Identity Platform? (Authentication) (Optional)
- Identity Platform is a customer identity and access management (CIAM) platform. It's a fully managed service that adds Google-grade security and scalability to your application's authentication layer. It supports various sign-in methods (email/password, social logins, SAML, OIDC).
What is Cloud Tasks vs. Cloud Pub/Sub? (Task Management vs. Messaging)
- Cloud Pub/Sub: Asynchronous messaging for event-driven architectures. Messages are typically processed immediately by available subscribers. No built-in retry mechanism or delivery order guarantees (unless using ordered keys).
- Cloud Tasks: Managed task queue for reliable execution of discrete tasks. Provides features like rate limiting, retries with exponential backoff, and scheduled delivery. Often used to enqueue work to be processed by workers (e.g., Cloud Functions, Cloud Run).
What is Cloud Data Catalog? (Revisited)
- Data Catalog is a fully managed, scalable metadata management service. It helps organizations discover, manage, and understand their data assets across GCP and other platforms.
What is the purpose of data lineage in BigQuery? (Data Governance) (Optional)
- Data lineage in BigQuery tracks how data flows through queries and transformations. It helps understand the origin of data, how it has been modified, and its usage, which is important for data governance, compliance, and debugging.
What is Assured Workloads? (Compliance) (Optional)
- Assured Workloads helps customers run sensitive workloads on GCP while meeting specific compliance and sovereignty requirements (like FedRAMP High, IL4, CJIS, C5, Australia IRAP, Canada PBMM, NGT). It enforces constraints on resource configuration, location, and personnel access.
What is Cloud External Key Manager (EKM)? (Security) (Advanced)
- Cloud EKM allows you to use encryption keys that are managed outside of Google Cloud to protect your data in GCP. This provides greater control over your keys, but you are responsible for the availability and management of the external key management system.
What is the purpose of the Google Cloud Marketplace?
- The Google Cloud Marketplace is a platform where you can quickly deploy pre-configured software packages (like databases, operating systems, developer tools) onto GCP infrastructure. It simplifies the process of setting up complex software stacks.
What is the difference between deploying to App Engine Standard and App Engine Flexible? (Revisited)
- Standard: Sandboxed, scales to zero, constrained runtimes. Good for stateless web apps with predictable scaling needs.
- Flexible: Container-based on VMs, more flexibility, doesn't scale to zero. Good for apps needing custom runtimes, background processes, or using native code.
What is the purpose of the app.yaml
file in App Engine? (Revisited)
- The
app.yaml
file is the primary configuration file for App Engine applications. It defines the application's settings, including runtime, handlers for URL paths, scaling settings, environment variables, and more.
What is the difference between a Service Account Key and Workload Identity Federation? (Security) (Revisited)
- Service Account Key: A static credential file (JSON or P12) that grants access to the service account. Can be a security risk if compromised.
- Workload Identity Federation: Allows external identities (from other clouds or on-prem) to impersonate a service account and get short-lived access tokens. Eliminates the need to manage static keys outside of GCP. More secure.
What is the purpose of the gcloud init
command? (gcloud)
gcloud init
is used to initialize, authorize, and configure thegcloud
command-line tool. It guides you through setting up your default project, region, and zone.
What is the purpose of the gcloud config set
command? (gcloud)
gcloud config set
is used to set properties in yourgcloud
configuration, such as the default project, region, zone, or account.
What is the purpose of the gcloud auth activate-service-account
command? (gcloud)
- This command is used to authenticate
gcloud
using a service account key file. It's often used in automated scripts or CI/CD pipelines running outside of GCP.
What is the purpose of the gcloud auth application-default login
command? (gcloud)
- This command is used to set up Application Default Credentials (ADC) for your user account. ADC allows applications running on your local machine to automatically find your credentials and authenticate with Google Cloud APIs.