AWS DBS真题 No.1-100 2025-03-01 DBS 52 0% 0 投票, 0 平均值 0 Report a question What’s wrong with this question? You cannot submit an empty report. Please add some details. 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100 SOAAWS Certified SysOps Administrator – Associate AWS DBS真题 No.1-100 中英双语,人工翻译,带完整解析 AWS DBS真题 No.1-100 1 / 100 分类: DBS 1. 1. A data engineer is about to perform a major upgrade to the DDL contained within an Amazon Redshift cluster to support a new data warehouse application. The upgrade scripts will include user permission updates, view and table structure changes as well as additional loading and data manipulation tasks. The data engineer must be able to restore the database to its existing state in the event of issues. Which action should be taken prior to performing this upgrade task? 一名数据工程师即将对Amazon Redshift集群中的DDL进行重大升级,以支持一个新的数据仓库应用程序。升级脚本将包括用户权限更新、视图和表结构更改,以及额外的加载和数据操作任务。数据工程师必须能够在出现问题时将数据库恢复到现有状态。应该在执行此升级任务之前采取什么措施? A. A. Run an UNLOAD command for all data in the warehouse and save it to S3. A. 执行一个UNLOAD命令,将仓库中的所有数据导出并保存到S3。 B. B. Create a manual snapshot of the Amazon Redshift cluster. B. 创建 Amazon Redshift 集群的手动快照。 C. C. Make a copy of the automated snapshot on the Amazon Redshift cluster. C. 在 Amazon Redshift 集群上复制自动快照。 D. D. Call the waitForSnapshotAvailable command from either the AWS CLI or an AWS SDK. D. 从AWS CLI或AWS SDK中调用waitForSnapshotAvailable命令。 正确答案: B Correct answer is B as a manual snapshot needs to be taken to be able to restore Redshift to the point before upgrade.,Refer AWS documentation – Redshift Snapshots,Snapshots are point-in-time backups of a cluster. There are two types of snapshots: automated and manual. Amazon Redshift stores these snapshots internally in Amazon S3 by using an encrypted Secure Sockets Layer (SSL) connection.,Amazon Redshift automatically takes incremental snapshots that track changes to the cluster since the previous automated snapshot. Automated snapshots retain all of the data required to restore a cluster from a snapshot. You can create a snapshot schedule to control when automated snapshots are taken, or you can take a manual snapshot any time.,When you restore from a snapshot, Amazon Redshift creates a new cluster and makes the new cluster available before all of the data is loaded, so you can begin querying the new cluster immediately. The cluster streams data on demand from the snapshot in response to active queries, then loads the remaining data in the background.,When you launch a cluster, you can set the retention period for automated and manual snapshots. You can change the retention period for automated and manual snapshots by modifying the cluster. You can change the retention period for a manual snapshot when you create the snapshot or by modifying the snapshot.,You can take a manual snapshot any time. By default, manual snapshots are retained indefinitely, even after you delete your cluster. You can specify the retention period when you create a manual snapshot, or you can change the retention period by modifying the snapshot. If you create a snapshot using the Amazon Redshift console, it defaults the snapshot retention period to 365 days.,If a snapshot is deleted, you can’t start any new operations that reference that snapshot. However, if a restore operation is in progress, that restore operation will run to completion.,Option A is wrong as it would only copy the data.,Option C is wrong as you cannot copy the automated snapshot. Also, automated snapshot are controlled by time and size and would not represent the data before upgrade.,Option D is wrong as waitForSnapshotAvailable needs to be called after triggering the manual snapshot creation and it can be verified from console as well. 正确答案: B 正确答案是B,因为需要手动创建快照才能恢复Redshift到升级前的状态。 参考AWS文档 – Redshift快照。 快照是集群的时间点备份。快照有两种类型:自动快照和手动快照。Amazon Redshift通过使用加密的安全套接字层(SSL)连接,将这些快照存储在Amazon S3内部。 Amazon Redshift会自动创建增量快照,跟踪自上一个自动快照以来对集群所做的更改。自动快照保留所有恢复集群所需的数据。您可以创建快照计划,以控制自动快照的创建时间,或者可以随时创建手动快照。 当您从快照恢复时,Amazon Redshift会创建一个新集群,并在所有数据加载完成之前使新集群可用,这样您就可以立即开始查询新集群。集群按需从快照中流式传输数据,以响应活跃查询,然后在后台加载其余数据。 当您启动集群时,您可以设置自动快照和手动快照的保留期。您可以通过修改集群来更改自动快照和手动快照的保留期。您可以在创建手动快照时设置保留期,或者通过修改快照来更改保留期。 您可以随时创建手动快照。默认情况下,手动快照会被永久保留,即使您删除集群也不会被删除。您可以在创建手动快照时指定保留期,或者通过修改快照来更改保留期。如果您使用Amazon Redshift控制台创建快照,它默认将快照的保留期设置为365天。 如果快照被删除,您无法启动任何引用该快照的新操作。然而,如果恢复操作正在进行,该恢复操作将运行至完成。 选项A是错误的,因为它只是复制数据。 选项C是错误的,因为您不能复制自动快照。此外,自动快照受时间和大小的控制,不会代表升级前的数据。 选项D是错误的,因为waitForSnapshotAvailable必须在触发手动快照创建后调用,并且也可以从控制台验证。 2 / 100 分类: DBS 2. 2. The department of transportation for a major metropolitan area has placed sensors on roads at key locations around the city. The goal is to analyze the flow of traffic and notifications from emergency services to identify potential issues and to help planners correct trouble spots. A data engineer needs a scalable and fault-tolerant solution that allows planners to respond to issues within 30 seconds of their occurrence. Which solution should the data engineer choose? 2. 一个主要大都市区的交通部门在城市周围的关键位置的道路上安装了传感器。目标是分析交通流量和紧急服务的通知,以识别潜在问题,并帮助规划者修正问题点。数据工程师需要一个可扩展且容错的解决方案,使规划者能够在问题发生后的30秒内做出响应。数据工程师应该选择哪个解决方案? A. A. Collect the sensor data with Amazon Kinesis Firehose and store it in Amazon Redshift for analysis. Collect emergency services events with Amazon SQS and store in Amazon DynamoDB for analysis. A. 使用 Amazon Kinesis Firehose 收集传感器数据,并将其存储在 Amazon Redshift 中进行分析。 使用 Amazon SQS 收集紧急服务事件,并将其存储在 Amazon DynamoDB 中进行分析。 B. B. Collect the sensor data with Amazon SQS and store in Amazon DynamoDB for analysis. Collect emergency services events with Amazon Kinesis Firehose and store in Amazon Redshift for analysis. B. 使用Amazon SQS收集传感器数据并存储在Amazon DynamoDB中进行分析。 使用Amazon Kinesis Firehose收集紧急服务事件并存储在Amazon Redshift中进行分析。 C. C. Collect both sensor data and emergency services events with Amazon Kinesis Streams and use DynamoDB for analysis. C. 使用Amazon Kinesis Streams收集传感器数据和紧急服务事件,并使用DynamoDB进行分析。 D. D. Collect both sensor data and emergency services events with Amazon Kinesis Firehose and use Amazon Redshift for analysis. D. 使用Amazon Kinesis Firehose收集传感器数据和紧急服务事件,并使用Amazon Redshift进行分析。 正确答案: A Correct answer is A as we need to tackle 2 issues. First is to capture real time sensor data and store it for analysis. Second is to respond to emergency notifications events with low latency. First can be handled using Kinesis Firehose to load data in Redshift for analysis. Second can be handled using SQS for notifications and DynamoDB for quick analysis or processing.,Refer AWS documentation – Kinesis Firehose FAQs,Amazon Kinesis Data Firehose buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations. You can configure buffer size and buffer interval while creating your delivery stream. Buffer size is in MBs and ranges from 1MB to 128MB for Amazon S3 destination and 1MB to 100MB for Amazon Elasticsearch Service destination. Buffer interval is in seconds and ranges from 60 seconds to 900 seconds. Please note that in circumstances where data delivery to destination is falling behind data writing to delivery stream, Firehose raises buffer size dynamically to catch up and make sure that all data is delivered to the destination.,Option B is wrong as SQS is not suitable for real time sensor data collection and not is DynamoDB for analytics. Also, Redshift with Kinesis would not provide quick handling for data as Kinesis works on buffer interval and buffer size.,Option C is wrong as DynamoDB is not ideal for batch analytics.,Option D is wrong as Kinesis would not work emergency services handling as it works on buffer interval and buffer size. 正确答案: A 正确答案是A,因为我们需要解决两个问题。第一个是捕获实时传感器数据并将其存储以供分析。第二个是以低延迟响应紧急通知事件。第一个问题可以使用Kinesis Firehose将数据加载到Redshift中进行分析。第二个问题可以使用SQS处理通知,使用DynamoDB进行快速分析或处理。 参考AWS文档 – Kinesis Firehose常见问题解答 Amazon Kinesis Data Firehose在将数据传输到目标之前,会将传入的流数据缓冲到一定的大小或一定的时间段。您可以在创建传输流时配置缓冲区大小和缓冲区间隔。缓冲区大小以MB为单位,对于Amazon S3目标范围为1MB到128MB,对于Amazon Elasticsearch Service目标范围为1MB到100MB。缓冲区间隔以秒为单位,范围为60秒到900秒。请注意,在数据传输到目标的速度落后于写入传输流的数据时,Firehose会动态增加缓冲区大小,以赶上并确保所有数据都传送到目标。 选项B是错误的,因为SQS不适合实时传感器数据收集,DynamoDB也不适合分析。此外,Redshift与Kinesis的组合无法提供快速的数据处理,因为Kinesis依赖于缓冲区间隔和缓冲区大小。 选项C是错误的,因为DynamoDB不适合批量分析。 选项D是错误的,因为Kinesis不适用于紧急服务处理,因为它依赖于缓冲区间隔和缓冲区大小。 3 / 100 分类: DBS 3. 3. An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region. Which three steps should the data engineer take to accomplish this task? (Choose three.) 3. 一个 Amazon Redshift 数据库使用 KMS 进行加密。数据工程师需要使用 AWS CLI 在另一个 AWS 区域创建数据库的 KMS 加密快照。数据工程师应该采取哪三个步骤来完成此任务?(选择三项。) A. A. Create a new KMS key in the destination region. A. 在目标区域创建一个新的KMS密钥。 B. B. Copy the existing KMS key to the destination region. B. 将现有的 KMS 密钥复制到目标区域。 C. C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the destination region. C. 使用 CreateSnapshotCopyGrant 允许 Amazon Redshift 使用来自目标区域的 KMS 密钥。 D. D. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region. D. 使用CreateSnapshotCopyGrant允许Amazon Redshift使用源区域的KMS密钥。 E. E. In the source region, enable cross-region replication and specify the name of the copy grant created. E. 在源区域,启用跨区域复制并指定已创建的复制授权名称。 F. F. In the destination region, enable cross-region replication and specify the name of the copy grant created. F. 在目标区域,启用跨区域复制并指定创建的复制授权名称。 正确答案: A, C, E Correct answers are A, C & E,Option A as KMS keys are specific to the region and a new key needs to be created in the destination region.,Option C as the grant needs to be provided for Redshift to use the master key in the destination region,Option E as the replication needs to be enabled on the source region.,Refer AWS documentation – Cross Region KMS Encrypted Snapshot & Redshift – Copying AWS KMS-Encrypted Snapshots to Another AWS Region,When you launch an Amazon Redshift cluster, you can choose to encrypt it with a master key from the AWS Key Management Service (AWS KMS). AWS KMS keys are specific to a region. If you want to enable cross-region snapshot copy for an AWS KMS-encrypted cluster, you must configure a snapshot copy grant for a master key in the destination region so that Amazon Redshift can perform encryption operations in the destination region.,AWS KMS keys are specific to an AWS Region. If you enable copying of Amazon Redshift snapshots to another AWS Region, and the source cluster and its snapshots are encrypted using a master key from AWS KMS, you need to configure a grant for Amazon Redshift to use a master key in the destination AWS Region. This grant enables Amazon Redshift to encrypt snapshots in the destination AWS Region.,Option B is wrong as keys are specific to the region, new keys need to be created.,Option D is wrong as the grants need to be provided in the destination region,Option F is wrong as the cross-region replication needs to be enabled in the source region. 正确答案: A, C, E 正确答案是A, C和E,选项A是因为KMS密钥是区域特定的,需要在目标区域创建新的密钥。选项C是因为需要为Redshift提供授权,以便在目标区域使用主密钥。选项E是因为需要在源区域启用复制。 参考AWS文档 – 跨区域KMS加密快照和Redshift – 将AWS KMS加密快照复制到另一个AWS区域。 当您启动Amazon Redshift集群时,您可以选择使用来自AWS密钥管理服务(AWS KMS)的主密钥对其进行加密。AWS KMS密钥是特定于某个区域的。如果您希望启用AWS KMS加密集群的跨区域快照复制,必须为目标区域的主密钥配置快照复制授权,以便Amazon Redshift可以在目标区域执行加密操作。 AWS KMS密钥是特定于AWS区域的。如果您启用将Amazon Redshift快照复制到另一个AWS区域,并且源集群及其快照使用AWS KMS的主密钥加密,则需要为Amazon Redshift配置授权,以便在目标AWS区域使用主密钥。此授权使Amazon Redshift能够在目标AWS区域加密快照。 选项B是错误的,因为密钥是特定于区域的,需要创建新的密钥。 选项D是错误的,因为授权需要在目标区域提供。 选项F是错误的,因为需要在源区域启用跨区域复制。 检查 4 / 100 分类: DBS 4. 4. You have two different groups using Redshift to analyze data of a petabyte-scale data warehouse. Each query issued by the first group takes approximately 1-2 hours to analyze the data while the second group’s queries only take between 5-10 minutes to analyze data. You don’t want the second group’s queries to wait until the first group’s queries are finished. You need to design a solution so that this does not happen. Which of the following would be the best and cheapest solution to deploy to solve this dilemma? 4. 你有两个不同的团队使用Redshift分析一个PB级数据仓库的数据。第一个团队发出的每个查询大约需要1到2个小时来分析数据,而第二个团队的查询只需要5到10分钟就能分析数据。你不希望第二个团队的查询在第一个团队的查询完成之前等待。你需要设计一个解决方案,确保这种情况不会发生。以下哪种解决方案是最合适且最经济的? A. A. Create a read replica of Redshift and run the second team’s queries on the read replica. A. 创建一个Redshift的只读副本,并在该副本上运行第二组团队的查询。 B. B. Create two separate workload management groups and assign them to the respective groups. B. 创建两个独立的工作负载管理组,并将它们分配到各自的组中。 C. C. Pause the long queries when necessary and resume them when there are no queries happening. C. 在必要时暂停长时间的查询,并在没有查询发生时恢复它们。 D. D. Start another Redshift cluster from a snapshot for the second team if the current Redshift cluster is busy processing long queries. D. 如果当前的 Redshift 集群正在忙于处理长时间运行的查询,请从快照中启动另一个 Redshift 集群供第二个团队使用。 正确答案: B Correct answer is B as Redshift workload management allows proper usage of cluster.,Refer to the AWS Blog for Redshift to run mixed workloads,Amazon Redshift Workload Management allows you to manage workloads of various sizes and complexity for specific environments. Parameter groups contain WLM configuration, which determines how many query queues are available for processing and how queries are routed to those queues. Following settings are available 正确答案: B 正确答案是B,因为Redshift工作负载管理允许集群的正确使用。请参阅AWS博客了解Redshift如何运行混合工作负载。 Amazon Redshift工作负载管理允许您管理不同规模和复杂度的工作负载,适用于特定环境。参数组包含WLM配置,决定了可用于处理的查询队列数量,以及查询如何被路由到这些队列。以下设置可用: 5 / 100 分类: DBS 5. 5. A telecommunications company needs to predict customer churn (i.e., customers who decide to switch to a competitor). The company has historic records of each customer, including monthly consumption patterns, calls to customer service, and whether the customer ultimately quit the service. All of this data is stored in Amazon S3. The company needs to know which customers are likely going to churn soon so that they can win back their loyalty. What is the optimal approach to meet these requirements? 5. 一家电信公司需要预测客户流失(即决定转向竞争对手的客户)。该公司拥有每个客户的历史记录,包括每月消费模式、客服电话记录以及客户是否最终停止使用该服务。所有这些数据都存储在Amazon S3中。公司需要知道哪些客户可能很快会流失,以便能够挽回他们的忠诚度。为了满足这些需求,最佳的做法是什么? A. A. Use the Amazon Machine Learning service to build the binary classification model based on the dataset stored in Amazon S3. The model will be used regularly to predict churn attribute for existing customers. A. 使用亚马逊机器学习服务,基于存储在亚马逊S3中的数据集构建二分类模型。该模型将定期用于预测现有客户的流失属性。 B. B. Use AWS QuickSight to connect it to data stored in Amazon S3 to obtain the necessary business insight. Plot the churn trend graph to extrapolate churn likelihood for existing customers. B. 使用AWS QuickSight将其连接到存储在Amazon S3中的数据,以获取必要的业务洞察。绘制流失趋势图,推算现有客户的流失可能性。 C. C. Use EMR to run the Hive queries to build a profile of a churning customer. Apply a profile to existing customers to determine the likelihood of churn. C. 使用EMR运行Hive查询,建立流失客户的画像。将该画像应用于现有客户,以确定流失的可能性。 D. D. Use a Redshift cluster to COPY the data from Amazon S3. Create a User Defined Function in Redshift that computes the likelihood of churn. D. 使用Redshift集群从Amazon S3复制数据。在Redshift中创建一个用户定义的函数,用于计算客户流失的可能性。 正确答案: A Correct answer is A as the simplest way to build the model is to use the Amazon Machine Learning (Amazon ML), using the binary classification model. As the company has historical data they can learn from and apply it from the new data.,Refer AWS documentation – Predicting Customer Churn with Amazon Machine Learning,Options B, C & D are wrong as they do not provide models or methods to apply to new data. 正确答案: A 正确答案是A,因为构建模型的最简单方法是使用Amazon机器学习(Amazon ML),并采用二分类模型。由于公司有历史数据可以学习并将其应用于新数据。 参考AWS文档 – 使用Amazon机器学习预测客户流失。 选项B、C和D是错误的,因为它们没有提供可以应用于新数据的模型或方法。 6 / 100 分类: DBS 6. 6. Your social media marketing application has a component written in Ruby running on AWS Elastic Beanstalk. This application component posts messages to social media sites in support of various marketing campaigns. Your management now requires you to record replies to these social media messages to analyze the effectiveness of the marketing campaign in comparison to past and future efforts. You’ve already developed a new application component to interface with the social media site APIs in order to read the replies. Which process should you use to record the social media replies in a durable data store that can be accessed at any time for analytics of historical data? 6. 你们的社交媒体营销应用程序包含一个用 Ruby 编写的组件,该组件在 AWS Elastic Beanstalk 上运行。这个应用程序组件向社交媒体网站发布消息,以支持各种营销活动。现在,管理层要求你记录这些社交媒体消息的回复,以便分析营销活动的效果,并与过去和未来的努力进行对比。你已经开发了一个新的应用组件,用于与社交媒体网站的 API 接口,以读取回复。那么,应该使用哪种流程将社交媒体的回复记录在一个耐用的数据存储中,以便随时进行历史数据的分析? A. A. Deploy the new application component in an Auto Scaling group of Amazon EC2 instances, read the data from the social media sites, store it with Amazon Elastic Block Store, and use AWS Data Pipeline to publish it to Amazon Kinesis for analytics. A. 在Amazon EC2实例的自动扩展组中部署新的应用程序组件,从社交媒体网站读取数据,使用Amazon Elastic Block Store存储数据,并使用AWS Data Pipeline将其发布到Amazon Kinesis进行分析。 B. B. Deploy the new application component as an Elastic Beanstalk application, read the data from the social media sites, store it in DynamoDB, and use Apache Hive with Amazon Elastic MapReduce for analytics. B. 将新的应用组件部署为Elastic Beanstalk应用,从社交媒体网站读取数据,将其存储在DynamoDB中,并使用Apache Hive与Amazon Elastic MapReduce进行分析。 C. C. Deploy the new application component in an Auto Scaling group of Amazon EC2 instances, read the data from the social media sites, store it in Amazon Glacier, and use AWS Data Pipeline to publish it to Amazon RedShift for analytics. C. 将新的应用组件部署到 Amazon EC2 实例的自动扩展组中,从社交媒体网站读取数据,将其存储在 Amazon Glacier 中,并使用 AWS 数据管道将其发布到 Amazon RedShift 进行分析。 D. D. Deploy the new application component as an Amazon Elastic Beanstalk application, read the data from the social media site, store it with Amazon Elastic Block store, and use Amazon Kinesis to stream the data to Amazon CloudWatch for analytics. D. 将新的应用程序组件作为Amazon Elastic Beanstalk应用程序部署,从社交媒体网站读取数据,将其存储在Amazon Elastic Block存储中,并使用Amazon Kinesis将数据流式传输到Amazon CloudWatch进行分析。 正确答案: B Correct answer is B as the point here is durable data store with any time analytics the best option is to store the data in DynamoDB and use Apache Hive with Amazon Elastic MapReduce for analytics.,Refer AWS documentation – DynamoDB EMR Hive Processing,Option A is wrong as Elastic Block Store is not ideal for storing social media data,Option C is wrong as Amazon Glacier is not an ideal for storing social media data,Option D is wrong as Elastic Block Store is not ideal for storing social media data and CloudWatch is not for analytics. 正确答案: B 正确答案是B,因为这里的关键是持久化数据存储和随时分析,最佳选择是将数据存储在DynamoDB中,并使用Apache Hive和Amazon Elastic MapReduce进行分析。参考AWS文档 – DynamoDB EMR Hive处理。 选项A是错误的,因为Elastic Block Store不适合存储社交媒体数据。 选项C是错误的,因为Amazon Glacier不适合存储社交媒体数据。 选项D是错误的,因为Elastic Block Store不适合存储社交媒体数据,且CloudWatch不用于分析。 7 / 100 分类: DBS 7. 7. An organization needs to design and deploy a large-scale data storage solution that will be highly durable and highly flexible with respect to the type and structure of data being stored. The data to be stored will be sent or generated from a variety of sources and must be persistently available for access and processing by multiple applications. What is the most cost-effective technique to meet these requirements? 7. 一个组织需要设计并部署一个大规模的数据存储解决方案,该解决方案在数据存储类型和结构方面具有高度的耐用性和灵活性。要存储的数据将来自各种来源,并且必须始终可用,以便多个应用程序能够访问和处理这些数据。满足这些要求的最具成本效益的技术是什么? A. A. Use Amazon Simple Storage Service (S3) as the actual data storage system, coupled with appropriate tools for ingestion/acquisition of data and for subsequent processing and querying. A. 使用Amazon简单存储服务(S3)作为实际的数据存储系统,并结合适当的工具进行数据的获取/采集,以及随后的处理和查询。 B. B. Deploy a long-running Amazon Elastic MapReduce (EMR) cluster with Amazon Elastic Block Store (EBS) volumes for persistent HDFS storage and appropriate Hadoop ecosystem tools for processing and querying. B. 部署一个长期运行的 Amazon Elastic MapReduce (EMR) 集群,配备 Amazon Elastic Block Store (EBS) 卷用于持久化 HDFS 存储,并提供适当的 Hadoop 生态系统工具用于处理和查询。 C. C. Use Amazon Redshift with data replication to Amazon Simple Storage Service (S3) for comprehensive durable data storage, processing, and querying. C. 使用 Amazon Redshift 与数据复制到 Amazon Simple Storage Service (S3),实现全面的持久数据存储、处理和查询。 D. D. Launch an Amazon Relational Database Service (RDS), and use the enterprise grade and capacity of the Amazon Aurora engine for storage, processing, and querying. D. 启动亚马逊关系数据库服务(RDS),并利用亚马逊Aurora引擎的企业级性能和容量进行存储、处理和查询。 正确答案: A Correct answer is A as S3 can provide the most cost-effective solution to store data while providing highly durable and highly flexible storage option with respect to the type and structure of data.,Option B is wrong and HDFS would not be a cost-effective option as compared to S3.,Options C & D are wrong as they do not have flexibility in terms of data type and structure and would not be cost-effective as well. 正确答案: A 正确答案是A,因为S3可以提供最具成本效益的数据存储解决方案,同时在数据类型和结构方面提供高度耐用和灵活的存储选项。 选项B是错误的,HDFS与S3相比不会是一个具有成本效益的选择。 选项C和D是错误的,因为它们在数据类型和结构方面缺乏灵活性,也不会是具有成本效益的选择。 8 / 100 分类: DBS 8. 8. You have been asked to handle a large data migration from multiple Amazon RDS MySQL instances to a DynamoDB table. You have been given a short amount of time to complete the data migration. What will allow you to complete this complex data processing workflow? 8. 您被要求处理从多个 Amazon RDS MySQL 实例到 DynamoDB 表的大规模数据迁移。您被给定了很短的时间来完成数据迁移。什么方法将帮助您完成这个复杂的数据处理工作流? A. A. Create an Amazon Kinesis data stream, pipe in all of the Amazon RDS data, and direct the data toward a DynamoDB table. A. 创建一个Amazon Kinesis数据流,将所有Amazon RDS数据传入,并将数据定向到DynamoDB表。 B. B. Write a script in your language of choice, install the script on an Amazon EC2 instance, and then use Auto Scaling groups to ensure that the latency of the migration pipelines never exceeds four seconds in any 15-minute period. B. 用你选择的语言编写一个脚本,将该脚本安装在一个 Amazon EC2 实例上,然后使用自动扩展组确保迁移管道的延迟在任何 15 分钟内都不超过四秒钟。 C. C. Write a bash script to run on your Amazon RDS instance that will export data into DynamoDB. C. 编写一个bash脚本,在您的Amazon RDS实例上运行,将数据导出到DynamoDB。 D. D. Create a data pipeline to export Amazon RDS data and import the data into DynamoDB. D. 创建一个数据管道,将Amazon RDS数据导出并导入到DynamoDB中。 正确答案: D Correct answer is D as Data Pipeline can be used to import the data from MySQL and Export it to DynamoDB as batch.,Refer AWS documentation – Near Zero Downtime Migration from MySQL to DynamoDB, Data Pipeline Export MySQL & Data Pipeline Import DynamoDB,Option A is wrong as Kinesis data stream cannot emit data directly to DynamoDB table and would need a consumer. Also Kinesis is best for real-time puts,Option B is wrong as it doesn’t define how the migration is happening,Option C is wrong as You do not have access to RDS instance. 正确答案: D 正确答案是 D,因为数据管道可以用来从 MySQL 导入数据,并将其作为批量导出到 DynamoDB。请参考 AWS 文档 – 从 MySQL 到 DynamoDB 的近零停机迁移,数据管道导出 MySQL 和数据管道导入 DynamoDB。 选项 A 错误,因为 Kinesis 数据流不能直接将数据发送到 DynamoDB 表,并且需要一个消费者。此外,Kinesis 最适合实时写入。 选项 B 错误,因为它没有定义迁移的具体方式。 选项 C 错误,因为您无法访问 RDS 实例。 9 / 100 分类: DBS 9. 9. A retailer exports data daily from its transactional databases into an S3 bucket in the Sydney region. The retailer’s Data Warehousing team wants to import this data into an existing Amazon Redshift cluster in their VPC at Sydney. Corporate security policy mandates that data can only be transported within a VPC. What combination of the following steps will satisfy the security policy? Choose 2 answers 9. 一家零售商每天将数据从其事务数据库导出到位于悉尼地区的S3存储桶中。该零售商的数据仓库团队希望将这些数据导入到他们悉尼VPC中现有的Amazon Redshift集群中。公司安全政策要求数据只能在VPC内部传输。以下哪些步骤的组合能够满足安全政策?请选择两个答案。 A. A. Enable Amazon Redshift Enhanced VPC Routing. A. 启用 Amazon Redshift 增强 VPC 路由。 B. B. Create a Cluster Security Group to allow the Amazon Redshift cluster to access Amazon S3. B. 创建一个集群安全组,允许Amazon Redshift集群访问Amazon S3。 C. C. Create a NAT gateway in a public subnet to allow the Amazon Redshift cluster to access Amazon S3. C. 在公共子网中创建一个NAT网关,以允许Amazon Redshift集群访问Amazon S3。 D. D. Create and configure an Amazon S3 VPC endpoint. D. 创建并配置一个 Amazon S3 VPC 端点。 正确答案: A, D Correct answer are A & D as Redshift Enhanced VPC Routing helps access AWS services including S3 through VPC, without having to route any traffic through internet. Also, note the region is the same.,Refer AWS documentation – Redshift Enhanced VPC Routing,When you use Amazon Redshift Enhanced VPC Routing, Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC. You can now use standard VPC features, such as VPC security groups, network access control lists (ACLs), VPC endpoints, VPC endpoint policies, Internet gateways, and Domain Name System (DNS) servers, to tightly manage the flow of data between your Amazon Redshift cluster and other resources. When you use Enhanced VPC Routing to route traffic through your VPC, you can also use VPC flow logs to monitor COPY and UNLOAD traffic.,If Enhanced VPC Routing is not enabled, Amazon Redshift routes traffic through the Internet, including traffic to other services within the AWS network.,VPC Endpoints – For traffic to an Amazon S3 bucket in the same region as your cluster, you can create a VPC endpoint to direct traffic directly to the bucket. When you use VPC endpoints, you can attach an endpoint policy to manage access to Amazon S3.,Option B is wrong as Redshift cannot directly access S3 without internet,Option C is wrong as NAT enables connectivity to services via Internet only or other AWS services.,NAT gateway – To connect to an Amazon S3 bucket in another region or to another service within the AWS network, or to access a host instance outside the AWS network, you can configure a network address translation (NAT) gateway. 正确答案: A, D 正确答案是 A 和 D,因为 Redshift 增强型 VPC 路由帮助通过 VPC 访问 AWS 服务,包括 S3,而无需通过互联网路由任何流量。另请注意,区域是相同的。 参考 AWS 文档 – Redshift 增强型 VPC 路由。 当您使用 Amazon Redshift 增强型 VPC 路由时,Amazon Redshift 强制所有 COPY 和 UNLOAD 流量通过您的 Amazon VPC 在您的集群与数据存储库之间传输。现在,您可以使用标准 VPC 功能,如 VPC 安全组、网络访问控制列表(ACLs)、VPC 终端节点、VPC 终端节点策略、互联网网关和域名系统(DNS)服务器,来严格管理您的 Amazon Redshift 集群与其他资源之间的数据流动。当您使用增强型 VPC 路由通过您的 VPC 路由流量时,您还可以使用 VPC 流日志来监控 COPY 和 UNLOAD 流量。 如果没有启用增强型 VPC 路由,Amazon Redshift 会通过互联网路由流量,包括到 AWS 网络内其他服务的流量。 VPC 终端节点 – 对于与集群位于同一区域的 Amazon S3 存储桶的流量,您可以创建 VPC 终端节点将流量直接引导到存储桶。当您使用 VPC 终端节点时,可以附加终端节点策略来管理对 Amazon S3 的访问。 选项 B 错误,因为 Redshift 无法在没有互联网的情况下直接访问 S3。 选项 C 错误,因为 NAT 仅通过互联网或其他 AWS 服务启用对服务的连接。 NAT 网关 – 若要连接到另一区域的 Amazon S3 存储桶或连接到 AWS 网络内的其他服务,或者访问 AWS 网络外的主机实例,您可以配置网络地址转换(NAT)网关。 检查 10 / 100 分类: DBS 10. 10. You have an application that is currently in the development stage but is expected to write 2,400 items per minute to a DynamoDB table, each 2Kb in size or less and then fluctuate to 4,800 writes of items (of the same size) per minute on weekends. There may be other fluctuations within that range in the future as the application develops. It is important to the success of the application that the vast majority of user requests are met in a cost-effective way. How should this table be created? 10. 你有一个应用程序,目前处于开发阶段,但预计每分钟向DynamoDB表写入2,400个项目,每个项目的大小为2Kb或更小,然后在周末波动到每分钟4,800个项目的写入(大小相同)。随着应用程序的发展,未来可能会在该范围内出现其他波动。为了确保应用程序的成功,大多数用户请求需要以具有成本效益的方式得到满足。这个表应该如何创建? A. A. Provision a base WCU of 80 and then schedule regular increases to 160 WCUs when a higher load is expected. A. 提供一个基础的80个WCU,然后在预计负载较高时,安排定期增加到160个WCU。 B. B. Set up an auto-scaling policy on the DynamoDB table that doesn’t let the traffic dip below the usual load and allows it to scale to meet demand. B. 在DynamoDB表上设置自动扩展策略,确保流量不会低于正常负载,并允许其根据需求进行扩展。 C. C. Enabled DynamoDB streams have a Lambda function triggered to review the current capacity on each change to the table. C. 启用的DynamoDB流会触发一个Lambda函数,在每次表格发生变化时检查当前的容量。 D. D. Provision a base WCU of 160 and then schedule a job that adds 160 more WCUs when a higher load is expected. D. 提供一个基础的WCU为160,然后安排一个任务,在预计更高负载时增加160个WCU。 正确答案: B Correct answer is B as DynamoDB Auto Scaling can help scale as per the demand.,Refer AWS documentation – DynamoDB AutoScaling,Many database workloads are cyclical in nature or are difficult to predict in advance. For example, consider a social networking app where most of the users are active during daytime hours. The database must be able to handle the daytime activity, but there’s no need for the same levels of throughput at night. Another example might be a new mobile gaming app that is experiencing rapid adoption. If the game becomes too popular, it could exceed the available database resources, resulting in slow performance and unhappy customers. These kinds of workloads often require manual intervention to scale database resources up or down in response to varying usage levels.,DynamoDB auto scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns. This enables a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling. When the workload decreases, Application Auto Scaling decreases the throughput so that you don’t pay for unused provisioned capacity.,Option A is wrong as its more of a manual effort and not a cost-effective way as compared to B,Option C is wrong as DynamoDB streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours. It cannot help review the current capacity.,Option D is wrong as it is not a cost effective to provision the throughput to the maximum required. 正确答案: B 正确答案是B,因为DynamoDB自动扩展可以根据需求进行扩展。参见AWS文档 – DynamoDB自动扩展。 许多数据库工作负载具有周期性特征,或者很难提前预测。例如,考虑一个社交网络应用,大多数用户在白天活跃。数据库必须能够处理白天的活动,但晚上无需相同水平的吞吐量。另一个例子可能是一个新兴的手机游戏应用,正在经历快速增长。如果游戏变得过于受欢迎,它可能会超出可用的数据库资源,导致性能缓慢和客户不满。这些类型的工作负载通常需要人工干预来根据使用水平的变化扩展或缩减数据库资源。 DynamoDB自动扩展使用AWS应用程序自动扩展服务,根据实际的流量模式动态调整您预配置的吞吐容量。这使得表或全局二级索引能够增加其预配置的读写容量,以处理突如其来的流量增加,而不会出现限流。当工作负载减少时,应用程序自动扩展会减少吞吐量,以避免为未使用的预配置容量付费。 选项A是错误的,因为这更多是人工操作,相比B来说并不是一种具有成本效益的方法。 选项C是错误的,因为DynamoDB流捕获了任何DynamoDB表中按时间顺序排列的项级修改,并将这些信息存储在日志中,最多24小时。它不能帮助审查当前容量。 选项D是错误的,因为将吞吐量预配置到最大需求并不是一种具有成本效益的方法。 11 / 100 分类: DBS 11. 11. Your company recently purchased five different companies that run different backend databases that include Redshift, MySQL, Hive on EMR and PostgreSQL. You need a single tool that can run queries on all the different platform for your daily ad-hoc analysis. Which tool enables you to do that? 11. 你们公司最近购买了五家不同的公司,这些公司运行着不同的后台数据库,包括Redshift、MySQL、Hive on EMR和PostgreSQL。你需要一个可以在所有这些不同平台上运行查询的工具,用于每天的临时分析。哪个工具可以帮助你实现这一点? A. A. Presto A. Presto B. B. QuickSight B. QuickSight C. C. Ganglia C. 神经节 D. D. YARN D. YARN 正确答案: A Correct answer is A as Presto allows ad hoc query analysis over multiple data sources.,Refer AWS documentation – Presto,Presto (or PrestoDB) is an open source, distributed SQL query engine, designed from the ground up for fast analytic queries against data of any size. It supports both non-relational sources, such as the Hadoop Distributed File System (HDFS), Amazon S3, Cassandra, MongoDB, and HBase, and relational data sources such as MySQL, PostgreSQL, Amazon Redshift, Microsoft SQL Server, and Teradata.,Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds.,Option B is wrong as QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization.,Option C is wrong as Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids,Option D is wrong as YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework. 正确答案: A 正确答案是A,因为Presto允许在多个数据源上进行临时查询分析。参考AWS文档 – Presto。 Presto(或PrestoDB)是一个开源的分布式SQL查询引擎,专为快速分析查询设计,支持处理任何大小的数据。它支持非关系型数据源,如Hadoop分布式文件系统(HDFS)、Amazon S3、Cassandra、MongoDB和HBase,以及关系型数据源,如MySQL、PostgreSQL、Amazon Redshift、Microsoft SQL Server和Teradata。 Presto可以在数据存储的位置进行查询,无需将数据移入单独的分析系统。查询执行在纯内存架构上并行运行,大部分结果在几秒钟内返回。 选项B是错误的,因为QuickSight是一个快速的、云支持的商业智能服务,使得将洞察力轻松传递给组织中的每个人变得容易。 选项C是错误的,因为Ganglia是一个可扩展的分布式监控系统,适用于高性能计算系统,如集群和Grid。 选项D是错误的,因为YARN是开源Hadoop分布式处理框架中的资源管理和作业调度技术。 12 / 100 分类: DBS 12. 12. A company has lot of web applications, databases and data warehouse built on Teradata, NoSQL databases, and other types of data stores. They have lot of data assets in terms of logs, documents; excel files, CSV files, PDF documents and others. Web Application has different user workloads at different parts of the day. They are running one of their web application Node.js supported by MongoDB Database. The schema designed is document based. The team wants to migrate the platform on to AWS. Which NoSQL Managed service provides the document management capability? 12. 一家公司拥有大量基于Teradata、NoSQL数据库和其他类型数据存储构建的Web应用程序、数据库和数据仓库。它们拥有大量的数据资产,包括日志、文档、Excel文件、CSV文件、PDF文档等。Web应用程序在一天的不同时间段有不同的用户工作负载。它们正在运行其中一个基于Node.js的Web应用程序,支持MongoDB数据库。所设计的架构是基于文档的。团队希望将平台迁移到AWS。那么,哪个NoSQL托管服务提供文档管理功能? A. A. Amazon Aurora Database, being a multi-modal database support document models and NoSQL requirements A. Amazon Aurora 数据库作为一个多模数据库,支持文档模型和 NoSQL 需求。 B. B. Amazon RDS Database, being a multi-modal database support document models and NoSQL requirements B. 亚马逊RDS数据库,作为一个多模式数据库,支持文档模型和NoSQL需求 C. C. Amazon DynamoDB Database, being a document database support document models and NoSQL requirements C. Amazon DynamoDB 数据库,作为一个文档数据库,支持文档模型和 NoSQL 要求 D. D. Amazon Neptune Database, being a graph database support document models and NoSQL requirements D. 亚马逊海王星数据库作为一个图形数据库,支持文档模型和NoSQL需求 正确答案: C Correct answer is C as Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.,Option A is wrong as Amazon Aurora (Aurora) is a fully managed relational database engine that’s compatible with MySQL and PostgreSQL. Amazon Aurora supports relational data models and does not support graph model.,Option B is wrong as Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the cloud. Amazon RDS supports relational data models and does not support graph model.,Option D is wrong as Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. 正确答案: C 正确答案是C,因为Amazon DynamoDB是一个完全托管的NoSQL数据库服务,提供快速和可预测的性能,并具有无缝的可扩展性。 选项A是错误的,因为Amazon Aurora(Aurora)是一个完全托管的关系数据库引擎,兼容MySQL和PostgreSQL。Amazon Aurora支持关系数据模型,但不支持图形模型。 选项B是错误的,因为Amazon关系数据库服务(Amazon RDS)是一个Web服务,使得在云中设置、操作和扩展关系数据库变得更加容易。Amazon RDS支持关系数据模型,但不支持图形模型。 选项D是错误的,因为Amazon Neptune是一个快速、可靠、完全托管的图形数据库服务,使得构建和运行与高度连接的数据集交互的应用变得容易。 13 / 100 分类: DBS 13. 13. An International company has deployed a multi-tier web application that relies on DynamoDB in a single region. For regulatory reasons they need disaster recovery capability in a separate region with a Recovery Time Objective of 2 hours and a Recovery Point Objective of 24 hours. They should synchronize their data on a regular basis and be able to provision the web application rapidly using CloudFormation. The objective is to minimize changes to the existing web application, control the throughput of DynamoDB used for the synchronization of data and synchronize only the modified elements. Which design would you choose to meet these requirements? 13. 一家国际公司在单一区域部署了一个多层次的网页应用程序,该应用程序依赖于DynamoDB。由于法规要求,他们需要在另一个区域具备灾难恢复能力,恢复时间目标为2小时,恢复点目标为24小时。他们应该定期同步数据,并能够使用CloudFormation快速配置网页应用程序。目标是尽量减少对现有网页应用程序的更改,控制用于数据同步的DynamoDB的吞吐量,并仅同步已修改的元素。为了满足这些要求,您会选择哪种设计方案? A. A. Use AWS Data Pipeline to schedule a DynamoDB cross region copy once a day. Create a ‘Lastupdated’ attribute in your DynamoDB table that would represent the timestamp of the last update and use it as a filter A. 使用AWS数据管道每天安排一次DynamoDB跨区域复制。 在DynamoDB表中创建一个“Lastupdated”属性,该属性表示最后更新时间的时间戳,并将其用作过滤器。 B. B. Use EMR and write a custom script to retrieve data from DynamoDB in the current region using a SCAN operation and push it to DynamoDB in the second region. B. 使用EMR并编写自定义脚本,使用SCAN操作从当前区域的DynamoDB中检索数据,并将其推送到第二个区域的DynamoDB中。 C. C. Use AWS Data Pipeline to schedule an export of the DynamoDB table to S3 in the current region once a day then schedule another task immediately after it that will import data from S3 to DynamoDB in the other region. C. 使用AWS数据管道安排每天一次将DynamoDB表导出到当前区域的S3,然后在其后立即安排另一个任务,将数据从S3导入到另一个区域的DynamoDB。 D. D. Send each update into an SQS queue in the second region; use an auto-scaling group behind the SQS queue to replay the write in the second region. D. 将每个更新发送到第二个区域的SQS队列;使用一个自动扩展组在SQS队列后面重放在第二个区域的写操作。 正确答案: A Correct answer is A as the key requirement here is DR with RTO of 2 hours and a RPO of 24 hours with only the changed items to be replicated. DynamoDB cross region copy would help for DR with required RPO and RTO with Lastupdated time would help replicate only updated items.,Refer AWS DynamoDB Data Copy Between Regions Blog,Option B is wrong the scan operation is expensive and time consuming and would not help meet RTO. Also, there is no handling for only updated data.,Option C is wrong is time consuming and would not help meet the RTO. Also, there is no handling for only updated data.,Option D is wrong as this needs update to the application to push data to DynamoDB as well the SQS in a reliable manner. 正确答案: A 正确答案是A,因为这里的关键要求是DR,RTO为2小时,RPO为24小时,并且只复制已更改的项目。DynamoDB跨区域复制可以帮助满足DR要求,同时提供所需的RPO和RTO,Lastupdated时间将帮助仅复制更新的项目。参考AWS DynamoDB跨区域数据复制博客。 选项B是错误的,扫描操作开销大且耗时,不会帮助满足RTO。同时,无法处理仅更新的数据。 选项C是错误的,耗时且无法帮助满足RTO。同时,无法处理仅更新的数据。 选项D是错误的,因为这需要更新应用程序,将数据以可靠的方式推送到DynamoDB以及SQS。 14 / 100 分类: DBS 14. 14. You work for a start-up that tracks commercial delivery trucks via GPS. You receive coordinates that are transmitted from each delivery truck once every 6 seconds. You need to process these coordinates in real-time from multiple sources and load them into Elasticsearch without significant technical overhead to maintain. Which tool should you use to digest the data? 14. 你在一家跟踪商业配送卡车的初创公司工作,通过GPS跟踪卡车的位置。你每6秒钟接收到一次从每辆配送卡车传输的坐标数据。你需要实时处理来自多个来源的这些坐标,并将它们加载到Elasticsearch中,同时不需要太大的技术维护负担。你应该使用哪个工具来处理这些数据? A. A. Amazon Kinesis Firehose A. 亚马逊 Kinesis Firehose B. B. Amazon EMR B. 亚马逊 EMR C. C. AWS Data Pipeline C. AWS 数据管道 D. D. Amazon SQS D. 亚马逊SQS 正确答案: A Correct answer is A as Kinesis Data Firehose can be used to transfer data directly to Elasticsearch, without any handling.,Refer AWS documentation – Kinesis Firehose,Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.,Option B is wrong as EMR is for batch analytics.,Option C is wrong as AWS Data Pipeline does not capture real time data. AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals.,Option D is wrong as SQS is a message service and would need handling to storage to Elasticsearch. 正确答案: A 正确答案是 A,因为 Kinesis Data Firehose 可以直接将数据传输到 Elasticsearch,而无需任何处理。参考 AWS 文档 – Kinesis Firehose。 Amazon Kinesis Data Firehose 是最简单、可靠地将流数据加载到数据存储和分析工具中的方式。它可以捕获、转换并将流数据加载到 Amazon S3、Amazon Redshift、Amazon Elasticsearch Service 和 Splunk,使得您现有的业务智能工具和仪表盘能够进行近实时分析。它是一个完全托管的服务,能够自动扩展以匹配您的数据吞吐量,并且无需持续管理。它还可以在加载之前对数据进行批处理、压缩、转换和加密,从而最小化目标存储使用的空间并提高安全性。 选项 B 错误,因为 EMR 是用于批量分析的。 选项 C 错误,因为 AWS Data Pipeline 不捕获实时数据。AWS Data Pipeline 是一个网络服务,帮助您在指定的时间间隔内可靠地处理和移动不同 AWS 计算和存储服务之间的数据,以及本地数据源。 选项 D 错误,因为 SQS 是一个消息服务,需要处理才能将数据存储到 Elasticsearch。 15 / 100 分类: DBS 15. 15. A solutions architect works for a company that has a data lake based on a central Amazon S3 bucket. The data contains sensitive information. The architect must be able to specify exactly which files each user can access. Users access the platform through a SAML federation Single Sign On platform. The architect needs to build a solution that allows fine grained access control, traceability of access to the objects, and usage of the standard tools (AWS Console, AWS CLI) to access the data. Which solution should the architect build? 15. 一名解决方案架构师为一家基于中央 Amazon S3 存储桶的数据湖公司工作。该数据包含敏感信息。架构师必须能够精确指定每个用户可以访问哪些文件。用户通过 SAML 联邦单点登录平台访问该平台。架构师需要构建一个解决方案,允许细粒度的访问控制、访问对象的可追溯性,并使用标准工具(AWS 控制台、AWS CLI)来访问数据。架构师应构建哪种解决方案? A. A. Use Amazon S3 Server-Side Encryption with AWS KMS-Managed Keys for storing data. Use AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for auditing. A. 使用Amazon S3服务器端加密与AWS KMS管理的密钥来存储数据。 使用AWS KMS授权来允许访问平台的特定元素。 使用AWS CloudTrail进行审计。 B. B. Use Amazon S3 Server-Side Encryption with Amazon S3-Managed Keys. Set Amazon S3 ACLs to allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing. B. 使用亚马逊S3服务器端加密,并使用亚马逊S3托管的密钥。设置亚马逊S3访问控制列表(ACL)以允许访问平台的特定元素。使用亚马逊S3访问日志以进行审计。 C. C. Use Amazon S3 Client-Side Encryption with Client-Side Master Key. Set Amazon S3 ACLs to allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing. C. 使用 Amazon S3 客户端端加密与客户端主密钥。设置 Amazon S3 访问控制列表 (ACL),以允许访问平台的特定元素。使用 Amazon S3 访问日志以进行审计。 D. D. Use Amazon S3 Client-Side Encryption with AWS KMS-Managed Keys for storing data. Use AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for auditing. D. 使用 Amazon S3 客户端加密与 AWS KMS 管理的密钥来存储数据。使用 AWS KMS 授权来允许访问平台的特定元素。使用 AWS CloudTrail 进行审计。 正确答案: B Correct answer is B as S3 Server Side Encryption with S3 Managed Keys provide encryption. S3 ACLs allows fine grained control access and S3 to access logs would help provide traceability across all tools.,Use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) – Each object is encrypted with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.,Option C is wrong as with Client-Side Encryption, the users must have the keys to decrypt the data.,When downloading an object—The client downloads the encrypted object from Amazon S3. Using the material description from the object’s metadata, the client determines which master key to use to decrypt the data key. The client uses that master key to decrypt the data key and then uses the data key to decrypt the object.,Options A & D are wrong as KMS Grants are mainly to provide access to the KMS keys. There is not mention of fine grained control over the S3 objects. 正确答案: B 正确答案是B,因为S3服务器端加密使用S3管理的密钥提供加密功能。S3访问控制列表(ACLs)允许精细控制访问,S3访问日志有助于提供跨所有工具的可追溯性。 使用Amazon S3管理密钥的服务器端加密(SSE-S3) – 每个对象都会使用唯一的密钥进行加密。作为额外的保障,它会使用定期轮换的主密钥加密该密钥本身。Amazon S3服务器端加密使用最强的块加密算法之一,256位高级加密标准(AES-256)来加密您的数据。 选项C是错误的,因为在客户端加密中,用户必须拥有密钥才能解密数据。 当下载一个对象时——客户端从Amazon S3下载加密对象。通过使用对象元数据中的材料描述,客户端确定使用哪个主密钥来解密数据密钥。客户端使用该主密钥解密数据密钥,然后使用数据密钥解密对象。 选项A和D是错误的,因为KMS授权主要是提供对KMS密钥的访问权限。没有提到对S3对象的精细控制。 16 / 100 分类: DBS 16. 16. A mobile application collects data that must be stored in multiple Availability Zones within five minutes of being captured in the app. What architecture securely meets these requirements? 16. 一个移动应用收集的数据必须在五分钟内存储在多个可用区中。什么架构能安全地满足这些要求? A. A. The mobile app should write to an S3 bucket that allows anonymous PutObject calls. A. 移动应用应该写入一个允许匿名 PutObject 调用的 S3 存储桶。 B. B. The mobile app should authenticate with an Amazon Cognito identity that is authorized to write to an Amazon Kinesis Firehose with an Amazon S3 destination. B. 移动应用程序应通过已授权写入具有Amazon S3目标的Amazon Kinesis Firehose的Amazon Cognito身份进行身份验证。 C. C. The mobile app should authenticate with an embedded IAM access key that is authorized to write to an Amazon Kinesis Firehose with an Amazon S3 destination. C. 移动应用应使用嵌入的IAM访问密钥进行身份验证,该密钥被授权写入具有Amazon S3目标的Amazon Kinesis Firehose。 D. D. The mobile app should call a REST-based service that stores data on Amazon EBS. Deploy the service on multiple EC2 instances across two Availability Zones. D. 移动应用程序应调用一个基于REST的服务,该服务将数据存储在Amazon EBS上。将该服务部署在跨两个可用区的多个EC2实例上。 正确答案: B Correct answer is B as it is essential when writing mobile applications that you consider the security of both how the application authenticates and how it stores credentials. Amazon Cognito gives you the ability to securely authenticate pools of users on any type of device at scale.,Option A is wrong as it uses an anonymous Put, which may allow other apps to write counterfeit data;,Option C is wrong as it would put credentials directly into the application, which is strongly discouraged because applications can be decompiled which can compromise the keys.,Option D is wrong as it does not meet our availability requirements: although the EC2 instances are running in different Availability Zones, the EBS volumes attached to each instance only store data in a single Availability Zone. 正确答案: B 正确答案是B,因为在编写移动应用程序时,考虑到应用程序的认证方式和凭证存储方式的安全性至关重要。Amazon Cognito使您能够在任何类型的设备上大规模安全地验证用户池。 选项A是错误的,因为它使用了匿名Put,这可能允许其他应用程序写入伪造数据。 选项C是错误的,因为它会将凭证直接放入应用程序中,这是强烈不建议的,因为应用程序可以被反编译,从而泄露密钥。 选项D是错误的,因为它没有满足我们的可用性要求:尽管EC2实例运行在不同的可用区,但附加到每个实例的EBS卷仅在单一可用区内存储数据。 17 / 100 分类: DBS 17. 17. You are using QuickSight to identify demand trends over multiple months for your top five product lines. Which type of visualization do you choose? 17. 你正在使用QuickSight来识别你前五大产品线在多个月份的需求趋势。你选择哪种类型的可视化? A. A. Scatter Plot A. 散点图 B. B. Pie Chart B. 饼图 C. C. Pivot Table C. 数据透视表 D. D. Line Chart D. 折线图 正确答案: D Correct answer is D as you need to represent time driven data for demand trends over multiple months, Line chart would be an ideal choice.,Refer AWS documentation – QuickSight Visual Types,Option A is wrong as Scatter plots can help to visualize two or three measures for a dimension.,Option B is wrong as Pie charts can help to compare values for items in a dimension.,Option C is wrong as Pivot tables can help to show measure values for the intersection of two dimensions. 正确答案: D 正确答案是 D,因为你需要表示跨多个月份的需求趋势的时间驱动数据,折线图是理想的选择。 参考 AWS 文档 – QuickSight 可视化类型。 选项 A 错误,因为散点图可以帮助可视化一个维度的两个或三个度量。 选项 B 错误,因为饼图可以帮助比较维度中项目的值。 选项 C 错误,因为透视表可以帮助显示两个维度交集的度量值。 18 / 100 分类: DBS 18. 18. A company is storing data on Amazon Simple Storage Service (S3). The company’s security policy mandates that data be encrypted at rest. Which of the following methods can achieve this? Choose 3 answers 18. 一家公司将数据存储在Amazon简单存储服务(S3)上。该公司的安全政策要求数据在静态时进行加密。以下哪种方法可以实现这一点?请选择3个答案。 A. A. Use Amazon S3 server-side encryption with AWS Key Management Service managed keys. A. 使用Amazon S3服务器端加密,并使用AWS密钥管理服务托管的密钥。 B. B. Use Amazon S3 server-side encryption with customer-provided keys B. 使用Amazon S3服务器端加密与客户提供的密钥 C. C. Use Amazon S3 server-side encryption with EC2 key pair. C. 使用Amazon S3服务器端加密与EC2密钥对。 D. D. Use Amazon S3 bucket policies to restrict access to the data at rest. D. 使用 Amazon S3 存储桶策略来限制对静态数据的访问。 E. E. Encrypt the data on the client-side before ingesting to Amazon S3 using their own master key E. 在将数据导入到Amazon S3之前,在客户端使用他们自己的主密钥对数据进行加密 F. F. Use SSL to encrypt the data while in transit to Amazon S3. F. 在数据传输到 Amazon S3 过程中使用 SSL 加密数据。 正确答案: A, B, E Correct answers are A, B & E,Refer to the AWS S3 Protecting Data using Encryption,Data at rest encryption using S3 can be implemented using either Server Side or Client Side encryption. SSE can be implemented using either KMS provided keys (SSE-KMS) or Customer provided keys (SSE-C). CSE can be implemented by encrypting the data before uploading it to S3 and then decrypting the data after downloading it from S3 at client side.,Option C is wrong as server side encryption doesn’t work with EC2 key pair,Option D is wrong as bucket policies are just to restrict access to S3,Option F is wrong as it targets the data in transit only. 正确答案: A, B, E 正确答案是 A, B 和 E,参见 AWS S3 使用加密保护数据,使用 S3 的静态数据加密可以通过服务器端加密(SSE)或客户端加密(CSE)来实现。SSE 可以通过使用 KMS 提供的密钥(SSE-KMS)或客户提供的密钥(SSE-C)来实现。CSE 可以通过在上传数据到 S3 之前加密数据,然后在从 S3 下载数据后在客户端解密数据来实现。 选项 C 错误,因为服务器端加密不适用于 EC2 密钥对。 选项 D 错误,因为存储桶策略只是用来限制对 S3 的访问。 选项 F 错误,因为它仅针对传输中的数据。 检查 19 / 100 分类: DBS 19. 19. A company that provides economics data dashboards needs to be able to develop software to display rich, interactive, data-driven graphics that run in web browsers and leverages the full stack of web standards (HTML, SVG, and CSS). Which technology provides the most appropriate support for this requirements? 19. 一家公司提供经济数据仪表盘,需要能够开发软件来显示丰富的、互动的、数据驱动的图形,这些图形运行在网页浏览器中,并利用完整的网页标准栈(HTML、SVG 和 CSS)。哪种技术最能满足这些需求? A. A. D3.js A. D3.js B. B. IPython/Jupyter B. IPython/Jupyter C. C. R Studio C. R工作室 D. D. Hue D. 色调 正确答案: A Correct answer is A as D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.,Option B is wrong as Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. It is not a rich visualization tool which supports all the web standards.,Option C is wrong as RStudio makes R easier to use. It includes a code editor, debugging & visualization tools.,Option D is wrong as Hue is the open source analytics workbench designed for fast data discovery, intelligent query assistance, and seamless collaboration 正确答案: A Correct answer is A as D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation. Option B is wrong as Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. It is not a rich visualization tool which supports all the web standards. Option C is wrong as RStudio makes R easier to use. It includes a code editor, debugging & visualization tools. Option D is wrong as Hue is the open source analytics workbench designed for fast data discovery, intelligent query assistance, and seamless collaboration. 20 / 100 分类: DBS 20. 20. An enterprise customer is migrating to Redshift and is considering using dense storage nodes in its Redshift cluster. The customer wants to migrate 50 TB of data. The customer’s query patterns involve performing many joins with thousands of rows. The customer needs to know how many nodes are needed in its target Redshift cluster. The customer has a limited budget and needs to avoid performing tests unless absolutely needed. Which approach should this customer use? 20. 一家企业客户正在迁移到Redshift,并考虑在其Redshift集群中使用密集存储节点。客户希望迁移50 TB的数据。客户的查询模式涉及执行许多带有数千行的连接操作。客户需要知道在目标Redshift集群中需要多少节点。客户的预算有限,除非绝对必要,否则需要避免进行测试。该客户应该使用哪种方法? A. A. Start with many small nodes. A. 从许多小节点开始。 B. B. Start with fewer large nodes. B. 从较少的大节点开始。 C. C. Have two separate clusters with a mix of a small and large nodes. C. 拥有两个独立的集群,集群中包含大小节点的混合。 D. D. Insist on performing multiple tests to determine the optimal configuration. D. 坚持进行多次测试以确定最佳配置。 正确答案: A Correct answer is A as the customer is planning to use Dense Storage nodes, they can start with more number of small nodes which would be cost-effective as compared to large nodes and easier to improve query performance and storage.,Refer AWS documentation – Redshift Cluster & Nodes,DS2 node types are optimized for large data workloads and use hard disk drive (HDD) storage. Node types are available in different sizes. DS2 nodes are available in xlarge and 8xlarge sizes.,The number of nodes that you choose depends on the size of your dataset and your desired query performance. Using the dense storage node types as an example, if you have 32 TB of data, you can choose either 16 ds2.xlarge nodes or 2 ds2.8xlarge nodes. If your data grows in small increments, choosing the ds2.xlarge node size allows you to scale in increments of 2 TB. If you typically see data growth in larger increments, a ds2.8xlarge node size might be a better choice.,Because Amazon Redshift distributes and executes queries in parallel across all of a cluster’s compute nodes, you can increase query performance by adding nodes to your cluster. Amazon Redshift also distributes your data across all compute nodes in a cluster. When you run a cluster with at least two compute nodes, data on each node will always be mirrored on disks on another node and you reduce the risk of incurring data loss.,Option B is wrong as with 50TB you would need 4 large nodes and it would not be as cost effective as small nodes.,Options C & D are wrong as it is not meet the customer requirement of limited budget and avoid performing tests unless absolutely needed. 正确答案:A 正确答案是 A,因为客户计划使用密集存储节点,他们可以从更多的小型节点开始,这相比于大型节点更具成本效益,并且更容易提高查询性能和存储能力。 请参考 AWS 文档 – Redshift 集群与节点。DS2 节点类型针对大型数据工作负载进行了优化,并使用硬盘驱动器(HDD)存储。节点类型提供不同的尺寸,DS2 节点可用的尺寸包括 xlarge 和 8xlarge。 您选择的节点数量取决于数据集的大小以及所期望的查询性能。例如,使用密集存储节点类型,如果您的数据量为 32 TB,您可以选择 16 个 ds2.xlarge 节点或 2 个 ds2.8xlarge 节点。如果您的数据增长是小幅递增的,那么选择 ds2.xlarge 节点大小可以让您以 2 TB 的增量扩展。如果您的数据通常以较大增量增长,那么 ds2.8xlarge 节点大小可能是更好的选择。 由于 Amazon Redshift 在集群的所有计算节点上并行分布和执行查询,因此您可以通过向集群添加节点来提高查询性能。Amazon Redshift 还会将您的数据分布到集群中的所有计算节点。当您的集群至少有两个计算节点时,每个节点上的数据都会始终镜像到另一个节点的磁盘上,从而降低数据丢失的风险。 选项 B 是错误的,因为对于 50TB 数据,您需要 4 个大型节点,而这相比于小型节点并不具备成本效益。 选项 C 和 D 是错误的,因为它们不符合客户有限预算的要求,并且需要尽量避免进行测试,除非绝对必要。 21 / 100 分类: DBS 21. 21. ABCD has developed a sensor intended to be placed inside of people’s shoes, monitoring the number of steps taken every day. ABCD is expecting thousands of sensors reporting in every minute and hopes to scale to millions by the end of the year. A requirement for the project is it needs to be able to accept the data, run it through ETL to store in warehouse and archive it on Amazon Glacier, with room for a real-time dashboard for the sensor data to be added at a later date. What is the best method for architecting this application given the requirements? Choose the correct answer: 21. ABCD已经开发了一种传感器,旨在放置在人的鞋子内部,监控每天走的步数。ABCD预计每分钟会有成千上万的传感器报告,并希望到年底能够扩展到数百万个传感器。该项目的要求是,它需要能够接受数据,经过ETL处理后存储到数据仓库,并将其归档到Amazon Glacier中,同时为以后添加传感器数据的实时仪表盘留出空间。根据这些要求,架构此应用程序的最佳方法是什么?请选择正确答案: A. A. Use Amazon Cognito to accept the data when the user pairs the sensor to the phone, and then have Cognito send the data to Dynamodb. Use Data Pipeline to create a job that takes the DynamoDB tablee and sends it to an EMR cluster for ETL, then outputs to Redshift and S3 while, using S3 lifecycle policies to archive on Glacier. A. 使用Amazon Cognito在用户将传感器与手机配对时接收数据,然后让Cognito将数据发送到DynamoDB。使用Data Pipeline创建一个任务,将DynamoDB表格发送到EMR集群进行ETL处理,然后将结果输出到Redshift和S3,同时使用S3生命周期策略将数据归档到Glacier。 B. B. Write the sensor data directly to a scaleable DynamoDB; create a data pipeline that starts an EMR cluster using data from DynamoDB and sends the data to S3 and Redshift. B. 将传感器数据直接写入可扩展的DynamoDB;创建一个数据管道,使用来自DynamoDB的数据启动EMR集群,并将数据发送到S3和Redshift。 C. C. Write the sensor data to Amazon S3 with a lifecycle policy for Glacier, create an EMR cluster that uses the bucket data and runs it through ETL. It then outputs that data into Redshift data warehouse. C. 将传感器数据写入 Amazon S3,并为 Glacier 配置生命周期策略,创建一个使用该桶数据并通过 ETL 处理的 EMR 集群。然后将数据输出到 Redshift 数据仓库。 D. D. Write the sensor data directly to Amazon Kinesis and output the data into Amazon S3 creating a lifecycle policy for Glacier archiving. Also, have a parallel processing application that runs the data through EMR and sends to a Redshift data warehouse. D. 将传感器数据直接写入Amazon Kinesis,并将数据输出到Amazon S3,同时为Glacier归档创建生命周期策略。还需要一个并行处理应用程序,将数据通过EMR处理并发送到Redshift数据仓库。 正确答案: D Correct answer is D as the requirement is real time data ingestion and analytics, the best option is to use Kinesis for storing the real time incoming data. The data can then be moved to S3 and analyzed using EMR and Redshift. Data can then be moved to Glacier for archival.,Refer AWS documentation – Kinesis,Amazon Kinesis is a platform for streaming data on AWS, making it easy to load and analyze streaming data, and also providing the ability for you to build custom streaming data applications for specialized needs.,Option A is wrong as Cognito is not suitable for handling real time data,Amazon Cognito lets you easily add user sign-up and sign-in and manage permissions for your mobile and web apps. You can create your own user directory within Amazon Cognito, or you can authenticate users through social identity providers such as Facebook, Twitter, or Amazon; with SAML identity solutions; or by using your own identity system. In addition, Amazon Cognito enables you to save data locally on users’ devices, allowing your applications to work even when the devices are offline. You can then synchronize data across users’ devices so that their app experience remains consistent regardless of the device they use.,Option B is wrong as DynamoDB is not suitable for streaming data ingestion and handling.,Option C is wrong as S3 is not an ideal solution to handle this huge amount of requests. 正确答案: D 正确答案是 D,因为要求是实时数据摄取和分析,最佳选择是使用 Kinesis 存储实时传入的数据。然后可以将数据移动到 S3,并使用 EMR 和 Redshift 进行分析。数据随后可以移动到 Glacier 进行归档。 参考 AWS 文档 – Kinesis Amazon Kinesis 是一个用于在 AWS 上进行流数据处理的平台,它使加载和分析流数据变得更加简单,并且提供了为特定需求构建自定义流数据应用程序的能力。 选项 A 错误,因为 Cognito 不适合处理实时数据。 Amazon Cognito 让你轻松地为你的移动应用和 Web 应用添加用户注册和登录功能,并管理权限。你可以在 Amazon Cognito 中创建自己的用户目录,或者通过社交身份提供商(如 Facebook、Twitter 或 Amazon)验证用户;使用 SAML 身份解决方案;或使用你自己的身份系统。此外,Amazon Cognito 使你能够将数据本地保存到用户的设备上,允许你的应用即使在设备离线时也能工作。然后你可以在用户的设备之间同步数据,使他们的应用体验无论使用哪个设备都保持一致。 选项 B 错误,因为 DynamoDB 不适合流数据的摄取和处理。 选项 C 错误,因为 S3 不是处理大量请求的理想解决方案。 22 / 100 分类: DBS 22. 22. You need to visualize data from Spark and Hive running on an EMR cluster. Which of the options is best for an interactive and collaborative notebook for data exploration? 22. 您需要可视化来自运行在EMR集群上的Spark和Hive的数据。以下哪个选项最适合用于数据探索的交互式和协作性笔记本? A. A. Hive A. Hive B. B. D3.js B. D3.js C. C. Kinesis Analytics C. Kinesis 分析 D. D. Zeppelin D. 齐柏林 正确答案: D Correct answer is D as Zeppelin provides data ingestion, data discovery, data analytics and data visualization & collaboration.,Refer documentation – Zeppelin,Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.,Apache Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown and Shell.,Some basic charts are already included in Apache Zeppelin. Visualizations are not limited to SparkSQL query, any output from any language backend can be recognized and visualized.,Option A is wrong as Hive is data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.,Option B is wrong as D3.js a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS.,Option C is wrong as Kinesis Data Analytics is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time. 正确答案: D 正确答案是D,因为Zeppelin提供数据摄取、数据发现、数据分析和数据可视化与协作功能。 请参阅文档 – Zeppelin Zeppelin是一个基于Web的笔记本,能够实现数据驱动、交互式数据分析和协作文档,支持SQL、Scala等语言。 Apache Zeppelin解释器概念允许将任何语言/数据处理后端插件集成到Zeppelin中。目前,Apache Zeppelin支持多种解释器,如Apache Spark、Python、JDBC、Markdown和Shell。 一些基本的图表已经包含在Apache Zeppelin中。可视化不限于SparkSQL查询,任何语言后端的输出都可以被识别并可视化。 选项A是错误的,因为Hive是数据仓库软件,旨在使用SQL帮助读取、写入和管理存储在分布式存储中的大数据集。 选项B是错误的,因为D3.js是一个JavaScript库,用于基于数据操作文档。D3帮助您使用HTML、SVG和CSS将数据呈现出来。 选项C是错误的,因为Kinesis Data Analytics是分析流数据、获取可操作的洞察力并实时响应您的业务和客户需求的最简单方法。 23 / 100 分类: DBS 23. 23. Your company needs to design a data warehouse for a client in the retail industry. The data warehouse will store historic purchases in Amazon Redshift. To comply with PCI:DSS requirements and meet data protection standards, the data must be encrypted at rest and have keys managed by a corporate on-premises HSM. How can you meet these requirements in a cost-effective manner? 23. 你们公司需要为零售行业的客户设计一个数据仓库。该数据仓库将把历史购买数据存储在 Amazon Redshift 中。为了遵守 PCI:DSS 要求并满足数据保护标准,数据必须在静态时加密,并且密钥由公司内部的 HSM 管理。你如何以成本效益的方式满足这些要求? A. A. Use AWS Import/Export to import a company HSM device into AWS alongside the Amazon Redshift cluster, and configure Redshift to use the imported HSM. A. 使用 AWS Import/Export 将公司 HSM 设备导入 AWS,并与 Amazon Redshift 集群一起使用,然后配置 Redshift 使用导入的 HSM。 B. B. Create a VPN connection between a VPC you create in AWS and an on-premises network. Then launch the Redshift cluster in the VPC, and configure it to use your corporate HSM. B. 创建一个在AWS中创建的VPC与本地网络之间的VPN连接。然后在VPC中启动Redshift集群,并配置它使用您的企业HSM。 C. C. Use the AWS CloudHSM service to establish a trust relationship between the CloudHSM and the corporate HSM over a Direct Connect connection. Configure Amazon Redshift to use the CloudHSM device. C. 使用AWS CloudHSM服务通过Direct Connect连接在CloudHSM和公司HSM之间建立信任关系。配置Amazon Redshift以使用CloudHSM设备。 D. D. Configure the AWS Key Management Service to point to the corporate HSM device, and then launch the Amazon Redshift cluster with the KMS managing the encryption keys. D. 配置AWS密钥管理服务指向公司HSM设备,然后启动Amazon Redshift集群,并由KMS管理加密密钥。 正确答案: B Correct answer is B as Redshift Cluster can integrate with corporate HSM via VPN in a cost-effective way,Refer AWS documentation – Redshift Encryption,In Amazon Redshift, you can enable database encryption for your clusters to help protect data at rest. When you enable encryption for a cluster, the data blocks and system metadata are encrypted for the cluster and its snapshots.,You can enable encryption when you launch your cluster, or you can modify an unencrypted cluster to use AWS Key Management Service (AWS KMS) encryption. To do so, you can use either an AWS-managed key or a customer-managed key (CMK). When you modify your cluster to enable KMS encryption, Amazon Redshift automatically migrates your data to a new encrypted cluster. Snapshots created from the encrypted cluster are also encrypted.,Amazon Redshift uses a hierarchy of encryption keys to encrypt the database. You can use either AWS Key Management Service (AWS KMS) or a hardware security module (HSM) to manage the top-level encryption keys in this hierarchy. The process that Amazon Redshift uses for encryption differs depending on how you manage keys. Amazon Redshift automatically integrates with AWS KMS but not with an HSM. When you use an HSM, you must use client and server certificates to configure a trusted connection between Amazon Redshift and your HSM.,Option A is wrong as importing HSM to AWS would not be secure as it against the requirement.,Option C is wrong as Direct Connect would be cheap and quick to start with.,Option D is wrong as CloudHSM cannot connect to on-premises.,Does CloudHSM work with on-premises HSMs? – Yes. While CloudHSM does not interoperate directly with on-premises HSMs, you can securely transfer exportable keys between CloudHSM and most commercial HSMs using one of several supported RSA key wrap methods. 正确答案: B 正确答案是B,因为Redshift集群可以通过VPN以具有成本效益的方式与企业HSM集成,参考AWS文档 – Redshift加密。在Amazon Redshift中,您可以为集群启用数据库加密,以帮助保护静态数据。当您为集群启用加密时,数据块和系统元数据会被加密,包括集群及其快照。 您可以在启动集群时启用加密,或者可以修改未加密的集群,使用AWS密钥管理服务(AWS KMS)加密。为此,您可以使用AWS管理的密钥或客户管理的密钥(CMK)。当您修改集群以启用KMS加密时,Amazon Redshift会自动将数据迁移到一个新的加密集群。从加密集群创建的快照也会被加密。 Amazon Redshift使用加密密钥层次结构来加密数据库。您可以使用AWS密钥管理服务(AWS KMS)或硬件安全模块(HSM)来管理此层次结构中的顶级加密密钥。Amazon Redshift用于加密的过程取决于您如何管理密钥。Amazon Redshift自动与AWS KMS集成,但不与HSM集成。当使用HSM时,您必须使用客户端和服务器证书来配置Amazon Redshift与您的HSM之间的可信连接。 选项A错误,因为将HSM导入AWS将不安全,违反了要求。 选项C错误,因为Direct Connect开始时会便宜且快速。 选项D错误,因为CloudHSM无法连接到本地HSM。 CloudHSM是否与本地HSM兼容? 是的。虽然CloudHSM不能直接与本地HSM互操作,但您可以使用几种支持的RSA密钥封装方法,在CloudHSM和大多数商业HSM之间安全地传输可导出密钥。 24 / 100 分类: DBS 24. 24. A company wants to use Redshift cluster for petabyte-scale data warehousing. Data for processing would be stored on Amazon S3. As a security requirement, the company wants the data to be encrypted at rest. As a solution architect how would you implement the solution? 24. 一家公司希望使用Redshift集群进行PB级数据仓库。处理的数据将存储在Amazon S3上。作为安全要求,公司希望数据在静态时进行加密。作为解决方案架构师,您将如何实现这一解决方案? A. A. Store the data in S3 with Server Side Encryption and copy the data over to Redshift cluster A. 将数据存储在启用服务器端加密的S3中,并将数据复制到Redshift集群。 B. B. Store the data in S3. Launch an encrypted Redshift cluster, copy the data to the Redshift cluster and store back in S3 in encrypted format B. 将数据存储在S3中。启动一个加密的Redshift集群,将数据复制到Redshift集群中,然后以加密格式存回S3。 C. C. Store the data in S3 with Server Side Encryption. Launch an encrypted Redshift cluster and copy the data to the cluster. C. 将数据存储在启用服务器端加密的S3中。启动一个加密的Redshift集群并将数据复制到该集群中。 D. D. Store the data in S3 with Server Side Encryption. Launch a Redshift cluster, copy the data to cluster and enable encryption on the cluster. D. 将数据存储在S3中并启用服务器端加密。启动一个Redshift集群,将数据复制到集群中,并启用集群上的加密。 正确答案: C Correct answer is C as the need is for data at rest encryption. S3 with SSE will help store the data in S3 in encrypted format.,Refer AWS documentation – Redshift Encryption & S3 Encryption,In Amazon Redshift, you can enable database encryption for your clusters to help protect data at rest. When you enable encryption for a cluster, the data blocks and system metadata are encrypted for the cluster and its snapshots.,Encryption is an optional, immutable setting of a cluster. If you want encryption, you enable it during the cluster launch process. To go from an unencrypted cluster to an encrypted cluster or the other way around, unload your data from the existing cluster and reload it in a new cluster with the chosen encryption setting.,Option A is wrong as data is not encrypted in Redshift.,Option B is wrong as data is not encrypted in S3.,Option D is wrong as you cannot enable encryption after Redshift cluster is launched. 正确答案: C 正确答案是C,因为需求是数据静态加密。使用S3与SSE可以帮助将数据以加密格式存储在S3中。 参考AWS文档 – Redshift加密与S3加密 在Amazon Redshift中,您可以为集群启用数据库加密,以帮助保护静态数据。当您为集群启用加密时,数据块和系统元数据会被加密,用于集群及其快照。 加密是集群的可选且不可更改的设置。如果您需要加密,可以在集群启动过程中启用它。要从未加密集群转换为加密集群,或反之,您需要将数据从现有集群卸载并重新加载到新集群中,并选择加密设置。 选项A是错误的,因为Redshift中的数据没有加密。 选项B是错误的,因为S3中的数据没有加密。 选项D是错误的,因为在Redshift集群启动后无法启用加密。 25 / 100 分类: DBS 25. 25. An organization needs a data store to handle the following data types and access patterns: Key-value access pattern Complex SQL queries and transactions Consistent reads Fixed schema Which data store should the organization choose? 25. 一个组织需要一个数据存储来处理以下数据类型和访问模式: 键值访问模式 复杂的SQL查询和事务 一致性读取 固定的架构 该组织应该选择哪种数据存储? A. A. Amazon S3 A. 亚马逊 S3 B. B. Amazon Kinesis B. 亚马逊 Kinesis C. C. Amazon DynamoDB C. 亚马逊 DynamoDB D. D. Amazon RDS D. 亚马逊关系数据库服务 (Amazon RDS) 正确答案: D Correct answer is D as Amazon RDS handles all these requirements, and although Amazon RDS is not typically thought of as optimized for key-value based access, a schema with a good primary key selection can provide this functionality.,Option A is wrong as Amazon S3 provides no fixed schema and does not have consistent read after PUT support.,Option B is wrong as Amazon Kinesis supports streaming data that is consistent as of a given sequence number but doesn’t provide key/value access.,Option C is wrong as Amazon DynamoDB provides key/value access and consistent reads, it does not support SQL-based queries. 正确答案: D 正确答案是D,因为Amazon RDS处理所有这些要求,尽管Amazon RDS通常不被认为是针对键值访问优化的,但一个良好的主键选择的模式可以提供此功能。 选项A是错误的,因为Amazon S3没有固定模式,并且不支持PUT后的持续读取。 选项B是错误的,因为Amazon Kinesis支持基于给定序列号的一致流数据,但不提供键值访问。 选项C是错误的,因为Amazon DynamoDB提供键值访问和一致性读取,但不支持基于SQL的查询。 26 / 100 分类: DBS 26. 26. A video-sharing mobile application uploads files greater than 10 GB to an Amazon S3 bucket. However, when using the application in locations far away from the S3 bucket region, uploads take extended periods of time, and sometimes fail to complete. Which combination of methods would improve the performance of uploading to the application? (Select TWO.) 26. 一个视频分享移动应用将大于10 GB的文件上传到Amazon S3存储桶。然而,当在距离S3存储桶区域较远的位置使用该应用时,上传需要较长时间,有时甚至无法完成。以下哪种方法的组合可以提高上传性能?(选择两项。) A. A. Configure an S3 bucket in each region to receive the uploads, and use cross-region replication to copy the files to the distribution bucket. A. 在每个区域配置一个 S3 存储桶来接收上传,并使用跨区域复制将文件复制到分发存储桶。 B. B. Modify the application to add random prefixes to the files before uploading. B. 修改应用程序,在上传文件之前为文件添加随机前缀。 C. C. Set up Amazon Route 53 with latency-based routing to route the uploads to the nearest S3 bucket region. C. 设置Amazon Route 53,使用基于延迟的路由将上传内容路由到最近的S3存储桶区域。 D. D. Enable S3 Transfer Acceleration on the S3 bucket, and configure the application to use the Transfer Acceleration endpoint for uploads. D. 启用S3传输加速功能,并配置应用程序使用传输加速端点进行上传。 E. E. Configure the application to break the video files into chunks and use a multipart upload to transfer files to Amazon S3. E. 配置应用程序,将视频文件拆分成多个块,并使用分段上传将文件传输到Amazon S3。 正确答案: D, E Correct answers are D & E,Option D as S3 Transfer Acceleration helps speed up the upload performance. Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.,Option E as multipart upload helps provide better recoverability.,Depending on the size of the data you are uploading, Amazon S3 offers the following options:,We recommend that you use multipart uploading in the following ways:,Option A is wrong as the mobile application needs to be configured for different endpoints and does not improve performance. Also, cross region replication would create duplication and increase cost.,Option B is wrong as random prefixes are no more needed for improving performance.,Option C is wrong as Route 53 latency based routing works only with S3 static website. 正确答案: D, E 正确答案是D和E,选项D因为S3 Transfer Acceleration有助于加速上传性能。 Amazon S3 Transfer Acceleration使文件在长距离之间的客户端和S3存储桶之间进行快速、简便且安全的传输。Transfer Acceleration利用了Amazon CloudFront全球分布的边缘位置。当数据到达边缘位置时,数据将通过优化的网络路径传输到Amazon S3。 选项E因为多部分上传有助于提供更好的恢复能力。 根据上传数据的大小,Amazon S3提供以下选项: 我们建议您按照以下方式使用多部分上传: 选项A是错误的,因为移动应用程序需要为不同的端点进行配置,并且无法改善性能。此外,跨区域复制会创建重复并增加成本。 选项B是错误的,因为不再需要随机前缀来改善性能。 选项C是错误的,因为Route 53基于延迟的路由仅适用于S3静态网站。 检查 27 / 100 分类: DBS 27. 27. A company is collected real time senstive data using Amazon Kinesis. As a security requirement, the Amazon Kinesis stream needs to be encrypted. Which approach should be used to accomplish this task? 27. 一家公司正在使用 Amazon Kinesis 收集实时敏感数据。作为安全要求,Amazon Kinesis 流需要进行加密。应该使用哪种方法来完成这个任务? A. A. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the producer. A. 在数据进入Amazon Kinesis流之前,在生产者端对数据进行客户端加密。 B. B. Use a partition key to segment the data by MD5 hash function, which makes it undecipherable while in transit. B. 使用分区键通过MD5哈希函数对数据进行分段,从而使其在传输过程中无法被解密。 C. C. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the consumer. C. 在数据进入消费者端的Amazon Kinesis流之前,先对数据进行客户端加密。 D. D. Use a shard to segment the data, which has built-in functionality to make it indecipherable while in transit. D. 使用分片来对数据进行分段,分片具有内置功能,在传输过程中使其不可解读。 正确答案: A Correct answer is A as the data can be encrypted using client side encryption. The encryption needs to be done on the producer before the data is pushed to Kinesis Streams.,Refer AWS documentation – Kinesis Encrypt and Decrypt Data,Options B & D are wrong as they do not provide encryption.,Option C is wrong as the encryption needs to happen at the producer side. 正确答案: A 正确答案是 A,因为数据可以使用客户端加密进行加密。加密需要在数据推送到 Kinesis 流之前在生产者端完成。参考 AWS 文档 – Kinesis 加密和解密数据。 选项 B 和 D 是错误的,因为它们不提供加密。 选项 C 是错误的,因为加密需要在生产者端进行。 28 / 100 分类: DBS 28. 28. A customer has a machine learning workflow that consists of multiple quick cycles of reads-writes-reads on Amazon S3. The customer needs to run the workflow on EMR but is concerned that the reads in subsequent cycles will miss new data critical to the machine learning from the prior cycles. How should the customer accomplish this? 28. 一位客户有一个机器学习工作流,该工作流由多个快速的读写读取循环组成,使用的是 Amazon S3。客户需要在 EMR 上运行该工作流,但担心在后续循环中的读取操作会错过来自前一个循环的对机器学习至关重要的新数据。客户应该如何实现这一目标? A. A. Use AWS Data Pipeline to orchestrate the data processing cycles. A. 使用AWS数据管道来协调数据处理周期。 B. B. Turn on EMRFS consistent view when configuring the EMR cluster. B. 在配置EMR集群时开启EMRFS一致视图。 C. C. Set hadoop.data.consistency=true in the core-site.xml file. C. 在core-site.xml文件中设置hadoop.data.consistency=true。 D. D. Set hadoop.s3.consistency=true in the core-site.xml file. D. 在core-site.xml文件中设置hadoop.s3.consistency=true。 正确答案: B Correct answer is B as EMRFS Consistent View helps provide a view of the objects in S3 and also tracks the consistency.,Refer AWS documentation – EMRFS Consistent View,EMRFS consistent view is an optional feature available when using Amazon EMR release version 3.2.1 or later. Consistent view allows EMR clusters to check for list and read-after-write consistency for Amazon S3 objects written by or synced with EMRFS. Consistent view addresses an issue that can arise due to the Amazon S3 Data Consistency Model. For example, if you add objects to Amazon S3 in one operation and then immediately list objects in a subsequent operation, the list and the set of objects processed may be incomplete. This is more commonly a problem for clusters that run quick, sequential steps using Amazon S3 as a data store, such as multi-step extract-transform-load (ETL) data processing pipelines.,When you create a cluster with consistent view enabled, Amazon EMR uses an Amazon DynamoDB database to store object metadata and track consistency with Amazon S3. If consistent view determines that Amazon S3 is inconsistent during a file system operation, it retries that operation according to rules that you can define.,With consistent view enabled, EMRFS returns the set of objects listed in an EMRFS metadata store and those returned directly by Amazon S3 for a given path. Because Amazon S3 is still the “source of truth” for the objects in a path, EMRFS ensures that everything in a specified Amazon S3 path is being processed regardless of whether it is tracked in the metadata. However, EMRFS consistent view only ensures that the objects in the folders that you track are checked for consistency. 正确答案: B 正确答案是 B,因为 EMRFS 一致视图有助于提供 S3 中对象的视图,并且跟踪一致性。参考 AWS 文档 – EMRFS 一致视图。 EMRFS 一致视图是一个可选功能,适用于使用 Amazon EMR 版本 3.2.1 或更高版本。 一致视图允许 EMR 集群检查由 EMRFS 写入或与之同步的 Amazon S3 对象的列表和写后读取一致性。一致视图解决了由于 Amazon S3 数据一致性模型可能引发的问题。例如,如果你在一次操作中将对象添加到 Amazon S3,然后立即在随后的操作中列出对象,那么列出的对象和处理的对象集合可能不完整。这在使用 Amazon S3 作为数据存储的集群中尤其常见,尤其是快速、连续的步骤,例如多步骤的提取-转换-加载(ETL)数据处理管道。 当你创建一个启用了“一致视图”的集群时,Amazon EMR 使用 Amazon DynamoDB 数据库来存储对象元数据并跟踪与 Amazon S3 的一致性。如果一致视图确定在文件系统操作期间 Amazon S3 不一致,它会根据你定义的规则重试该操作。 启用一致视图时,EMRFS 返回在 EMRFS 元数据存储中列出的对象集合和 Amazon S3 为给定路径直接返回的对象。因为 Amazon S3 仍然是该路径中对象的“真实来源”,EMRFS 确保路径中指定的所有内容都在处理过程中,不管它是否在元数据中被跟踪。然而,EMRFS 一致视图仅确保检查你跟踪的文件夹中的对象一致性。 29 / 100 分类: DBS 29. 29. Managers in a company need access to the human resources database that runs on Amazon Redshift, to run reports about their employees. Managers must only see information about their direct reports. Which technique should be used to address this requirement with Amazon Redshift? 29. 公司中的经理需要访问运行在 Amazon Redshift 上的人力资源数据库,以便生成有关员工的报告。 经理只能查看有关直接下属的信息。应使用哪种技术来满足在 Amazon Redshift 中的此要求? A. A. Define an IAM group for each manager with each employee as an IAM user in that group, and use that to limit the access. A. 为每个经理定义一个IAM组,将每个员工作为IAM用户加入该组,并使用它来限制访问。 B. B. Use Amazon Redshift snapshot to create one cluster per manager. Allow the manager to access only their designated clusters. B. 使用Amazon Redshift快照为每个经理创建一个集群。只允许经理访问他们指定的集群。 C. C. Define a key for each manager in AWS KMS and encrypt the data for their employees with their private keys. C. 在AWS KMS中为每个管理员定义一个密钥,并使用他们的私钥加密员工的数据。 D. D. Define a view that uses the employee’s manager name to filter the records based on current user names. D. 定义一个视图,使用员工的经理姓名根据当前用户名筛选记录。 正确答案: D Correct answer is D as you can create a view in Redshift which filters the records based on the current user name and show only those results for the logged in user.,Option A is wrong as IAM group with users cannot limit the access to the data in Redshift.,Option B is wrong as it does not limit or filter the data and it is not a cost-effective solution.,Option C is wrong as encryption cannot be done at each employee record level. 正确答案: D 正确答案是 D,因为你可以在 Redshift 中创建一个视图,该视图基于当前用户名过滤记录,只显示已登录用户的结果。 选项 A 错误,因为 IAM 组中的用户无法限制对 Redshift 中数据的访问。 选项 B 错误,因为它不能限制或过滤数据,并且不是一种具有成本效益的解决方案。 选项 C 错误,因为加密不能在每个员工记录级别进行。 30 / 100 分类: DBS 30. 30. A company with a support organization needs support engineers to be able to search historic cases to provide fast responses on new issues raised. The company has forwarded all support messages into an Amazon Kinesis Stream. This meets a company objective of using only managed services to reduce operational overhead. The company needs an appropriate architecture that allows support engineers to search on historic cases and find similar issues and their associated responses. Which AWS Lambda action is most appropriate? 30. 一家拥有支持组织的公司需要支持工程师能够搜索历史案例,以便对新提出的问题提供快速响应。该公司已将所有支持消息转发到 Amazon Kinesis Stream 中。这符合公司仅使用托管服务来减少操作开销的目标。公司需要一个合适的架构,允许支持工程师搜索历史案例并找到类似的问题及其相关响应。哪种 AWS Lambda 操作最为合适? A. A. Ingest and index the content into an Amazon Elasticsearch domain. A. 将内容导入并索引到Amazon Elasticsearch域中。 B. B. Stem and tokenize the input and store the results into Amazon ElastiCache. B. 对输入进行词干提取和分词,并将结果存储到 Amazon ElastiCache 中。 C. C. Write data as JSON into Amazon DynamoDB with primary and secondary indexes. C. 将数据以JSON格式写入Amazon DynamoDB,包含主索引和辅助索引。 D. D. Aggregate feedback in Amazon S3 using a columnar format with partitioning. D. 使用带分区的列式格式在Amazon S3中汇总反馈。 正确答案: A Correct answer is A as Elasticsearch provides full text search capability and is a fully managed AWS service.,Refer AWS documentation – Elasticsearch Integration,You can load streaming data into your Amazon Elasticsearch Service domain from many different sources. Some sources, like Amazon Kinesis Data Firehose and Amazon CloudWatch Logs, have built-in support for Amazon ES. Others, like Amazon S3, Amazon Kinesis Data Streams, and Amazon DynamoDB, use AWS Lambda functions as event handlers. The Lambda functions respond to new data by processing it and streaming it to your domain.,Option B is wrong as Elasticache do not provide search capability, but more of a key value lookup and caching ability.,Option C is wrong as its not easy and cost-efficient to search on DynamoDB,Option D is wrong as S3 does not provide any search capability. 正确答案: A 正确答案是A,因为Elasticsearch提供了全文搜索功能,并且是一个完全托管的AWS服务。 参考AWS文档 – Elasticsearch集成 您可以将流数据从多个不同的来源加载到您的Amazon Elasticsearch Service域中。一些来源,如Amazon Kinesis Data Firehose和Amazon CloudWatch Logs,已经内建对Amazon ES的支持。其他来源,如Amazon S3、Amazon Kinesis Data Streams和Amazon DynamoDB,使用AWS Lambda函数作为事件处理程序。Lambda函数通过处理新数据并将其流式传输到您的域来响应数据。 选项B是错误的,因为Elasticache不提供搜索功能,而是更多提供键值查找和缓存能力。 选项C是错误的,因为在DynamoDB上进行搜索既不容易也不具成本效益。 选项D是错误的,因为S3不提供任何搜索功能。 31 / 100 分类: DBS 31. 31. An online retailer is using Amazon DynamoDB to store data related to customer transactions. The items in the table contains several string attributes describing the transaction as well as a JSON attribute containing the shopping cart and other details corresponding to the transaction. Average item size is – 250KB, most of which is associated with the JSON attribute. The average customer generates – 3GB of data per month. Customers access the table to display their transaction history and review transaction details as needed. Ninety percent of the queries against the table are executed when building the transaction history view, with the other 10% retrieving transaction details. The table is partitioned on CustomerID and sorted on transaction date. The client has very high read capacity provisioned for the table and experiences very even utilization, but complains about the cost of Amazon DynamoDB compared to other NoSQL solutions. Which strategy will reduce the cost associated with the client’s read queries while not degrading quality? 31. 一家在线零售商正在使用 Amazon DynamoDB 存储与客户交易相关的数据。表中的项目包含多个字符串属性,用于描述交易,以及一个 JSON 属性,包含购物车和其他与交易相关的详细信息。每个项目的平均大小为 250KB,其中大部分与 JSON 属性相关。每个客户每月生成大约 3GB 的数据。客户访问该表以显示他们的交易历史记录,并根据需要查看交易详细信息。针对该表的 90% 查询是在构建交易历史记录视图时执行的,其余的 10% 用于检索交易详细信息。该表按 CustomerID 分区,并按交易日期排序。客户端为该表预配置了非常高的读取容量,并且利用率非常均匀,但抱怨与其他 NoSQL 解决方案相比,Amazon DynamoDB 的成本过高。哪种策略可以在不降低质量的情况下减少与客户端读取查询相关的成本? A. A. Modify all database calls to use eventually consistent reads and advise customers that transaction history may be one second out-of-date. A. 修改所有数据库调用,使用最终一致性读取,并告知客户交易历史可能会滞后约一秒钟。 B. B. Change the primary table to partition on TransactionID, create a GSI partitioned on customer and sorted on date, project small attributes into GSI, and then query GSI for summary data and the primary table for JSON details. B. 将主表更改为按TransactionID进行分区,创建一个按客户分区并按日期排序的GSI,将小属性投影到GSI中,然后查询GSI以获取汇总数据,查询主表以获取JSON详细信息。 C. C. Vertically partition the table, store base attributes on the primary table, and create a foreign key reference to a secondary table containing the JSON data. Query the primary table for summary data and the secondary table for JSON details. C. 垂直划分表格,将基本属性存储在主表中,并创建一个外键引用指向包含JSON数据的副表。查询主表以获取摘要数据,查询副表以获取JSON详细信息。 D. D. Create an LSI sorted on date, project the JSON attribute into the index, and then query the primary table for summary data and the LSI for JSON details. D. 创建一个按日期排序的LSI,将JSON属性投影到索引中,然后查询主表的汇总数据和LSI的JSON详情。 正确答案: B Correct answer is B as the key requirement is to reduce cost without affecting quality. The issue here is the JSON data is being read always even though it is not needed 90% of time. As the data size for JSON is huge compared to other attributes, the provisioned throughput needed is high. The issue can be resolved by retrieving only the details for history and the transaction details when needed. Creating a GSI for the base transaction history and using the primary for transaction summary would work perfectly.,Option A is wrong as it would affect the quality.,Option C is wrong as it is not possible.,Option D is wrong as the LSI needs to be created with the sort key as transaction id instead of date. This will allow to retrieve the transaction details from LSI and summary from primary table.,Also, note, both options B & D need changes to the base table as you cannot change the primary key of the table once created and nor can you create a LSI after the table is created. 正确答案: B 正确答案是B,因为关键要求是降低成本而不影响质量。这里的问题是JSON数据总是被读取,即使90%的时间不需要它。由于与其他属性相比,JSON数据的大小巨大,因此所需的预配置吞吐量较高。这个问题可以通过仅在需要时检索历史记录和交易详情来解决。为基础交易历史创建GSI,并使用主键进行交易摘要会完全有效。 选项A是错误的,因为它会影响质量。 选项C是错误的,因为这是不可能的。 选项D是错误的,因为LSI需要使用交易ID作为排序键而不是日期。这样可以从LSI中检索交易详情,从主表中获取摘要。 另外,请注意,B和D选项都需要对基础表进行更改,因为一旦创建了表,无法更改主键,也无法在表创建后创建LSI。 32 / 100 分类: DBS 32. 32. Your client needs to load a 600 GB file into a Redshift cluster from S3, using the Redshift COPY command. The file has several known (and potentially some unknown) issues that will probably cause the load process to fail. How should the client most efficiently detect load errors without needing to perform cleanup if the load process fails? 32. 您的客户需要使用 Redshift COPY 命令将一个 600 GB 的文件从 S3 加载到 Redshift 集群中。该文件有几个已知的问题(可能还包括一些未知的问题),这些问题很可能导致加载过程失败。客户应该如何最有效地检测加载错误,而无需在加载过程失败时执行清理操作? A. A. Split the 600 GB file into smaller 25 GB chunks and load each separately. A. 将600 GB的文件拆分成更小的25 GB块,并分别加载每个块。 B. B. Compress the input file before running COPY. B. 在运行COPY之前压缩输入文件。 C. C. Write a script to delete the data from the tables in case of errors. C. 编写脚本以在出现错误时从表中删除数据。 D. D. Use the COPY command with the NOLOAD parameter. D. 使用带有 NOLOAD 参数的 COPY 命令。 正确答案: D Correct answer is D as NOLOAD checks the integrity of all of the data without loading it into the database. The NOLOAD option displays any errors that would occur if you had attempted to load the data. All other options will require subsequent processing on the cluster which will consume resources.,Refer AWS documentation – Data Load Copy Parameters,If you want to validate your data without actually loading the table, use the NOLOAD option with the COPY command. 正确答案: D 正确答案是D,因为 NOLOAD 检查所有数据的完整性,而无需将其加载到数据库中。NOLOAD 选项会显示在尝试加载数据时可能出现的任何错误。所有其他选项都需要在集群上进行后续处理,这将消耗资源。 参考 AWS 文档 – 数据加载复制参数 如果你希望验证数据而不实际加载表格,请使用 COPY 命令中的 NOLOAD 选项。 33 / 100 分类: DBS 33. 33. A company that manufactures and sells smart air conditioning units also offers add-on services so that customers can see real-time dashboards in a mobile application or a web browser. Each unit sends its sensor information in JSON format every two seconds for processing and analysis. The company also needs to consume this data to predict possible equipment problems before they occur. A few thousand pre-purchased units will be delivered in the next couple of months. The company expects high market growth in the next year and needs to handle a massive amount of data and scale without interruption. Which ingestion solution should the company use? 33. 一家制造和销售智能空调的公司还提供附加服务,让客户可以在移动应用程序或网页浏览器中查看实时仪表盘。每台空调每两秒钟以JSON格式发送其传感器信息进行处理和分析。公司还需要消耗这些数据,以预测设备可能出现的问题。在接下来的几个月里,几千台预购的空调将被交付。公司预计明年市场增长迅猛,需要处理海量数据并在不中断的情况下进行扩展。公司应该使用哪种数据摄取解决方案? A. A. Write sensor data records to Amazon Kinesis Streams. Process the data using KCL applications for the end-consumer dashboard and anomaly detection workflows. A. 将传感器数据记录写入 Amazon Kinesis Streams。使用 KCL 应用程序处理数据,以供终端消费者仪表盘和异常检测工作流使用。 B. B. Batch sensor data to Amazon Simple Storage Service (S3) every 15 minutes. Flow the data downstream to the end-consumer dashboard and to the anomaly detection application. B. 每15分钟将传感器数据批量传输到亚马逊简单存储服务(S3)。将数据流向下游,传送到最终消费者仪表板和异常检测应用程序。 C. C. Write sensor data records to Amazon Kinesis Firehose with Amazon Simple Storage Service (S3) as the destination. Consume the data with a KCL application for the end-consumer dashboard and anomaly detection. C. 将传感器数据记录写入 Amazon Kinesis Firehose,使用 Amazon Simple Storage Service (S3) 作为目标。通过 KCL 应用程序消费数据,用于终端用户仪表盘和异常检测。 D. D. Write sensor data records to Amazon Relational Database Service (RDS). Build both the end-consumer dashboard and anomaly detection application on top of Amazon RDS. D. 将传感器数据记录写入亚马逊关系数据库服务(RDS)。在亚马逊RDS基础上构建最终消费者仪表板和异常检测应用程序。 正确答案: A Correct answer is A as Kinesis Data Streams can help handle the streaming data. Kinesis Streams provides you with the ability to build custom applications to process and analyze streaming data using KCL which can be used for anomaly detection and processing.,Refer AWS documentation – Kinesis Streams,Although you can use Kinesis Data Streams to solve a variety of streaming data problems, a common use is the real-time aggregation of data followed by loading the aggregate data into a data warehouse or map-reduce cluster.,Data is put into Kinesis data streams, which ensures durability and elasticity. The delay between the time a record is put into the stream and the time it can be retrieved (put-to-get delay) is typically less than 1 second. In other words, a Kinesis Data Streams application can start consuming the data from the stream almost immediately after the data is added. The managed service aspect of Kinesis Data Streams relieves you of the operational burden of creating and running a data intake pipeline. You can create streaming map-reduce–type applications. The elasticity of Kinesis Data Streams enables you to scale the stream up or down, so that you never lose data records before they expire.,Multiple Kinesis Data Streams applications can consume data from a stream, so that multiple actions, like archiving and processing, can take place concurrently and independently. For example, two applications can read data from the same stream. The first application calculates running aggregates and updates an Amazon DynamoDB table, and the second application compresses and archives data to a data store like Amazon Simple Storage Service (Amazon S3). The DynamoDB table with running aggregates is then read by a dashboard for up-to-the-minute reports.,The Kinesis Client Library enables fault-tolerant consumption of data from streams and provides scaling support for Kinesis Data Streams applications.,Option B is wrong as S3 with batching would not provide near real time analysis on the stream data.,Option C is wrong as Kinesis Firehose would only transfer the data to S3. It does not provide ability for KCL applications to work on the streaming data.,Option D is wrong as RDS is not ideal for real time streaming data. 正确答案: A 正确答案是 A,因为 Kinesis 数据流可以帮助处理流数据。Kinesis Streams 提供了构建自定义应用程序的能力,使用 KCL 处理和分析流数据,这可以用于异常检测和处理。参考 AWS 文档 – Kinesis Streams。 尽管你可以使用 Kinesis 数据流解决各种流数据问题,一个常见的用法是对数据进行实时聚合,然后将聚合数据加载到数据仓库或 MapReduce 集群中。 数据被放入 Kinesis 数据流中,从而确保了持久性和弹性。从将记录放入流中到可以检索记录的时间(放入到获取延迟)通常小于 1 秒。换句话说,Kinesis 数据流应用程序几乎可以在数据添加后立即开始从流中消费数据。Kinesis 数据流的托管服务部分解除了你创建和运行数据输入管道的操作负担。你可以创建流式 MapReduce 类型的应用程序。Kinesis 数据流的弹性使你能够根据需要扩展或缩减流量,因此在数据记录过期之前,你不会丢失数据。 多个 Kinesis 数据流应用程序可以从一个流中消费数据,这样就可以并发且独立地执行多个操作,如归档和处理。例如,两个应用程序可以从同一个流中读取数据。第一个应用程序计算运行中的聚合并更新 Amazon DynamoDB 表,第二个应用程序将数据压缩并归档到像 Amazon Simple Storage Service(Amazon S3)这样的数据存储中。然后,包含运行中聚合的 DynamoDB 表被仪表板读取,用于生成最新的报告。 Kinesis 客户端库支持从流中容错地消费数据,并为 Kinesis 数据流应用程序提供扩展支持。 选项 B 是错误的,因为使用批处理的 S3 不提供对流数据的近实时分析。 选项 C 是错误的,因为 Kinesis Firehose 只会将数据传输到 S3。它不提供 KCL 应用程序对流数据进行操作的能力。 选项 D 是错误的,因为 RDS 不适合实时流数据。 34 / 100 分类: DBS 34. 34. A web application is using Amazon Kinesis Streams for clickstream data that may not be consumed for up to 12 hours. As a security requirement, how can the data be secured at rest within the Kinesis Streams? 34. 一个 web 应用程序正在使用 Amazon Kinesis Streams 处理点击流数据,这些数据可能会在最多 12 小时内没有被消费。作为安全要求,如何在 Kinesis Streams 中确保数据在静态时的安全性? A. A. Enable SSL connections to Kinesis A. 启用SSL连接到Kinesis B. B. Use Amazon Kinesis Consumer Library B. 使用Amazon Kinesis消费者库 C. C. Encrypt the data once it is at rest with a Lambda function C. 使用Lambda函数在数据静止时对其进行加密 D. D. Enable server-side encryption in Kinesis Streams D. 在Kinesis Streams中启用服务器端加密 正确答案: D Correct answer is D as Kinesis support Server Side Encryption with which the data can be encrypted at rest.,Refer AWS documentation – Kinesis Server Side Encryption,Server-side encryption is a feature in Amazon Kinesis Data Streams that automatically encrypts data before it’s at rest by using an AWS KMS customer master key (CMK) you specify. Data is encrypted before it’s written to the Kinesis stream storage layer, and decrypted after it’s retrieved from storage. As a result, your data is encrypted at rest within the Kinesis Data Streams service. This allows you to meet strict regulatory requirements and enhance the security of your data.,With server-side encryption, your Kinesis stream producers and consumers don’t need to manage master keys or cryptographic operations. Your data is automatically encrypted as it enters and leaves the Kinesis Data Streams service, so your data at rest is encrypted. AWS KMS provides all the master keys that are used by the server-side encryption feature. AWS KMS makes it easy to use a CMK for Kinesis that is managed by AWS, a user-specified AWS KMS CMK, or a master key imported into the AWS KMS service.,Option A is wrong as SSL/TLS is for Encryption for data in transit,Option B is wrong as Kinesis consumer library is for consumption of data.,Option C is wrong as the data is not encrypted when stored 正确答案: D 正确答案是 D,因为 Kinesis 支持服务器端加密,通过该功能可以对静态数据进行加密。参考 AWS 文档 – Kinesis 服务器端加密。 服务器端加密是 Amazon Kinesis 数据流中的一项功能,它通过使用您指定的 AWS KMS 客户主密钥(CMK)自动加密数据,使数据在静态时保持加密。数据在写入 Kinesis 流存储层之前就会被加密,取出存储后则会被解密。因此,您的数据在 Kinesis 数据流服务中处于静态加密状态。这使您能够满足严格的合规要求并增强数据的安全性。 通过服务器端加密,您的 Kinesis 流生产者和消费者无需管理主密钥或加密操作。数据在进入和离开 Kinesis 数据流服务时会自动加密,因此您的静态数据是加密的。AWS KMS 提供所有由服务器端加密功能使用的主密钥。AWS KMS 使得使用 AWS 管理的 CMK、用户指定的 AWS KMS CMK 或导入到 AWS KMS 服务中的主密钥变得简单。 选项 A 错误,因为 SSL/TLS 用于传输中的数据加密。 选项 B 错误,因为 Kinesis 消费者库用于数据消费。 选项 C 错误,因为数据在存储时未加密。 35 / 100 分类: DBS 35. 35. You’re launching a test Elasticsearch cluster with the Amazon Elasticsearch Service, and you’d like to restrict access to only your office desktop computer that you occasionally share with an intern to allow her to get more experience interacting with Elasticsearch. What’s the easiest way to do this? 35. 你正在使用Amazon Elasticsearch Service启动一个测试Elasticsearch集群,且希望将访问权限限制为仅限你偶尔与实习生共享的办公室桌面电脑,以便让她获得更多与Elasticsearch交互的经验。最简单的方法是什么? A. A. Create a username and password combination to allow you to sign into the cluster. A. 创建一个用户名和密码组合,以便让您能够登录集群。 B. B. Create an SSH key and add that to the accepted keys of the Elasticsearch cluster. Then store that SSH key on your desktop and use it to sign in. B. 创建一个SSH密钥,并将其添加到Elasticsearch集群的接受密钥中。 然后将该SSH密钥保存在桌面上,并使用它进行登录。 C. C. Create an IAM user and role that allows access to the Elasticsearch cluster. C. 创建一个 IAM 用户和角色,以允许访问 Elasticsearch 集群。 D. D. Create an IP-based resource policy on the Elasticsearch cluster that allows access to requests coming from the IP of the machine. D. 在Elasticsearch集群上创建基于IP的资源策略,允许来自机器IP的请求访问。 正确答案: D Correct answer is D as IP-based resource policy can restrict access to the specific IP addresses only.,Refer AWS documentation – Elasticsearch Access Control,IP-based Policies – IP-based policies restrict access to a domain to one or more IP addresses or CIDR blocks. Technically, IP-based policies are not a distinct type of policy. Instead, they are just resource-based policies that specify an anonymous principal and include a special Condition element.,The primary appeal of IP-based policies is that they allow unsigned requests to an Amazon ES domain, which lets you use clients like curl and Kibana or access the domain through a proxy server.,Options A & B are user/password and ssh keys do not apply to Elasticsearch cluster.,Options C is wrong as you can define identity based policies, however it would not limit access to the specific IP. 正确答案: D 正确答案是 D,因为基于 IP 的资源策略仅能限制对特定 IP 地址的访问。请参考 AWS 文档 – Elasticsearch 访问控制,基于 IP 的策略 – 基于 IP 的策略将访问限制到一个或多个 IP 地址或 CIDR 块。从技术上讲,基于 IP 的策略并不是一种独立的策略类型。相反,它们只是资源基于的策略,指定一个匿名主体并包括一个特殊的条件元素。 基于 IP 的策略的主要吸引力在于它们允许对 Amazon ES 域进行未签名的请求,这使得您可以使用像 curl 和 Kibana 这样的客户端,或者通过代理服务器访问该域。 A 和 B 选项是用户/密码和 SSH 密钥,不适用于 Elasticsearch 集群。 C 选项错误,因为您可以定义基于身份的策略,但它不会限制对特定 IP 的访问。 36 / 100 分类: DBS 36. 36. Your application development team is building a solution with two applications. The security team wants each application’s logs to be captured in two different places because one of the applications produces logs with sensitive data. How can you meet the requirements with the least risk and effort? 36. 你的应用开发团队正在构建一个包含两个应用的解决方案。安全团队希望每个应用的日志都能在两个不同的地方进行记录,因为其中一个应用生成包含敏感数据的日志。你如何在最小的风险和努力下满足这些要求? A. A. Aggregate logs into one file, then use Amazon CloudWatch Logs and then design two CloudWatch metric filters to filter sensitive data from the logs. A. 将日志汇总到一个文件中,然后使用 Amazon CloudWatch Logs,再设计两个 CloudWatch 指标过滤器,从日志中过滤敏感数据。 B. B. Use Amazon CloudWatch logs to capture all logs, write an AWS Lambda function that parses the log file, and move sensitive data to a different log. B. 使用Amazon CloudWatch日志捕获所有日志,编写一个AWS Lambda函数解析日志文件,并将敏感数据移到不同的日志中。 C. C. Add logic to the application that saves sensitive data logs on the Amazon EC2 instances’ local storage, and write a batch script that logs into the EC2 instances and moves sensitive logs to a secure location. C. 向应用程序添加逻辑,将敏感数据日志保存在Amazon EC2实例的本地存储中,并编写一个批处理脚本,该脚本登录到EC2实例并将敏感日志移动到安全位置。 D. D. Use Amazon CloudWatch logs with two log groups, one for each application, and use an AWS IAM policy to control access to the log groups as required. D. 使用 Amazon CloudWatch 日志,创建两个日志组,每个应用程序一个,并使用 AWS IAM 策略根据需要控制对日志组的访问。 正确答案: D Correct answer is D as different CloudWatch log groups can be created, which can have separate access control policies.,Refer AWS documentation – CloudWatch Log Groups,A log group is a group of log streams that share the same retention, monitoring, and access control settings. You can define log groups and specify which streams to put into each group. There is no limit on the number of log streams that can belong to one log group.,Option A is wrong as the logs file are still combined and can be accessed by anyone who has access.,Option B is wrong as its an overhead to create a Lambda function.,Option C is wrong as this would need application changes. 正确答案: D 正确答案是D,因为可以创建不同的CloudWatch日志组,并且这些日志组可以拥有独立的访问控制策略。参考AWS文档 – CloudWatch日志组。 日志组是共享相同保留、监控和访问控制设置的日志流的集合。您可以定义日志组并指定将哪些流放入每个组中。一个日志组可以包含任意数量的日志流,没有限制。 选项A是错误的,因为日志文件仍然被合并,并且可以被任何有访问权限的人访问。 选项B是错误的,因为创建Lambda函数是一个额外的开销。 选项C是错误的,因为这将需要对应用程序进行更改。 37 / 100 分类: DBS 37. 37. There are thousands of text files on Amazon S3. The total size of the files is 1 PB. The files contain retail order information for the past 2 years. A data engineer needs to run multiple interactive queries to manipulate the data. The Data Engineer has AWS access to spin up an Amazon EMR cluster. The data engineer needs to use an application on the cluster to process this data and return the results in interactive time frame. Which application on the cluster should the data engineer use? 37. 亚马逊S3上有数千个文本文件。文件的总大小为1PB。这些文件包含过去两年的零售订单信息。数据工程师需要运行多个交互式查询来处理数据。数据工程师有AWS访问权限,可以启动一个Amazon EMR集群。数据工程师需要在集群上使用一个应用程序来处理这些数据,并在交互式时间范围内返回结果。数据工程师应该使用集群上的哪个应用程序? A. A. Oozie A. Oozie B. B. Apache Pig with Tachyon B. Apache Pig与Tachyon C. C. Apache Hive C. Apache Hive D. D. Presto D. Presto 正确答案: D Correct answer is D as Presto can help work on Petabytes of data with the interactive ability.,Refer AWS documentation – EMR Presto,Presto is an open-source distributed SQL query engine optimized for low-latency, ad-hoc analysis of data. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can process data from multiple data sources including the Hadoop Distributed File System (HDFS) and Amazon S3.,You can quickly and easily create managed Presto clusters from the AWS Management Console, AWS CLI, or the Amazon EMR API. Additionally, you can leverage additional Amazon EMR features, including fast Amazon S3 connectivity, integration with Amazon EC2 Spot instances, choice of a wide variety of Amazon EC2 instances, including the memory optimized instances, and resize commands to easily add or remove instances from your cluster.,Presto uses a custom query execution engine with operators designed to support SQL semantics. Different from Hive/MapReduce, Presto executes queries in memory, pipelined across the network between stages, thus avoiding unnecessary I/O. The pipelined execution model runs multiple stages in parallel and streams data from one stage to the next as it becomes available.,Run interactive queries that directly access data in Amazon S3, save costs using Amazon EC2 Spot instance capacity, use Auto Scaling to dynamically add and remove capacity, and launch long-running or ephemeral clusters to match your workload. You can also add other Hadoop ecosystem applications on your cluster.,Option A is wrong as Oozie is more of a schedular tool.,Options B & C are wrong as they will work with Map Reduce. 正确答案: D 正确答案是D,因为Presto可以通过交互式能力处理PB级的数据。请参考AWS文档 – EMR Presto。 Presto是一个开源的分布式SQL查询引擎,优化用于低延迟、临时数据分析。它支持ANSI SQL标准,包括复杂查询、聚合、连接和窗口函数。Presto可以处理来自多个数据源的数据,包括Hadoop分布式文件系统(HDFS)和Amazon S3。 您可以通过AWS管理控制台、AWS CLI或Amazon EMR API快速轻松地创建托管的Presto集群。此外,您还可以利用Amazon EMR的其他功能,包括快速的Amazon S3连接、与Amazon EC2 Spot实例的集成、选择各种Amazon EC2实例,包括内存优化实例,以及调整大小命令,以轻松地向集群中添加或删除实例。 Presto使用一个自定义的查询执行引擎,带有支持SQL语义的操作符。与Hive/MapReduce不同,Presto在内存中执行查询,跨网络管道传输各个阶段的数据,从而避免了不必要的I/O。管道执行模型并行运行多个阶段,并在数据可用时将数据从一个阶段流向下一个阶段。 运行直接访问Amazon S3中数据的交互式查询,使用Amazon EC2 Spot实例容量节省成本,使用Auto Scaling动态地添加和删除容量,启动长期运行或临时集群以匹配您的工作负载。您还可以在集群中添加其他Hadoop生态系统应用程序。 选项A是错误的,因为Oozie更多的是一个调度工具。 选项B和C是错误的,因为它们将与MapReduce一起工作。 38 / 100 分类: DBS 38. 38. A company hosts a web application on AWS which uses RDS instance to store critical data. As a part of a security audit, it was recommended hardening of RDS instance. What actions would help achieve the same? (Select TWO) 38. 一家公司在AWS上托管一个Web应用程序,该应用程序使用RDS实例存储关键数据。作为安全审核的一部分,建议加固RDS实例。哪些操作有助于实现这一目标?(选择两个) A. A. Use Secure Socket Layer (SSL) connections with DB instances A. 使用安全套接字层(SSL)连接与数据库实例。 B. B. Use AWS CloudTrail to track all the SSH access to the RDS instance B. 使用 AWS CloudTrail 跟踪所有对 RDS 实例的 SSH 访问 C. C. Use AWS Inspector to apply patches to the RDS instance C. 使用AWS Inspector对RDS实例应用补丁 D. D. Use RDS encryption to secure the RDS instances and snapshots at rest. D. 使用RDS加密来保护RDS实例和快照的静态数据。 正确答案: A, D Correct answer are A, D as the RDS security can tightened using SSL connection and encryption.,Refer AWS documentation – RDS Security,Option B is wrong you cannot SSH into an RDS instance and CloudTrail does not track SSH logins.,Option C is wrong as the RDS instance is AWS managed. 正确答案: A, D 正确答案是 A, D,因为可以通过使用 SSL 连接和加密来增强 RDS 安全性。参见 AWS 文档 – RDS 安全性。 选项 B 错误,无法通过 SSH 访问 RDS 实例,并且 CloudTrail 不跟踪 SSH 登录。 选项 C 错误,因为 RDS 实例由 AWS 管理。 检查 39 / 100 分类: DBS 39. 39. A data engineer chooses Amazon DynamoDB as a data store for a regulated application. This application must be submitted to regulators for review. The data engineer needs to provide a control framework that lists the security controls from the process to follow to add new users down to the physical controls of the data center, including items like security guards and cameras. How should this control mapping be achieved using AWS? 39. 一名数据工程师选择了 Amazon DynamoDB 作为受监管应用程序的数据存储。这款应用程序必须提交给监管机构进行审查。数据工程师需要提供一个控制框架,列出从添加新用户的过程到数据中心的物理控制(如保安和摄像头)等项的安全控制。应该如何使用 AWS 实现这一控制映射? A. A. Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS responsibilities to the controls that must be provided. A. 请求AWS第三方审计报告和/或AWS质量附录,并将AWS的责任映射到必须提供的控制措施。 B. B. Request data center Temporary Auditor access to an AWS data center to verify the control mapping. B. 请求数据中心临时审计员访问AWS数据中心,以验证控制映射。 C. C. Request relevant SLAs and security guidelines for Amazon DynamoDB and define these guidelines within the application’s architecture to map to the control framework. C. 请求与Amazon DynamoDB相关的服务级别协议(SLA)和安全指南,并在应用程序架构中定义这些指南,以便映射到控制框架。 D. D. Request Amazon DynamoDB system architecture designs to determine how to map the AWS responsibilities to the control that must be provided. D. 请求Amazon DynamoDB系统架构设计,以确定如何将AWS的责任映射到必须提供的控制措施。 正确答案: A Correct answer is A as these are AWS specific and not accessible directly. AWS provides access to third party audit reports to confirm the same.,Refer AWS documentation – Risk Compliance Whitepaper,AWS and its customers share control over the IT environment, both parties have responsibility for managing the IT environment. AWS’ part in this shared responsibility includes providing its services on a highly secure and controlled platform and providing a wide array of security features customers can use. The customers’ responsibility includes configuring their IT environments in a secure and controlled manner for their purposes. While customers don’t communicate their use and configurations to AWS, AWS does communicate its security and control environment relevant to customers. AWS does this by doing the following:,Option B is wrong as AWS does not allow access to the data center.,Option C is wrong as security guidelines is not specific to DynamoDB and it is pretty much customer controlled.,Option D is wrong as AWS does not share any DynamoDB system architecture design document. 正确答案: A 正确答案是A,因为这些是AWS特有的,不能直接访问。AWS提供了第三方审计报告的访问权限以确认这一点。参阅AWS文档 – 风险合规白皮书。 AWS和其客户共享IT环境的控制权,双方都有责任管理IT环境。AWS在这种共享责任中的角色包括在高度安全和受控的平台上提供其服务,并提供一系列安全功能供客户使用。客户的责任包括以安全和受控的方式配置他们的IT环境,以满足他们的需求。虽然客户不向AWS传达他们的使用和配置情况,但AWS会传达与客户相关的安全和控制环境。AWS通过以下方式做到这一点: 选项B是错误的,因为AWS不允许访问数据中心。 选项C是错误的,因为安全指南不是DynamoDB特有的,而且它基本上由客户控制。 选项D是错误的,因为AWS不会共享任何DynamoDB系统架构设计文档。 40 / 100 分类: DBS 40. 40. You need to filter and transform incoming messages coming from a smart sensor you have connected with AWS. Once messages are received, you need to store them as time series data in DynamoDB. Which AWS service can you use? 40. 你需要过滤并转换来自你连接到AWS的智能传感器的输入消息。一旦消息被接收,你需要将它们作为时间序列数据存储在DynamoDB中。你可以使用哪个AWS服务? A. A. IoT Device Shadow Service A. 物联网设备影子服务 B. B. Redshift B. Redshift C. C. Kinesis C. Kinesis D. D. IoT Rules Engine D. 物联网规则引擎 正确答案: D Correct answer is D as IoT Rules Engine can be used to capture data from Sensor and data received from the device can be inserted into DynamoDB.,Refer AWS documentation – AWS IoT Rules,Rules give your devices the ability to interact with AWS services. Rules are analyzed and actions are performed based on the MQTT topic stream. You can use rules to support tasks like these:,Option A is wrong as a device’s shadow is a JSON document that is used to store and retrieve current state information for a device. The Device Shadow service maintains a shadow for each device you connect to AWS IoT. You can use the shadow to get and set the state of a device over MQTT or HTTP, regardless of whether the device is connected to the Internet. Each device’s shadow is uniquely identified by the name of the corresponding thing.,Option B is wrong as Redshift is a data warehousing solution.,Option C is wrong as while Kinesis could technically be used as an intermediary between different sources, it isn’t a great way to get data into DynamoDB from an IoT device. 正确答案: D 正确答案是 D,因为 IoT 规则引擎可以用来捕获传感器的数据,并且从设备接收到的数据可以插入到 DynamoDB 中。参考 AWS 文档 – AWS IoT 规则,规则赋予你的设备与 AWS 服务互动的能力。规则会被分析,并根据 MQTT 主题流执行相应的操作。你可以使用规则支持如下任务: 选项 A 错误,因为设备的影像是一个 JSON 文档,用于存储和检索设备的当前状态信息。设备影像服务为你连接到 AWS IoT 的每个设备维护一个影像。你可以使用影像通过 MQTT 或 HTTP 获取和设置设备的状态,无论设备是否连接到互联网。每个设备的影像都通过对应事物的名称唯一标识。 选项 B 错误,因为 Redshift 是一个数据仓库解决方案。 选项 C 错误,因为虽然 Kinesis 在技术上可以作为不同来源之间的中介,但它并不是一个理想的方式来从 IoT 设备将数据传输到 DynamoDB。 41 / 100 分类: DBS 41. 41. An administrator is processing events in near real-time using Kinesis streams and Lambda. Lambda intermittently fails to process batches from one of the shards due to a 15-minute time limit. What is a possible solution for this problem? 41. 一名管理员正在使用 Kinesis 流和 Lambda 处理接近实时的事件。由于 15 分钟的时间限制,Lambda 间歇性地无法处理来自其中一个分片的批次。这个问题的一个可能解决方案是什么? A. A. Add more Lambda functions to improve concurrent batch processing. A. 添加更多Lambda函数以提高并发批处理能力。 B. B. Reduce the batch size that Lambda is reading from the stream. B. 减少Lambda从流中读取的批量大小。 C. C. Ignore and skip events that are older than 15 minutes and put them to Dead Letter Queue (DLQ). C. 忽略并跳过超过15分钟的事件,并将它们放入死信队列(DLQ)。 D. D. Configure Lambda to read from fewer shards in parallel. D. 配置Lambda以并行读取更少的分片。 正确答案: B Correct answer is B as Lambda reads in batches from Kinesis from a single shard, and hence it might timeout if the batch of records is huge.,Refer AWS documentation – Lambda with Kinesis,You can use an AWS Lambda function to process records in an Amazon Kinesis data stream. With Kinesis, you can collect data from many sources and process them with multiple consumers. Lambda supports standard data stream iterators and HTTP/2 stream consumers.,Lambda reads records from the data stream and invokes your function synchronously with an event that contains stream records. Lambda reads records in batches and invokes your function to process records from the batch.,Your Lambda function is a consumer application for your data stream. It processes one batch of records at a time from each shard.,For standard iterators, Lambda polls each shard in your Kinesis stream for records at a base rate of once per second. When more records are available, Lambda keeps processing batches until it receives a batch that’s smaller than the configured maximum batch size. The function shares read throughput with other consumers of the shard.,Option A is wrong as adding Lambda function does not reduce the batch of records read from Kinesis.,Option C is wrong as ignoring the data would lead to data loss and does not solve the problem,Option D is wrong as Lambda function reads from a single Kinesis shard. 正确答案: B 正确答案是 B,因为 Lambda 从 Kinesis 中的单个分片按批次读取数据,如果记录批次太大,可能会超时。参考 AWS 文档 – Lambda 与 Kinesis。 您可以使用 AWS Lambda 函数来处理 Amazon Kinesis 数据流中的记录。通过 Kinesis,您可以从多个来源收集数据并通过多个消费者进行处理。Lambda 支持标准数据流迭代器和 HTTP/2 流消费者。 Lambda 从数据流中读取记录,并通过包含流记录的事件同步调用您的函数。Lambda 按批次读取记录,并调用您的函数处理批次中的记录。 您的 Lambda 函数是数据流的消费者应用程序。它一次处理每个分片中的一批记录。 对于标准迭代器,Lambda 以每秒一次的基本速率轮询您的 Kinesis 流中的每个分片。当有更多记录时,Lambda 会继续处理批次,直到收到一个小于配置的最大批次大小的批次。该函数与其他消费者共享分片的读取吞吐量。 选项 A 错误,因为添加 Lambda 函数不会减少从 Kinesis 中读取的记录批次。 选项 C 错误,因为忽略数据会导致数据丢失,并不能解决问题。 选项 D 错误,因为 Lambda 函数是从单个 Kinesis 分片中读取数据。 42 / 100 分类: DBS 42. 42. A company is using Kinesis data streams to store the log data, which is processed by an application every 12 hours. As the data needs to reside in Kinesis data streams for 12 hours, the Security team wants the data to be encrypted at rest. How can it be secured in a most efficient way? 42. 一家公司正在使用 Kinesis 数据流存储日志数据,数据每 12 小时由应用程序处理一次。由于数据需要在 Kinesis 数据流中存储 12 小时,安全团队希望数据在静态时进行加密。如何以最有效的方式保护数据安全? A. A. Kinesis does not support encryption A. Kinesis 不支持加密 B. B. Encrypt using SSL/TLS for encrypting the data. B. 使用SSL/TLS加密数据。 C. C. Encrypt using S3 Server Side Encryption. C. 使用S3服务器端加密进行加密。 D. D. Encrypt using Kinesis Server Side Encryption. D. 使用Kinesis服务器端加密进行加密。 正确答案: D Correct answer is D as Kinesis support Server Side Encryption with which the data can be encrypted at rest.,Refer AWS documentation – Kinesis Server Side Encryption,Server-side encryption is a feature in Amazon Kinesis Data Streams that automatically encrypts data before it’s at rest by using an AWS KMS customer master key (CMK) you specify. Data is encrypted before it’s written to the Kinesis stream storage layer, and decrypted after it’s retrieved from storage. As a result, your data is encrypted at rest within the Kinesis Data Streams service. This allows you to meet strict regulatory requirements and enhance the security of your data.,With server-side encryption, your Kinesis stream producers and consumers don’t need to manage master keys or cryptographic operations. Your data is automatically encrypted as it enters and leaves the Kinesis Data Streams service, so your data at rest is encrypted. AWS KMS provides all the master keys that are used by the server-side encryption feature. AWS KMS makes it easy to use a CMK for Kinesis that is managed by AWS, a user-specified AWS KMS CMK, or a master key imported into the AWS KMS service.,Option A is wrong as Kinesis supports encryption at rest,Option B is wrong as SSL/TLS is for Encryption for data in transit,Option C is wrong as S3 SSE does not work with Kinesis 正确答案: D 正确答案是D,因为Kinesis支持服务器端加密,可以在数据静止时进行加密。参考AWS文档 – Kinesis服务器端加密。 服务器端加密是Amazon Kinesis数据流中的一项功能,它会在数据静止之前自动加密数据,使用您指定的AWS KMS客户主密钥(CMK)。数据在写入Kinesis流存储层之前会被加密,从存储中检索后再解密。因此,您的数据在Kinesis数据流服务内静止时是加密的。这使得您能够满足严格的合规要求并增强数据的安全性。 使用服务器端加密时,您的Kinesis流生产者和消费者无需管理主密钥或加密操作。您的数据在进入和离开Kinesis数据流服务时会自动加密,因此您的静态数据是加密的。AWS KMS提供了所有用于服务器端加密功能的主密钥。AWS KMS使得使用由AWS管理的CMK、用户指定的AWS KMS CMK或导入到AWS KMS服务中的主密钥变得更加容易。 选项A是错误的,因为Kinesis支持静态数据加密。 选项B是错误的,因为SSL/TLS用于传输中的数据加密。 选项C是错误的,因为S3 SSE不能与Kinesis一起使用。 43 / 100 分类: DBS 43. 43. A company needs a churn prevention model to predict which customers will NOT renew their yearly subscription to the company’s service. The company plans to provide these customers with a promotional offer. A binary classification model that uses Amazon Machine Learning is required. On which basis should this binary classification model be built? 43. 一家公司需要一个流失预防模型,以预测哪些客户不会续订他们每年的订阅服务。公司计划向这些客户提供促销优惠。需要使用亚马逊机器学习的二分类模型。该二分类模型应基于什么基础进行构建? A. A. User profiles (age, gender, income, occupation) A. 用户资料(年龄、性别、收入、职业) B. B. Last user session B. 上次用户会话 C. C. Each user time series events in the past 3 months C. 每个用户过去3个月的时间序列事件 D. D. Quarterly results D. 季度结果 正确答案: C Correct answer is C as the time series data regarding the usage of the customer can give insights and help build and train the model,Refer AWS documentation – AWS Machine Learning Churn prediction,Options A & D are wrong as they do not give any idea of customer behaviour.,Option B is wrong as the data is too limited. 正确答案: C 正确答案是C,因为关于客户使用情况的时间序列数据可以提供洞察力,并帮助构建和训练模型,参考AWS文档 – AWS机器学习流失预测。 A和D选项是错误的,因为它们没有提供任何关于客户行为的线索。 B选项是错误的,因为数据过于有限。 44 / 100 分类: DBS 44. 44. A company launched EMR cluster to support their big data analytics requirements. They have multiple data sources built out of S3, SQL databases, MongoDB, Redis, RDS, other file systems. They are looking for distributed processing framework and programming model that helps you do machine learning, stream processing, or graph analytics using Amazon EMR clusters Which EMR Hadoop ecosystem fulfils the requirements? 44. 一家公司启动了 EMR 集群以支持他们的大数据分析需求。他们有多个数据源,包括 S3、SQL 数据库、MongoDB、Redis、RDS 以及其他文件系统。他们正在寻找一个分布式处理框架和编程模型,能够帮助他们使用 Amazon EMR 集群进行机器学习、流处理或图形分析。哪个 EMR Hadoop 生态系统能够满足这些需求? A. A. Apache Hive A. Apache Hive B. B. Apache HBase B. Apache HBase C. C. Apache HCatalog C. Apache HCatalog D. D. Apache Spark D. Apache Spark 正确答案: D Correct answer is D as Apache Spark is a distributed processing framework and programming model that helps you do machine learning, stream processing, or graph analytics using Amazon EMR clusters. Similar to Apache Hadoop, Spark is an open-source,distributed processing system commonly used for big data workloads. However, Spark has several notable differences from Hadoop MapReduce. Spark has an optimized directed acyclic graph (DAG) execution engine and actively caches data in-memory, which can boost performance, especially for certain algorithms and interactive queries.,Option A is wrong as Hive is an open-source, data warehouse, and analytic package that runs on top of a Hadoop cluster. Hive scripts use an SQL-like language called Hive QL (query language) that abstracts programming models and supports typical data warehouse interactions. Hive enables you to avoid the complexities of writing Tez jobs based on directed acyclic graphs (DAGs) or MapReduce programs in a lower level computer language, such as Java. Hive extends the SQL paradigm by including serialization formats.,Option B is wrong as HBase is an open source, non-relational, distributed database developed as part of the Apache Software Foundation’s Hadoop project. HBase runs on top of Hadoop Distributed File System (HDFS) to provide non- relational database capabilities for the Hadoop ecosystem. HBase works seamlessly with Hadoop, sharing its file system and serving as a direct input and output to the MapReduce framework and execution engine. HBase also integrates with Apache Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support for Java Database Connectivity (JDBC).,Option C is wrong as HCatalog is a tool that allows you to access Hive metastore tables within Pig, Spark SQL, and/or custom MapReduce applications. HCatalog has a REST interface and command line client that allows you to create tables or do other operations. 正确答案: D 正确答案是D,因为Apache Spark是一个分布式处理框架和编程模型,帮助你使用Amazon EMR集群进行机器学习、流处理或图分析。与Apache Hadoop类似,Spark是一个开源的分布式处理系统,通常用于大数据工作负载。然而,Spark与Hadoop MapReduce有几个显著的区别。Spark具有优化的有向无环图(DAG)执行引擎,并且积极地将数据缓存到内存中,这可以提升性能,尤其是在某些算法和交互式查询中。 选项A是错误的,因为Hive是一个开源的数据仓库和分析包,运行在Hadoop集群之上。Hive脚本使用类似SQL的语言,称为Hive QL(查询语言),它抽象了编程模型,并支持典型的数据仓库交互。Hive使你能够避免编写基于有向无环图(DAG)或MapReduce程序的Tez作业的复杂性,Tez作业使用较低级别的计算机语言,如Java。Hive通过包括序列化格式来扩展SQL范式。 选项B是错误的,因为HBase是一个开源的非关系型分布式数据库,作为Apache Software Foundation的Hadoop项目的一部分进行开发。HBase运行在Hadoop分布式文件系统(HDFS)之上,为Hadoop生态系统提供非关系型数据库功能。HBase与Hadoop无缝协作,共享其文件系统,并作为MapReduce框架和执行引擎的直接输入和输出。HBase还与Apache Hive集成,支持对HBase表进行类似SQL的查询、与基于Hive的表进行联接,并支持Java数据库连接(JDBC)。 选项C是错误的,因为HCatalog是一个工具,允许你在Pig、Spark SQL和/或自定义MapReduce应用程序中访问Hive元存储表。HCatalog具有REST接口和命令行客户端,允许你创建表或执行其他操作。 45 / 100 分类: DBS 45. 45. Your company produces customer commissioned one-of-a-kind skiing helmets combining high fashion with custom technical enhancements. Customers can show off their Individuality on the ski slopes and have access to head-up-displays. GPS rear-view cams and any other technical innovation they wish to embed in the helmet. The current manufacturing process is data rich and complex including assessments to ensure that the custom electronics and materials used to assemble the helmets are to the highest standards. Assessments are a mixture of human and automated assessments you need to add a new set of assessment to model the failure modes of the custom electronics using GPUs with CUDA across a cluster of servers with low latency networking. What architecture would allow you to automate the existing process using a hybrid approach and ensure that the architecture can support the evolution of processes over time? 45. 贵公司生产客户定制的独一无二的滑雪头盔,结合了高端时尚与定制技术增强功能。客户可以在滑雪场上展示他们的个性,并且可以使用抬头显示器、GPS后视摄像头以及他们希望嵌入头盔中的任何其他技术创新。当前的制造过程数据丰富且复杂,包括评估,以确保用于组装头盔的定制电子设备和材料符合最高标准。评估是人工和自动评估的结合,您需要添加一套新的评估来模拟定制电子设备的故障模式,使用GPU与CUDA在低延迟网络的服务器集群上进行计算。什么架构可以允许您使用混合方法自动化现有过程,并确保该架构能够支持过程随着时间的推移而发展? A. A. Use AWS Data Pipeline to manage movement of data & meta-data and assessments Use an auto-scaling group of G2 instances in a placement group. A. 使用AWS数据管道管理数据和元数据的移动以及评估使用G2实例的自动扩展组并将其放置在一个放置组中。 B. B. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of G2 instances in a placement group. B. 使用Amazon Simple Workflow (SWF) 来管理评估、数据和元数据的移动。使用放置组中的G2实例自动扩展组。 C. C. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of C3 instances with SR-IOV (Single Root I/O Virtualization). C. 使用 Amazon Simple Workflow (SWF) 来管理评估、数据和元数据的移动。使用带有 SR-IOV(单根 I/O 虚拟化)的 C3 实例的自动扩展组。 D. D. Use AWS data Pipeline to manage movement of data & meta-data and assessments use auto-scaling group of C3 with SR-IOV (Single Root I/O virtualization) D. 使用AWS数据管道来管理数据和元数据的移动,并且评估使用C3的自动扩展组和SR-IOV(单根I/O虚拟化)。 正确答案: B Key point here hybrid work flow with both automated and manual tasks and ability to replay also needing GPUs with CUD instances with low latency networking,Correct answer is B as SWF provides an ability to have both human and automated assessments with G2 instances in a placement group providing GPU and low latency networking.,Option A & D are wrong as it involves hybrid approach involving human assessments.,Option C & D are wrong as C3 and SR-IOV won’t provide GPU as well as Enhanced networking needs to be enabled 正确答案: B 关键点在于混合工作流,包含自动化和手动任务,并且需要回放功能,同时还需要使用带有低延迟网络的CUD实例的GPU。正确答案是B,因为SWF提供了同时进行人工和自动化评估的能力,且G2实例在放置组中提供GPU和低延迟网络。 A选项和D选项错误,因为它们涉及了包含人工评估的混合方法。 C选项和D选项错误,因为C3和SR-IOV无法提供GPU,并且增强网络需要被启用。 46 / 100 分类: DBS 46. 46. A company operates an international business served from a single AWS region. The company wants to expand into a new country. The regulator for that country requires the Data Architect to maintain a log of financial transactions in the country within 24 hours of the product transaction. The production application is latency insensitive. The new country contains another AWS region. What is the most cost-effective way to meet this requirement? 46. 一家公司运营着一个从单一AWS区域提供服务的国际业务。该公司希望扩展到一个新国家。该国的监管机构要求数据架构师在产品交易后的24小时内,维护该国的金融交易日志。生产应用程序对延迟不敏感。新国家包含另一个AWS区域。满足此要求的最具成本效益的方式是什么? A. A. Use CloudFormation to replicate the production application to the new region. A. 使用CloudFormation将生产应用程序复制到新区域。 B. B. Use Amazon CloudFront to serve application content locally in the country; Amazon CloudFront logs will satisfy the requirement. B. 使用Amazon CloudFront在本国本地提供应用内容;Amazon CloudFront日志将满足该要求。 C. C. Continue to serve customers from the existing region while using Amazon Kinesis to stream transaction data to the regulator. C. 在使用Amazon Kinesis将交易数据流式传输到监管机构的同时,继续为现有区域的客户提供服务。 D. D. Use Amazon S3 cross-region replication to copy and persist production transaction logs to a bucket in the new country’s region. D. 使用Amazon S3跨区域复制将生产交易日志复制并持久化到新国家区域的存储桶中。 正确答案: D Correct answer is D as only the logs need to be maintained in the new country, S3 cross region replication can be used to copy the data to the AWS region within the new Country.,Option A is wrong as there is not need for replication the complete application,Option B is wrong as CloudFront logs would only provide access logs and maybe not hold the financial transaction logs.,Option C is wrong as using Kinesis would need to build and host client application and have data storage charges as well. 正确答案: D 正确答案是D,因为只需要在新国家维护日志,可以使用S3跨区域复制将数据复制到新国家的AWS区域。选项A错误,因为不需要复制完整的应用程序。选项B错误,因为CloudFront日志只提供访问日志,可能不包含财务交易日志。选项C错误,因为使用Kinesis需要构建和托管客户端应用程序,并且还会产生数据存储费用。 47 / 100 分类: DBS 47. 47. You have recently joined a startup company building sensors to measure street noise and air quality in urban areas. The company has been running a pilot deployment of around 100 sensors for 3 months. Each sensor uploads 1KB of sensor data every minute to a backend hosted on AWS. During the pilot, you measured a peak or 10 IOPS on the database, and you stored an average of 3GB of sensor data per month in the database. The current deployment consists of a load-balanced auto scaled Ingestion layer using EC2 instances and a PostgreSQL RDS database with 500GB standard storage. The pilot is considered a success and your CEO has managed to get the attention or some potential investors. The business plan requires a deployment of at least 100K sensors, which needs to be supported by the backend. You also need to store sensor data for at least two years to be able to compare year over year Improvements. To secure funding, you have to make sure that the platform meets these requirements and leaves room for further scaling. Which setup will meet the requirements? 47. 你最近加入了一家创业公司,该公司正在建设用于测量城市地区街道噪音和空气质量的传感器。该公司已经运行了一个大约100个传感器的试点部署,已经持续了3个月。每个传感器每分钟向托管在AWS上的后端上传1KB的传感器数据。在试点期间,你在数据库上测量到了最高10次I/O操作每秒,并且每月在数据库中存储了平均3GB的传感器数据。当前的部署由一个负载均衡的自动扩展的摄取层组成,使用EC2实例和一个500GB标准存储的PostgreSQL RDS数据库。试点被认为是成功的,你的CEO成功吸引了一些潜在投资者的关注。商业计划要求至少部署10万个传感器,并且需要通过后端来支持。你还需要存储至少两年的传感器数据,以便能够比较逐年改进。为了确保资金到位,你必须确保平台满足这些要求并为进一步扩展留下空间。哪种设置将满足这些要求? A. A. Add an SQS queue to the ingestion layer to buffer writes to the RDS instance A. 在数据摄取层添加一个SQS队列,用于缓冲写入RDS实例的操作 B. B. Ingest data into a DynamoDB table and move old data to a Redshift cluster B. 将数据导入DynamoDB表,并将旧数据迁移到Redshift集群 C. C. Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage C. 将RDS实例替换为一个6节点的Redshift集群,具有96TB的存储 D. D. Keep the current architecture but upgrade RDS storage to 3TB and 10K provisioned IOPS D. 保持当前架构,但将RDS存储升级为3TB,并配置10K预配置IOPS 正确答案: B Key point here is backend supporting the data with 2 years retention and architecture being scalable,Correct answer is B as DynamoDB can be used to support the ingestion throughput via autoscaled instances and later store data into Redshift for analysis,Option A & D are wrong as RDS would not be scalable and performant with high input rate and storage for 2 years,Option C is wrong as Redshift is designed for data warehousing and would not be able to support the ingestion throughput 正确答案: B 这里的关键点是后端支持两年数据保留并且架构可扩展,正确答案是B,因为DynamoDB可以通过自动扩展的实例支持数据摄取吞吐量,并将数据存储到Redshift中进行分析,选项A和D是错误的,因为RDS在面对高输入速率和两年存储时无法提供可扩展性和性能,选项C是错误的,因为Redshift是为数据仓库设计的,无法支持数据摄取吞吐量。 48 / 100 分类: DBS 48. 48. A company receives data sets coming from external providers on Amazon S3. Data sets from different providers are dependent on one another. Data sets will arrive at different times and in no particular order. A data architect needs to design a solution that enables the company to do the following: Rapidly perform cross data set analysis as soon as the data become available Manage dependencies between data sets that arrive at different times Which architecture strategy offers a scalable and cost-effective solution that meets these Requirements? 48. 一家公司从外部提供商处接收来自Amazon S3的数据集。不同提供商的数据集彼此之间是相互依赖的。数据集将在不同的时间到达,并且没有特定的顺序。一位数据架构师需要设计一个解决方案,使公司能够做到以下几点: 一旦数据可用,快速执行跨数据集分析 管理在不同时间到达的数据集之间的依赖关系 哪种架构策略能够提供一个可扩展且具有成本效益的解决方案,满足这些要求? A. A. Maintain data dependency information in Amazon RDS for MySQL. Use an AWS Data Pipeline job to load an Amazon EMR Hive table based on task dependencies and event notification triggers in Amazon S3. A. 在 Amazon RDS for MySQL 中维护数据依赖信息。使用 AWS Data Pipeline 作业,根据任务依赖关系和 Amazon S3 中的事件通知触发器,加载 Amazon EMR Hive 表。 B. B. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon SNS and event notifications to publish data to fleet of Amazon EC2 workers. Once the task dependencies have been resolved, process the data with Amazon EMR. B. 在 Amazon DynamoDB 表中维护数据依赖信息。使用 Amazon SNS 和事件通知将数据发布到 Amazon EC2 工作节点群中。任务依赖关系解决后,使用 Amazon EMR 处理数据。 C. C. Maintain data dependency information in an Amazon ElastiCache Redis cluster. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to Redis. Once the task dependencies have been resolved, process the data with Amazon EMR. C. 在 Amazon ElastiCache Redis 集群中维护数据依赖信息。使用 Amazon S3 事件通知触发一个 AWS Lambda 函数,将 S3 对象映射到 Redis。一旦任务依赖关系得到解决,就使用 Amazon EMR 处理数据。 D. D. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to the task associated with it in DynamoDB. Once all task dependencies have been resolved, process the data with Amazon EMR. D. 在Amazon DynamoDB表中维护数据依赖信息。使用Amazon S3事件通知触发一个AWS Lambda函数,将S3对象映射到DynamoDB中与其相关联的任务。一旦所有任务依赖关系解决,使用Amazon EMR处理数据。 正确答案: D Correct answer is D as the data dependency can be managed in DynamoDB. S3 event notifications can trigger Lambda functions to map the objects and check dependency. Once all satisfied, EMR job can be triggered.,Option A is wrong as EMR hive with RDS with need resource running always and not a cost-effective solution.,Option B is wrong as using EC2 fleet of servers would not be cost-effective as compared to lambda.,Option C is wrong as ElastiCache Redis is not an ideal storage for mapping and would not be cost-effective as compared to DynamoDB being a completely managed service. 正确答案: D 正确答案是 D,因为数据依赖可以在 DynamoDB 中管理。S3 事件通知可以触发 Lambda 函数来映射对象并检查依赖关系。一旦所有条件满足,就可以触发 EMR 作业。 选项 A 错误,因为 EMR hive 与 RDS 需要持续运行资源,这不是一种具有成本效益的解决方案。 选项 B 错误,因为与 Lambda 相比,使用 EC2 服务器集群的成本效益较低。 选项 C 错误,因为 ElastiCache Redis 不是理想的映射存储,相比于 DynamoDB(完全托管的服务),它的成本效益较低。 49 / 100 分类: DBS 49. 49. A media advertising company handles a large number of real-time messages sourced from over 200 websites in real time. Processing latency must be kept low. Based on calculations, a 60-shard Amazon Kinesis stream is more than sufficient to handle the maximum data throughput, even with traffic spikes. The company also uses an Amazon Kinesis Client Library (KCL) application running on Amazon Elastic Compute Cloud (EC2) managed by an Auto Scaling group. Amazon CloudWatch indicates an average of 25% CPU and a modest level of network traffic across all running servers. The company reports a 150% to 200% increase in latency of processing messages from Amazon Kinesis during peak times. There are NO reports of delay from the sites publishing to Amazon Kinesis. What is the appropriate solution to address the latency? 49. 一家媒体广告公司处理来自200多个网站的海量实时消息。处理延迟必须保持较低。根据计算,60分片的Amazon Kinesis流足以处理最大的数据吞吐量,即使在流量高峰期。该公司还使用运行在Amazon Elastic Compute Cloud (EC2)上的Amazon Kinesis客户端库(KCL)应用程序,由自动扩展组进行管理。Amazon CloudWatch显示,所有运行中的服务器的CPU平均使用率为25%,网络流量处于适度水平。公司报告称,在高峰时段,从Amazon Kinesis处理消息的延迟增加了150%到200%。没有报告来自发布到Amazon Kinesis的网站的延迟问题。应对延迟的适当解决方案是什么? A. A. Increase the number of shards in the Amazon Kinesis stream to 80 for greater concurrency. A. 将Amazon Kinesis流中的分片数量增加到80,以提高并发性。 B. B. Increase the size of the Amazon EC2 instances to increase network throughput. B. 增加Amazon EC2实例的大小,以提高网络吞吐量。 C. C. Increase the minimum number of instances in the Auto Scaling group. C. 增加自动伸缩组中的最小实例数量。 D. D. Increase Amazon DynamoDB throughput on the checkpoint table. D. 增加检查点表上Amazon DynamoDB的吞吐量。 正确答案: C Correct answer is C as the shards are more than enough and the EC2 instances utilization is low, the only other reason can be the instances do not match up the shards and single instance is processing multiple shards.,Refer AWS documentation – Kinesis Record Processor Scaling,Typically, when you use the KCL, you should ensure that the number of instances does not exceed the number of shards (except for failure standby purposes). Each shard is processed by exactly one KCL worker and has exactly one corresponding record processor, so you never need multiple instances to process one shard. However, one worker can process any number of shards, so it’s fine if the number of shards exceeds the number of instances.,To scale up processing in your application, you should test a combination of these approaches:,Note that you can use Auto Scaling to automatically scale your instances based on appropriate metrics.,Option A is wrong as shards are more than enough increasing would not improve performance.,Option B is wrong as the network traffic is modest increasing the size would not improve performance.,Option D is wrong as there is no relation of DynamoDB. 正确答案: C 正确答案是C,因为碎片已经足够多,而EC2实例的利用率较低,唯一的其他原因可能是实例没有与碎片匹配,导致单个实例处理多个碎片。请参考AWS文档 – Kinesis记录处理器扩展。 通常,在使用KCL时,您应确保实例的数量不超过碎片的数量(除非用于故障备用)。每个碎片由一个KCL工作者处理,并且有一个相应的记录处理器,因此您无需多个实例来处理一个碎片。然而,一个工作者可以处理任意数量的碎片,因此如果碎片的数量超过实例的数量也是可以的。 为了扩展应用程序中的处理能力,您应测试这些方法的组合: 请注意,您可以使用自动扩展根据适当的指标自动扩展实例。 选项A是错误的,因为碎片已经足够多,增加并不会提高性能。 选项B是错误的,因为网络流量适中,增加大小不会提高性能。 选项D是错误的,因为DynamoDB与此无关。 50 / 100 分类: DBS 50. 50. An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.) 50. 管理员需要为Redshift集群中的架构设计策略。管理员需要确定Redshift架构中表的最佳分布样式。在以下两种情况下,选择EVEN分布最为合适?(选择两项。) A. A. When the tables are highly denormalized and do NOT participate in frequent joins. A. 当表格高度非规范化,并且不参与频繁的连接操作时。 B. B. When data must be grouped based on a specific key on a defined slice. B. 当数据必须基于定义的切片上的特定键进行分组时。 C. C. When data transfer between nodes must be eliminated. C. 当必须消除节点之间的数据传输时。 D. D. When a new table has been loaded and it is unclear how it will be joined to dimension. D. 当加载了一个新表且不清楚它将如何与维度连接时。 正确答案: A, D Correct answers are A & D as EVEN distribution distributes the data across slides in a round robin fashion and does not participate in joins.,Refer AWS documentation – Redshift Distribution Style,EVEN distribution – The leader node distributes the rows across the slices in a round-robin fashion, regardless of the values in any particular column. EVEN distribution is appropriate when a table does not participate in joins or when there is not a clear choice between KEY distribution and ALL distribution.,Option B is wrong as if the data needs to be grouped by specific Key, KEY distribution should be used.,The rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node collocates the rows on the slices according to the values in the joining columns so that matching values from the common columns are physically stored together.,Option C is wrong as if the data transfer needs to be eliminated, ALL distribution should be used as a copy of all the data is made on all the nodes.,A copy of the entire table is distributed to every node. Where EVEN distribution or KEY distribution place only a portion of a table’s rows on each node, ALL distribution ensures that every row is collocated for every join that the table participates in. 正确答案: A, D 正确答案是 A 和 D,因为 EVEN 分配方式将数据以循环的方式分配到各个分片,并且不参与连接操作。参见 AWS 文档 – Redshift 分配方式。 EVEN 分配 – 领导节点以循环方式将行分配到各个分片,而不考虑任何特定列中的值。当表不参与连接操作时,或者在 KEY 分配和 ALL 分配之间没有明确选择时,EVEN 分配是适合的。 选项 B 错误,因为如果数据需要按特定键进行分组,应使用 KEY 分配。 行是根据某一列中的值进行分配的。领导节点将匹配的值放置在相同的节点分片上。如果您根据连接键分配一对表,领导节点会根据连接列中的值将行在分片上协同分配,以确保来自公共列的匹配值物理上存储在一起。 选项 C 错误,因为如果需要消除数据传输,应该使用 ALL 分配,因为所有节点都会复制一份所有数据。 整个表的副本被分配到每个节点上。与 EVEN 分配或 KEY 分配只将表的一部分行放置在每个节点上不同,ALL 分配确保每个行都在每个表参与连接的情况下被协同分配。 检查 51 / 100 分类: DBS 51. 51. A solutions architect for a logistics organization ships packages from thousands of suppliers to end customers. The architect is building a platform where suppliers can view the status of one or more of their shipments. Each supplier can have multiple roles that will only allow access to specific fields in the resulting information. Which strategy allows the appropriate level of access control and requires the LEAST amount of management work? 51. 一位物流组织的解决方案架构师负责将包裹从成千上万的供应商运送到最终客户。该架构师正在构建一个平台,供应商可以查看他们一个或多个货件的状态。每个供应商可以拥有多个角色,这些角色仅允许访问结果信息中的特定字段。哪种策略能够实现适当的访问控制,并且需要最少的管理工作? A. A. Send the tracking data to Amazon Kinesis Streams. Use AWS Lambda to store the data in an Amazon DynamoDB Table. Generate temporary AWS credentials for the suppliers’ users with AWS STS, specifying fine-grained security policies to limit access only to their applicable data. A. 将追踪数据发送到 Amazon Kinesis Streams。使用 AWS Lambda 将数据存储到 Amazon DynamoDB 表中。通过 AWS STS 为供应商的用户生成临时 AWS 凭证,指定细粒度的安全策略,仅限访问他们适用的数据。 B. B. Send the tracking data to Amazon Kinesis Firehose. Use Amazon S3 notifications and AWS Lambda to prepare files in Amazon S3 with appropriate data for each supplier’s roles. Generate temporary AWS credentials for the suppliers’ users with AWS STS. Limit access to the appropriate files through security policies. B. 将跟踪数据发送到 Amazon Kinesis Firehose。使用 Amazon S3 通知和 AWS Lambda 准备 Amazon S3 中的文件,为每个供应商的角色提供适当的数据。使用 AWS STS 为供应商的用户生成临时 AWS 凭证。通过安全策略限制对适当文件的访问。 C. C. Send the tracking data to Amazon Kinesis Streams. Use Amazon EMR with Spark Streaming to store the data in HBase. Create one table per supplier. Use HBase Kerberos integration with the suppliers’ users. Use HBase ACL-based security to limit access for the roles to their specific table and columns. C. 将跟踪数据发送到 Amazon Kinesis Streams。使用 Amazon EMR 和 Spark Streaming 将数据存储到 HBase。为每个供应商创建一个表。使用 HBase Kerberos 集成供应商的用户。使用 HBase 基于 ACL 的安全性来限制角色对其特定表和列的访问。 D. D. Send the tracking data to Amazon Kinesis Firehose. Store the data in an Amazon Redshift cluster. Create views for the suppliers’ users and roles. Allow suppliers access to the Amazon Redshift cluster using a user limited to the applicable view. D. 将跟踪数据发送到 Amazon Kinesis Firehose。将数据存储在 Amazon Redshift 集群中。为供应商的用户和角色创建视图。允许供应商使用限制在适用视图中的用户访问 Amazon Redshift 集群。 正确答案: A Correct answer is A as DynamoDB can be used to store the data. Access to fields can be controlled using DynamoDB fine grained access control, which can be mapped to IAM role. This solution also requires the least amount of management effort.,Refer AWS documentation – DynamoDB Control Access,In DynamoDB, you have the option to specify conditions when granting permissions using an IAM policy (see Access Control). For example, you can:,Option B is wrong as S3 would not provide fine grained access control of data within the file.,Option C is wrong as although its possible, the option does not satisfy the least amount of management work requirement.,Option D is wrong as Redshift is more for a data warehouse solution and comes with management effort. 正确答案: A 正确答案是A,因为DynamoDB可以用来存储数据。可以通过DynamoDB的细粒度访问控制来控制对字段的访问,该控制可以映射到IAM角色。这个解决方案还需要最少的管理工作。 参见AWS文档 – DynamoDB控制访问 在DynamoDB中,您可以在使用IAM策略授予权限时指定条件(请参见访问控制)。例如,您可以: 选项B是错误的,因为S3无法提供文件内部数据的细粒度访问控制。 选项C是错误的,尽管可以实现,但该选项不符合最少管理工作量的要求。 选项D是错误的,因为Redshift更多是一个数据仓库解决方案,并且需要管理工作。 52 / 100 分类: DBS 52. 52. A utility company is building an application that stores data coming from more than 10,000 sensors. Each sensor has a unique ID and will send a datapoint (approximately 1KB) every 10 minutes throughout the day. Each datapoint contains the information coming from the sensor as well as a timestamp. This company would like to query information coming from a particular sensor for the past week very rapidly and want to delete all the data that is older than 4 weeks. Using Amazon DynamoDB for its scalability and rapidity, how do you implement this in the most cost effective way? 52. 一家公用事业公司正在构建一个应用程序,用于存储来自超过10,000个传感器的数据。每个传感器都有一个唯一的ID,并且每10分钟会发送一个数据点(大约1KB)到应用程序。每个数据点包含来自传感器的信息以及时间戳。该公司希望能够快速查询过去一周来自特定传感器的信息,并希望删除所有超过4周的数据。该公司使用Amazon DynamoDB来满足其可扩展性和快速性需求,如何以最具成本效益的方式实现这一目标? A. A. One table, with a primary key that is the sensor ID and a sort key that is the timestamp A. 一张表,主键是传感器ID,排序键是时间戳 B. B. One table, with a primary key that is the concatenation of the sensor ID and timestamp B. 一个表,主键是传感器ID和时间戳的拼接 C. C. One table for each week, with a primary key that is the concatenation of the sensor ID and timestamp C. 每周一张表,主键是传感器ID和时间戳的连接 D. D. One table for each week, with a primary key that is the sensor ID and a sort key that is the timestamp D. 每周一个表,主键为传感器ID,排序键为时间戳 正确答案: D Correct answer is D Composite key with Sensor ID and timestamp would help for faster queries,Refer AWS documentation for DynamoDB handling Timeseries data,Option C & D are valid as they are keeping tables for each week. However, with Option C, concatenation will cause queries would be slowerTable should be designed with a composite primary key consisting of Customer ID as the partition key and date/time as the sort keyOption A & B are wrong as One table would not make sense as we need to query only on past week and want data only for 4 weeks. This would impact performance. Also, provisioned throughput consumption is based on the size of the deleted item and its more costly as compared to dropping a table. 正确答案: D 正确答案是D,使用包含传感器ID和时间戳的复合键可以帮助提高查询速度,参考AWS文档了解DynamoDB如何处理时间序列数据。选项C和D有效,因为它们为每周维护一个表。然而,选项C中的串联会导致查询变慢。 表应设计为使用由客户ID作为分区键和日期/时间作为排序键的复合主键。 选项A和B是错误的,因为一个表在这种情况下没有意义,因为我们只需要查询过去一周的数据,并且只需要4周的数据。这会影响性能。此外,预配置的吞吐量消耗是基于删除项的大小的,且其成本比删除一个表更高。 53 / 100 分类: DBS 53. 53. You need to provide customers with rich visualizations that allow you to easily connect multiple disparate data sources in S3, Redshift, and several CSV files. Which tool should you use that requires the least setup? 53. 你需要为客户提供丰富的可视化功能,使你能够轻松地连接S3、Redshift和多个CSV文件中的不同数据源。你应该使用哪个工具,要求设置工作最少? A. A. Hue on EMR A. Hue on EMR B. B. Redshift B. 红移 C. C. QuickSight C. QuickSight D. D. Elasticsearch D. Elasticsearch 正确答案: C Correct answer is C as QuickSight provides visualization capability with integration with RDS, Redshift.,Refer AWS documentation – QuickSight,Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization.,As a fully managed service, QuickSight lets you easily create and publish interactive dashboards that include ML Insights. Dashboards can then be accessed from any device, and embedded into your applications, portals, and websites.,QuickSight allows you to directly connect to and import data from a wide variety of cloud and on-premises data sources. These include SaaS applications such as Salesforce, Square, ServiceNow, Twitter, Github, and JIRA; 3rd party databases such as Teradata, MySQL, Postgres, and SQL Server; native AWS services such as Redshift, Athena, S3, RDS, and Aurora; and private VPC subnets. You can also upload a variety of file types including Excel, CSV, JSON, and Presto.,Option A is wrong as Hue is a Web interface for analyzing data with Hadoop.,Option B is wrong as Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake.,Option D is wrong as Elasticsearch is a fully managed service that makes it easy for you to deploy, secure, and operate Elasticsearch at scale with zero down time. It does not provide visualization support and needs to be used with Kibana and nor does it integrate with RDS, Redshift. The data needs to be loaded into Elasticsearch. 正确答案: C 正确答案是 C,因为 QuickSight 提供了与 RDS、Redshift 的集成和可视化功能。请参考 AWS 文档 – QuickSight。 Amazon QuickSight 是一个快速的、基于云的商业智能服务,能够轻松为组织中的每个人提供洞察。 作为一个完全托管的服务,QuickSight 让您可以轻松创建和发布包含 ML Insights 的互动仪表板。然后,您可以在任何设备上访问仪表板,并将其嵌入到您的应用程序、门户和网站中。 QuickSight 允许您直接连接并导入来自各种云端和本地数据源的数据。这些数据源包括 SaaS 应用程序,如 Salesforce、Square、ServiceNow、Twitter、Github 和 JIRA;第三方数据库,如 Teradata、MySQL、Postgres 和 SQL Server;本地 AWS 服务,如 Redshift、Athena、S3、RDS 和 Aurora;以及私有 VPC 子网。您还可以上传多种文件类型,包括 Excel、CSV、JSON 和 Presto。 选项 A 错误,因为 Hue 是用于分析 Hadoop 数据的 Web 界面。 选项 B 错误,因为 Redshift 是一个快速、可扩展的数据仓库,使得分析所有数据变得简单且具有成本效益,适用于数据仓库和数据湖。 选项 D 错误,因为 Elasticsearch 是一个完全托管的服务,能够轻松部署、保护和操作 Elasticsearch,且不会有停机时间。它不提供可视化支持,且需要与 Kibana 一起使用,也没有与 RDS、Redshift 集成。数据需要加载到 Elasticsearch 中。 54 / 100 分类: DBS 54. 54. You need to create a recommendation engine for your e-commerce website that sells over 300 items. The items never change, and the new users need to be presented with the list of all 300 items in order of their interest. Which option do you use to accomplish this? 54. 你需要为你的电子商务网站创建一个推荐引擎,该网站销售超过300个商品。这些商品不会改变,新的用户需要按兴趣顺序呈现所有300个商品的列表。你使用哪个选项来完成这个任务? A. A. Mahout A. Mahout B. B. Spark/Spark MLlib B. Spark/Spark MLlib C. C. Amazon Machine Learning C. 亚马逊机器学习 D. D. RDS MySQL D. RDS MySQL 正确答案: A, B Correct answers are A & B,Option A as Mahout provides recommender engine/collaborative filtering capability,Option B as Spark’s MLlib machine learning library should help with this task. Amazon ML is limited to 100 “categorical’ recommendations, so a custom system is required for this purpose.,Option C is wrong as Amazon ML is limited to 100 “categorical’ recommendations, so a custom system is required for this purpose.,Option D is wrong as RDS MySQL is just a database engine and does not provide analytics capability. 正确答案: A, B 正确答案是 A 和 B,选项 A 是因为 Mahout 提供了推荐引擎/协同过滤功能,选项 B 是因为 Spark 的 MLlib 机器学习库应该有助于完成此任务。亚马逊 ML 限制为 100 个“类别”推荐,因此需要一个自定义系统来完成此任务。 选项 C 错误,因为亚马逊 ML 限制为 100 个“类别”推荐,因此需要一个自定义系统来完成此任务。 选项 D 错误,因为 RDS MySQL 只是一个数据库引擎,并不提供分析功能。 检查 55 / 100 分类: DBS 55. 55. A web application emits multiple types of events to Amazon Kinesis Streams for operational reporting. Critical events must be captured immediately before processing can continue, but informational events do not need to delay processing. What is the most appropriate solution to record these different types of events? 55. 一个Web应用程序将多种类型的事件发送到Amazon Kinesis Streams进行操作报告。关键事件必须在处理继续之前立即捕获,但信息性事件不需要延迟处理。记录这些不同类型事件的最合适解决方案是什么? A. A. Log all events using the Kinesis Producer Library. A. 使用 Kinesis Producer Library 记录所有事件。 B. B. Log critical events using the Kinesis Producer Library, and log informational events using the PutRecords API method. B. 使用Kinesis生产者库记录关键事件,使用PutRecords API方法记录信息事件。 C. C. Log critical events using the PutRecords API method, and log informational events using the Kinesis Producer Library. C. 使用PutRecords API方法记录关键事件,并使用Kinesis Producer Library记录信息性事件。 D. D. Log all events using the PutRecords API method. D. 使用PutRecords API方法记录所有事件。 正确答案: C Correct answer is C as the core of this question is how to send event messages to Kinesis synchronously vs. asynchronously. The critical events must be sent synchronously, and the informational events can be sent asynchronously. The Kinesis Producer Library (KPL) implements an asynchronous send function, so it can be used for the informational messages. PutRecords is a synchronous send function, so it must be used for the critical events.,Refer AWS documentation – Developing Producers using KPL,Because the KPL may buffer records before sending them to Kinesis Data Streams, it does not force the caller application to block and wait for a confirmation that the record has arrived at the server before continuing execution. A call to put a record into the KPL always returns immediately and does not wait for the record to be sent or a response to be received from the server. Instead, a Future object is created that receives the result of sending the record to Kinesis Data Streams at a later time. This is the same behavior as asynchronous clients in the AWS SDK.,Option A is wrong as Kinesis Producer Library sends all events asynchronously,Option B is wrong as critical events needs to be synchronous and information events needs to be asynchronous.,Option D is wrong as PutRecords sends all events synchronously 正确答案: C 正确答案是C,因为这个问题的核心是如何同步与异步地将事件消息发送到Kinesis。关键事件必须同步发送,信息性事件可以异步发送。Kinesis生产者库(KPL)实现了一个异步发送功能,因此可以用于信息性消息。PutRecords是一个同步发送功能,因此必须用于关键事件。 参考AWS文档 – 使用KPL开发生产者 由于KPL可能在将记录发送到Kinesis数据流之前对记录进行缓冲,它并不强制调用方应用程序阻塞并等待确认记录已到达服务器后再继续执行。调用KPL的put方法始终会立即返回,并且不会等待记录发送或从服务器接收到响应。相反,它会创建一个Future对象,在稍后时间接收将记录发送到Kinesis数据流的结果。这与AWS SDK中的异步客户端行为相同。 选项A是错误的,因为Kinesis生产者库将所有事件异步发送。 选项B是错误的,因为关键事件需要同步发送,信息事件需要异步发送。 选项D是错误的,因为PutRecords将所有事件同步发送。 56 / 100 分类: DBS 56. 56. You have to identify potential fraudulent credit card transactions using Amazon Machine Learning. You have been given historical labeled data that you can use to create your model. You will also need to the ability to tune the model you pick. Which model type should you use? 56. 你需要使用亚马逊机器学习来识别潜在的欺诈信用卡交易。你已经获得了可以用来创建模型的历史标记数据。你还需要能够调整所选模型的能力。你应该使用哪种模型类型? A. A. Clustering A. 聚类 B. B. Regression B. 回归 C. C. Binary C. 二进制 D. D. Cannot be done using Amazon Machine Learning D. 无法使用亚马逊机器学习完成 正确答案: C Correct answer is C as Binary classification can be used to predict for whether the transaction is fraudulent or not. 正确答案:C 正确答案是C,因为二元分类可用于预测交易是否欺诈。 57 / 100 分类: DBS 57. 57. You’ve been asked by the VP of People to showcase the current breakdown of the headcount for each department within your organization. What chart do you select to do this to make it easy to compare each department? 57. 人力资源副总裁要求你展示你所在组织各部门当前的人数分布。你选择什么样的图表来展示,以便轻松比较各部门之间的差异? A. A. Line chart A. 折线图 B. B. Column chart B. 柱状图 C. C. Pie chart C. 饼图 D. D. Scatter plot D. 散点图 正确答案: C Correct answer is C as Pie charts are best to use when you are trying to compare parts of a whole, which is ideal for the use case. They do not show changes over time.,Refer AWS documentation – QuickSight Chart Types,Option A is wrong as Line graphs are used to track changes over short and long periods of time.,Option B is wrong as Column chart is a data visualization where each category is represented by a rectangle, with the height of the rectangle being proportional to the values being plotted,Option D is wrong as Scatter plot is a two-dimensional data visualization that uses dots to represent the values obtained for two different variables – one plotted along the x-axis and the other plotted along the y-axis 正确答案: C 正确答案是 C,因为饼图最适合用于比较整体的各个部分,这对于该用例是理想的。饼图不显示随时间变化的情况。 参考AWS文档 – QuickSight 图表类型。 选项 A 错误,因为折线图用于跟踪短期和长期的变化。 选项 B 错误,因为柱状图是一种数据可视化方式,其中每个类别由一个矩形表示,矩形的高度与绘制的值成比例。 选项 D 错误,因为散点图是一种二维数据可视化方式,使用点来表示两个不同变量的值——一个绘制在 x 轴上,另一个绘制在 y 轴上。 58 / 100 分类: DBS 58. 58. An online gaming company uses DynamoDB to store user activity logs and is experiencing throttled writes on the company’s DynamoDB table. The company is NOT consuming close to the provisioned capacity. The table contains a large number of items and is partitioned on user and sorted by date. The table is 200GB and is currently provisioned at 10K WCU and 20K RCU. Which two additional pieces of information are required to determine the cause of the throttling? (Choose two.) 58. 一家在线游戏公司使用DynamoDB存储用户活动日志,并且在公司的DynamoDB表上遇到写入限制。该公司并没有接近所提供的容量。该表包含大量的项,并且按用户进行分区,并按日期排序。该表大小为200GB,当前配置为10K WCU和20K RCPU。为了确定限速的原因,还需要哪些额外的信息?(选择两个。) A. A. The structure of any GSIs that have been defined on the table A. 在表格上定义的任何GSI的结构 B. B. CloudWatch data showing consumed and provisioned write capacity when writes are being throttled B. 当写入被限制时,CloudWatch 数据显示已消耗和预配的写入容量 C. C. Application-level metrics showing the average item size and peak update rates for each attribute C. 应用级指标,显示每个属性的平均项大小和峰值更新速率 D. D. The structure of any LSIs that have been defined on the table D. 已在表格上定义的任何LSI的结构 E. E. The maximum historical WCU and RCU for the table E. 表格的最大历史WCU和RCU 正确答案: B, D Correct answers are B & D as the key reason for throttling is hot keys, as the application does not consume the entire provisioned capacity.,Option B as CloudWatch helps shows the stats for consumed vs provisioned throughput capacity.,Option D as an LSI consumes WCU for writes on the primary table.,Refer AWS documentation – Throttled DB & DynamoDB LSI Considerations & DynamoDB CloudWatch,Partitions are usually throttled when they are accessed by your downstream applications much more frequently than other partitions (that is, a “hot” partition), or when workloads rely on short periods of time with high usage (a “burst” of read or write activity). To avoid hot partitions and throttling, you must optimize your table and partition structure.,Distribute your read operations and write operations as evenly as possible across your table. A “hot” partition can degrade the overall performance of your table.,Write Capacity Units – When an item in a table is added, updated, or deleted, updating the local secondary indexes will consume provisioned write capacity units for the table. The total provisioned throughput cost for a write is the sum of write capacity units consumed by writing to the table and those consumed by updating the local secondary indexes.,You can monitor ConsumedReadCapacityUnits or ConsumedWriteCapacityUnits over the specified time period, to track how much of your provisioned throughput is being used.,Option A is wrong as GSI does not impact primary table throughput capacity.,Option C is wrong as the provisioned capacity is not exceeded, the average would not be of much help.,Option E is wrong as the provisioned capacity is not exceeded the historical stats would not be of much help. 正确答案: B, D 正确答案是 B 和 D,主要原因是热点键,因应用程序并没有消耗整个预配置的容量。选项 B 因为 CloudWatch 显示了已消耗和预配置的吞吐量容量统计信息。选项 D 因为 LSI 在主表上写入时消耗写入容量单位(WCU)。参见 AWS 文档 – 限流的数据库和 DynamoDB LSI 注意事项以及 DynamoDB CloudWatch。 当分区被下游应用程序访问的频率远高于其他分区(即“热点”分区),或工作负载依赖于短时间内的高使用率(即“突发”读写活动)时,分区通常会被限流。 为了避免热点分区和限流,必须优化您的表和分区结构。尽可能均匀地分布读操作和写操作在表中。一个“热点”分区会降低表的整体性能。 写入容量单位 – 当表中的项被添加、更新或删除时,更新本地二级索引将消耗表的预配置写入容量单位。写入的总预配置吞吐量成本是写入表时消耗的写入容量单位与更新本地二级索引时消耗的容量单位之和。 您可以监控指定时间段内的已消耗读容量单位(ConsumedReadCapacityUnits)或已消耗写容量单位(ConsumedWriteCapacityUnits),以跟踪已使用的预配置吞吐量。 选项 A 错误,因为 GSI 不影响主表的吞吐量容量。 选项 C 错误,因为预配置的容量未被超出,平均值帮助不大。 选项 E 错误,因为预配置的容量未被超出,历史统计信息帮助不大。 检查 59 / 100 分类: DBS 59. 59. A Redshift data warehouse has different user teams that need to query the same table with very different query types. These user teams are experiencing poor performance. Which action improves performance for the user teams in this situation? 59. 一个Redshift数据仓库有不同的用户团队需要以非常不同的查询类型查询相同的表。这些用户团队正在经历性能差的问题。在这种情况下,哪种操作可以改善用户团队的性能? A. A. Create custom table views. A. 创建自定义表格视图。 B. B. Add interleaved sort keys per team. B. 为每个团队添加交错排序键。 C. C. Maintain team-specific copies of the table. C. 维护团队特定的表格副本。 D. D. Add support for workload management queue hopping. D. 添加对工作负载管理队列跳跃的支持。 正确答案: B Correct answer is B as multiple teams query different columns with different queries it would be best to use Interleaved keys to improve performance. Interleaved keys are provided to help with the limitations of compound keys. They are designed to weigh each column in the key evenly, allowing improved performance regardless of which columns in the key you’re filtering.,Refer AWS documentation – Redshift Interleaved Sort Keys,An interleaved sort gives equal weight to each column, or subset of columns, in the sort key. If multiple queries use different columns for filters, then you can often improve performance for those queries by using an interleaved sort style. When a query uses restrictive predicates on secondary sort columns, interleaved sorting significantly improves query performance as compared to compound sorting.,Options A & C are wrong as they create duplicate copies or refer to the same underlying table and would not improve performance.,Option D is wrong as the key here is queries on same table as Amazon Redshift workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won’t get stuck in queues behind long-running queries. 正确答案: B 正确答案是B,因为多个团队使用不同的查询查询不同的列,最好使用交错键来提高性能。交错键旨在帮助解决复合键的局限性。它们被设计用来平衡键中每一列的权重,从而无论你过滤的是键中的哪一列,都能提高性能。 参考AWS文档 – Redshift交错排序键 交错排序为排序键中的每一列或列的子集赋予相等的权重。如果多个查询使用不同的列进行过滤,则通过使用交错排序样式通常可以提高这些查询的性能。当查询在次排序列上使用限制性谓词时,交错排序相比复合排序能显著提高查询性能。 选项A和C是错误的,因为它们创建了重复的副本或引用相同的基础表,并不能提高性能。 选项D是错误的,因为这里的键是对同一表的查询,而Amazon Redshift工作负载管理(WLM)使用户能够灵活地管理工作负载中的优先级,从而避免短时间运行的查询被长期运行的查询堵塞在队列中。 60 / 100 分类: DBS 60. 60. A data engineer needs to collect data from multiple Amazon Redshift clusters within a business and consolidate the data into a single central data warehouse. Data must be encrypted at all times while at rest or in flight. What is the most scalable way to build this data collection process? 60. 一名数据工程师需要从业务中的多个 Amazon Redshift 集群收集数据,并将数据整合到一个单一的中央数据仓库中。在数据静态或传输过程中,必须始终加密数据。构建此数据收集过程的最具可扩展性的方法是什么? A. A. Run an ETL process that connects to the source clusters using SSL to issue a SELECT query for new data, and then write to the target data warehouse using an INSERT command over another SSL secured connection. A. 执行一个ETL过程,使用SSL连接到源集群,发出SELECT查询以获取新数据,然后通过另一个SSL加密的连接使用INSERT命令将数据写入目标数据仓库。 B. B. Use AWS KMS data key to run an UNLOAD ENCRYPTED command that stores the data in an unencrypted S3 bucket; run a COPY command to move the data into the target cluster. B. 使用AWS KMS数据密钥运行UNLOAD ENCRYPTED命令,将数据存储到未加密的S3桶中;运行COPY命令将数据移动到目标集群中。 C. C. Run an UNLOAD command that stores the data in an S3 bucket encrypted with an AWS KMS data key; run a COPY command to move the data into the target cluster. C. 执行一个UNLOAD命令,将数据存储在使用AWS KMS数据密钥加密的S3桶中;执行一个COPY命令将数据移入目标集群。 D. D. Connect to the source cluster over an SSL client connection, and write data records to Amazon Kinesis Firehose to load into your target data warehouse. D. 通过SSL客户端连接连接到源集群,并将数据记录写入Amazon Kinesis Firehose,以便加载到目标数据仓库中。 正确答案: B Correct answer is B as the UNLOAD ENCRYPTED command automatically stores the data encrypted using-client side encryption and uses HTTPS to encrypt the data during the transfer to S3.,Refer AWS documentation – Redshift Unloading Data,UNLOAD automatically creates files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE-S3). You can also specify server-side encryption with an AWS Key Management Service key (SSE-KMS) or client-side encryption with a customer-managed key (CSE-CMK).,Option C is wrong because the data would not be encrypted in flight, and you cannot encrypt an entire bucket with a KMS key.,Options A & D are wrong as the most scalable solutions are the UNLOAD/COPY solutions because they will work in parallel. 正确答案: B 正确答案是B,因为UNLOAD ENCRYPTED命令会自动使用客户端加密存储数据,并在传输到S3期间使用HTTPS加密数据。参考AWS文档 – Redshift卸载数据。 UNLOAD会自动使用Amazon S3服务器端加密(SSE-S3)和AWS管理的加密密钥创建文件。你还可以指定使用AWS Key Management Service密钥(SSE-KMS)进行服务器端加密,或者使用客户管理的密钥(CSE-CMK)进行客户端加密。 选项C是错误的,因为数据在传输过程中不会被加密,而且你不能使用KMS密钥加密整个存储桶。 选项A和D是错误的,因为最具可扩展性的解决方案是UNLOAD/COPY解决方案,因为它们可以并行工作。 61 / 100 分类: DBS 61. 61. Your company releases new features with high frequency while demanding high application availability. As part of the application’s A/B testing, logs from each updated Amazon EC2 instance of the application need to be analyzed in near real-time, to ensure that the application is working flawlessly after each deployment. If the logs show any anomalous behavior, then the application version of the instance is changed to a more stable one. Which of the following methods should you use for shipping and analyzing the logs in a highly available manner? 61. 你的公司以高频率发布新功能,同时要求高应用可用性。作为应用程序A/B测试的一部分,需要近实时分析每个更新后的Amazon EC2实例的日志,以确保每次部署后应用程序能够完美运行。如果日志显示任何异常行为,则该实例的应用程序版本会更改为更稳定的版本。以下哪种方法适合以高可用性方式传输和分析日志? A. A. Ship the logs to Amazon S3 for durability and use Amazon EMR to analyze the logs in a batch manner each hour. 将日志传输到Amazon S3以确保持久性,并使用Amazon EMR每小时以批处理方式分析日志。 B. B. Ship the logs to Amazon CloudWatch Logs and use Amazon EMR to analyze the logs in a batch manner each hour. B. 将日志传输到 Amazon CloudWatch Logs,并使用 Amazon EMR 每小时以批处理方式分析日志。 C. C. Ship the logs to an Amazon Kinesis stream and have the consumers analyze the logs in a live manner. C. 将日志发送到Amazon Kinesis流,并让消费者以实时方式分析日志。 D. D. Ship the logs to a large Amazon EC2 instance and analyze the logs in a live manner. D. 将日志传输到一个大型的 Amazon EC2 实例,并以实时方式分析日志。 E. E. Store the logs locally on each instance and then have an Amazon Kinesis stream pull the logs for live analysis E. 将日志保存在每个实例的本地,然后通过Amazon Kinesis流拉取日志进行实时分析 正确答案: C Correct answer is C as the data can be ingested into the Kinesis streams using agents and the logs can then be analyzed real time.,Refer AWS documentation – Kinesis Serverless log Analytics,Amazon Kinesis Streams enables you to build custom applications that process or analyze streaming data for specialized needs. Amazon Kinesis Streams can continuously capture and store terabytes of data per hour from hundreds of thousands of sources such as website clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events.,Option A & B are wrong as analyzing the logs every hour does not provide real time capability as required.,Option D is wrong as storing the logs on EC2 instance is not a scalable performant model,Option E is wrong as Amazon Kinesis stream work on the push mechanism, and the data from the EC2 instances need to be ingested into the Kinesis streams. 正确答案: C 正确答案是C,因为数据可以通过代理被摄取到Kinesis流中,然后日志可以实时分析。参考AWS文档 – Kinesis无服务器日志分析。 Amazon Kinesis Streams使您能够构建定制应用程序,以处理或分析流数据,满足特定需求。Amazon Kinesis Streams可以持续捕获并存储每小时从数十万来源(如网站点击流、金融交易、社交媒体信息流、IT日志和位置跟踪事件)中获得的TB级数据。 选项A和B是错误的,因为每小时分析一次日志无法提供所需的实时能力。 选项D是错误的,因为将日志存储在EC2实例上不是一个可扩展的高效模型。 选项E是错误的,因为Amazon Kinesis流是基于推送机制工作的,EC2实例中的数据需要被摄取到Kinesis流中。 62 / 100 分类: DBS 62. 62. Your company is in the process of developing a next generation pet collar that collects biometric information to assist families with promoting healthy lifestyles for their pets. Each collar will push 30kb of biometric data In JSON format every 2 seconds to a collection platform that will process and analyze the data providing health trending information back to the pet owners and veterinarians via a web portal Management has tasked you to architect the collection platform ensuring the following requirements are met. Provide the ability for real-time analytics of the inbound biometric data to ensure processing of the biometric data is highly durable, Elastic and parallel. The results of the analytic processing should be persisted for data mining. Which architecture outlined below will meet the initial requirements for the collection platform? 62. 你们公司正在开发一款下一代宠物项圈,收集生物特征信息,以帮助家庭促进宠物健康的生活方式。每个项圈每2秒钟将30KB的生物特征数据以JSON格式推送到一个数据收集平台,该平台将处理和分析数据,并通过一个网络门户向宠物主人和兽医提供健康趋势信息。管理层已委托你设计数据收集平台,确保以下要求得到满足:提供实时分析功能,以确保生物特征数据的处理具有高度的持久性、弹性和并行性。分析处理结果应当持久化,以便数据挖掘。以下哪种架构能够满足收集平台的初步要求? A. A. Utilize S3 to collect the inbound sensor data analyze the data from S3 with a daily scheduled Data Pipeline and save the results to a Redshift Cluster. A. 利用S3收集传入的传感器数据,使用每日调度的数据管道分析来自S3的数据,并将结果保存到Redshift集群中。 B. B. Utilize Amazon Kinesis to collect the inbound sensor data, analyze the data with Kinesis clients and save the results to a Redshift cluster using EMR. B. 利用Amazon Kinesis收集传入的传感器数据,使用Kinesis客户端分析数据,并通过EMR将结果保存到Redshift集群中。 C. C. Utilize SQS to collect the inbound sensor data analyze the data from SQS with Amazon Kinesis and save the results to a Microsoft SQL Server RDS instance. C. 利用SQS收集传入的传感器数据,使用Amazon Kinesis分析来自SQS的数据,并将结果保存到Microsoft SQL Server RDS实例中。 D. D. Utilize EMR to collect the inbound sensor data, analyze the data from EMR with Amazon Kinesis and save the results to DynamoDB. D. 利用EMR收集传入的传感器数据,通过Amazon Kinesis分析EMR中的数据,并将结果保存到DynamoDB中。 正确答案: B Key point here to architect durable collection platform with real time analytics, data mining storage.,Correct answer is B to use Kinesis to capture the data in a elastic, durable and parallel manner. Analyze data with Kinesis clients and store data to Redshift for data mining using EMR.,Option A is wrong as S3 would not be ideal to capture data with that frequency and daily job will not provide real time analytics,Option C is wrong as SQS is not an ideal solution to capture this data and Kinesis clients are required to analyze the data. SQL server might not be a scalable option,Option D is wrong as EMR alone is not ideal to capture data and would need specific frameworks like Kafka to capture data for processing. Also real time analytics needs to done using Spark Streaming and not EMR alone. DynamoDB is not for data mining. 正确答案: B 关键点在于设计一个具有实时分析、数据挖掘存储功能的持久化数据收集平台。正确答案是 B,使用 Kinesis 以弹性、持久和并行的方式捕获数据。通过 Kinesis 客户端分析数据,并将数据存储到 Redshift 进行数据挖掘,使用 EMR。 选项 A 错误,因为 S3 不适合以那种频率捕获数据,并且每日任务无法提供实时分析。 选项 C 错误,因为 SQS 不是捕获此数据的理想解决方案,并且需要使用 Kinesis 客户端来分析数据。SQL Server 可能不是一个可扩展的选项。 选项 D 错误,因为仅使用 EMR 不适合捕获数据,并且需要像 Kafka 这样的特定框架来捕获数据进行处理。此外,实时分析需要使用 Spark Streaming,而不是仅使用 EMR。DynamoDB 不适合进行数据挖掘。 63 / 100 分类: DBS 63. 63. A social media customer has data from different data sources including RDS running MySQL, Redshift, and Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results. What is the most cost-effective solution to meet these requirements? 63. 一个社交媒体客户拥有来自不同数据源的数据,包括运行MySQL的RDS、Redshift和EMR上的Hive。为了支持更好的分析,客户需要能够分析来自不同数据源的数据并结合结果。满足这些需求的最具成本效益的解决方案是什么? A. A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis. A. 从不同的数据库/数据仓库加载所有数据到 S3。使用 Redshift COPY 命令将数据复制到 Redshift 进行分析。 B. B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query. B. 在Hive所在的EMR集群上安装Presto。配置MySQL和PostgreSQL连接器,以便在单个查询中从不同的数据源中选择数据。 C. C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze. C. 启动一个Elasticsearch集群。从所有三个数据源加载数据,并使用Kibana进行分析。 D. D. Write a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems. D. 编写一个程序,在单独的EC2实例上运行,向三个不同的系统发起查询。获取所有三个系统的响应后,汇总结果。 正确答案: B Correct answer is B as Presto can help query over multiple datasources and also provides connectors to interact directly MySQL, Redshift and Hive.,Refer AWS documentation – EMR Presto,Presto is an open-source distributed SQL query engine optimized for low-latency, ad-hoc analysis of data. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can process data from multiple data sources including the Hadoop Distributed File System (HDFS) and Amazon S3.,Option A is wrong as data is replicated and is not a cost effective solution.,Option C is wrong as Elasticsearch does not provide analytics capabilities.,Option D is wrong as running on EC2 instances is not a scalable and cost-effective solution. 正确答案: B 正确答案是B,因为Presto可以帮助在多个数据源上进行查询,并且还提供与MySQL、Redshift和Hive直接交互的连接器。参考AWS文档 – EMR Presto。 Presto是一个开源的分布式SQL查询引擎,优化了低延迟的即席数据分析。它支持ANSI SQL标准,包括复杂查询、聚合、连接和窗口函数。Presto可以处理来自多个数据源的数据,包括Hadoop分布式文件系统(HDFS)和Amazon S3。 选项A是错误的,因为数据是被复制的,且不是一种具有成本效益的解决方案。 选项C是错误的,因为Elasticsearch不提供分析功能。 选项D是错误的,因为运行在EC2实例上不是一种可扩展且具有成本效益的解决方案。 64 / 100 分类: DBS 64. 64. Management has requested a comparison of total sales performance in the five North American regions in January. They’re hoping to determine how to allocate a budget to regions based on performance in that single period. What sort of visualization do you use in Amazon QuickSight? 64. 管理层要求对1月份五个北美地区的总销售表现进行比较。他们希望根据该单一时期的表现来决定如何分配预算给各个地区。你会在Amazon QuickSight中使用什么样的可视化方式? A. A. Bar chart A. 条形图 B. B. Line chart B. 折线图 C. C. Stacked area chart C. 堆积面积图 D. D. Histogram D. 直方图 正确答案: A Correct answer is A as Bar Chart can be used to represent the data for comparison in sales for each region.,Refer AWS documentation – QuickSight Visual Types,Option B is wrong as Use line charts to compare changes in measure values over period of time,Option C is wrong as Stacked area chart is the extension of a basic area chart to display the evolution of the value of several groups on the same graphic.,Option D is wrong as Histograms are sometimes confused with bar charts. A histogram is used for continuous data, where the bins represent ranges of data, while a bar chart is a plot of categorical variables. 正确答案: A 正确答案是A,因为条形图可以用来表示每个地区销售数据的比较。 请参考AWS文档 – QuickSight可视化类型。 选项B是错误的,因为折线图用于比较一段时间内度量值的变化。 选项C是错误的,因为堆叠面积图是基本面积图的扩展,用于显示多个组在同一图形上的值的演变。 选项D是错误的,因为直方图有时会与条形图混淆。直方图用于表示连续数据,其中箱子代表数据范围,而条形图则是分类变量的图表。 65 / 100 分类: DBS 65. 65. A new client is requesting a tool that will provide fast query performance for enterprise reporting and business intelligence workloads, particularly those involving extremely complex SQL with multiple joins and sub-queries. They also want the ability to give analysts access to a central system through tradition SQL clients that allow them to explore and familiarize themselves with the data. What solution do you initially recommend they investigate? 65. 一位新客户要求提供一个工具,该工具能够为企业报告和商业智能工作负载提供快速的查询性能,特别是涉及多个连接和子查询的极其复杂的SQL。他们还希望能够通过传统的SQL客户端为分析师提供访问中央系统的能力,使他们能够探索和熟悉数据。你最初推荐他们调查哪种解决方案? A. A. SQS A. SQS B. B. Redshift B. Redshift C. C. Athena C. 雅典娜 D. D. EMR D. 电子病历 (EMR) 正确答案: B Correct answer is B as Redshift is a fully managed data warehousing solution providing standard SQL interface and ability to run complex queries.,Refer AWS documentation – Redshift,Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution.,Option A is wrong as SQS does not provide querying capability,Option C is wrong as Athena does not provide complex querying capability,Option D is wrong as EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. 正确答案: B 正确答案是B,因为Redshift是一个完全托管的数据仓库解决方案,提供标准SQL接口和运行复杂查询的能力。请参阅AWS文档 – Redshift。 Amazon Redshift是一个快速的、完全托管的数据仓库,使您能够使用标准SQL和现有的商业智能(BI)工具,简单且经济高效地分析所有数据。它允许您针对PB级结构化数据运行复杂的分析查询,使用先进的查询优化、列存储在高性能本地磁盘上,并进行大规模并行查询执行。 选项A是错误的,因为SQS不提供查询功能。 选项C是错误的,因为Athena不提供复杂查询能力。 选项D是错误的,因为EMR提供一个托管的Hadoop框架,使得跨动态可扩展的Amazon EC2实例处理大量数据变得容易、快速且经济高效。 66 / 100 分类: DBS 66. 66. A company stores data in an S3 bucket. Some of the data contains sensitive information. They need to ensure that the bucket complies with PCI DSS (Payment Card Industry Data Security Standard) compliance standards. Which of the following should be implemented to fulfill this requirement? (Select TWO) 66. 一家公司将数据存储在S3桶中。一些数据包含敏感信息。他们需要确保该桶符合PCI DSS(支付卡行业数据安全标准)合规性要求。为了满足这一要求,以下哪些措施应该实施?(选择两个) A. A. Enable server side encryption SSE for a bucket. A. 为桶启用服务器端加密(SSE)。 B. B. Enable versioning for the bucket B. 为存储桶启用版本控制 C. C. Ensure that access to the bucket is only given to one IAM role C. 确保只有一个IAM角色可以访问该存储桶。 D. D. Ensure that objects from the bucket are request only via HTTPS D. 确保只有通过 HTTPS 才能请求桶中的对象 正确答案: A, D Correct answers are A & D as one of the requirement is data security with encryption at rest and in transit. PCI DSS helps ensure that companies maintain a secure environment for storing, processing, and transmitting credit card information.,Option B is wrong as versioning only helps maintain data versions and are helpful to recover from accidental overwrites or deletions.,Option C is wrong as this is not a requirement and a best practice. 正确答案: A, D 正确答案是 A 和 D,因为其中一项要求是数据安全,包括静态加密和传输加密。PCI DSS 有助于确保公司为存储、处理和传输信用卡信息提供一个安全的环境。 选项 B 错误,因为版本控制仅有助于维护数据版本,并且有助于从意外覆盖或删除中恢复。 选项 C 错误,因为这不是一个要求,而是最佳实践。 检查 67 / 100 分类: DBS 67. 67. Your company sells consumer devices and needs to record the first activation of all sold devices. Devices are not activated until the information is written on a persistent database. Activation data is very important for your company and must be analyzed daily with a MapReduce job. The execution time of the data analysis process must be less than three hours per day. Devices are usually sold evenly during the year, but when a new device model is out, there is a predictable peak in activation’s, that is, for a few days there are 10 times or even 100 times more activation’s than in average day. Which of the following databases and analysis framework would you implement to better optimize costs and performance for this workload? 67. 贵公司销售消费类设备,并需要记录所有销售设备的首次激活信息。设备在信息写入持久化数据库之前不会被激活。激活数据对贵公司非常重要,必须通过MapReduce作业进行每日分析。数据分析过程的执行时间必须少于每天三小时。设备通常在全年均匀销售,但当新设备型号发布时,激活量会出现可预测的峰值,也就是说,在几天内,激活量是平均日的10倍甚至100倍。以下哪种数据库和分析框架将有助于更好地优化此工作负载的成本和性能? A. A. Amazon RDS and Amazon Elastic MapReduce with Spot instances. A. 亚马逊RDS和亚马逊弹性MapReduce与竞价实例。 B. B. Amazon DynamoDB and Amazon Elastic MapReduce with Spot instances. B. 亚马逊 DynamoDB 和亚马逊 Elastic MapReduce 与 Spot 实例。 C. C. Amazon RDS and Amazon Elastic MapReduce with Reserved instances. C. 亚马逊RDS和亚马逊弹性MapReduce与预留实例。 D. D. Amazon DynamoDB and Amazon Elastic MapReduce with Reserved instances D. 亚马逊DynamoDB和亚马逊Elastic MapReduce与预留实例 正确答案: B Key point here is to optimize cost and performance only for the increased workload only, not the existing one,Refer EMR best practices – For unpredictable workloads, the suggested pricing model is Spot or On-Demand.,Correct answer is B and preferred over A as DynamoDB would be preferred over RDS for the throughput supported and Spot instances to reduce cost and handle the temporary workload.,Option C & D are wrong as Reserved instances would be preferred for a consistent and predictable workload and would prove costly in this scenario. 正确答案: B 关键点在于仅针对增加的工作负载优化成本和性能,而不是现有的工作负载。参考 EMR 最佳实践 – 对于不可预测的工作负载,建议的定价模型是 Spot 或按需定价。 正确答案是 B,优于 A,因为 DynamoDB 比 RDS 更适合支持吞吐量,且 Spot 实例有助于降低成本并处理临时工作负载。 选项 C 和 D 错误,因为预留实例适用于一致和可预测的工作负载,而在这种情况下会导致高成本。 68 / 100 分类: DBS 68. 68. Your company is storing millions of sensitive transactions across thousands of 100-GB files that must be encrypted in transit and at rest. Analysts concurrently depend on subsets of files, which can consume up to 5TB of space, to generate simulations that can be used to steer business decisions. You are required to design an AWS solution that can cost effectively accommodate the long-term storage and in-flight subsets of data. 68. 你的公司正在存储数百万条敏感交易数据,这些数据分布在数千个100GB的文件中,必须在传输和静态时进行加密。分析人员同时依赖部分文件,这些文件最多可占用5TB的空间,用于生成可以用于引导商业决策的模拟。你需要设计一个AWS解决方案,以经济有效的方式容纳长期存储和传输中的数据子集。 A. A. Use Amazon Simple Storage Service (S3) with server-side encryption, and run simulations on subsets in ephemeral drives on Amazon EC2. A. 使用亚马逊简单存储服务(S3)并启用服务器端加密,在亚马逊EC2的临时驱动器上对子集进行模拟。 B. B. Use Amazon S3 with server-side encryption, and run simulations on subsets in-memory on Amazon EC2. B. 使用带有服务器端加密的Amazon S3,并在Amazon EC2上对内存中的子集进行模拟。 C. C. Use HDFS on Amazon EMR, and run simulations on subsets in ephemeral drives on Amazon EC2. C. 在Amazon EMR上使用HDFS,并在Amazon EC2的临时驱动器上对子集进行仿真。 D. D. Use HDFS on Amazon Elastic MapReduce (EMR), and run simulations on subsets in-memory on Amazon Elastic Compute Cloud (EC2). D. 在 Amazon Elastic MapReduce (EMR) 上使用 HDFS,并在 Amazon Elastic Compute Cloud (EC2) 上将模拟运行在内存中的子集。 E. E. Store the full data set in encrypted Amazon Elastic Block Store (EBS) volumes, and regularly capture snapshots that can be cloned to EC2 workstations E. 将完整的数据集存储在加密的Amazon Elastic Block Store (EBS)卷中,并定期捕获快照,以便克隆到EC2工作站。 正确答案: A Correct answer is A as the S3 with SSE provides encryption at rest and HTTPS can be used to push data to S3 for encryption in transit. S3 provides an option for cost effective long term storage. Ephemeral drives would help run simulations and the data would lost once the EC2 instance is terminated.,Option B is wrong as S3 with SSE provides encryption at rest and HTTPS can be used to push data to S3 for encryption in transit. However, in memory simulations with 5 TB data would not be feasible.,Option C & D are wrong as HDFS is not an cost effective solution as data nodes would be required to store the data and it does not provide encryption by default.,Option E is wrong as EBS for long term storage is an expensive option. 正确答案: A 正确答案是A,因为启用SSE的S3提供静态加密,并且可以使用HTTPS将数据推送到S3进行传输加密。S3提供了一个成本效益高的长期存储选项。临时驱动器有助于运行模拟,一旦EC2实例终止,数据将丢失。 选项B错误,因为启用SSE的S3提供静态加密,并且可以使用HTTPS将数据推送到S3进行传输加密。然而,在内存中进行5TB数据的模拟是不可行的。 选项C和D错误,因为HDFS不是一个成本效益高的解决方案,因为需要数据节点来存储数据,并且默认情况下不提供加密。 选项E错误,因为将EBS用于长期存储是一种昂贵的选择。 69 / 100 分类: DBS 69. 69. An administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting and analytics before being archived. How should the administrator recommend storing the log data? 69. 一名管理员需要为来自移动设备的事件设计事件日志存储架构。事件数据将在每天由Amazon EMR集群处理,用于聚合报告和分析,然后再进行归档。管理员应该如何推荐存储日志数据? A. A. Create an Amazon S3 bucket and write log data into folders by device. Execute the EMR job on the device folders. A. 创建一个Amazon S3存储桶,并按设备将日志数据写入文件夹中。 在设备文件夹上执行EMR任务。 B. B. Create an Amazon DynamoDB table partitioned on the device and sorted on date, write log data to table. Execute the EMR job on the Amazon DynamoDB table. B. 创建一个按设备分区并按日期排序的Amazon DynamoDB表,将日志数据写入表中。 在Amazon DynamoDB表上执行EMR作业。 C. C. Create an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the daily folder. C. 创建一个 Amazon S3 存储桶,并按天将数据写入文件夹。每天在该文件夹上执行 EMR 作业。 D. D. Create an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the EMR job on the table. D. 创建一个基于EventID分区的Amazon DynamoDB表,将日志数据写入表中。执行EMR作业在该表上。 正确答案: C Correct answer is C as the EMR jobs needs to process daily data, it would be best to partition the data by day.,Refer AWS documentation – EMR Best Practices,Data partitioning is an essential optimization to your data processing workflow. Without any data partitioning in place, your data processing job needs to read or scan all available data sets and apply additional filters in order to skip unnecessary data. Such architecture might work for a low volume of data, but scanning the entire data set is a very time consuming and expensive approach for larger data sets. Data partitioning lets you create unique buckets of data and eliminate the need for a data processing job to read the entire data set.,Three considerations determine how you partition your data:,For instance, if you are processing a time-series data set where you need to process your data once every hour and your data-access pattern is based on time, partitioning your data based on date makes the most sense. An example of such data processing would be processing your daily logs. If you have incoming logs from variety of data sources (web servers, devices etc.), then creating partitions of data based on the hour of the day gives you a date-based partitioning scheme.,The structure of such partitioning scheme will look similar to the following:,/data/logs/YYYY-MM-DD-HH/logfiles for this given hour, where YYYY-MM-DD-HH changes based on the current log ingest time.,Option A is wrong as the data needs to be processed by day, it would be best to partition the data by date instead of device id.,Options B & D are wrong as DynamoDB is not an ideal solution for storage and archival of logs data and does not provide easy integration with EMR. 正确答案: C 正确答案是C,因为EMR作业需要处理每日数据,最好按天对数据进行分区。请参考AWS文档 – EMR最佳实践。 数据分区是数据处理工作流中的一项重要优化。如果没有进行任何数据分区,数据处理作业需要读取或扫描所有可用的数据集,并应用额外的过滤器以跳过不必要的数据。这种架构对于小数据量可能有效,但对于较大的数据集,扫描整个数据集是一个非常耗时且昂贵的方法。数据分区可以让你创建唯一的数据桶,并消除数据处理作业需要读取整个数据集的需求。 有三个考虑因素决定了如何对数据进行分区: 例如,如果你正在处理一个时间序列数据集,需要每小时处理一次数据,并且你的数据访问模式是基于时间的,那么按日期进行数据分区是最合适的。一个这样的数据处理示例是处理每日日志。如果你有来自各种数据源(如Web服务器、设备等)的日志数据,那么根据一天中的小时来创建数据分区将为你提供基于日期的分区方案。 这种分区方案的结构类似于以下内容: /data/logs/YYYY-MM-DD-HH/logfiles,表示给定小时的日志文件,其中YYYY-MM-DD-HH根据当前的日志接收时间变化。 选项A是错误的,因为数据需要按天处理,最好按日期而不是设备ID对数据进行分区。 选项B和D是错误的,因为DynamoDB并不是存储和归档日志数据的理想解决方案,并且与EMR的集成不方便。 70 / 100 分类: DBS 70. 70. Your company uses DynamoDB to support their mobile application and S3 to host the images and other documents shared between users. DynamoDB has a table with 60 partitions and is being heavily accessed by users. The queries run by users do not fully use the per-partition’s throughput. However there are times when in less than 3 minutes, a heavy load of queries flow in and this happen occasionally. Sometimes there are many background tasks that are running in background. How can DynamoDB be configured to handle the workload? 70. 你们公司使用DynamoDB来支持他们的移动应用程序,并使用S3来托管用户之间共享的图像和其他文档。DynamoDB有一个包含60个分区的表,并且正在被用户大量访问。用户执行的查询并没有完全利用每个分区的吞吐量。然而,有时在不到3分钟的时间内,会有大量查询涌入,这种情况偶尔发生。有时后台还会有许多任务在运行。如何配置DynamoDB以处理这些工作负载? A. A. Using Burst Capacity effectively A. 有效利用突发能力 B. B. Using Adaptive Capacity B. 使用适应能力 C. C. Design Partition Keys to distribute workload evenly C. 设计分区键以均匀分配工作负载 D. D. Using Write Sharding to distribute Workloads Evenly D. 使用写分片(Write Sharding)均匀分配工作负载 正确答案: A Correct answer is A as DynamoDB burst capacity can retain part of unused provisioned capacity, upto 5 minutes, allowing application to burst.,Refer AWS documentation – DynamoDB Best Practices,DynamoDB provides some flexibility in your per-partition throughput provisioning by providing burst capacity, as follows. Whenever you are not fully using a partition’s throughput, DynamoDB reserves a portion of that unused capacity for later bursts of throughput to handle usage spikes.,DynamoDB currently retains up to five minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly—even faster than the per-second provisioned throughput capacity that you’ve defined for your table.,DynamoDB can also consume burst capacity for background maintenance and other tasks without prior notice.,Option B is wrong as Adaptive Capacity enables your application to continue reading and writing to hot partitions without being throttled, provided that traffic does not exceed your table’s total provisioned capacity or the partition maximum capacity.,Option C is wrong as the partition keys are designed fine, as the application does not consume its total capacity and is not throttled.,Option D is wrong as Write sharding only allows for writes. 正确答案: A 正确答案是A,因为DynamoDB的突发容量可以保留部分未使用的预配置容量,最长可达5分钟,从而允许应用程序进行突发。 参考AWS文档 – DynamoDB最佳实践,DynamoDB通过提供突发容量为您的每个分区的吞吐量预配置提供了一定的灵活性,如下所示。每当您未完全使用一个分区的吞吐量时,DynamoDB会保留该未使用容量的一部分,以供以后进行吞吐量的突发使用,以应对使用峰值。 DynamoDB目前保留最多五分钟(300秒)的未使用读写容量。在偶尔的读写活动突发期间,这些额外的容量单位可以迅速被消耗——甚至比您为表定义的每秒预配置吞吐量还要快。 DynamoDB还可以在没有事先通知的情况下消耗突发容量用于后台维护和其他任务。 选项B是错误的,因为自适应容量使您的应用程序能够继续读取和写入热分区,而不受限制,前提是流量没有超过表的总预配置容量或分区最大容量。 选项C是错误的,因为分区键设计得很好,因为应用程序并未消耗其总容量且不会被限制。 选项D是错误的,因为写入分片仅允许进行写入操作。 71 / 100 分类: DBS 71. 71. An online photo album app has a key design feature to support multiple screens (e.g, desktop, mobile phone, and tablet) with high-quality displays. Multiple versions of the image must be saved in different resolutions and layouts. The image-processing Java program takes an average of five seconds per upload, depending on the image size and format. Each image upload captures the following image metadata: user, album, photo label, upload timestamp.The app should support the following requirements: Hundreds of user image uploads per second Maximum image upload size of 10 MB Maximum image metadata size of 1 KB Image displayed in optimized resolution in all supported screens no later than one minute after image upload Which strategy should be used to meet these requirements? 71. 一个在线相册应用程序具有一个关键设计特性,支持多个屏幕(例如,桌面、手机和平板)并提供高质量的显示效果。必须以不同的分辨率和布局保存图像的多个版本。图像处理的Java程序每次上传平均需要五秒钟,具体取决于图像的大小和格式。每次图像上传都会捕获以下图像元数据:用户、相册、照片标签、上传时间戳。 该应用程序应支持以下要求: 每秒数百次用户图像上传 最大图像上传大小为10 MB 最大图像元数据大小为1 KB 在所有支持的屏幕上,图像应在上传后不超过一分钟内以优化的分辨率显示 应该使用哪种策略来满足这些要求? A. A. Write image and metadata to RDS with BLOB data type. Use AWS Data Pipeline to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB. A. 将图像和元数据写入RDS,使用BLOB数据类型。使用AWS数据管道运行图像处理,并将图像输出保存到Amazon S3,将元数据保存到应用程序仓库数据库。 B. B. Write images and metadata to Amazon Kinesis. Use a Kinesis Client Library (KCL) application to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB. B. 将图像和元数据写入 Amazon Kinesis。使用 Kinesis 客户端库(KCL)应用程序运行图像处理,并将图像输出保存到 Amazon S3,将元数据保存到应用程序存储库数据库。 C. C. Write image and metadata to Amazon Kinesis. Use Amazon Elastic MapReduce (EMR) with Spark Streaming to run image processing and save the images output to Amazon S3 and metadata to app repository DB. C. 将图像和元数据写入 Amazon Kinesis。使用 Amazon Elastic MapReduce (EMR) 配合 Spark Streaming 进行图像处理,并将图像输出保存到 Amazon S3,将元数据保存到应用程序存储库数据库。 D. D. Upload image with metadata to Amazon S3, use Lambda function to run the image processing and save the images output to Amazon S3 and metadata to the app repository DB. D. 将带有元数据的图像上传到Amazon S3,使用Lambda函数运行图像处理,并将图像输出保存到Amazon S3,将元数据保存到应用程序仓库数据库。 正确答案: D Correct answer is D as the images with metadata can be uploaded to S3. S3 can support both the size and request rate. A Lambda function can be triggered to convert and same the images output back to S3 and metadata to app DB.,Option A is wrong as RDS is not an ideal storage for images as well as handle hundreds of uploads per second.,Options B & C are wrong as Kinesis support max message size of 1MB per record and would not be able to support 10MB of image files. 正确答案: D 正确答案是 D,因为带有元数据的图像可以上传到 S3。S3 可以支持图像的大小和请求速率。可以触发 Lambda 函数来转换并将图像输出保存回 S3,同时将元数据保存到应用数据库。 选项 A 错误,因为 RDS 不是理想的图像存储方式,也无法处理每秒数百次上传。 选项 B 和 C 错误,因为 Kinesis 支持每条记录最大 1MB 的消息大小,无法支持 10MB 的图像文件。 72 / 100 分类: DBS 72. 72. A company is using Amazon Machine Learning as part of a medical software application. The application will predict the most likely blood type for a patient based on a variety of other clinical tests that are available when blood type knowledge is unavailable. What is the appropriate model choice and target attribute combination for this problem? 72. 一家公司正在将 Amazon Machine Learning 用作医疗软件应用的一部分。该应用将根据在血型信息不可用时可以获取的其他临床测试,预测患者最可能的血型。对于这个问题,适当的模型选择和目标属性组合是什么? A. A. Multi-class classification model with a categorical target attribute. A. 带有类别目标属性的多类别分类模型。 B. B. Regression model with a numeric target attribute. B. 带有数值型目标属性的回归模型。 C. C. Binary Classification with a categorical target attribute. C. 具有分类目标属性的二元分类。 D. D. K-Nearest Neighbors model with a multi-class target attribute. D. K-最近邻模型与多类别目标属性。 正确答案: A Correct answer is A as the blood group types are limited, a multi-class classification model can help classification the result into the blood groups,Option B is wrong as regression is for predictive modelling,Option C is wrong as Binary classification can only classify a yes or no.,Option D is wrong as K-Nearest Neighbours is more for grouping unknown data. 正确答案: A 正确答案是 A,因为血型类型是有限的,多类分类模型可以帮助将结果分类到不同的血型中,选项 B 错误,因为回归模型用于预测建模,选项 C 错误,因为二分类只能分类为“是”或“否”,选项 D 错误,因为 K-最近邻更适用于对未知数据进行分组。 73 / 100 分类: DBS 73. 73. A company is developing a video application that will emit a log stream. Each record in the stream may contain up to 400 KB of data. To improve the video-streaming experience, it is necessary to collect a subset of metrics from the stream to be analyzed for trends over time using complex SQL queries. A Solutions Architect will create a solution that allows the application to scale without customer interaction. Which solution should be implemented to meet these requirements? 73. 一家公司正在开发一款视频应用程序,该程序将发出日志流。日志流中的每条记录可能包含最多400 KB的数据。为了改善视频流体验,有必要从流中收集一部分指标,以便通过复杂的SQL查询分析这些指标的趋势。解决方案架构师将创建一个解决方案,使得该应用程序可以在无需客户互动的情况下进行扩展。为了满足这些要求,应该实施哪种解决方案? A. A. Send the log data to an Amazon Kinesis Data Firehose delivery stream. Use an AWS Lambda function to transform the data. Deliver the data to Amazon Redshift. Query the data in Amazon Redshift. A. 将日志数据发送到 Amazon Kinesis Data Firehose 交付流。 使用 AWS Lambda 函数转换数据。 将数据交付到 Amazon Redshift。 在 Amazon Redshift 中查询数据。 B. B. Send the log data to an Amazon SQS standard queue. Make the queue an event source for an AWS Lambda function that transforms the data and stores it in Amazon Redshift. Query the data in Amazon Redshift. B. 将日志数据发送到 Amazon SQS 标准队列。将该队列作为 AWS Lambda 函数的事件源,该函数转换数据并将其存储在 Amazon Redshift 中。在 Amazon Redshift 中查询数据。 C. C. Send the log data to an Amazon CloudWatch Logs log group. Make the log group an event source for an AWS Lambda function that transforms the data and stores it in an Amazon S3 bucket. Query the data with Amazon Athena. C. 将日志数据发送到 Amazon CloudWatch Logs 日志组。将日志组设置为 AWS Lambda 函数的事件源,该函数对数据进行转换并将其存储在 Amazon S3 存储桶中。使用 Amazon Athena 查询数据。 D. D. Send the log data to an Amazon Kinesis data stream. Subscribe an AWS Lambda function to the stream that transforms the data and sends it to a second data stream. Use Amazon Kinesis Data Analytics to query the data in the second stream. D. 将日志数据发送到 Amazon Kinesis 数据流。订阅一个 AWS Lambda 函数到该数据流,函数会转换数据并将其发送到第二个数据流。使用 Amazon Kinesis 数据分析查询第二个数据流中的数据。 正确答案: D Correct answer is D as the data can be captured using Kinesis Data Stream and Kinesis Data Analytics can be used to query on the streaming data using time or window queries to generate trend analysis.,Refer AWS documentation – Streaming Analytics Pipeline,Many Amazon Web Services (AWS) customers use streaming data to gain real-time insight into customer activity and immediate business trends. Streaming data, which is generated continuously from thousands of data sources, includes a wide variety of data such as log files from your mobile or web applications, e-commerce purchases, in-game player activity, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices. This data can help companies make well-informed decisions and proactively respond to changing business conditions.,Amazon Kinesis, a platform for streaming data on AWS, offers powerful services that make it easier to build data processing applications, load massive volumes of streaming data, and analyze it in real time.,Options A, B & C are wrong as they do not provide analytics on streaming data. 正确答案: D 正确答案是D,因为数据可以通过Kinesis数据流捕获,Kinesis数据分析可以用于对流数据进行时间或窗口查询,以生成趋势分析。参考AWS文档 – 流媒体分析管道。 许多Amazon Web Services (AWS)客户使用流数据来实时洞察客户活动和即时业务趋势。流数据是从成千上万的数据源连续生成的,包含各种各样的数据,如来自您的移动或Web应用程序的日志文件、电子商务购买、游戏中的玩家活动、社交网络信息、金融交易场所或地理空间服务,以及来自连接设备的遥测数据。这些数据可以帮助公司做出明智的决策,并主动应对不断变化的商业环境。 Amazon Kinesis是AWS上的流数据平台,提供强大的服务,使得构建数据处理应用程序、加载大量流数据并实时分析变得更加容易。 选项A、B和C是错误的,因为它们没有提供对流数据的分析。 74 / 100 分类: DBS 74. 74. A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon Redshift. What is the most efficient architecture strategy for this purpose? 74. 一家制造公司的数据工程师正在设计一个数据处理平台,该平台接收大量非结构化数据。数据工程师必须在 Amazon Redshift 中填充一个结构良好的星型模式。为了这个目的,最有效的架构策略是什么? A. A. Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift. A. 使用 Amazon EMR 转换非结构化数据并生成 CSV 数据。 将 CSV 数据复制到 Redshift 中的分析模式。 B. B. Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema. B. 将非结构化数据加载到Redshift中,并使用字符串解析函数提取结构化数据以插入到分析模式中。 C. C. When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schema on Redshift. C. 当数据保存到 Amazon S3 时,使用 S3 事件通知和 AWS Lambda 转换文件内容。将数据插入到 Redshift 的分析架构中。 D. D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift. D. 使用AWS Marketplace ETL工具对数据进行规范化,将结果持久化到Amazon S3,并使用AWS Lambda将数据插入Redshift。 正确答案: A Correct answer is A as the data volume is large, it can be processed using EMR to generate structured CSV data and then load the data into Redshift.,Refer AWS documentation – Data Warehousing on AWS,Data in Amazon Redshift must be structured by a defined schema. Amazon Redshift doesn’t support an arbitrary schema structure for each row. If your data is unstructured, you can perform extract, transform, and load (ETL) on Amazon EMR to get the data ready for loading into Amazon Redshift. For JSON data, you can store key value pairs and use the native JSON functions in your queries.,Option B is wrong as unstructured data cannot be loaded into Redshift.,Option C is wrong as Lambda would not be able to handle large amounts of data due to its limitation.,Option D is wrong as market ETL tool is not needed and EMR can be used. 正确答案: A 正确答案是 A,因为数据量很大,可以使用 EMR 处理数据,生成结构化的 CSV 数据,然后将数据加载到 Redshift。 参考 AWS 文档 – AWS 上的数据仓库,Amazon Redshift 中的数据必须由定义的模式进行结构化。Amazon Redshift 不支持每一行具有任意模式结构。如果您的数据是非结构化的,您可以在 Amazon EMR 上执行提取、转换和加载 (ETL),以使数据准备好加载到 Amazon Redshift。对于 JSON 数据,您可以存储键值对,并在查询中使用原生 JSON 函数。 选项 B 错误,因为非结构化数据无法加载到 Redshift。 选项 C 错误,因为 Lambda 由于其限制,无法处理大量数据。 选项 D 错误,因为不需要市场上的 ETL 工具,可以使用 EMR。 75 / 100 分类: DBS 75. 75. You are deploying an application to track GPS coordinates of delivery trucks in the United States. Coordinates are transmitted from each delivery truck once every three seconds. You need to design an architecture that will enable real-time processing of these coordinates from multiple consumers. Which service should you use to implement data ingestion? 75. 你正在部署一个应用程序,用于跟踪美国送货卡车的GPS坐标。每辆送货卡车每三秒钟会传输一次坐标。你需要设计一个架构,以便实现来自多个消费者的这些坐标的实时处理。你应该使用哪个服务来实现数据摄取? A. A. Amazon Kinesis A. 亚马逊 Kinesis B. B. AWS Data Pipeline B. AWS 数据管道 C. C. Amazon AppStream C. 亚马逊 AppStream D. D. Amazon Simple Queue Service D. 亚马逊简单队列服务 正确答案: A Key point here is address real time data ingestion.,Correct answer is A,Amazon Kinesis is a platform for streaming data on AWS, making it easy to load and analyze streaming data, and also providing the ability for you to build custom streaming data applications for specialized needs.,Option B is wrong as Data Pipeline is more of an orchestration service and just helps move data between different data stores.,Option C is wrong as Amazon AppStream is an application streaming service that lets you stream your existing resource-intensive applications from the cloud without code modifications.,Option D is wrong as SQS would not be able to handle large scale real time ingestion. 正确答案: A 关键点在于处理实时数据摄取。 正确答案是A,Amazon Kinesis是AWS上的一个流数据平台,使得加载和分析流数据变得容易,并且还提供了构建定制流数据应用程序的能力,以满足特殊需求。 选项B是错误的,因为Data Pipeline更多的是一个编排服务,仅仅帮助将数据在不同的数据存储之间传输。 选项C是错误的,因为Amazon AppStream是一个应用程序流服务,允许你从云中流式传输现有的资源密集型应用程序,而无需修改代码。 选项D是错误的,因为SQS无法处理大规模的实时摄取。 76 / 100 分类: DBS 76. 76. You have a customer-facing application running on multiple M3 instances in two AZs. These instances are in an auto-scaling group configured to scale up when load increases. After taking a look at your CloudWatch metrics, you realize that during specific times every single day, the auto-scaling group has a lot more instances than it normally does. Despite this, one of your customers is complaining that the application is very slow to respond during those time periods every day. The application is reading and writing to a DynamoDB table which has 400 Write Capacity Units and 400 Read Capacity Units. The primary key is the company ID, and the table is storing roughly 20 TB of data. Which solution would solve the issue in a scalable and cost-effective manner? 76. 您有一个面向客户的应用程序,在两个可用区的多个M3实例上运行。这些实例属于一个自动扩展组,在负载增加时配置为扩展。当您查看CloudWatch指标时,您意识到每天的特定时间,自动扩展组的实例数量远远多于通常的数量。尽管如此,其中一位客户抱怨说,在这些时间段内,应用程序的响应非常慢。该应用程序正在读取和写入一个DynamoDB表,该表具有400个写入容量单元和400个读取容量单元。主键是公司ID,该表存储了大约20 TB的数据。哪种解决方案可以以可扩展且具有成本效益的方式解决此问题? A. A. Use data pipelines to migrate your DynamoDB table to a new DynamoDB table with a different primary key that evenly distributes the dataset across the table. A. 使用数据管道将您的DynamoDB表迁移到一个新的DynamoDB表,该表具有不同的主键,能够均匀分布数据集。 B. B. Add a caching layer in front of the web application with ElastiCache Memcached, or Redis. B. 在 Web 应用程序前添加一个缓存层,使用 ElastiCache Memcached 或 Redis。 C. C. DynamoDB is not a good solution for this use case. Instead, create a data pipeline to move data from DynamoDB to Amazon RDS, which is more suitable for this. C. DynamoDB 不是这个用例的最佳解决方案。相反,创建一个数据管道,将数据从 DynamoDB 移动到 Amazon RDS,因为后者更适合这个场景。 D. D. Double the number of Read and Write Capacity Units. The DynamoDB table is being throttled when customers from the same company all use the table at the same time. D. 将读写容量单元数量翻倍。当同一公司中的多个客户同时使用 DynamoDB 表时,表的访问被限制。 正确答案: A Correct answer is A as a single company is facing the issue and it would be a hot key issue cause of the primary key being Company ID. Data Pipeline can be used to migrate the data.,Option B is wrong as Elasticache may be reduce the load depending upon the queries.,Option C is wrong as RDS would not be able to handle the huge amount of data.,Option D is wrong as this is not a cost-effective solution. 正确答案: A 正确答案是A,因为只有一个公司面临这个问题,而且由于主键是公司ID,这将是一个热点问题。数据管道可以用来迁移数据。 选项B是错误的,因为Elasticache可能会根据查询减少负载。 选项C是错误的,因为RDS无法处理大量数据。 选项D是错误的,因为这不是一个具有成本效益的解决方案。 77 / 100 分类: DBS 77. 77. Your enterprise application requires key-value storage as the database. The data is expected to be about 10 GB the first month and grow to 2 PB over the next two years. There are no other query requirements at this time. What solution would you recommend? 77. 你的企业应用需要使用键值存储作为数据库。预计数据在第一个月约为10 GB,并将在接下来的两年内增长到2 PB。目前没有其他查询要求。你会推荐什么解决方案? A. A. Hive on HDFS A. Hive 在 HDFS 上 B. B. RDS MySQL B. RDS MySQL C. C. HBase on HDFS C. HBase在HDFS上 D. D. Hadoop with Spark D. Hadoop 与 Spark 正确答案: C Correct answer is C as HBase on HDFS provide the ability the store the large amount of data in a non-relational key-value format.,Refer AWS documentation – EMR HBase,HBase is an open source, non-relational, distributed database developed as part of the Apache Software Foundation’s Hadoop project. HBase runs on top of Hadoop Distributed File System (HDFS) to provide non-relational database capabilities for the Hadoop ecosystem.,HBase works seamlessly with Hadoop, sharing its file system and serving as a direct input and output to the MapReduce framework and execution engine. HBase also integrates with Apache Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support for Java Database Connectivity (JDBC).,Option B is wrong as RDS would not support huge amount of data and is a relational database.,Options A & D is wrong as they do not provide key-value storage format. 正确答案: C 正确答案是C,因为HBase基于HDFS,能够以非关系型键值格式存储大量数据。参考AWS文档 – EMR HBase。 HBase是一个开源的、非关系型的、分布式数据库,是Apache软件基金会Hadoop项目的一部分。HBase运行在Hadoop分布式文件系统(HDFS)之上,为Hadoop生态系统提供非关系型数据库功能。 HBase与Hadoop无缝协作,共享其文件系统,并作为MapReduce框架和执行引擎的直接输入和输出。HBase还与Apache Hive集成,支持对HBase表进行SQL-like查询、与基于Hive的表连接,并支持Java数据库连接(JDBC)。 选项B是错误的,因为RDS不支持大量数据,并且是关系型数据库。 选项A和D是错误的,因为它们不提供键值存储格式。 78 / 100 分类: DBS 78. 78. A system needs to collect on-premises application spool files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is automatically deleted from the local server after an hour. What is the most cost-efficient option to meet these requirements? 78. 一个系统需要将本地应用程序的打印队列文件收集到AWS中的持久存储层。每个打印队列文件为2 KB。该应用程序每小时生成1百万个文件。每个源文件在一小时后会从本地服务器自动删除。最具成本效益的选项是什么? A. A. Write file contents to an Amazon DynamoDB table. A. 将文件内容写入Amazon DynamoDB表。 B. B. Copy files to Amazon S3 Standard Storage. B. 将文件复制到Amazon S3标准存储。 C. C. Write file contents to Amazon ElastiCache. C. 将文件内容写入 Amazon ElastiCache。 D. D. Copy files to Amazon S3 infrequent Access Storage. D. 将文件复制到 Amazon S3 不常访问存储。 正确答案: A Correct answer is A as the provisioned throughput required for DynamoDB would be most cost efficient as compared to the PUT requests for S3.,DynamoDB Calculation vs S3 Calculation (Considering 31 days),Option C is wrong as ElastiCache is not ideal storage, but its more ideal for caching,Options B & D are wrong as PUT operations on S3 for small files would be expensive as compared to DynamoDB inserts. 正确答案: A 正确答案是 A,因为为 DynamoDB 配置的吞吐量相比 S3 的 PUT 请求将更具成本效益。 DynamoDB 计算与 S3 计算(考虑31天),选项 C 错误,因为 ElastiCache 不是理想的存储方案,更适合用于缓存。 选项 B 和 D 错误,因为在 S3 上执行小文件的 PUT 操作比在 DynamoDB 上执行插入操作更昂贵。 79 / 100 分类: DBS 79. 79. A data engineer needs to architect a data warehouse for an online retail company to store historic purchases. The data engineer needs to use Amazon Redshift. To comply with PCI:DSS and meet corporate data protection standards, the data engineer must ensure that data is encrypted at rest and that the keys are managed by a corporate on-premises HSM. Which approach meets these requirements in the most cost-effective manner? 79. 一名数据工程师需要为一家在线零售公司设计一个数据仓库,用于存储历史购买数据。该数据工程师需要使用Amazon Redshift。为了遵守PCI:DSS并满足企业数据保护标准,数据工程师必须确保数据在静态时被加密,并且密钥由企业本地HSM进行管理。哪种方法能够以最具成本效益的方式满足这些要求? A. A. Create a VPC, and then establish a VPN connection between the VPC and the on-premises network. Launch the Amazon Redshift cluster in the VPC, and configure it to use your corporate HSM. A. 创建一个VPC,然后在VPC与本地网络之间建立VPN连接。在VPC中启动Amazon Redshift集群,并配置它以使用您的企业HSM。 B. B. Use the AWS CloudHSM service to establish a trust relationship between the CloudHSM and the corporate HSM over a Direct Connect connection. Configure Amazon Redshift to use the CloudHSM device. B. 使用AWS CloudHSM服务通过Direct Connect连接在CloudHSM和公司HSM之间建立信任关系。配置Amazon Redshift使用CloudHSM设备。 C. C. Configure the AWS Key Management Service to point to the corporate HSM device, and then launch the Amazon Redshift cluster with the KMS managing the encryption keys. C. 配置AWS密钥管理服务指向公司HSM设备,然后启动Amazon Redshift集群,由KMS管理加密密钥。 D. D. Use AWS Import/Export to import the corporate HSM device into the AWS Region where the Amazon Redshift cluster will launch, and configure Redshift to use the imported HSM. D. 使用AWS导入/导出将企业HSM设备导入到Amazon Redshift集群将要启动的AWS区域,并配置Redshift使用导入的HSM。 正确答案: A Correct answer is A as Amazon Redshift can use an on-premises HSM for key management over the VPN, which ensures that the encryption keys are locally managed.,Option B is wrong as although it is possible as CloudHSM can cluster to an on-premises HSM. But then key management could be performed on either the on-premises HSM or CloudHSM, and that doesn’t meet the design goal.,Option C is wrong as does not describe a valid feature of KMS and violates the requirement for the corporate HSM to manage the keys requirement, even if it were possible.,Option D is wrong as it is not possible because you cannot put hardware into an AWS Region. 正确答案: A 正确答案是A,因为Amazon Redshift可以通过VPN使用本地的HSM进行密钥管理,这确保了加密密钥由本地管理。 选项B是错误的,尽管CloudHSM可以与本地HSM集群,但密钥管理可以在本地HSM或CloudHSM上进行,这不符合设计目标。 选项C是错误的,因为它没有描述KMS的有效功能,并且违反了公司HSM管理密钥的要求,即使它可能是可行的。 选项D是错误的,因为这是不可能的,因为你不能将硬件放入AWS区域。 80 / 100 分类: DBS 80. 80. Your application generates a 1 KB JSON payload that needs to be queued and delivered to EC2 instances for applications. At the end of the day, the application needs to replay the data for the past 24 hours. In the near future, you also need the ability for other multiple EC2 applications to consume the same stream concurrently. What is the best solution for this? 80. 您的应用程序生成一个 1 KB 的 JSON 负载,需要将其排队并传送到 EC2 实例以供应用程序使用。每天结束时,应用程序需要重放过去 24 小时的数据。在不久的将来,您还需要其他多个 EC2 应用程序能够同时消费相同的数据流。对此,最佳解决方案是什么? A. A. Kinesis Data Streams A. Kinesis 数据流 B. B. Kinesis Firehose B. Kinesis Firehose C. C. SNS C. 社交网络服务 D. D. SQS D. SQS 正确答案: A Correct answer is A as Kinesis Data Streams allows the ability for replaying the data as well access to the same data to multiple Kinesis client applications.,Refer AWS documentation – Kinesis Data Streams FAQ,Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs. You can continuously add various types of data such as clickstreams, application logs, and social media to an Amazon Kinesis data stream from hundreds of thousands of sources. Within seconds, the data will be available for your Amazon Kinesis Applications to read and process from the stream.,Amazon Kinesis Data Streams enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis data stream (for example, to perform counting, aggregation, and filtering).,Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows.,Option B is wrong as Kinesis Firehose only allows data transfer to S3, Redshift. It does not provide the replay capability or buffering of messages,Option C is wrong as SNS can provide the message to multiple subscribers, however it cannot replay the data.,Option D is wrong SQS does not provide the ability to replay or access to multiple consumer. 正确答案: A 正确答案是A,因为Kinesis数据流允许重放数据并且可以为多个Kinesis客户端应用程序访问相同的数据。参考AWS文档 – Kinesis数据流常见问题解答。 Amazon Kinesis Data Streams使您能够构建自定义应用程序,处理或分析流数据以满足特殊需求。您可以从成千上万个数据源中将各种类型的数据(例如点击流、应用程序日志和社交媒体数据)持续添加到Amazon Kinesis数据流中。几秒钟内,数据将可供您的Amazon Kinesis应用程序从流中读取和处理。 Amazon Kinesis Data Streams实现了流式大数据的实时处理。它提供记录的顺序处理能力,以及按相同顺序读取和/或重放记录到多个Amazon Kinesis应用程序的能力。Amazon Kinesis客户端库(KCL)将所有记录按给定的分区键交付到同一个记录处理器,使得从同一个Amazon Kinesis数据流读取数据的多个应用程序更容易(例如进行计数、聚合和过滤)。 Amazon Simple Queue Service(Amazon SQS)提供可靠、高度可扩展的托管队列,用于在计算机之间传输消息。Amazon SQS使您能够轻松地在分布式应用程序组件之间移动数据,并帮助您构建消息独立处理的应用程序(具有消息级确认/失败语义),例如自动化工作流。 选项B是错误的,因为Kinesis Firehose仅允许将数据传输到S3、Redshift。它不提供重放能力或消息缓冲。 选项C是错误的,因为SNS可以将消息提供给多个订阅者,但它不能重放数据。 选项D是错误的,因为SQS不提供重放或访问多个消费者的能力。 81 / 100 分类: DBS 81. 81. An organization is designing an application architecture. The application will have over 100 TB of data and will support transactions that arrive at rates from hundreds per second to tens of thousands per second, depending on the day of the week and time of the day. All transaction data must be durably and reliably stored. Certain read operations must be performed with strong consistency. Which solution meets these requirements? 81. 一个组织正在设计一个应用架构。该应用将拥有超过100 TB的数据,并且支持每秒数百到数万个的交易请求,具体取决于星期几和一天中的时间。所有交易数据必须可靠且持久地存储。某些读取操作必须在强一致性的要求下执行。哪种解决方案能够满足这些需求? A. A. Use Amazon DynamoDB as the data store and use strongly consistent reads when necessary. A. 使用 Amazon DynamoDB 作为数据存储,并在必要时使用强一致性读取。 B. B. Use an Amazon Relational Database Service (RDS) instance sized to meet the maximum anticipated transaction rate and with the High Availability option enabled. B. 使用一个适当大小的Amazon关系数据库服务(RDS)实例,以满足最大预期交易率,并启用高可用性选项。 C. C. Deploy a NoSQL data store on top of an Amazon Elastic MapReduce (EMR) cluster, and select the HDFS High Durability option. C. 在 Amazon Elastic MapReduce (EMR) 集群上部署一个 NoSQL 数据存储,并选择 HDFS 高耐久性选项。 D. D. Use Amazon Redshift with synchronous replication to Amazon Simple Storage Service (S3) and row-level locking for strong consistency. D. 使用Amazon Redshift与同步复制到Amazon Simple Storage Service (S3)以及行级锁定以确保强一致性。 正确答案: A Correct answer is A as DynamoDB can store and handle the transactions. DynamoDB also supports strongly consistent reads. DynamoDB is also a managed AWS service.,Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s a fully managed, multiregion, multimaster database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second.,Option B is wrong as RDS does not meet the storage and request handling rate.,Option C is wrong as NoSQL datastore would need resources and handling. Also, EMR HDFS option is not a cost-effective option.,Option D is wrong ad Redshift is a data warehouse solution and not ideal to handle high frequency data collection. 正确答案: A 正确答案是A,因为DynamoDB可以存储和处理事务。DynamoDB还支持强一致性读取。DynamoDB也是一个托管的AWS服务。 Amazon DynamoDB是一个键值和文档数据库,能够在任何规模下提供单数字毫秒级性能。它是一个完全托管的、多区域、多主数据库,具有内置的安全性、备份和恢复功能,并为互联网规模的应用提供内存缓存。DynamoDB每天可以处理超过10万亿个请求,并支持超过每秒2000万个请求的高峰。 选项B是错误的,因为RDS无法满足存储和请求处理速率的需求。 选项C是错误的,因为NoSQL数据存储需要资源和处理。同时,EMR HDFS选项不是一种成本效益高的选择。 选项D是错误的,因为Redshift是一个数据仓库解决方案,并不适合处理高频数据收集。 82 / 100 分类: DBS 82. 82. A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data engineer needs to make sure it complies with regulatory requirements. The auditor must be able to confirm at any point which servers are running and which network access controls are deployed. Which action should the data engineer take to meet this requirement? 82. 一名数据工程师希望为一个应用程序使用 Amazon Elastic Map Reduce。该数据工程师需要确保它符合监管要求。审计员必须能够随时确认哪些服务器正在运行,以及哪些网络访问控制已部署。数据工程师应该采取什么措施来满足这一要求? A. A. Provide the auditor IAM accounts with the SecurityAudit policy attached to their group. A. 为审计员的IAM账户提供附加了SecurityAudit策略的组。 B. B. Provide the auditor with SSH keys for access to the Amazon EMR cluster. B. 提供审计员SSH密钥以访问Amazon EMR集群。 C. C. Provide the auditor with CloudFormation templates. C. 向审计员提供CloudFormation模板。 D. D. Provide the auditor with access to AWS DirectConnect to use their existing tools. D. 提供审计员访问AWS DirectConnect的权限,以便使用他们现有的工具。 正确答案: A Correct answer is A as the SecurityAudit managed policy can provide the Auditors with the read only access to AWS Services.,Option B is wrong as providing SSH keys is not a good practice.,Option C is wrong as it does not mention the Cluster was setup using CloudFormation. Also, CloudFormation templates may not give the actual picture of whats deployed.,Option D is wrong as Direct Connect does not provide access to tools and it is still control using IAM. 正确答案: A 正确答案是A,因为SecurityAudit托管策略可以为审计员提供只读访问AWS服务的权限。 选项B是错误的,因为提供SSH密钥不是一个好的做法。 选项C是错误的,因为它没有提到集群是通过CloudFormation设置的。此外,CloudFormation模板可能无法给出实际部署的情况。 选项D是错误的,因为Direct Connect不提供对工具的访问,并且仍然是通过IAM进行控制。 83 / 100 分类: DBS 83. 83. You are using IOT sensors to monitor the movement of a group of hikers on a three day trek and send the information into an Kinesis Stream. They each have a sensor in their shoe and you know for certain that there is no problem with mobile coverage so all the data is getting back to the stream. You have used default settings for the stream. At the end of the third day the data is sent to an S3 bucket. When you go to interpret the data in S3 there is only data for the last day and nothing for the first 2 days. Which of the following is the most probable cause of this? 83. 你正在使用物联网传感器监控一组徒步旅行者在三天旅行中的运动情况,并将信息发送到Kinesis流中。每个人的鞋子里都有一个传感器,你可以确定手机信号覆盖没有问题,因此所有数据都能顺利返回流中。你使用了流的默认设置。第三天结束时,数据被发送到一个S3存储桶。当你去S3中解析数据时,只能看到最后一天的数据,前两天的数据却没有。以下哪项是最可能导致这种情况的原因? A. A. Temporary loss of mobile coverage; although mobile coverage was good in the area, even temporary loss of data will stop the streaming A. 临时失去移动网络信号;尽管该地区的移动信号良好,但即使是短暂的数据丢失也会导致流媒体停止播放。 B. B. You cannot send Kinesis data to the same bucket on consecutive days if you do not have versioning enabled on the bucket. If you don’t have versioning enabled you would need to define 3 different buckets or else the data is overwritten each day B. 如果您没有在存储桶上启用版本控制,则无法在连续的两天将Kinesis数据发送到同一个存储桶。如果没有启用版本控制,您需要定义三个不同的存储桶,否则数据会在每天覆盖。 C. C. Data records are only accessible for a default of 24 hours from the time they are added to a stream. C. 数据记录仅在添加到流中的24小时内可访问,默认情况下为24小时。 D. D. A sensor probably stopped working on the second day. If one sensor fails, no data is sent to the stream until that sensor is fixed D. 传感器可能在第二天停止工作。如果一个传感器出现故障,在该传感器修复之前,数据不会发送到流中。 正确答案: C Correct answer is C as by default, Kinesis stores the records for 24 hours only.,Refer AWS documentation – Kinesis FAQs,By default, Records of a stream are accessible for up to 24 hours from the time they are added to the stream. You can raise this limit to up to 7 days by enabling extended data retention. 正确答案: C 正确答案是C,因为默认情况下,Kinesis 只会存储记录 24 小时。 请参考 AWS 文档 – Kinesis 常见问题解答。 默认情况下,流的记录在添加到流中后的 24 小时内可访问。您可以通过启用延长数据保留来将此限制提高到最多 7 天。 84 / 100 分类: DBS 84. 84. A research scientist is planning for the one-time launch of an Elastic MapReduce cluster and is encouraged by her manager to minimize the costs. The cluster is designed to ingest 200TB of genomics data with a total of 100 Amazon EC2 instances and is expected to run for around four hours. The resulting data set must be stored temporarily until archived into an Amazon RDS Oracle instance. Which option will help save the most money while meeting requirements? 84. 一名研究科学家计划一次性启动一个Elastic MapReduce集群,并受到她经理的鼓励,尽量减少成本。该集群旨在处理200TB的基因组数据,共有100个Amazon EC2实例,预计运行约四小时。生成的数据集必须暂时存储,直到归档到Amazon RDS Oracle实例中。哪种选项能够在满足要求的同时节省最多的费用? A. A. Store ingest and output files in Amazon S3. Deploy on-demand for the master and core nodes and spot for the task nodes. A. 将输入和输出文件存储在Amazon S3中。为主节点和核心节点按需部署,为任务节点部署为临时节点。 B. B. Optimize by deploying a combination of on-demand, RI and spot-pricing models for the master, core and task nodes. Store ingest and output files in Amazon S3 with a lifecycle policy that archives them to Amazon Glacier. B. 通过为主节点、核心节点和任务节点部署按需、RI 和现货定价模型的组合来进行优化。将摄取和输出文件存储在 Amazon S3 中,并使用生命周期策略将其归档到 Amazon Glacier。 C. C. Store the ingest files in Amazon S3 RRS and store the output files in S3. Deploy Reserved Instances for the master and core nodes and on-demand for the task nodes. C. 将摄取文件存储在Amazon S3 RRS中,将输出文件存储在S3中。为主节点和核心节点部署预留实例,为任务节点部署按需实例。 D. D. Deploy on-demand master, core and task nodes and store ingest and output files in Amazon S3 RRS D. 按需部署主节点、核心节点和任务节点,并将输入和输出文件存储在 Amazon S3 RRS 中。 正确答案: A Key point here is to save most money while being able to process the huge data.,Correct answer is A as it follows best practice of using On demand for master and core and spot for task nodes also help reduce cost using spot instances.,Refer AWS documentation – EMR Instances,Option B is wrong as RI will make it expensive as there is no consistent requirement,Option C is wrong as RI will make it expensive as there is no consistent requirement.,Option D is wrong as input should be in S3 standard, as re-ingesting the input data might end up being more costly then holding the data for limited time in standard S3 正确答案: A 这里的关键是节省大部分费用,同时能够处理庞大的数据。 正确答案是A,因为它遵循了最佳实践,即为主节点和核心节点使用按需实例,为任务节点使用竞价实例,从而帮助降低成本。 请参考AWS文档 – EMR实例。 选项B是错误的,因为预留实例(RI)会使其变得昂贵,因为没有持续的需求。 选项C是错误的,因为预留实例(RI)会使其变得昂贵,因为没有持续的需求。 选项D是错误的,因为输入数据应该存储在S3标准存储中,因为重新加载输入数据可能比将数据在标准S3中保存有限时间更为昂贵。 85 / 100 分类: DBS 85. 85. A company needs to deploy a data lake solution for their data scientists in which all company data is accessible and stored in a central S3 bucket. The company segregates the data by business unit, using specific prefixes. Scientists can only access the data from their own business unit. The company needs a single sign-on identity and management solution based on Microsoft Active Directory (AD) to manage access to the data in Amazon S3. Which method meets these requirements? 85. 一家公司需要为其数据科学家部署一个数据湖解决方案,其中所有公司数据都可以访问并存储在一个中央S3桶中。公司按照业务部门划分数据,使用特定的前缀。科学家只能访问自己业务部门的数据。公司需要一个基于Microsoft Active Directory (AD)的单一登录身份和管理解决方案来管理对Amazon S3中数据的访问。哪种方法符合这些要求? A. A. Use AWS IAM Federation functions and specify the associated role based on the users’ groups in AD. A. 使用AWS IAM联邦功能,并根据AD中用户的组指定相关角色。 B. B. Create bucket policies that only allow access to the authorized prefixes based on the users’ group name in Active Directory. B. 创建仅允许访问基于用户在Active Directory中的组名的授权前缀的桶策略。 C. C. Deploy the AD Synchronization service to create AWS IAM users and groups based on AD information. C. 部署AD同步服务,根据AD信息创建AWS IAM用户和组。 D. D. Use Amazon S3 API integration with AD to impersonate the users on access in a transparent manner. D. 使用Amazon S3 API与AD的集成,在访问时以透明的方式模拟用户。 正确答案: A Correct answer is A as Identity Federation allows organizations to associate temporary credentials to users authenticated through an external identity provider such as Microsoft Active Directory (AD). These temporary credentials are linked to AWS IAM roles that grant access to the S3 bucket.,Refer AWS documentation – S3 Cross Account Access (Role to S3 Bucket with Prefix access can be configured similarly),Option B is wrong as it does not work because bucket policies are linked to IAM principles and cannot recognize AD attributes.,Option C is wrong as it does not work because AD Synchronization will not sync directly with AWS IAM, and custom synchronization would not result in Amazon S3 being able to see group information.,Option D is wrong as it isn’t possible because there is no feature to integrate Amazon S3 directly with external identity providers.,Correct answer is B as NOLOAD checks the integrity of all of the data without loading it into the database. The NOLOAD option displays any errors that would occur if you had attempted to load the data. All other options will require subsequent processing on the cluster which will consume resources.,Refer AWS documentation – Data Load Copy Parameters,If you want to validate your data without actually loading the table, use the NOLOAD option with the COPY command. 正确答案: A 正确答案是A,因为身份联合允许组织将临时凭证与通过外部身份提供者(如Microsoft Active Directory (AD))验证的用户相关联。这些临时凭证与AWS IAM角色关联,这些角色授予访问S3存储桶的权限。参考AWS文档 – S3跨账户访问(可以类似配置角色到S3存储桶与前缀访问) 选项B是错误的,因为它不起作用,因为存储桶策略与IAM原则关联,无法识别AD属性。 选项C是错误的,因为它不起作用,因为AD同步不会直接与AWS IAM同步,并且自定义同步不会导致Amazon S3能够看到组信息。 选项D是错误的,因为这是不可能的,因为没有将Amazon S3直接与外部身份提供者集成的功能。 正确答案是B,因为NOLOAD选项检查所有数据的完整性,而无需将其加载到数据库中。NOLOAD选项会显示如果尝试加载数据时可能出现的错误。所有其他选项将需要在集群上进行后续处理,这将消耗资源。 参考AWS文档 – 数据加载复制参数 如果您想验证数据而不实际加载表格,请在COPY命令中使用NOLOAD选项。 86 / 100 分类: DBS 86. 86. A company has several teams of analysts. Each team of analysts has their own cluster. The teams need to run SQL queries using Hive, Spark-SQL, and Presto with Amazon EMR. The company needs to enable a centralized metadata layer to expose the Amazon S3 objects as tables to the analysts. Which approach meets the requirement for a centralized metadata layer? 86. 一家公司有多个分析团队。每个分析团队都有自己的集群。团队需要使用Hive、Spark-SQL和Presto在Amazon EMR上运行SQL查询。公司需要启用一个集中式元数据层,将Amazon S3对象作为表暴露给分析人员。哪种方法可以满足集中式元数据层的要求? A. A. EMRFS consistent view with a common Amazon DynamoDB table A. EMRFS 与公共 Amazon DynamoDB 表的一致视图 B. B. Bootstrap action to change the Hive Metastore to an Amazon RDS database B. 启动操作将 Hive 元数据存储更改为 Amazon RDS 数据库 C. C. s3distcp with the output Manifest option to generate RDS DDL C. 使用带有输出清单选项的s3distcp生成RDS DDL D. D. Naming scheme support with automatic partition discovery from Amazon S3 D. 支持命名方案,并自动发现来自Amazon S3的分区 正确答案: A Correct answer is A as EMRFS consistent view using a DynamoDB table can be implemented using a separate common DynamoDB table.,Refer AWS documentation – EMRFS Metadata,EMRFS consistent view tracks consistency using a DynamoDB table to track objects in Amazon S3 that have been synced with or created by EMRFS. The metadata is used to track all operations (read, write, update, and copy), and no actual content is stored in it. This metadata is used to validate whether the objects or metadata received from Amazon S3 matches what is expected. This confirmation gives EMRFS the ability to check list consistency and read-after-write consistency for new objects EMRFS writes to Amazon S3 or objects synced with EMRFS. Multiple clusters can share the same metadata. 正确答案: A 正确答案是 A,因为使用 DynamoDB 表可以实现 EMRFS 一致性视图,并且可以通过一个单独的公共 DynamoDB 表来实现。 参见 AWS 文档 – EMRFS 元数据 EMRFS 一致性视图通过使用 DynamoDB 表来跟踪 Amazon S3 中的对象,这些对象已经与 EMRFS 同步或由 EMRFS 创建。元数据用于跟踪所有操作(读取、写入、更新和复制),并且不会在其中存储实际内容。该元数据用于验证从 Amazon S3 接收的对象或元数据是否与预期的内容匹配。此确认使 EMRFS 能够检查列表一致性,并且可以为 EMRFS 写入到 Amazon S3 或与 EMRFS 同步的新对象提供写后读一致性。 多个集群可以共享相同的元数据。 87 / 100 分类: DBS 87. 87. A Company has two batch processing applications that consume financial data about the day’s stock transactions. Each transaction needs to be stored durably and guarantee that a record of each application is delivered so the audit and billing batch processing applications can process the data. However, the two applications run separately and several hours apart and need access to the same transaction information. After reviewing the transaction information for the day, the information no longer needs to be stored. What is the best way to architect this application? Choose the correct answer from the options below 87. 一家公司有两个批处理应用程序,它们消耗关于当天股票交易的财务数据。每个交易需要持久化存储,并保证每个应用程序的记录都能够被传递,以便审计和计费批处理应用程序能够处理数据。然而,这两个应用程序是分开运行的,且相隔几个小时,需要访问相同的交易信息。在审查完当天的交易信息后,该信息不再需要存储。最佳的架构方式是什么?请选择以下选项中的正确答案。 A. A. Use SQS for storing the transaction messages. When the billing batch process consumes each message, have the application create an identical message and place it in a different SQS for the audit application to use several hours later. A. 使用SQS存储交易消息。当计费批处理进程消费每条消息时,让应用程序创建一条相同的消息,并将其放入另一个SQS中,以便审计应用程序在几个小时后使用。 B. B. Use SQS for storing the transaction messages; when the billing batch process performs first and consumes the message, write the code in a way that does not remove the message after consumed, so it is available for the audit application several hours later. The audit application can consume the SQS message and remove it from the queue when completed. B. 使用SQS存储事务消息;当账单批处理过程首先执行并消费消息时,请编写代码,以便在消费后不删除消息,这样几小时后审核应用程序仍然可以使用该消息。审核应用程序可以消费SQS消息,并在完成后将其从队列中移除。 C. C. Store the transaction information in a DynamoDB table. The billing application can read the rows while the audit application will read the rows them remove the data. C. 将交易信息存储在DynamoDB表中。计费应用程序可以读取这些行,而审计应用程序将读取这些行并删除数据。 D. D. Use Kinesis to store the transaction information. The billing application will consume data from the stream, the audit application can consume the same data several hours later. D. 使用 Kinesis 存储交易信息。计费应用程序将从流中获取数据,审计应用程序可以在几个小时后获取相同的数据。 正确答案: D Correct answer is D as the key point here is batch application and message being stored durably and delivery guarantee. Kinesis can store the data durably and allow access to multiple consumers without any dependencies.,Refer AWS documentation – Kinesis Data Streams,Q: How does Amazon Kinesis Data Streams differ from Amazon SQS?,Amazon Kinesis Data Streams enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis data stream (for example, to perform counting, aggregation, and filtering).,Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows.,Q: When should I use Amazon Kinesis Data Streams, and when should I use Amazon SQS?,We recommend Amazon Kinesis Data Streams for use cases with requirements that are similar to the following:,We recommend Amazon SQS for use cases with requirements that are similar to the following:,Option A is wrong as SQS chaining would create dependency among the consumers. If one consumer fails, the message would not be available for the other consumer impacting the availability.,Option B is wrong as although possible is error prone and needs to maintain the position read by the application.,Option C is wrong as with DynamoDB delivery guarantee needs to handled by application as well is not a cost effective solution. 正确答案: D Correct answer is D as the key point here is batch application and message being stored durably and delivery guarantee. Kinesis can store the data durably and allow access to multiple consumers without any dependencies. Refer AWS documentation – Kinesis Data Streams Q: How does Amazon Kinesis Data Streams differ from Amazon SQS? Amazon Kinesis Data Streams enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis data stream (for example, to perform counting, aggregation, and filtering). Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows. Q: When should I use Amazon Kinesis Data Streams, and when should I use Amazon SQS? We recommend Amazon Kinesis Data Streams for use cases with requirements that are similar to the following: We recommend Amazon SQS for use cases with requirements that are similar to the following: Option A is wrong as SQS chaining would create dependency among the consumers. If one consumer fails, the message would not be available for the other consumer impacting the availability. Option B is wrong as although possible is error prone and needs to maintain the position read by the application. Option C is wrong as with DynamoDB delivery guarantee needs to handled by application as well is not a cost effective solution. 88 / 100 分类: DBS 88. 88. Your website is serving on-demand training videos to your workforce. Videos are uploaded monthly in high resolution MP4 format. Your workforce is distributed globally often on the move and using company-provided tablets that require the HTTP Live Streaming (HLS) protocol to watch a video. Your company has no video transcoding expertise and it required you might need to pay for a consultant. How do you implement the most cost-efficient architecture without compromising high availability and quality of video delivery? 88. 您的网站正在为员工提供按需培训视频。视频每月上传一次,采用高分辨率的MP4格式。您的员工分布在全球各地,常常在外出差,并使用公司提供的平板电脑观看视频,这些平板电脑需要HTTP实时流(HLS)协议才能观看视频。贵公司没有视频转码方面的专业知识,因此可能需要支付顾问费用。您如何在不妥协视频交付的高可用性和质量的前提下,实施最具成本效益的架构? A. A. Elastic Transcoder to transcode original high-resolution MP4 videos to HLS. S3 to host videos with lifecycle Management to archive original flies to Glacier after a few days. CloudFront to serve HLS transcoded videos from S3 A. 使用Elastic Transcoder将原始高分辨率MP4视频转码为HLS格式。使用S3托管视频,并通过生命周期管理将原始文件在几天后归档到Glacier。使用CloudFront从S3提供转码后的HLS视频。 B. B. A video transcoding pipeline running on EC2 using SQS to distribute tasks and Auto Scaling to adjust the number or nodes depending on the length of the queue S3 to host videos with Lifecycle Management to archive all files to Glacier after a few days CloudFront to serve HLS transcoding videos from Glacier B. 一个运行在EC2上的视频转码管道,使用SQS分发任务,利用Auto Scaling根据队列长度调整节点数量,S3用于托管视频,并通过生命周期管理在几天后将所有文件归档到Glacier,CloudFront用于从Glacier提供HLS转码视频。 C. C. Elastic Transcoder to transcode original high-resolution MP4 videos to HLS EBS volumes to host videos and EBS snapshots to incrementally backup original rues after a few days. CloudFront to serve HLS transcoded videos from EC2. C. 使用Elastic Transcoder将原始高分辨率的MP4视频转码为HLS,使用EBS卷来托管视频,使用EBS快照在几天后增量备份原始文件。CloudFront从EC2提供转码后的HLS视频服务。 D. D. A video transcoding pipeline running on EC2 using SQS to distribute tasks and Auto Scaling to adjust the number of nodes depending on the length of the queue. EBS volumes to host videos and EBS snapshots to incrementally backup original files after a few days. CloudFront to serve HLS transcoded videos from EC2 D. 一个在EC2上运行的视频转码流水线,使用SQS分发任务,使用Auto Scaling根据队列长度调整节点数量。EBS卷用于托管视频,EBS快照用于在几天后增量备份原始文件。CloudFront用于从EC2提供HLS转码视频。 正确答案: A Key here the cost efficient solution with company needing video transcoding expertise and needing to hire a consultant with global distribution.,Correct answer is A as Elastic Transcoder provides and out of box option to transcode videos into any format without any expertise. S3 to host videos and CloudFront to serve HLS transcoded videos for global distribution while being cost efficient,Option B & D are wrong as a video transcoding pipeline with instances would increase the cost needing expertise as well as infrastructure,Option C & D are wrong as EBS volumes to host data with snapshots would increase the cost. 正确答案: A 关键在于成本高效的解决方案,公司需要视频转码专业知识,并需要雇佣具有全球分发能力的顾问。正确答案是A,因为Elastic Transcoder提供了一种开箱即用的选项,可以将视频转码为任何格式,无需任何专业知识。使用S3托管视频,使用CloudFront提供HLS转码视频以实现全球分发,同时保持成本高效。 选项B和D是错误的,因为视频转码管道需要实例,这会增加成本并需要专业知识以及基础设施。 选项C和D是错误的,因为使用EBS卷托管数据并创建快照会增加成本。 89 / 100 分类: DBS 89. 89. You need to create an Amazon Machine Learning model to predict how many inches of rain will fall in an area based on the historical rainfall data. What type of modeling will you use? 89. 您需要创建一个亚马逊机器学习模型,以预测根据历史降雨数据,一个地区将降下多少英寸的雨。您将使用哪种类型的建模? A. A. Categorical A. 分类的 B. B. Binary B. 二进制 C. C. Regression C. 回归 D. D. Unsupervised D. 无监督 正确答案: C Correct answer is C as Supervised learning using Regression can help build a model to predict rain based on the historical data.,Refer documentation – Machine Learning 正确答案: C 正确答案是C,因为使用回归的监督学习可以帮助建立一个基于历史数据预测降雨的模型。 参考文档 – 机器学习 90 / 100 分类: DBS 90. 90. A company has launched EMR cluster to support their big data analytics requirements. AFS has multiple data sources built out of S3, SQL databases, MongoDB, Redis, RDS, other file systems. They are looking for a web application to create and share documents that contain live code, equations, visualizations, and narrative text. Which EMR Hadoop ecosystem fulfils the requirements? 90. 一家公司已经启动了EMR集群,以支持他们的大数据分析需求。AFS构建了多个数据源,包括S3、SQL数据库、MongoDB、Redis、RDS和其他文件系统。他们正在寻找一个Web应用程序,用于创建和共享包含实时代码、方程式、可视化和叙述文本的文档。哪个EMR Hadoop生态系统能满足这些需求? A. A. Apache Hive A. Apache Hive B. B. Apache Hue B. Apache Hue C. C. Jupyter Notebook C. Jupyter Notebook D. D. Apache Presto D. Apache Presto 正确答案: C Correct answer is C as Jupyter Notebook is an open-source web application that you can use to create and share documents that contain live code, equations, visualizations, and narrative text,Option A is wrong as Hive is an open-source, data warehouse, and analytic package that runs on top of a Hadoop cluster.,Option B is wrong as Apache Hue is an open-source, web-based, graphical user interface for use with Amazon EMR and Apache Hadoop. It does not provide live code and sharing of documents.,Option D is wrong as Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources 正确答案: C 正确答案是 C,因为 Jupyter Notebook 是一个开源的 web 应用程序,您可以使用它创建和共享包含实时代码、方程式、可视化和叙述文本的文档。 选项 A 错误,因为 Hive 是一个开源的数据仓库和分析包,运行在 Hadoop 集群之上。 选项 B 错误,因为 Apache Hue 是一个开源的基于 web 的图形用户界面,用于与 Amazon EMR 和 Apache Hadoop 配合使用。它不提供实时代码和文档共享功能。 选项 D 错误,因为 Presto 是一个快速的 SQL 查询引擎,专为对来自多个来源的大型数据集执行交互式分析查询而设计。 91 / 100 分类: DBS 91. 91. A company is building a new application in AWS. The architect needs to design a system to collect application log events. The design should be a repeatable pattern that minimizes data loss if an application instance fails, and keeps a durable copy of a log data for at least 30 days. What is the simplest architecture that will allow the architect to analyze the logs? 91. 一家公司正在AWS中构建一个新的应用程序。架构师需要设计一个系统来收集应用程序日志事件。该设计应是一个可重复的模式,能够最小化应用程序实例故障时的数据丢失,并保持至少30天的日志数据持久副本。什么是最简单的架构,能够让架构师分析日志? A. A. Write them directly to a Kinesis Firehose. Configure Kinesis Firehose to load the events into an Amazon Redshift cluster for analysis. A. 直接将它们写入 Kinesis Firehose。配置 Kinesis Firehose 将事件加载到 Amazon Redshift 集群中进行分析。 B. B. Write them to a file on Amazon Simple Storage Service (S3). Write an AWS Lambda function that runs in response to the S3 event to load the events into Amazon Elasticsearch Service for analysis. B. 将它们写入亚马逊简单存储服务(S3)中的文件。编写一个AWS Lambda函数,该函数响应S3事件运行,将事件加载到亚马逊Elasticsearch服务中进行分析。 C. C. Write them to the local disk and configure the Amazon CloudWatch Logs agent to load the data into CloudWatch Logs and subsequently into Amazon Elasticsearch Service. C. 将它们写入本地磁盘,并配置 Amazon CloudWatch Logs 代理将数据加载到 CloudWatch Logs 中,然后再加载到 Amazon Elasticsearch Service 中。 D. D. Write them to CloudWatch Logs and use an AWS Lambda function to load them into HDFS on an Amazon Elastic MapReduce (EMR) cluster for analysis. D. 将它们写入 CloudWatch Logs,并使用 AWS Lambda 函数将它们加载到 Amazon Elastic MapReduce (EMR) 集群中的 HDFS 以进行分析。 正确答案: A Correct answer is A as the simplest would be to use Firehose to stream data to collect the logs and load the data to Redshift for analysis.,Refer AWS documentation – Kinesis Data Firehose,Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.,Option B is wrong the logs file cannot be written to S3 directly and would need an agent like CloudWatch for Kinesis. Also you would need to develop the Lambda function to track events and load data to Elasticsearch,Option C is wrong as local disk would not provide the durability and might lead to data loss if the instance goes down.,Option D is wrong as the Lambda functions needs to be developed. Also, loading the data to HDFS using lambda might not seamless and would need AWS Data Pipeline. 正确答案: A 正确答案是A,因为最简单的方法是使用Firehose流式传输数据来收集日志,并将数据加载到Redshift进行分析。 参考AWS文档 – Kinesis Data Firehose。 Amazon Kinesis Data Firehose是将流数据可靠地加载到数据存储和分析工具的最简单方法。它可以捕获、转换并将流数据加载到Amazon S3、Amazon Redshift、Amazon Elasticsearch Service和Splunk,支持使用您今天已经在使用的现有业务智能工具和仪表板进行接近实时的分析。它是一个完全托管的服务,能够自动扩展以匹配数据吞吐量,并且无需持续管理。它还可以在加载数据之前对其进行批处理、压缩、转换和加密,从而最大限度地减少目标存储的使用量并提高安全性。 选项B是错误的,因为日志文件不能直接写入S3,需要像CloudWatch这样的代理来进行Kinesis。同时,您需要开发Lambda函数来跟踪事件并将数据加载到Elasticsearch。 选项C是错误的,因为本地磁盘无法提供耐用性,如果实例崩溃,可能会导致数据丢失。 选项D是错误的,因为需要开发Lambda函数。而且,通过Lambda加载数据到HDFS可能不够无缝,并且需要AWS Data Pipeline。 92 / 100 分类: DBS 92. 92. A large oil and gas company needs to provide near real-time alerts when peak thresholds are exceeded in its pipeline system. The company has developed a system to capture pipeline metrics such as flow rate, pressure, and temperature using millions of sensors. The sensors deliver to AWS IoT. What is a cost-effective way to provide near real-time alerts on the pipeline metrics? 92. 一家大型石油和天然气公司需要在其管道系统的峰值阈值被超越时提供近实时的警报。该公司已开发出一个系统,通过数百万个传感器捕捉管道的指标,如流量、压力和温度。这些传感器将数据传输到AWS IoT。提供管道指标近实时警报的经济有效的方式是什么? A. A. Create an AWS IoT rule to generate an Amazon SNS notification. A. 创建一个AWS IoT规则以生成Amazon SNS通知。 B. B. Store the data points in an Amazon DynamoDB table and poll if for peak metrics data from an Amazon EC2 application. B. 将数据点存储在 Amazon DynamoDB 表中,并从 Amazon EC2 应用程序中轮询获取峰值指标数据。 C. C. Create an Amazon Machine Learning model and invoke it with AWS Lambda. C. 创建一个亚马逊机器学习模型,并通过AWS Lambda调用它。 D. D. Use Amazon Kinesis Streams and a KCL-based application deployed on AWS Elastic Beanstalk. D. 使用Amazon Kinesis流和基于KCL的应用程序,部署在AWS Elastic Beanstalk上。 正确答案: A Correct answer is A as IoT rules can help evaluate and send notifications when the peak thresholds are exceeded. The AWS IoT rules engine listens for incoming MQTT messages that match a rule. When a matching message is received, the rule takes some action with the data in the MQTT message (for example, writing data to an Amazon S3 bucket, invoking a Lambda function, or sending a message to an Amazon SNS topic).,Refer AWS documentation – IoT Rules,Rules give your devices the ability to interact with AWS services. Rules are analyzed and actions are performed based on the MQTT topic stream. You can use rules to support tasks like these:,Options B, C & D are wrong as they are not cost-effective and need additional development or involve other services. 正确答案: A 正确答案是A,因为IoT规则可以帮助评估并在超过峰值阈值时发送通知。 AWS IoT规则引擎监听与规则匹配的传入MQTT消息。当收到匹配的消息时,规则会对MQTT消息中的数据执行某些操作(例如,将数据写入Amazon S3存储桶、调用Lambda函数或将消息发送到Amazon SNS主题)。 参考AWS文档 – IoT规则 规则使您的设备能够与AWS服务进行交互。规则会根据MQTT主题流进行分析,并执行相应的操作。您可以使用规则来支持以下任务: 选项B、C和D是错误的,因为它们 93 / 100 分类: DBS 93. 93. You need real-time reporting on logs generated from your applications. In addition, you need anomaly detection. The processing latency needs to be one second or less. Which option would you choose if your team has no experience with Machine learning libraries and doesn’t want to have to maintain any software installations yourself? 93. 你需要对应用程序生成的日志进行实时报告。此外,你还需要异常检测。处理延迟需要在一秒钟或更短时间内。如果你的团队没有机器学习库的经验,并且不希望自己维护任何软件安装,你会选择哪个选项? A. A. Kinesis Streams with Kinesis Analytics A. Kinesis 流与 Kinesis 分析 B. B. Kafka B. 卡夫卡 C. C. Kinesis Firehose to S3 and Athena C. Kinesis Firehose 到 S3 和 Athena D. D. Spark Streaming with SparkSQL and MLlib D. 使用SparkSQL和MLlib的Spark Streaming 正确答案: A Correct answer is A as Kinesis Data Streams with Kinesis Data Analytics can provide real time analytics only data while using managed services,Refer AWS documentation – Kinesis Data Analytics,Amazon Kinesis Data Analytics is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time. Amazon Kinesis Data Analytics reduces the complexity of building, managing, and integrating streaming applications with other AWS services. SQL users can easily query streaming data or build entire streaming applications using templates and an interactive SQL editor. Java developers can quickly build sophisticated streaming applications using open source Java libraries and AWS integrations to transform and analyze data in real-time.,Amazon Kinesis Data Analytics takes care of everything required to run your real-time applications continuously and scales automatically to match the volume and throughput of your incoming data.,Option B is wrong as Kafka needs to be managed and does not provide analytical capability,Option C is wrong as Athena does not work on the streaming analytics. It is more for batch analytics,Option D is wrong as Spark Streaming and MLlib would need EMR cluster management and development. 正确答案: A 正确答案是A,因为Kinesis数据流与Kinesis数据分析可以提供实时分析数据,同时使用托管服务。参考AWS文档 – Kinesis数据分析,Amazon Kinesis数据分析是分析流数据、获取可操作的洞察并实时响应业务和客户需求的最简单方式。Amazon Kinesis数据分析减少了构建、管理和与其他AWS服务集成流应用程序的复杂性。SQL用户可以轻松查询流数据或使用模板和交互式SQL编辑器构建整个流应用程序。Java开发人员可以快速使用开源Java库和AWS集成构建复杂的流应用程序,以实时转换和分析数据。 Amazon Kinesis数据分析负责处理运行实时应用程序所需的一切,并自动扩展以匹配传入数据的量和吞吐量。 选项B是错误的,因为Kafka需要管理,并且不提供分析能力。 选项C是错误的,因为Athena不适用于流数据分析。它更适用于批量分析。 选项D是错误的,因为Spark Streaming和MLlib需要EMR集群管理和开发。 94 / 100 分类: DBS 94. 94. A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints: Bicycle origination points Bicycle destination points Mileage between the points Number of bicycle slots available at the station (which is variable based on the station location) Number of slots available and taken at a given time The program has received additional funds to increase the number of bicycle stations available. All data is regularly archived to Amazon Glacier. The new bicycle stations must be located to provide the most riders access to bicycles. How should this task be performed? 94. 一座城市在过去三年里一直在收集其公共自行车共享项目的数据。目前,5PB的数据集存储在Amazon S3上。数据包含以下数据点: 自行车起始点 自行车目的地点 两点之间的里程 车站可用的自行车车位数量(根据车站位置有所不同) 特定时间内可用和已占用的车位数量 该项目已获得额外资金,以增加可用的自行车站数量。所有数据定期存档到Amazon Glacier。新的自行车站必须选址,以便为最多的骑行者提供自行车。该任务应如何执行? A. A. Move the data from Amazon S3 into Amazon EBS-backed volumes and use an EC2 based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization. A. 将数据从Amazon S3移动到Amazon EBS支持的卷中,并使用基于EC2的Hadoop集群和竞价实例运行执行随机梯度下降优化的Spark作业。 B. B. Use the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and perform a SQL query that outputs the most popular bicycle stations. B. 使用 Amazon Redshift COPY 命令将数据从 Amazon S3 移动到 Redshift,并执行 SQL 查询,输出最受欢迎的自行车站。 C. C. Persist the data on Amazon S3 and use a transient EMR cluster with spot instances to run a Spark streaming job that will move the data into Amazon Kinesis. C. 将数据持久化到 Amazon S3,并使用一个临时的 EMR 集群与抢占实例来运行一个 Spark 流处理任务,将数据迁移到 Amazon Kinesis。 D. D. Keep the data on Amazon S3 and use an Amazon EMR-based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization over EMRFS. D. 将数据保存在 Amazon S3 上,并使用基于 Amazon EMR 的 Hadoop 集群,利用竞价实例运行一个 Spark 作业,该作业在 EMRFS 上执行随机梯度下降优化。 正确答案: D Correct answer is D as the data is already hosted in S3, EMR with EMRFS can be used to perform analysis.,Option A is wrong EBS backed volumes cannot handled the data capacity,Option B is wrong as copying the data in Redshift would duplicate it and current size limitation of Redshift is 2PB.,Option C is wrong as the answer is incomplete as it does not provide any analysis option. 正确答案: D 正确答案是 D,因为数据已经托管在 S3 中,EMR 可以与 EMRFS 一起使用来执行分析。 选项 A 错误,因为 EBS 支持的卷无法处理数据容量。 选项 B 错误,因为将数据复制到 Redshift 会导致数据重复,并且 Redshift 当前的大小限制是 2PB。 选项 C 错误,因为答案不完整,没有提供任何分析选项。 95 / 100 分类: DBS 95. 95. A large grocery distributor receives daily depletion reports from the field in the form of gzip archives of CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job. Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the EMR job. Which recommendation should an administrator provide? 95. 一个大型杂货分销商每天从现场接收以gzip压缩格式上传到Amazon S3的CSV文件的消耗报告。这些文件的大小范围从500MB到5GB。这些文件每天由EMR作业处理。最近观察到,文件大小存在变化,且EMR作业执行时间过长。分销商需要在有限的信息下调整和优化数据处理工作流,以提高EMR作业的性能。管理员应该提供什么建议? A. A. Reduce the HDFS block size to increase the number of task processors. 将HDFS块大小减小以增加任务处理器的数量。 B. B. Use bzip2 or Snappy rather than gzip for the archives. B. 使用 bzip2 或 Snappy 而不是 gzip 来处理归档文件。 C. C. Decompress the gzip archives and store the data as CSV files. C. 解压缩gzip归档文件并将数据存储为CSV文件。 D. D. Use Avro rather than gzip for the archives. D. 使用 Avro 而不是 gzip 来压缩归档文件。 正确答案: B Correct answer is B as gzip is not ideal compression for files larger than 1GB and compression technique should be checked with supports splitting like bzip2 or one with higher compression handling like Snappy.,Refer AWS documentation – EMR Best Practices,Depending on how large your aggregated data files are, the compression algorithm becomes an important choice. For instance, if you are aggregating your data (using the ingest tool of your choice) and the aggregated data files are between 500 MB to 1 GB, GZIP compression is an acceptable data compression type. However, if your data aggregation creates files larger than 1 GB, its best to pick a compression algorithm that supports splitting.,What Compression Algorithm Should I Use?,Naturally, not all compression algorithms are alike. Consider these potential advantages and disadvantages:, As the table below suggests, some compression algorithms are faster. You need to understand your workload in order to decide if faster compressions are any use for you. For example, if your job is CPU bounded, faster compression algorithms may not give you enough performance improvement. If you decide compression speed is important, Snappy compression seems to perform faster., Some compressions algorithms are slower but offer better space savings, which may be important to you. However, if storage cost is not an important factor, you may want a faster algorithm instead., Importantly, some algorithms allow file output to be split. As discussed earlier, the ability to split your data file affects how you store your data files. If the compression algorithm does not support splitting, you may have to maintain smaller file sizes. However, if your compressed files can be chunked, you may want to store large files for Amazon EMR processing. 正确答案: B 正确答案是B,因为gzip不适合用于大于1GB的文件压缩,应该选择支持分割的压缩技术,例如bzip2,或者像Snappy那样支持更高压缩处理的技术。参考AWS文档 – EMR最佳实践。 根据你聚合的数据文件的大小,压缩算法是一个重要的选择。例如,如果你正在聚合数据(使用你选择的摄取工具),而聚合后的数据文件大小在500MB到1GB之间,GZIP压缩是一个可接受的数据压缩类型。然而,如果你的数据聚合创建了大于1GB的文件,最好选择一个支持分割的压缩算法。 应该使用什么压缩算法? 自然,并非所有压缩算法都相同。考虑以下潜在的优缺点: 如表格所示,一些压缩算法更快。你需要了解你的工作负载,以决定是否更快的压缩对你有用。例如,如果你的任务是CPU绑定的,快速的压缩算法可能不会带来足够的性能提升。如果你决定压缩速度很重要,Snappy压缩似乎表现得更快。 一些压缩算法较慢,但能提供更好的空间节省,这对你可能很重要。然而,如果存储成本不是一个重要因素,你可能更希望选择一个更快的算法。 重要的是,一些算法允许文件输出分割。如前所述,数据文件分割的能力会影响你如何存储数据文件。如果压缩算法不支持分割,你可能需要维持较小的文件大小。然而,如果你的压缩文件可以分块,你可能希望将大文件存储用于Amazon EMR处理。 96 / 100 分类: DBS 96. 96. An administrator is deploying Spark on Amazon EMR for two distinct use cases: machine learning algorithms and ad-hoc querying. All data will be stored in Amazon S3. Two separate clusters for each use case will be deployed. The data volumes on Amazon S3 are less than 10 GB. How should the administrator align instance types with the cluster’s purpose? 96. 一名管理员正在为两个不同的使用场景在Amazon EMR上部署Spark:机器学习算法和临时查询。所有数据将存储在Amazon S3中。每个使用场景将部署两个独立的集群。Amazon S3上的数据量小于10 GB。管理员应该如何根据集群的目的来选择实例类型? A. A. Machine Learning on C instance types and ad-hoc queries on R instance types A. C 实例类型上的机器学习和 R 实例类型上的临时查询 B. B. Machine Learning on R instance types and ad-hoc queries on G2 instance types B. 在R实例类型上的机器学习和在G2实例类型上的临时查询 C. C. Machine Learning on T instance types and ad-hoc queries on M instance types C. T 实例类型上的机器学习和 M 实例类型上的临时查询 D. D. Machine Learning on D instance types and ad-hoc queries on I instance types D. D 实例类型上的机器学习与 I 实例类型上的临时查询 正确答案: A Correct answer is A as machine learning are usually compute intensive and adhoc queries are suited for memory optimized.,Refer AWS documentation – EMR Best Practices,For memory-intensive applications, prefer R type instances over the other instance types. For compute-intensive applications, prefer C type instances. For applications balanced between memory and compute, prefer M type general-purpose instances.,Compute Optimized – High performance web servers, scientific modelling, batch processing, distributed analytics, high-performance computing (HPC), machine/deep learning inference, ad serving, highly scalable multiplayer gaming, and video encoding.,Memory Optimized – Instances are well suited for memory intensive applications such as high performance databases, distributed web scale in-memory caches, mid-size in-memory databases, real time big data analytics, and other enterprise applications.,Option B is wrong as G2 is GPU powered and GPU-powered G2 instance family is home to molecular modeling, rendering, machine learning, game streaming, and transcoding jobs that require massive amounts of parallel processing power.,Option C is wrong as T are ideally suited for general purpose applications.,Option D is wrong as I & D are storage optimized instances. 正确答案: A 正确答案是A,因为机器学习通常需要大量计算资源,而临时查询适合于内存优化的环境。请参考AWS文档 – EMR最佳实践。 对于内存密集型应用程序,优先选择R类型实例而非其他类型的实例。对于计算密集型应用程序,优先选择C类型实例。对于在内存和计算之间平衡的应用程序,优先选择M类型通用实例。 计算优化型 – 高性能网页服务器、科学建模、批处理、分布式分析、高性能计算(HPC)、机器/深度学习推理、广告服务、高度可扩展的多人游戏以及视频编码。 内存优化型 – 实例非常适合内存密集型应用程序,如高性能数据库、分布式网页规模内存缓存、中型内存数据库、实时大数据分析及其他企业应用。 选项B错误,因为G2是GPU驱动的,GPU驱动的G2实例系列适用于分子建模、渲染、机器学习、游戏流媒体和转码任务,这些任务需要大量并行处理能力。 选项C错误,因为T实例类型理想适用于通用应用程序。 选项D错误,因为I和D是存储优化型实例。 97 / 100 分类: DBS 97. 97. A customer’s nightly EMR job processes a single 2-TB data file stored on Amazon Simple Storage Service (S3). The Amazon Elastic Map Reduce (EMR) job runs on two On-Demand core nodes and three On-Demand task nodes. Which of the following may help reduce the EMbR job completion time? Choose 2 answers 97. 客户的夜间EMR作业处理存储在Amazon Simple Storage Service (S3)上的单个2-TB数据文件。该Amazon Elastic Map Reduce (EMR)作业在两个按需核心节点和三个按需任务节点上运行。以下哪项可能有助于减少EMR作业的完成时间?选择两个答案。 A. A. Use three Spot Instances rather than three On-Demand instances for the task nodes. A. 对于任务节点,使用三个Spot实例,而不是三个按需实例。 B. B. Change the input split size in the MapReduce job configuration. B. 在MapReduce作业配置中更改输入拆分大小。 C. C. Use a bootstrap action to present the S3 bucket as a local filesystem. C. 使用引导操作将S3存储桶呈现为本地文件系统。 D. D. Launch the core nodes and task nodes within an Amazon Virtual Cloud. D. 在亚马逊虚拟云中启动核心节点和任务节点。 E. E. Adjust the number of simultaneous mapper tasks. E. 调整同时进行的映射任务数量。 F. F. Enable termination protection for the job flow. F. 为作业流启用终止保护。 正确答案: B, E Correct answer is B & E as the key point here is to reduce job completion time.,Option B as the split size of the match in memory block size of task and HDFS files will help to complete the job faster.,Option E as adjusting and tuning the number of simultaneous mapper task would help reduce time,Refer to EMR Best Practices,Option A is wrong as Spot instances would help reduce cost but might increase the job completion time,Option C is wrong as it would not help as the data is already there in the data nodes.,Option D is wrong as the instances would be in VPC already and would not improve job times,Option F is wrong as termination protection would not help as the instances are not being terminated adhoc 正确答案: B, E 正确答案是B和E,因为关键点在于减少作业完成时间。 选项B,因为内存块大小的匹配和HDFS文件的拆分大小将有助于更快地完成作业。 选项E,因为调整和调整同时进行的mapper任务数量有助于减少时间。 参考EMR最佳实践。 选项A是错误的,因为Spot实例有助于降低成本,但可能会增加作业完成时间。 选项C是错误的,因为它没有帮助,因为数据已经存在于数据节点中。 选项D是错误的,因为实例已经在VPC中,不会改善作业时间。 选项F是错误的,因为终止保护不会起作用,因为这些实例不是临时终止的。 检查 98 / 100 分类: DBS 98. 98. A travel website needs to present a graphical quantitative summary of its daily bookings to website visitors for marketing purposes. The website has millions of visitors per day, but wants to control costs by implementing the least-expensive solution for this visualization. What is the most cost-effective solution? 98. 一个旅游网站需要向网站访客展示其每日预订的图形化定量摘要,用于营销目的。该网站每天有数百万的访客,但希望通过实施最具成本效益的解决方案来控制费用。最具成本效益的解决方案是什么? A. A. Generate a static graph with a transient EMR cluster daily, and store it an Amazon S3. A. 每天生成一个静态图表,并使用临时EMR集群,将其存储在Amazon S3中。 B. B. Generate a graph using MicroStrategy backed by a transient EMR cluster. B. 使用MicroStrategy生成图表,后端支持一个临时的EMR集群。 C. C. Implement a Jupyter front-end provided by a continuously running EMR cluster leveraging spot instances for task nodes. C. 实现一个由持续运行的EMR集群提供的Jupyter前端,该集群利用抢占实例作为任务节点。 D. D. Implement a Zeppelin application that runs on a long-running EMR cluster. D. 实现一个在长期运行的EMR集群上运行的Zeppelin应用程序。 正确答案: A Correct answer is A as the most cost effective solution is to use a transient cluster to create the stats and use S3 to host the same.,Option B is wrong as using MicroStrategy is an overhead and its a marketplace product, which would not be a cost effective solution.,Options C & D are wrong as using a long running or continuous cluster is not cost effective 正确答案: A 正确答案是 A,因为最具成本效益的解决方案是使用临时集群来创建统计数据,并使用 S3 托管相同的内容。 选项 B 错误,因为使用 MicroStrategy 是一种负担,并且它是一个市场产品,这不是一种成本效益高的解决方案。 选项 C 和 D 错误,因为使用长期运行或持续集群并不具有成本效益。 99 / 100 分类: DBS 99. 99. You need to perform ad-hoc business analytics queries on well-structured data. Data comes in constantly at a high velocity. Your business intelligence team can understand SQL. What AWS service(s) should you look to first? 99. 您需要对结构良好的数据执行临时的业务分析查询。数据以高速不断流入。您的商业智能团队可以理解SQL。您应该首先考虑使用哪些AWS服务? A. A. Kinesis Firehose + RDS A. Kinesis Firehose + RDS B. B. Kinesis Firehose + Redshift B. Kinesis Firehose + Redshift C. C. EMR using Hive C. 使用Hive的EMR D. D. EMR running Apache Spark D. EMR运行Apache Spark 正确答案: B Key point is perform ad-hoc analytics with data at high velocity,Correct answer is B as Kinesis Firehose provides a managed service for aggregating streaming data and inserting it into RedShift. RedShift also supports ad-hoc queries over well-structured data using a SQL-compliant wire protocol, so the business team should be able to adopt this system easily.,Option A is wrong as RDS would not be suitable for ad-hoc analytics,Option C & D are wrong as EMR does not itself handle data at high velocity. Need Kafka or similar frameworks 正确答案: B 关键点是以高速度对数据进行临时分析,正确答案是 B,因为 Kinesis Firehose 提供了一个托管服务,用于聚合流数据并将其插入 RedShift。RedShift 还支持对结构良好的数据进行临时查询,使用符合 SQL 协议的网络协议,因此业务团队应该能够轻松采用此系统。 选项 A 错误,因为 RDS 不适合临时分析。 选项 C 和 D 错误,因为 EMR 本身无法处理高速数据。需要 Kafka 或类似的框架。 100 / 100 分类: DBS 100. 100. A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance. How should the administrator accomplish this task? 100. 一家游戏公司需要正确地扩展其由DynamoDB支持的游戏应用程序。Amazon Redshift包含过去两年的历史数据。游戏流量根据季节、电影上映和假期等各种因素在全年内波动。管理员需要计算每周需要为DynamoDB表预配置多少读取和写入吞吐量。管理员应该如何完成这一任务? A. A. Feed the data into Amazon Machine Learning and build a regression model. A. 将数据输入到亚马逊机器学习,并建立一个回归模型。 B. B. Feed the data into Spark Mlib and build a random forest model. B. 将数据输入到Spark Mlib并构建一个随机森林模型。 C. C. Feed the data into Apache Mahout and build a multi-classification model. C. 将数据输入到Apache Mahout中并构建多分类模型。 D. D. Feed the data into Amazon Machine Learning and build a binary classification model. D. 将数据输入到Amazon机器学习中,并构建一个二元分类模型。 正确答案: A Correct answer is A as Regression model can help predict or forecast data based on the earlier dataset.,Regression predictive modeling is the task of approximating a mapping function (f) from input variables (X) to a continuous output variable (y).,Option B is wrong as Random forest is not needed and Spark MLib needs to have EMR to run, while Amazon Machine Learning is quick.,Options C & D are wrong as they are only or classification and do not forecast values. 正确答案: A 正确答案是A,因为回归模型可以基于早期数据集帮助预测或预报数据。回归预测建模是近似映射函数(f)的任务,输入变量(X)映射到连续输出变量(y)。 选项B是错误的,因为随机森林不需要,而Spark MLib需要有EMR来运行,而Amazon机器学习则很快速。 选项C和D是错误的,因为它们仅用于分类,不能预测值。 您的分数是平均分为 0% 0% 重新开始测验 评价表 匿名反馈 感谢评价 发送反馈 作者 WordPress Quiz plugin 本文地址:https://www.neiwangchuantou.com/2025/03/aws-dbs%e7%9c%9f%e9%a2%98-no-1-100/,禁止转载 0 0
评论0