Demonstrated with regard to the selected Cloud services and the software implementation, and related questions of cost and performance.

 

Cloud Computing (COMM034)

Coursework description, 2023-24

THIS DOCUMENT IS FOR YOUR USE ONLY.

 IT MUST NOT BE SENT TO ANY OTHER PERSON OR ORGANISATION

IT MUST NOT BE UPLOADED TO ANY WEBSITE.

POSTING OF THIS DOCUMENT ON ANY WEBSITE WITHOUT EXPLICIT PERMISSION HAVING FIRST BEEN OBTAINED FROM THE MODULE LEADER IS AN INFRINGEMENT OF COPYRIGHT AND COULD RESULT IN LEGAL ACTION.

Nature of coursework – it is vital you read and understand this section

This is an individual coursework. It is NOT a group coursework, nor something to copy from the web, nor for something like ChatGPT to do. It is your own individual efforts that will gain marks for you.

Efforts of others on your coursework may become a problem for both you and them, and it is in everybody’s interests that you avoid this. You should also very pay close heed to University guidance on the need to cite text obtained from any other service such as that produced with LLMs, and note that such content does not constitute your own individual efforts – so should not attract marks.

You may discuss what you are doing with others on the module. BUT do not share report content or code with others or expect others to share with you. Also, do not copy+paste material from elsewhere even if you are then changing some of the words. To use what others have written directly within your report, quote their text properly using “”, attribute it to them, and then discuss or interpret what is important about it for your purpose – this latter part is vital. For code, identify any original source locations in code comments so that it can be viewed by the assessor, with code comments also identifying what you have added/changed. Similarly, the source of any images that you use but have not created must be stated. Note that any direct inclusion of work by others, including by any past students if you happen to find related content elsewhere, implies that marks should largely be being given to them – only your efforts gain marks for you.As students from the recent past will be able to testify, unhappily, inappropriate copying or use of material from elsewhere can become problematic for progression through the programme.

Note that software tools are in use in respect to detection:

1.Turnitin will detect copying from elsewhere as well as from each other – for copying from each other, this information only becomes available after all submissions have been received.

a.There is no ‘magic number below which everything is acceptable’ in Turnitin. The nature of what is being identified is important.

b.Assessors may also recognise similar patterns between submissions where Turnitin does not, or know source materials that Turnitin doesn’t. Turnitin is not exhaustive, and you should not underestimate the assessors!

2.Code similarity software is also used to detect copying between your code submission and that of others, past and present, with a focus on code added to that already provided in the module.

You should be aware, also, that the module leader was one of the Department’s Academic Integrity Officers for several years, has seen too many misconduct cases, has several research publications in plagiarism detection, and is familiar with the kinds of output software such as ChatGPT produces (and how it is fairly repetitive and, as a current legal case is suggesting, it can produce plagiarised material).

For more information see, for example: https://exams.surrey.ac.uk/academic-integrity-and- misconduct/plagiarism : “Failure to understand what constitutes plagiarism, pressure of time, or conflicting deadlines for assessed work are not acceptable as sufficient explanations for the submission of plagiarised material.

Aim

To demonstrate your own understanding of how to critically explain, and construct, a Cloud native API using multiple services across Cloud providers, involving user-specifiable scaling. You will explain, implement, evaluate, and demonstrate,

an API that supports determining risks and profitability of a trading strategy using specific trading signals in combination with a Monte Carlo method.

Such an API might support, for example, building a trading strategy using trailing stops – though you are not expected to do this. Background reading may, however, help: https://www.investopedia.com/articles/trading/08/trailing-stop-loss.asp

Your API will need to adopt the Approach to meet the provided Requirements.

Making a submission

Submissions will be made via SurreyLearn and comprise two components: 

1.A PDF document of a maximum of 3 pages that conforms to the template provided

2.A Zip file for code you need to construct/run your system, usually KBs in size

a.include all code and configuration files used for GAE/Lambda/EC2/EMR/ECS

b.exclude such things as libraries or Python virtual environments or similar that merely bloat the zip file – a zip of MBs in size or larger has been bloated by something that needs to be excluded.

Reminder: individual coursework; must not be worked on in pairs or groups.

Submissions will be evaluated using tools including Turnitin. 

Deadline

Deadlines are given for the module’s Assignment folders on SurreyLearn. Standard University penalties apply to late submissions.

Marks and feedback

Due to cohort size, further increases in allowances made around deadlines by the University, and other factors including a multitude of administrative practices, it may only be possible to return feedback after marks have been officially released by the University. 

Weighting and Composition

Coursework represents 100% of the assessment for this module, and so should be considered as requiring significant effort in order to complete. Weighting for each of the marked components of the submission is shown in the Marking Criteria table at the end of the document. 

Relationship to learning outcomes

LO

Description

LO1:

Demonstrated with regard to the selected Cloud services and the software implementation, and

related questions of cost and performance.

LO2:

Demonstrated with regard to use of GAE and Lambda, as well as choice of the second scalable

 

 

service, within an industrial/academic problem context.

LO3:

Demonstrated with regard to alternative and/or additional services and appropriateness of

elaboration of aspects of the system.

LO4:

Demonstrated with regard to defining the system in the context of Cloud.

LO5:

Demonstrated through the creation and critical evaluation of the software implementation.

A note on linearity of description

Some aspects are described across passages and sections – e.g. Audit. Prior to posing questions, check that you have seen/searched all such mentions within the document first.

Approach

There are no marks available for re-explaining the approach in your submission.

Note that a core of Python code is provided for this approach, within this document, and this Python code must, with changes as necessary, be used for the API created.

i.The approach involves identifying trading signals in financial time series and capturing the risk associated to each signal. Such an assessment might support a subsequent evaluation of a trading strategy.

a.Financial time series here comprise daily data – specifically, a summary of the trading day comprising the Open/High/Low/Close values for each trading day. OHLC values can readily be produced for other time intervals – e.g. every 15 minutes – when such data that can form these are available.

b.OHLC data can be visualised using a “Japanese candlestick”, and certain resulting shapes are interpreted from these to indicate something about the data that may ‘signal’ making a trade (buy or sell). The figure below provides an example, using real data, of such candlesticks where:

i.Open and Close values provide the top and bottom of the ‘box’ on each candlestick – if Close is higher than Open, price movement was upwards overall and the body is green (a price rise from the start of the day to the end); Open higher than Close and the body is red (a price fall from the start of the day to the end); other charts and some pattern naming might use white for upward and black for downward or other colour/shading schemes, with candlestick patterns often named according to such colours.

ii.A line projecting from the top of the body indicates that the High was above the respective Open/Close; a line projecting from the bottom of the body indicates that the Low was below the respective Open/Close. Such lines are referred to as the wick or shadow.

iii.Resulting shapes can have various names such as a Green/Red Marabozu (Japanese for dominance) or Spinning Top, and can involve more than one candlestick – for example Harami (Japanese for pregnant) or Three Black Crows. Note how the latter name implies both a number of such candles and the colour/shade scheme.

Here, we’re only going to look at 2 patterns that each involve 3 candlesticks: (i) Three White Soldiers; (ii) Three Black Crows – for that which is named ‘Black’, look to the red candlesticks; ‘White’ corresponds to green. For the above figure, the combination of 6th , 7th and 8th candlesticks from the left should fit our expectations for Three ‘Black’ Crows. Code is provided that will act as a detector for such shapes in data, ignoring use of colour, obtained directly from Yahoo Finance (the pattern identification code could be more sophisticated, for example it doesn’t check if the 2nd and 3rd have Open value between Open and Close – the real body – of the candle before, but we’re keeping matters simpler here).

Three ‘White’ Soldiers – rising, close values above previous close values and each close above the open.

Expects further rise.


Three ‘Black’ Crows – falling, close values below previous close values and each close below the open.

Expects further fall.


Table 1: Images from https://www.ig.com/uk/trading-strategies/16-candlestick-patterns-every-trader-should-know-180615

c.What we want to know, here, is how much risk would be associated to each potential signal and whether the patterns might have been profitable some days after the signal. For risk, a Monte Carlo analysis offers one option: we use characteristics of the recent price history to simulate a substantially longer price series, then determine the amount that might be lost at certain confidence levels – for example, this could be discussed between people as: “there is a 95% confidence that no more than 5% of the amount traded based on this signal would be lost, and a 99% confidence that no more than 7.5% of the amount traded would be lost”. It is also possible to determine such things as the average trading risks (at 95% and 99%):

i.If we need a minimum price history of 101 days, including the signal, we first calculate daily returns – the % price change compared to the day before, i.e. (price-previous)/previous – which would offer 100 such values. This is the returns series.

ii.The returns series will be characterised by its mean and standard deviation, and we use a random number generator (Normal/Gaussian distribution) to simulate (generate) a series containing lots of such values that could closely fit to these parameters. This is a simulated returns series.

iii.By sorting a simulated returns series, of potential gains and losses, and picking off values at 95% and 99%, we know [theoretical] % changes of interest (see description of what could be expressed between people). We could extend from here in various ways – using such information to assess, for example, whether to trade only in high, or low, risk situations.

iv.Code is provided that shows how to capture such values. The analysis needs to use high numbers of ‘shots’, but this takes time - and each user is impatient. It is quite possible to parallelise such analysis: multiple new series are produced in parallel, each of which will produce both 95% and 99% values, and average values at 95% and at 99% get reported.

 To determine if a trading strategy is profitable, across all such signals, we can choose how many days after each signal to check on price difference as though the trade had been exited.

 Again, between people: “the 95% risk of a trade based on this signal was [value]. However, [number of days] days later there would have been a profit/loss of [amount]”; or, with a bit more investigation “this strategy generates higher profits when the 99% risk is above [value] using [number of days] days”.

For the purposes of this coursework, we are not trying to become financial traders - we want a system that reduces the overall wait time for results for better (larger, so more shots) risk simulation and enables the user to identify which signals relate more, or less, risk and whether profitable. It would be for the user to try lots of possibilities and determine what works better, so there could be a reasonable amount of use by each potential user.

Requirements – see the API specification, below, and as well as the user scenario, as support explanations of these:

i.You must use: (i) GAE, (ii) AWS Lambda, and (iii) one of the other scalable services in AWS: Elastic Compute Cloud (EC2), Elastic MapReduce (EMR) or – should you wish to explore – EC2 Container Service (ECS). Subsequent mentions of scalable services in this document mean Lambda plus your choice from EC2 or EMR or ECS.

ii.Your system must offer a persistent API that the user will interact with – such an interaction might only involve, for example, a series of calls using curl.

iii.The scalable services, and not GAE, must generate simulated risk values – GAE can be used to collect and average simulated risk values.

iv.To avoid credential issues between GAE and AWS, a Lambda function using a suitably defined role must be used to mediate communication with any AWS service other than Lambda.

v.The system must provide for all of the endpoints in the API specification, as involves the following:

a.Initialisation:

i.A way for the user to specify scale out, r, as will enable the analysis to be conducted in parallel at that scale in one of your scalable services, s. Choice of s relates to the two scalable services you have – i.e. if you have chosen EC2, the selection here will be between Lambda and EC2. For the avoidance of doubt, only one scalable service would be in use at any one time.

Note: ‘warm up’ must be implemented for all of the scalable services to assure scale out, and would include readying any data or service connections in advance of any analysis.

ii.A way to check on readiness for analysis in respect to the requested scale out number.

iii.A way to obtain time and cost required for this ‘warm up’.

iv.A way to obtain relevant endpoint information such that one of the r could be called (tested) directly.

b.For the risk analysis:

i.the system will run analysis using all of the following input parameters:

1.h as the length of price history from which to generate mean and standard deviation;

2.d as the number of data points (shots) to generate in each r for calculating risk via simulated returns;

3.t as buy or sell to allow for separate analysis of each;

4.p as the number of days after which to check price difference for profit (or loss); Across r, each returns its own 95% and 99% risk values for averaging.

c.For output, the system must provide for the following:

i.risk values per signal, and averaging of risk over signals

ii.profit/loss values related to each signal (where possible), and total profit or loss

iii.a chart URL, using either Image Charts or the [old] Google Chart service, showing a line each for the 95% and 99% risk values of the signals, and two lines showing the averages over (1) 95% risk values; (2) 99% risk values. With those 4 lines, higher and lower risk signals should be readily apparent.

iv.total billable time involved, across the r used in parallel, for the analysis, and cost associated to that time.

v.‘Audit’ information showing - for all analysis undertaken since the API was deployed - prior selections of s, r, h, d, t, p, and, as calculated in association to these, total profit (or loss), the two averaged risk values, and the billable time and cost - such that you could use this information to estimate costs for much higher numbers of data points (d).

d.Reset – the system must provide a way to ‘zero’ the analysis without needing further warm up.

e.Terminate – the system must be able to ‘terminate’ (so scale-to-zero) in EC2/EMR/ECS so that no further costs would be incurred. Further use would need to start from ‘Initialisation’.

Your system may incorporate additional Cloud components, for example for storage for the ‘Audit’. However, sensible choices need to be made in respect to the purpose to avoid overcomplication and with lowest costs in mind. Additional components should not be added unnecessarily.

API Specification

You will need to implement the following API specification – your system must be able to respond to the endpoints that are given, and accept inputs and produce outputs consistent with the JSON format specified. Demonstrated with regard to the selected Cloud services and the software implementation, and related questions of cost and performance.

For reasons of time, we only expect a setup sufficient to work – feel free to perfect it after you receive your marks, but just get it as operational as possible for the deadline.

For ease of reading, double quotes ("") have been removed from the JSON –these will need to be reintroduced throughout. To be clear: something like {var95: 0.3345 , var99: 0.3345}would become {"var95": "0.3345", "var99": "0.3345"}; all text used in the JSON must be in lowercase also.

Endpoint required

Description

Method

Inputs

Outputs

/warmup

Warms up to the scale specified for one of the services.

Warmup calls should not wait, so endpoint should return immediately - next endpoint

will be used for checking.

POST

Format (example):

{s: lambda, r: 3}

Format: {result: ok}

/scaled_ready

Retrieves confirmation that the specified scale is readied for analysis.

Expects /warmup already called (no need to handle situations where it isn’t).

Can be called many times, with waits between, until scale reached –so supports services with long warmup.

GET

NONE

Format:

Either:

{warm: false}

or

{warm: true}

/get_warmup_cost

Retrieves the total billable time of warmup to the requested scale and the cost related to

this.

GET

NONE

Example:

{billable_time: 227.44, cost:

18.33 }

/get_endpoints

Retrieves call strings relevant to directly

GET

NONE

Format:

 

calling each unique endpoint made available

 

 

{

 

at warmup.

 

 

{endpoint: http….},

 

 

 

 

{endpoint, http…..},

 

It would be expected that the returned call

 

 

{ …}

 

string(s), for which there must be at least 1,

 

 

 

would be run as a command line – for

 

 

}

 

example, a given call string might have the

 

 

 

 

form:

 

 

 

 

"curl -d {<args>} http://ADDRESS/PATH"

 

 

 

/analyse

Runs the analysis such that results from it can be obtained by multiple subsequent

calls to the next 6 endpoints in this table.

POST

Format (example):

{h: 101, d: 10000,

t: sell, p: 7}

Format:

{result: ok}

/get_sig_vars9599

Retrieves pairs of 95% and 99% VaR values for each signal (averaged over any parallel computations).

GET

NONE

Format (example):

{var95: [0.3345, 0.412,

0.07,…], var99: [0.3345,

0.412, 0.07,…]}

/get_avg_vars9599

Retrieves the average risk values over all signals at each of 95% and 99%.

GET

NONE

Format (example):

{var95: 0.3345 , var99: 0.3345}

/get_sig_profit_loss

Retrieves profit/loss values for all signals.

GET

NONE

Format (example):

{profit_loss: [27.2, -51, 8, 3,

-12, ...]}

/get_tot_profit_loss

Retrieves total resulting profit/loss.

GET

NONE

Format (example):

{profit_loss: -99.99}

/get_chart_url

Retrieves the URL for a chart generated

using the VaR values.

GET

NONE

Format (example):

{url: http…..}

/get_time_cost

Retrieves total billable time for the analysis and the cost related to this.

GET

NONE

Format (example):

{time: 123.45, cost: 88.32}

/get_audit

Retrieves information about all previous runs.

GET

NONE

Format (example):

{

{s: , r: , h: 101, d: 10000, t:

sell, p: 7, profit_loss: , av95:

 

 

 

 

 

, av99: , time: , cost: },

{ … },

}

/reset

Cleans up, as necessary, ready for another analysis, but retaining the warmed-up scale first requested.

 

Note that e.g. get_sig_vars9599 and similar

calls should not now be able to return results – JSON arrays would be empty.

GET

NONE

Format:

{result: ok}

/terminate

Terminates as necessary in order to scale- to-zero, as would require starting again from /warmup.

 

Termination calls should not wait, so

endpoint should return immediately - next endpoint will be used for checking.

GET

NONE

Format:

{result: ok}

/scaled_terminated

Retrieves confirmation of scale-to-zero.

Expects that /warmup already called (no need to handle situations where it hasn’t).

Could be called many times, with waits between, until all are terminated.

GET

NONE

Format:

Either:

{terminated: false}

or

{terminated: true}

Which data will your system analyse?

Your system will use one of the “Other symbols” identified in the code comment depending on the 2nd character of your Surrey username (i.e. a username such as qq0134):

  • If the second character is from ‘a’ to ‘f’ inclusive: MSFT
  • If the second character is from ‘g’ to ‘l’ inclusive: NVDA
  • If the second character is from ‘m’ to ‘r’ inclusive: GOOG
  • If the second character is anything else: AMZN

Code for the approach

Create a requirements.txt file that lists the following 3 libraries (known to work at the versions listed – at least yfinance version 0.2.14 seems to believe symbols don’t exist!).

Install using pip3, per Lab 1. Warnings such as “Can`t uninstall `pytz`. No files were found to uninstall.” can be safely ignored.

Note that if you wanted to add them to a GAE project you would have to extend the requirements.txt for that – i.e. ensuring that Flask, gunicorn, boto3 are also included. 

The code provided on the next pages should, for initial testing, be put into a single file. This provides for the core of the approach. However, it will be expected that you will ‘take this apart’ to use in your system.

You will need to take the code above and build from it appropriately – some things will come from what you have seen in labs, some would be similar to coursework preparation exercises. There will also be a bit of a ‘stretch’. 

Additional advice (includes hints):

1.Don’t just dump all the code into anything on scalable services (e.g. Lambda) .

Little of the code above needs to run in parallel. Consider where the code that needs only to be run once (per session) should be run, and when, within the system. If you run the same thing 10 times that you only needed to run once, it will cost you 10 times as much every time.

2.Related to 1, keep it simple for e.g. Lambda

It is readily possible to avoid using certain libraries (especially Pandas) that will otherwise take you more effort to use in the scalable services (Lambda, EC2 etc). Hence, it is advisable to avoid needing them there (esp. in Lambda)! If you wanted to avoid having to support use of DataFrame in Lambda, look at what, for example,

[entry[3] for entry in data.values.tolist()] might offer with respect to the above, but do note that it may be possible to simplify data transmission still further.

3.Lambda for other AWS services

‘event’ can be your friend!

4.Determine how you prefer to work with JSON

There are various ways to work with JSON. For example, with POST using jsonify in GAE via flask, e.g. from flask import Flask, jsonify, request, render_template Then, for simple JSON inputs, you can use:        request.get_json().get(`parameter`) ) To test with curl (for example on localhost with a testjsonify endpoint):curl -H "Content-Type: application/json" -d `{"parameter":"64"}` localhost:8080/testjsonify

For outputs, e.g.: return jsonify( username=var_username, email=var_email, id=var_id )

In this example, username/email/id don’t require code definitions as they will just be used as strings. After ‘=’, provide variables from the code.

Brief example user scenario

The user can be considered quite well versed in using such APIs, and happy to use tools such as ‘curl’ for this. 

The user first runs ‘warm up’ for scale out to 4 (r) of the type selected (s). Lambda, EC2, EMR, or any other, are made ready for analysis at that scale.

The user specifies 25,000 shots (d) per r, with a history of 91 days per signal (h) for buy signals (t) and a profit/loss time horizon (p) of 6 days.

In doing so, the user expects that, for each signal, 100,000 shots are being produced in total, and there will be the averaging of 4 values for each of 95% and 99% - each r producing one pair – that results in one 95% and one 99% value per signal.

In addition, the value of profit/loss for each signal is calculated using the difference between the price at the signal and, when available, the price p days later than the signal:

·for a Buy signal, there is profit if the price has moved higher but loss if lower;

·for Sell, there is profit if the price has moved lower (because buying that corresponds to the sell would be cheaper) but loss if it has moved higher (have to buy at a higher price).

Following this analysis, the user will be able to obtain a link to a chart showing risk values – two risk values for each signal, and two lines of averages over each one of these – one over the 95% signal values and one over the 99% signal values. The total value of profit/loss, the risk value pairs for all signals, and other such information that the API can offer, will also be available to the user.

The user could run further analysis, ensuring to run the reset first to ensure there a no lurking results, but for this scenario has done enough so wants to scale-to-zero by asking to terminate. This does not, however, delete the Audit, which needs to be stored across uses/sessions (NB: any variables in code do not allow for this over extended time periods, and nor do other temporary storage mechanisms that depend on continued running of supporting components).

What to submit

The report must use the coursework template. Template settings for margins, font sizes, font and paragraph spacing, and so on must not be altered. Doing so will impact on structure/quality, and quality forms part of the overall assessment; your submission may also be pasted into the original template and checked for length (see above).

Numbers of pages stated relates an absolute maximum – for the avoidance of doubt, anything appearing on pages 4 and beyond will not be marked since doing so could readily confer an advantage over those that are adhering to the limit.

Two columns must be used throughout with only large figures and tables allowed to span two columns.

Nothing stated in the coursework template is to be taken to contradict this coursework description in any way. Furthermore, only the latest available coursework description must be used, along with any communication posted on SurreyLearn about corrections or alterations.

The code submission must include only the files that are directly relevant to assessment – this means you must exclude Python libraries, virtual environments, SDKs or anything else what simplify inflates the size of the Zip file. The code submission should be just a few KB in size; anything in MB means something has been added that should not be included.

Submit a written paper, of a maximum of 3 pages, using the template provided, that presents the system. As well as providing a Title, your name, URN, email address and a link to your front-end, this needs to:

a.Explain this system in respect to Cloud. Provide an abstract and introduction which comprehensively and very clearly relates your system to NIST SP 800-145 with respect to:

a.the system (API) developer and

b.the user of the API

Assume that you would write this as though using Real AWS, even though you are using AWS Academy, and so there would be no restrictions on what would be accessible.

Note that simply stating all of, or parts of, NIST SP-800-145, or attempting to rewrite any of it, is not sufficient for this.

b.How have you approached the problem? Where the API endpoints specified in this document are going to need to make calls to interact with services that go beyond the boundary of the service that is running the API, explain which service(s) are involved, what this supports for this system, what inputs will have to be transmitted, and outputs they will produce. Be as specific as possible, particularly if the number of inputs/outputs depends on some relevant quantity.

Note that this means explaining what would be expected to be done, and where, for this system, not how. You are to avoid: (i) explaining any cloud service or how it might generally be used or useful; (ii) explaining how to use any of the cloud services; (iii) explaining code or any other technical mechanisms as would relate to implementation; (iv) describing what the user does. Explanations that you should have avoided will occupy space in the submission without attracting marks.

c.What did do complete, or not? State the numbers for requirements i-iv that you consider MET (M), PARTIALLY MET (P), or NOT MET (N), and list endpoints that are M/P/N. Note that code will be reviewed in regard to what is stated.

d.Do results relate to scale? Show and clearly explain what happens to the time taken for analysis, using billable time related to the scale as well as elapsed clock time, when:

a.the total number of shots is being increased systematically when using r = 4

b.the value of r is being increased systematically with respect to one large total number of shots (addressing scale within AWS Academy limits).

Consider, for example, how you might characterise relationships amongst the data involve here. You can, if no other section needs the space, include a maximum of one screenshot showing results from one run of the system. Figures must be explained in the text and carry meaningful captions.

e.How bad might the cloud bills get? Explain what it might cost to run the whole system (i.e. over all services being used by the system) based a set of assumptions about some number of users spending some time in attempting to derive the most profitable trading strategy, where a proportion of users that you specify will be using Lambda and the rest will use the 2nd scalable service.

These must be ‘real world’ costs, i.e. assuming all free tier usage is exhausted. Costs should relate only to the region in use for each service.

f.Demonstrate quality in presentation through submission structure, including the abstract and introduction, and quality of writing.

100% Plagiarism Free & Custom Written, Tailored to your instructions