question archive Linear Regression Improving detection of peaks in ECG signal [8 points] ECG is a method for monitoring and recording heart activity, and an example is shown in Fig

Linear Regression Improving detection of peaks in ECG signal [8 points] ECG is a method for monitoring and recording heart activity, and an example is shown in Fig

Subject:Computer SciencePrice:36.99 Bought3

Linear Regression Improving detection of peaks in ECG signal [8 points] ECG is a method for monitoring and recording heart activity, and an example is shown in Fig.1. A device recording the signal works at a certain sampling rate (frequency), and hence the actual signal, plotted as data points, looks as shown in Fig.1B. Peaks of ECG signal can be used, for example, to measure heart rate (average number of beats in a minute, i.e., number of peaks per minute), or the duration of intervals between two peaks that would, in combination with other measurements, enable other estimates of interest

(e.g., blood pressure, heart rate variability). The sampling frequency of a device plays an important role in the quality of the signal. The more often we sample, the more data points we get, and hence, the better quality of the signal, but probably also higher costs (e.g., of the device). The signal shown in Fig.1 was sampled at a sampling rate of 180 Hz. The task is to improve the detection of peaks, (both amplitude and the (time) precision of occurrences of peaks) by means of linear regression, or more precisely, by fitting lines through data points around each peak.

Given are the following arrays of data: y – amplitudes of ECG signal, and an array with indices of peaks.

You will have to create an array representing x data points (these are evenly spaced data points, sampled

at frequency 180 Hz). For example, the first peak has an index 24, for which (x, y) = (0.133, 1.965).

Figure 1: An example of an ECG signal. (A) ECG signal (8 s, blue line), and detected peaks (marked

as orange ’x’). (B) Scatter plot for the very first peak.

Tasks:

1. The linear function that we want to use is f(x) = ax + b. Using the least squares approach, the error

function to minimize is then:

E(a, b) = 1

m

Xm

i=1

(yi − f(xi))2

,

where m is the number of data points we use for regression, (xi

, yi), i 1, ..., m, are the points that we

use for fitting.

(A) Derive expressions for a and b in the form of sums by minimizing the cost function E(a, b).

(B) Derive expressions for a and b using the matrix-vector notation. First, rewrite the cost function

as function of θ and use L2-norm for that. Specify how you stack elements in the design matrix

X, and the vector θ (θ should contain parameters a and b, but in what order?), also specify the

dimensions of X, θ and y. Find the gradient of E w.r.t. θ, and do your derivations until you get

the expression for the Moore-Penrose pseudoinverse.

HINT: For a function f dependent on vector x, f(x) = ?Ax − b?

2

, xf = 2AT

(Ax − b).

ML1, SS22, Homework 1 Tutor: Lukas Steinwender, l.steinwender@student.tugraz.at

1.2 Smartwatch data [5 points] 3

2. Implement the equations for a and b in the code, either by using the matrix notation or the expressions

with sums that you derived.

3. Derive the expression and implement the function that finds an intersection point of two lines. Include

your derivation in the report (intersection of two lines, e.g., one with parameters a and b, the other

one with parameters c and d).

4. Implement the function that finds line coefficients for the line left of the peak, and for the line right

of the peak. For each line separately, decide how many points would be appropriate to use (for the

left line, for the right line). Also, decide if it is better to include the peak point in the left line, in the

right line, in both, or not to include it at all. This function should work for a single peak. Report the

number of points used and if the peak was included in the data points for regression lines (for the best

performing results).

5. To measure if there is an improvement in the peak, a function in the code is already provided. What

point is considered to be an improvement of the peak? Explain both parts of if -statement from the

code.

6. How does the number of points around the peak that we take for line regression, in this case, affect

the results? What happens (in this task, not in general) with the intersection point if the number of

chosen points for regression lines is too high?

7. Report the final score achieved (the percentage of peaks improved).

8. Why is the approach with fitting lines to improve peaks and finding an intersection point, in this case,

preferable over, for example, fitting a parabola and finding its peak?

Bonus tasks [2* points]:

• (Python related) Add an assert command in the function fit line to check if there are at least two

data points given, and an assert command in the function test fit line. Include both lines of the code

in the report.

• Implement in the code: in a single figure, plot ECG signal and all peaks, regression lines for each peak

and improved peaks. Note that the intersection point of two lines is the improved peak (use a special

marker for that, e.g, black circles). In the report, include two figures for time periods [0.0, 8.2] and

[0.05, 0.25]. This can be easily done by simply setting xlim parameter in the plotting function, after

everything is plotted.

1.2 Smartwatch data [5 points]

Given is data from a smartwatch representing the values for 100 subjects (rows) and 8 different variables of

interest (columns): hours sleep, hours work, average pulse, max pulse, exercise duration, exercise intensity,

fitness level, calories burned.

Tasks:

1. Find 3 meaningful linear relations between variables, i.e., which variable can be predicted by another

single one? Fit a line and calculate the MSE (implement the function fit predict mse). Calculate

the correlation coefficient (Pearson coefficient) (implement the function pearson coefficient). Visualize

the data (chosen variable 1, variable 2 ) by the means of a scatter plot, and plot the best fitting line

over it (use the function scatterplot and line). Include the plots in the report. State also correlation

coefficient, MSE and θ for all 3 pairs of variables that you chose.

2. Find 3 different pairs that are not linearly dependent. Repeat the steps as in the previous case (line

fitting, correlation coefficient, scatter plot and a line over it). Include the plots in the report. State

also correlation coefficient, MSE and θ for all 3 pairs of variables that you chose.

3. What can you say from scatter plots? What from the correlation coefficient, i.e., how do you interpret

the values of the Pearson coefficient?

ML1, SS22, Homework 1 Tutor: Lukas Steinwender, l.steinwender@student.tugraz.at

4

2 Logistic Regression [5 points]

For this task we will use 3 different data sets (X.npy should be used with targets-dataset-1.npy, targets[1]dataset-2.npy, and targets-dataset-3.npy), and sklearn library.

X.npy contains two features – values for x1, x2 {0, 29}. The targets are 0 or 1. Our task is to train a

classifier that predicts either Class 0 or Class 1, as shown in Fig.2A-C.

Figure 2: Targets for: (A) Data set 1; (B) Data set 2; (C) Data set 3.

Tasks:

1. Load the data. For each of the three tasks, create an appropriate design matrix X. Include any feature

that you think is necessary. (There is no need to include the zero (dummy) feature, because we will use

LogisticRegression classifier from the sklearn library, for which, by default, the bias term (intercept) is

added to the decision function.) In the report, for each task, state what design matrix you used, that

is, name the features of your design matrices.

2. Split the data set, such that 20% of the data is used for testing. Use the train test split function

(already imported).

3. Create a classifier (LogisticRegression classifier). Fit model to the data, calculate accuracy on the train

and test set. In addition, using log loss from sklearn.metrics, calculate loss on the train and test set.

(Hint: you will first need to calculate probabilities of predictions on the train and test set.)

Try out different penalties (check the documentation to see what options there are), and report your

final choice (the one that gives you the best accuracy on the test set). If you are not happy with the

final results, and changing the penalty does not help, rethink your design matrices!

Report the accuracy on the train and test set, loss on the train and test set, and what penalty you

used.

4. For each data set, include in the report 3 plots that you generated using the function plot datapoints

– one should show “Predictions on the train set”, one should be for “Predictions on the test set”, and

the last one for “Predictions on the whole data set” (train and test set merged) that would clearly

show if the task is perfectly solved. (Please include the plots that look the same as those in Fig. 2,

with the difference that instead of targets, you plot the predictions (output of your model)).

5. Report θ

vector, and also the bias term. Hint: check the Attributes of the classifier.

6. When do we use logistic regression?

7. A classifier could predict everything correctly and achieve 100% accuracy, but it can happen that the

loss is not zero. Why? (Hint: conclude from the cost function that is used for logistic regression. When

is the loss zero?)

ML1, SS22, Homework 1 Tutor: Lukas Steinwender, l.steinwender@student.tugraz.at

5

3 Gradient descent [7 points]

The following function (called Eggholder function), should be optimized using Gradient Descent algorithm:

f(x, y) = −(y + 47) sin rx

2

+ (y + 47)

− x sin p

|x − (y + 47)|. (1)

The global minimum of this function is the point (512, 404.2319), for which f(512, 404.2319) = −959.6407.

Tasks:

1. In the code, implement the gradient descent solver.

2. In the code, implement the cost function (Eq. 1).

3. Derive the expression to calculate partial derivatives (with respect to x and y) and implement it in the

code. Include your derivation in the report (derivation for ∂f(x,y)

∂x and ∂f(x,y)

∂y ). Note: d

dx |u| =

u

|u|

du

dx .

4. Choose the starting point randomly from −512 ≤ x, y ≤ 512. Try out different parameters (number

of iterations, learning rate) for the gradient descent algorithm, and try to find the minimum of the

function. Generate 3 plots showing how the cost changes over iteration (one with slow convergence,

one with smooth (and moderate speed) convergence, and one with fast convergence). Include them in

the report, and for each plot write which learning rate was used.

5. Why is this function challenging to optimize? Was it (always) possible to find the global minimum?

6. Is the absolute function differentiable in all points? What is the meaning of the derivative for the

absolute function that we used above, more precisely, of the term u

|u|

?

7. In the report, specify points for which the gradient of the Eggholder function might be problematic.

How can we computationally overcome the problem? Implement it in the code, and describe in the

report your approach. In the code, evaluate the gradient function with 2 problematic points (x, y). In

the report, include also the points that you tested.

Bonus task [1* point]:

• Gradient Descent algorithm needs a cost function to be specified, and gradients of the cost function with respect to all of its variables. If we would like to have a generic Gradient Descent algorithm,

i.e., an algorithm that works for any cost function without defining gradients of it w.r.t. all variables, what could we use? How could we make the function gradients in a general form, i.e., how can we approximate gradients? (No need to implement anything, just answer the questions.

Option 1

Low Cost Option
Download this past answer in few clicks

36.99 USD

PURCHASE SOLUTION

Option 2

Custom new solution created by our subject matter experts

GET A QUOTE

rated 5 stars

Purchased 3 times

Completion Status 100%