Merge pull request #39 from Nixtla/docs/azure-vignette

docs: add azure vignette
Nixtla · Oct 25, 2024 · 8cd89c3 · 8cd89c3
2 parents c2a132f + 9aa3c0e
commit 8cd89c3
Show file tree

Hide file tree

Showing 19 changed files with 203 additions and 56 deletions.
diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -19,6 +19,8 @@ navbar:
       menu:
         - text: "Anomaly Detection"
           href: articles/anomaly-detection.html
+        - text: "Azure Quickstart"
+          href: articles/azure-quickstart.html
         - text: "Cross-Validation"
           href: articles/cross-validation.html
         - text: "Data requirements"

diff --git a/man/figures/azure_deploy.png b/man/figures/azure_deploy.png
diff --git a/man/figures/azure_endpoints.png b/man/figures/azure_endpoints.png
diff --git a/man/figures/azure_landing.png b/man/figures/azure_landing.png
diff --git a/man/figures/azure_models.png b/man/figures/azure_models.png
diff --git a/man/figures/diagram.png b/man/figures/diagram.png
diff --git a/man/figures/diagram_setup.png b/man/figures/diagram_setup.png
diff --git a/vignettes/anomaly-detection.Rmd b/vignettes/anomaly-detection.Rmd
@@ -31,7 +31,7 @@ library(nixtlar)
 ## 1. Anomaly detection
 Anomaly detection plays a crucial role in time series analysis and forecasting. Anomalies, also known as outliers, are unusual observations that don't follow the expected time series patterns. They can be caused by a variety of factors, including errors in the data collection process, unexpected events, or sudden changes in the patterns of the time series. Anomalies can provide critical information about a system, like a potential problem or malfunction. After identifying them, it is important to understand what caused them, and then decide whether to remove, replace, or keep them.
 
-`TimeGPT` has a method for detecting anomalies, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first. 
+`TimeGPT` has a method for detecting anomalies, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/get-started.html) vignette first. 
 
 ## 2. Load data 
 For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. 
@@ -41,10 +41,11 @@ df <- nixtlar::electricity
 head(df)
 ```
 
-## 3. Detect anomalies 
-To detect anomalies, use `nixtlar::nixtla_client_detect_anomalies`, which should include the following parameter:  
+## 3. Detect Anomalies
 
-- **df**: The time series data, either as a data frame, a tibble, or a tsibble. It should include at least a column with the timestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names. If working with multiple series, you also need to include a column with unique identifiers. The default name for this column is `unique_id`. 
+To detect anomalies, use `nixtlar::nixtla_client_detect_anomalies`, which requires the following parameter:
+
+- **df**: The time series data, provided as a data frame, tibble, or tsibble. It must include at least two columns: one for the timestamps and one for the observations. The default names for these columns are `ds` and `y`. If your column names are different, specify them with `time_col` and `target_col`, respectively. If you are working with multiple series, you must also include a column with unique identifiers. The default name for this column is `unique_id`; if different, specify it with `id_col`.
 
 ```{r}
 nixtla_client_anomalies <- nixtlar::nixtla_client_detect_anomalies(df) 

diff --git a/vignettes/azure-quickstart.Rmd b/vignettes/azure-quickstart.Rmd
@@ -0,0 +1,100 @@
+---
+title: "TimeGEN-1 Quickstart (Azure)"
+output: 
+  rmarkdown::html_vignette:
+    toc: true 
+    toc_depth: 2
+vignette: >
+  %\VignetteIndexEntry{Azure Quickstart}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+```{r setup, include=FALSE}
+library(httptest2)
+.mockPaths("../tests/mocks")
+start_vignette(dir = "../tests/mocks")
+
+original_options <- options("NIXTLA_API_KEY"="dummy_api_key", digits=7)
+
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>", 
+  fig.width = 7, 
+  fig.height = 4
+)
+```
+
+```{r}
+library(nixtlar)
+```
+
+TimeGEN-1 is TimeGPT optimized for Azure, Microsoft's cloud computing service. You can easily access TimeGEN via `nixtlar`. To do this, just follow these steps: 
+
+## 1. Set up a TimeGEN-1 endpoint account and generate your API key on Azure. 
+
+- Go to [ml.azure.com](ml.azure.com)
+- Sign in or create an account.
+- If you don't have one already, create a workspace. This might require a subscription.
+
+![](../man/figures/azure_landing.png) 
+
+- Click on `Models` in the sidebar and select `TimeGEN` in the model catalog.  
+
+![](../man/figures/azure_models.png) 
+
+- Click `Deploy`. This will create an Endpoint. 
+
+![](../man/figures/azure_deploy.png) 
+
+- Go to your Endpoint in the sidebar. Here you will find your Base URL and the API key. 
+
+![](../man/figures/azure_endpoints.png) 
+
+## 2. Install `nixtlar`
+
+In your favorite R IDE, install `nixtlar` from CRAN or GitHub. 
+
+```{r, eval = FALSE}
+install.packages("nixtlar") # CRAN version 
+
+library(devtools)
+devtools::install_github("Nixtla/nixtlar")
+```
+
+## 3. Set up the Base URL and API key 
+
+To do this, use the `nixtla_client_setup` function. 
+
+```{r, eval = FALSE}
+nixtla_client_setup(
+  base_url = "Base URL here", 
+  api_key = "API key here"
+)
+```
+
+## 4. Start making forecasts! 
+
+Now you can start making forecasts! We will use the electricity dataset that is included in `nixtlar`. This dataset contains the prices of different electricity markets. 
+
+```{r}
+df <- nixtlar::electricity
+nixtla_client_fcst <- nixtla_client_forecast(df, h = 8, level = c(80,95))
+head(nixtla_client_fcst)
+```
+
+We can plot the forecasts with the `nixtla_client_plot` function. 
+
+```{r}
+nixtla_client_plot(df, nixtla_client_fcst, max_insample_length = 200)
+```
+
+To learn more about data requirements and TimeGPT's capabilities, please read the nixtlar vignettes.
+
+## Discover the power of TimeGEN on Azure via `nixtlar`. 
+
+Deploying TimeGEN via `nixtlar` on Azure allows you to implement robust and scalable forecasting solutions. This not only simplifies the integration of advanced analytics into your workflows but also ensures that you have the power of Azure’s cutting-edge technology at your disposal through a pay-as-you-go service. To learn more, read [here](https://www.nixtla.io/news/timegen1-on-azure).
+
+```{r, include=FALSE}
+options(original_options)
+end_vignette()
+```
diff --git a/vignettes/cross-validation.Rmd b/vignettes/cross-validation.Rmd
@@ -31,7 +31,7 @@ library(nixtlar)
 ## 1. Time series cross-validation 
 Cross-validation is a method for evaluating the performance of a forecasting model. Given a time series, it is carried out by defining a sliding window across the historical data and then predicting the period following it. The accuracy of the model is computed by averaging the accuracy across all the cross-validation windows. This method results in a better estimation of the model’s predictive abilities, since it considers multiple periods instead of just one, while respecting the sequential nature of the data.
 
-`TimeGPT` has a method for performing time series cross-validation, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first.  
+`TimeGPT` has a method for performing time series cross-validation, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/get-started.html) vignette first.  
 
 ## 2. Load data 
 For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. 
@@ -44,7 +44,7 @@ head(df)
 ## 3. Perform time series cross-validation
 To perform time series cross-validation using `TimeGPT`, use `nixtlar::nixtla_client_cross_validation`. The key parameters of this method are: 
 
-- **df**: The time series data, either as a data frame, a tibble, or a tsibble. It should include at least a column with the datestamps and a column with the observations. Default names for these columns are `ds` and `y`. If different, please specify their names. If working with multiple series, you also need to include a column with unique identifiers. The default name for this column is `unique_id`. 
+- **df**: The time series data, provided as a data frame, tibble, or tsibble. It must include at least two columns: one for the timestamps and one for the observations. The default names for these columns are `ds` and `y`. If your column names are different, specify them with `time_col` and `target_col`, respectively. If you are working with multiple series, you must also include a column with unique identifiers. The default name for this column is `unique_id`; if different, specify it with `id_col`.
 - **h**: The forecast horizon. 
 - **n_windows**: The number of windows to evaluate. Default value is 1. 
 - **step_size**: The gap between each cross-validation window. Default value is `NULL`. 

diff --git a/vignettes/data-requirements.Rmd b/vignettes/data-requirements.Rmd
@@ -41,34 +41,34 @@ This vignette explains the data requirements for using any of the core functions
 
 ## 1. Input Requirements
 
-`nixtlar` now supports the following data structures: data frames, tibbles, and tsibbles. The output format will always be a data frame.
+`nixtlar` now supports the following data structures: data frames, tibbles, and tsibbles. The output format will always be a data frame. 
 
 Regardless of your data structure, the following two columns must always be included when using any core functions of `nixtlar`:
 
-- **Date Column**: This column must contain timestamps formatted as `YYYY-MM-DD` or `YYYY-MM-DD hh:mm:ss`, either as character strings or date-time objects. The default name for this column is `ds`. If your dataset uses a different name, please specify it by setting the parameter `time_col="your_time_column_name"`.
+- **Date Column**: This column must contain timestamps formatted as `YYYY-MM-DD` or `YYYY-MM-DD hh:mm:ss`, either as characters or date-time objects. For date-time objects, we recommend using the `as.POSIX*` functions from base R, although `as.Date` is also supported. The default name for this column is `ds`. If your dataset uses a different name, please specify it by setting the parameter `time_col="your_time_column_name"`.
 
 - **Target Column**: This column should contain the numeric target variable for forecasting. The default name for this column is `y`. If your dataset uses a different name, specify it by setting the parameter `target_col="your_target_column_name"`.
 
 ## 2. Multiple Series
 
 If you are working with multiple series, you must include a column with a unique identifier for each series. This column can contain characters or integers, and its default name is `unique_id`. If your dataset uses a different name for the identifier column, please specify it by setting the parameter `id_col="your_id_column_name"`. If your dataset contains only one series and does not need an identifier, set `id_col` to `NULL`.
 
+Please be aware that in earlier versions of `nixtlar`, the default name for `id_col` was `NULL`, but it is now `unique_id`. 
+
 ```{r}
 # sample valid input 
 df <- nixtlar::electricity
 head(df)
 str(df)
 ```
 
-The `id_col` only accepts characters or integers. 
-
 ## 3. Exogenous Variables
 
-When using exogenous variables, `nixtlar` differentiates between historical and future exogenous variables:
+When using exogenous variables, `nixtlar` distinguishes between historical and future exogenous variables:
 
 - **Historical Exogenous Variables**: These should be included in the input data immediately following the `id_col`, `ds`, and `y` columns. If your dataset contains additional columns that are not exogenous variables, you must remove them before using any core functions of `nixtlar`.
 
-- **Future Exogenous Variables**: These correspond to the `X_df` parameter and should cover the entire forecast horizon. This dataset should include columns with the appropriate timestamps and, if available, unique identifiers, formatted as explained in previous sections.
+- **Future Exogenous Variables**: These correspond to the `X_df` parameter and should cover the entire forecast horizon. This dataset must include columns with the appropriate timestamps and, if applicable, unique identifiers, formatted as described in the previous sections.
 
 ```{r}
 # sample valid input with exogenous variables 
@@ -83,11 +83,11 @@ To learn more about how to use exogenous variables, please refer to the [Exogeno
 
 ## 4. Missing values 
 
-When using `TimeGPT` via `nixtlar`, you need to ensure that:
+When using `TimeGPT` via `nixtlar`, ensure the following:
 
-1. **No Missing Values in Target Column**: The target column must not contain any missing values (NA).
+1. **No Missing Values in the Target Column**: The target column must not contain any missing values (`NA`).
 
-2. **Continuous Date Sequence**: The dates must be continuous and without any gaps, from the start date to the end date, matching the frequency of the data.
+2. **Continuous Date Sequence**: The dates must be continuous, without any gaps, from the start date to the end date, matching the frequency of the data.
 
 Currently, **nixtlar** does not provide any functionality to fill missing values or dates. To learn more about this, please refer to the vignette on [Special Topics](https://nixtla.github.io/nixtlar/articles/special-topics.html). 
 

diff --git a/vignettes/exogenous-variables.Rmd b/vignettes/exogenous-variables.Rmd
@@ -32,7 +32,7 @@ library(nixtlar)
 
 Exogenous variables are external factors that provide additional information about the behavior of the target variable in time series forecasting. These variables, which are correlated with the target, can significantly improve predictions. Examples of exogenous variables include weather data, economic indicators, holiday markers, and promotional sales.
 
-`TimeGPT` allows you to include exogenous variables when generating a forecast. This vignette will show you how to include them. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/anomaly-detection.html) vignette first. 
+`TimeGPT` allows you to include exogenous variables when generating a forecast. This vignette will show you how to include them. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/get-started.html) vignette first. 
 
 ## 2. Load data 
 
@@ -43,11 +43,11 @@ df_exo_vars <- nixtlar::electricity_exo_vars
 head(df_exo_vars)
 ````
 
-When using exogenous variables, `nixtlar` differentiates between historical and future exogenous variables:
+When using exogenous variables, `nixtlar` distinguishes between historical and future exogenous variables:
 
 - **Historical Exogenous Variables**: These should be included in the input data immediately following the `id_col`, `ds`, and `y` columns. If your dataset contains additional columns that are not exogenous variables, you must remove them before using any core functions of `nixtlar`.
 
-- **Future Exogenous Variables**: These correspond to the `X_df` parameter and should cover the entire forecast horizon. This dataset should include columns with the appropriate timestamps and, if available, unique identifiers, formatted as explained in previous sections.
+- **Future Exogenous Variables**: These correspond to the `X_df` parameter and should cover the entire forecast horizon. This dataset must include columns with the appropriate timestamps and, if applicable, unique identifiers. 
 
 ````{r}
 future_exo_vars <- nixtlar::electricity_future_exo_vars
@@ -63,10 +63,10 @@ fcst_exo_vars <- nixtla_client_forecast(df_exo_vars, h = 24, X_df = future_exo_v
 head(fcst_exo_vars)
 ````
 
-For comparison, we will also generate a forecast without the exogenous variables. 
+For comparison, we will also generate a forecast without exogenous variables. 
 
 ````{r}
-df <- nixtlar::electricity # same dataset but without the exogenous variables
+df <- nixtlar::electricity # same dataset but without exogenous variables
 
 fcst <- nixtla_client_forecast(df, h = 24)
 head(fcst)

diff --git a/vignettes/get-started.Rmd b/vignettes/get-started.Rmd
@@ -35,17 +35,26 @@ First, you need to set up your API key. An API key is a string of characters tha
 
 When using `nixtlar`, there are two ways of setting up your API key: 
 
-### a. Using the `nixtla_set_api_key` function 
+### a. Using the `nixtla_client_setup` function 
 `nixtlar` has a function to easily set up your API key for your current R session. Simply call 
 
 ```{r eval=FALSE}
-nixtla_set_api_key(api_key = "paste your API key here")
+nixtla_client_setup(api_key = "Your API key here")
 ```
 
 Keep in mind that if you close your R session or you re-start it, then you'll need to set up your API key again. 
 
+When using Azure, you also need to add the `base_ur` parameter to the `nixtla_client_setup` function. 
+
+```{r eval=FALSE}
+nixtla_client_setup(
+  base_url = "Base ULR",
+  api_key = "Your API key here"
+)
+```
+
 ### b. Using an environment variable 
-For a more persistent method that can be used across different projects, set up your API key as environment variable. To do this, you first need to load the `usethis` package. 
+For a more persistent method that can be used across different projects, set up your API key as environment variable. To do this, first load the `usethis` package. 
 
 ```{r eval=FALSE, message=FALSE}
 library(usethis)
@@ -56,10 +65,20 @@ This will open your `.Reviron` file. Place your API key here and named it `NIXTL
 
 ```{r eval=FALSE}
 # Inside the .Renviron file 
-NIXTLA_API_KEY="paste your API key here"
+NIXTLA_API_KEY="Your API key here"
+```
+
+You'll need to restart R for changes to take effect. Keep in mind that modifying the `.Renviron` file affects all of your R sessions, so if you're not comfortable with this, use the `nixtla_client_setup` function instead. 
+
+If you are using Azure, you also need to specify the `NIXTLA_BASE_URL`. 
+
+```{r eval=FALSE}
+# Inside the .Renviron file 
+NIXTLA_BASE_URL="Base URL"
+NIXTLA_API_KEY="Your API key here"
 ```
 
-You'll need to restart R for changes to take effect. Keep in mind that modifying the `.Renviron` file affects all of your R sessions, so if you're not comfortable with this, set your API key using the `nixtla_set_api_key` function. 
+For details on how to set up your API key, check out the [Setting Up Your API Key](https://nixtla.github.io/nixtlar/articles/setting-up-your-api-key.html) vignette. To learn more about how to use Azure, please refer to the [TimeGEN-1 Quickstart (Azure)](vignette). 
 
 ### Validate your API key 
 If you want to validate your API key, call `nixtla_validate_api_key`. 
@@ -89,7 +108,7 @@ head(nixtla_client_fcst)
 `nixtlar` includes a function to plot the historical data and any output from `nixtla_client_forecast`, `nixtla_client_historic`, `nixtla_client_anomaly_detection` and `nixtla_client_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 
 
 ```{r}
-nixtla_client_plot(df, nixtla_client_fcst, id_col = "unique_id", max_insample_length = 200)
+nixtla_client_plot(df, nixtla_client_fcst, max_insample_length = 200)
 ```
 
 ```{r, include=FALSE}