# BigQuery - Inserting data into BigQuery
Loading file data into BigQuery allows you to insert millions of rows into a BigQuery table efficiently. If you're working with large flat files, this automation strategy will allow you to see huge performance improvements to the number of rows you can insert into a table, and how long it takes to insert these rows.
The BigQuery connector provides you with two ways to load file data into BigQuery - from a file you have downloaded in a previous step in your recipe or from a file that already exists in Google Cloud Storage. Our recommendation is to first upload your file to Google Cloud Storage before loading it from Google Cloud storage to Google BigQuery.
Beyond moving large amounts of rows, this is a powerful action that can senses changes in the number of columns in an incoming CSV or JSON file and extend the schema of the table in BigQuery to accept it.
NOTE
BigQuery can detect the schema of incoming CSV and JSON files so you don't need to provide the schema of the table. If not, you will have to define your table schema manually. When files have different columns than the destination table, you can use Alter table columns when required?
to allow BigQuery to extend the columns in the table.
# Input fields
Field | Description |
---|---|
Project | The project available in the connection to be billed for the query. |
Dataset | The dataset which the action or trigger will pull the possible tables from. |
Table | The table inside the dataset. If you're looking to create a new table, toggle to "Enter table ID" and provide your desired table name. |
Source URI ("Load data from Google Cloud Storage into BigQuery" action only) | The source URI of the file in Google Cloud Storage. The source URI of a file follow this format gs://[BUCKET_NAME]/[FILE_NAME] . * Wild cards are allowed after the bucket name. e.g. gs://my_sample_bucket/bulk_load_*.csv |
File contents ("Load data into BigQuery" action only) | The file contents of the file to stream. This can be in the format - CSV, Datastore backup, New line delimited JSON, AVRO, PARQUET, ORC |
File size ("Load data into BigQuery" action only) | The exact file size of the file being loaded. This is needed for us to stream the data into BigQuery. |
Schema | The schema of the file. Only needed if Autodetect is set to "No" |
Autodetect | Only applies to CSV and JSON files. BigQuery will introspect the file and detect the schema of the file automatically. |
Alter table columns when required? | Allows BigQuery to update the schema of a table if the incoming file does not match the columns in the target table. |
Create disposition | Tells BigQuery to create the table if needed or to throw an error. |
Write disposition | Tells BigQuery to either truncate the table before writing, append the data to the end of the table or throw and error if the table already exists. |
Destination table | Properties of destination table if it does not exist. Use these inputs to configure the name, description and labels attached to the table. |
Null Marker | Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. |
Field Delimiter | The separator for fields in a CSV file. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is comma (","). |
Skip leading rows | Allows the following values: - Unspecified - Autodetect tries to detect headers in the first row. If they are not detected, the row is read as data.Otherwise data is read starting from the second row, - 0 - Instructs autodetect that there are no headers and data should be read starting from the first row, - N > 0 - Autodetect skips N-1 rows and tries to detect headers in row N. If headers are not detected, row N is just skipped. Otherwise row N is used to extract column names for the detected schema. |
encoding | The character encoding of the data. The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8. |
quote | The value that is used to quote data sections in a CSV file. The default value is a double-quote (""). If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true. |
Max Bad Records | The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0. |
Allow Quoted Newlines | Indicates if BigQuery should allow quoted data sections that contain newline characters in a CSV file. The default value is false. |
Allow Jagged Rows | Indicates if we should automatically infer the options and schema for CSV and JSON sources. |
Ignore Unknown Values | Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records. The default value is false. |
Projection Fields | If sourceFormat is set to 'DATASTORE_BACKUP', indicates which entity properties to load into BigQuery from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If no properties are specified, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned in the job result. |
Time Partitioning | Time-based partitioning specification for the destination table. Only ONE of Time partitioning and Range partitioning should be specified. |
Range Partitioning | Range partitioning specification for the destination table. Only ONE of Time partitioning and Range partitioning should be specified. |
Clustering | One or more fields on which data should be clustered. Only top-level, non-repeated, simple-type fields are supported. The ordering of the clustering fields should be prioritized from most to least important for filtering purposes. Additional information on limitations can be found here |
Destination table encryption configuration | Describes the Cloud KMS encryption key that will be used to protect destination BigQuery table. The BigQuery Service Account associated with your project requires access to this encryption key. |
Use Avro Logical Types | If select yes, indicates whether to interpret logical types as the corresponding BigQuery data type (for example, TIMESTAMP), instead of using the raw type (for example, INTEGER). |
Reference File Schema URL | Provide a reference file with the reader schema. This file is only loaded if it is part of source URIs, but is not loaded otherwise. Enabled for the following formats: AVRO, PARQUET, ORC. |
Hive Partitioning Options | hen set, configures hive partitioning support. Not all storage formats support hive partitioning and requesting hive partitioning on an unsupported format will lead to an error, as will providing an invalid specification. |
Decimal target types | Defines the list of possible SQL data types to which the source decimal values are converted. This list and the precision and the scale parameters of the decimal field determine the target type. Supported types are: NUMERIC, BIGNUMERIC, STRING |
Parquet Options | Additional properties to set if sourceFormat is set to PARQUET. |
Preserve Ascii Control Characters | When sourceFormat is set to CSV, this indicates whether the embedded ASCII control characters (the first 32 characters in the ASCII-table) are preserved |
Chunk size | The size of the chunks in each packet sent to BigQuery. Often not needed to be configured. |
# Output fields
Field | Description |
---|---|
ID | ID of the Load job created |
Kind | The kind of BigQuery job created. |
Self link | The link to the load job in the BigQuery console |
Job reference | Contains data about the job created. |
State | Whether the job is completed. It should always be "Done" |
Statistics | Statistics associated with the load job created. |
Status | Contains information about any errors in the load job. |
User email | Email of the user who created the job. This would be the email of the user authenticated to this Workato connection. |
Last updated: 3/21/2023, 3:54:11 PM