Skip to content

Commit 3bc8141

Browse files
authored
Custom encryption not EE, read S3 file, end discount for Team plan (windmill-labs#529)
* Custom encryption not EE, read S3 file, end discount for Team plan * Fix build
1 parent 284f685 commit 3bc8141

File tree

11 files changed

+966
-155
lines changed

11 files changed

+966
-155
lines changed

docs/core_concepts/11_persistent_storage/index.mdx

Lines changed: 262 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@ For best performance, [install MinIO locally](https://min.io/docs/minio/kubernet
238238

239239
[MinIO](https://min.io/) is an open-source, high-performance, and scalable object storage server that is compatible with Amazon S3 APIs, designed for building private and public cloud storage solutions.
240240

241-
Then from Windmill, just [fill the S3 resource type](../../integrations/s3.md).
241+
Then from Windmill, just [fill the S3 resource type](../../integrations/s3.mdx).
242242

243243
#### Azure Blob
244244

@@ -254,7 +254,7 @@ Then from Windmill, just [fill the S3 resource type](../../integrations/s3.md).
254254

255255
### Connect your Windmill workspace to your S3 bucket or your Azure Blob storage
256256

257-
Once you've created an [S3 or Azure Blob resource](../../integrations/s3.md) in Windmill, go to the workspace settings > S3 Storage. Select the resource and click Save.
257+
Once you've created an [S3 or Azure Blob resource](../../integrations/s3.mdx) in Windmill, go to the workspace settings > S3 Storage. Select the resource and click Save.
258258

259259
![S3 storage workspace settings](./workspace_settings.png)
260260

@@ -275,7 +275,266 @@ When a script outputs a S3 file, it can be downloaded or previewed directly in W
275275

276276
![S3 file download](./file_download.png)
277277

278-
For more info on how to use files and S3 files in Windmill, see [Handling files and binary data](/docs/core_concepts/files_binary_data)
278+
#### Read a file from S3 within a script
279+
280+
<Tabs className="unique-tabs">
281+
282+
<TabItem value="bun" label="TypeScript (Bun)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
283+
284+
```ts
285+
import * as wmill from 'windmill-client';
286+
import { S3Object } from 'windmill-client';
287+
288+
export async function main(input_file: S3Object) {
289+
// Load the entire file_content as a Uint8Array
290+
const file_content = await wmill.loadS3File(input_file);
291+
292+
const decoder = new TextDecoder();
293+
const file_content_str = decoder.decode(file_content);
294+
console.log(file_content_str);
295+
296+
// Or load the file lazily as a Blob
297+
let fileContentBlob = await wmill.loadS3FileStream(inputFile);
298+
console.log(await fileContentBlob.text());
299+
}
300+
```
301+
302+
</TabItem>
303+
304+
<TabItem value="deno" label="TypeScript (Deno)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
305+
306+
```ts
307+
import * as wmill from 'npm:windmill-client@1.253.7';
308+
import S3Object from 'npm:windmill-client@1.253.7';
309+
310+
export async function main(input_file: S3Object) {
311+
// Load the entire file_content as a Uint8Array
312+
const file_content = await wmill.loadS3File(input_file);
313+
314+
const decoder = new TextDecoder();
315+
const file_content_str = decoder.decode(file_content);
316+
console.log(file_content_str);
317+
318+
// Or load the file lazily as a Blob
319+
let fileContentBlob = await wmill.loadS3FileStream(inputFile);
320+
console.log(await fileContentBlob.text());
321+
}
322+
```
323+
324+
</TabItem>
325+
326+
<TabItem value="python" label="Python" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
327+
328+
```python
329+
#requirements:
330+
#wmill>=1.251.7
331+
import wmill
332+
from wmill import S3Object
333+
334+
def main(input_file: S3Object):
335+
# Load the entire file_content as a bytes array
336+
file_content = wmill.load_s3_file(input_file)
337+
print(file_content.decode('utf-8'))
338+
339+
# Or load the file lazily as a Buffered reader:
340+
with wmill.load_s3_file_reader(input_file) as file_reader:
341+
print(file_reader.read())
342+
```
343+
344+
</TabItem>
345+
</Tabs>
346+
347+
![Read S3 file](../18_files_binary_data/s3_file_input.png)
348+
349+
#### Create a file in S3 within a script
350+
351+
<Tabs className="unique-tabs">
352+
353+
<TabItem value="bun" label="TypeScript (Bun)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
354+
355+
```ts
356+
import * as wmill from 'windmill-client';
357+
import { S3Object } from 'windmill-client';
358+
359+
export async function main(s3_file_path: string) {
360+
const s3_file_output: S3Object = {
361+
s3: s3_file_path
362+
};
363+
364+
const file_content = 'Hello Windmill!';
365+
// file_content can be either a string or ReadableStream<Uint8Array>
366+
await wmill.writeS3File(s3_file_output, file_content);
367+
return s3_file_output;
368+
}
369+
```
370+
371+
</TabItem>
372+
373+
<TabItem value="deno" label="TypeScript (Deno)" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
374+
375+
```ts
376+
import * as wmill from 'npm:windmill-client@1.253.7';
377+
import S3Object from 'npm:windmill-client@1.253.7';
378+
379+
export async function main(s3_file_path: string) {
380+
const s3_file_output: S3Object = {
381+
s3: s3_file_path
382+
};
383+
384+
const file_content = 'Hello Windmill!';
385+
// file_content can be either a string or ReadableStream<Uint8Array>
386+
await wmill.writeS3File(s3_file_output, file_content);
387+
return s3_file_output;
388+
}
389+
```
390+
391+
</TabItem>
392+
393+
<TabItem value="python" label="Python" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
394+
395+
```python
396+
#requirements:
397+
#wmill>=1.251.7
398+
import wmill
399+
from wmill import S3Object
400+
401+
def main(s3_file_path: str):
402+
s3_file_output = S3Object(s3=s3_file_path)
403+
404+
file_content = b"Hello Windmill!"
405+
# file_content can be either bytes or a BufferedReader
406+
file_content = wmill.write_s3_file(s3_file_output, file_content)
407+
return s3_file_output
408+
```
409+
410+
</TabItem>
411+
</Tabs>
412+
413+
![Write to S3 file](../18_files_binary_data/s3_file_output.png)
414+
415+
:::info
416+
Certain file types, typically parquet files, can be directly rendered by Windmill
417+
:::
418+
419+
For more info on how to use files and S3 files in Windmill, see [Handling files and binary data](/docs/core_concepts/files_binary_data).
420+
421+
### Windmill embedded integration with Polars and DuckDB for data pipelines
422+
423+
ETLs can be easily implemented in Windmill using its integration with Polars and DuckDB for facilitate working with tabular data. In this case, you don't need to manually interact with the S3 bucket, Polars/DuckDB does it natively and in a efficient way. Reading and Writing datasets to S3 can be done seamlessly.
424+
425+
<Tabs className="unique-tabs">
426+
<TabItem value="polars" label="Polars" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
427+
428+
```python
429+
#requirements:
430+
#polars==0.20.2
431+
#s3fs==2023.12.0
432+
#wmill>=1.229.0
433+
434+
import wmill
435+
from wmill import S3Object
436+
import polars as pl
437+
import s3fs
438+
439+
440+
def main(input_file: S3Object):
441+
bucket = wmill.get_resource("<PATH_TO_S3_RESOURCE>")["bucket"]
442+
443+
# this will default to the workspace s3 resource
444+
storage_options = wmill.polars_connection_settings().storage_options
445+
# this will use the designated resource
446+
# storage_options = wmill.polars_connection_settings("<PATH_TO_S3_RESOURCE>").storage_options
447+
448+
# input is a parquet file, we use read_parquet in lazy mode.
449+
# Polars can read various file types, see
450+
# https://pola-rs.github.io/polars/py-polars/html/reference/io.html
451+
input_uri = "s3://{}/{}".format(bucket, input_file["s3"])
452+
input_df = pl.read_parquet(input_uri, storage_options=storage_options).lazy()
453+
454+
# process the Polars dataframe. See Polars docs:
455+
# for dataframe: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html
456+
# for lazy dataframe: https://pola-rs.github.io/polars/py-polars/html/reference/lazyframe/index.html
457+
output_df = input_df.collect()
458+
print(output_df)
459+
460+
# To write back the result to S3, Polars needs an s3fs connection
461+
s3 = s3fs.S3FileSystem(**wmill.polars_connection_settings().s3fs_args)
462+
output_file = "output/result.parquet"
463+
output_uri = "s3://{}/{}".format(bucket, output_file)
464+
with s3.open(output_uri, mode="wb") as output_s3:
465+
# persist the output dataframe back to S3 and return it
466+
output_df.write_parquet(output_s3)
467+
468+
return S3Object(s3=output_file)
469+
```
470+
471+
</TabItem>
472+
<TabItem value="duckdb" label="DuckDB" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
473+
474+
```python
475+
#requirements:
476+
#wmill>=1.229.0
477+
#duckdb==0.9.1
478+
479+
import wmill
480+
from wmill import S3Object
481+
import duckdb
482+
483+
484+
def main(input_file: S3Object):
485+
bucket = wmill.get_resource("u/admin/windmill-cloud-demo")["bucket"]
486+
487+
# create a DuckDB database in memory
488+
# see https://duckdb.org/docs/api/python/dbapi
489+
conn = duckdb.connect()
490+
491+
# this will default to the workspace s3 resource
492+
args = wmill.duckdb_connection_settings().connection_settings_str
493+
# this will use the designated resource
494+
# args = wmill.duckdb_connection_settings("<PATH_TO_S3_RESOURCE>").connection_settings_str
495+
496+
# connect duck db to the S3 bucket - this will default to the workspace s3 resource
497+
conn.execute(args)
498+
499+
input_uri = "s3://{}/{}".format(bucket, input_file["s3"])
500+
output_file = "output/result.parquet"
501+
output_uri = "s3://{}/{}".format(bucket, output_file)
502+
503+
# Run queries directly on the parquet file
504+
query_result = conn.sql(
505+
"""
506+
SELECT * FROM read_parquet('{}')
507+
""".format(
508+
input_uri
509+
)
510+
)
511+
query_result.show()
512+
513+
# Write the result of a query to a different parquet file on S3
514+
conn.execute(
515+
"""
516+
COPY (
517+
SELECT COUNT(*) FROM read_parquet('{input_uri}')
518+
) TO '{output_uri}' (FORMAT 'parquet');
519+
""".format(
520+
input_uri=input_uri, output_uri=output_uri
521+
)
522+
)
523+
524+
conn.close()
525+
return S3Object(s3=output_file)
526+
```
527+
528+
</TabItem>
529+
</Tabs>
530+
531+
:::info
532+
533+
Polars and DuckDB need to be configured to access S3 within the Windmill script. The job will need to accessed the S3 resources, which either needs to be accessible to the user running the job, or the S3 resource needs to be [set as public in the workspace settings](/docs/core_concepts/persistent_storage#connect-your-windmill-workspace-to-your-s3-bucket-or-your-azure-blob-storage).
534+
535+
:::
536+
537+
For more info on how Data Pipelines in Windmill, see [Data Pipelines](../27_data_pipelines/index.mdx).
279538

280539
## Structured Databases: Postgres (Supabase, Neon.tech)
281540

docs/core_concepts/18_files_binary_data/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ Certain file types, typically parquet files, can be directly rendered by Windmil
193193

194194
### Windmill embedded integration with Polars and DuckDB for data pipelines
195195

196-
ETL can be easily implemented in Windmill using its integration with Polars and DuckDB for facilitate working with tabular data. In this case, you don't need to manually interact with the S3 bucket, Polars/DuckDB does it natively and in a efficient way. Reading and Writing datasets to S3 can be done seamlessly.
196+
ETLs can be easily implemented in Windmill using its integration with Polars and DuckDB for facilitate working with tabular data. In this case, you don't need to manually interact with the S3 bucket, Polars/DuckDB does it natively and in a efficient way. Reading and Writing datasets to S3 can be done seamlessly.
197197

198198
<Tabs className="unique-tabs">
199199
<TabItem value="polars" label="Polars" attributes={{className: "text-xs p-4 !mt-0 !ml-0"}}>
@@ -303,7 +303,7 @@ def main(input_file: S3Object):
303303

304304
:::info
305305

306-
Polars and DuckDB needs to be configured to access S3 within the Windmill script. The job will need to accessed the S3 resources, which either needs to be accessible to the user running the job, or the S3 resource needs to be [set as public in the workspace settings](/docs/core_concepts/persistent_storage#connect-your-windmill-workspace-to-your-s3-bucket-or-your-azure-blob-storage).
306+
Polars and DuckDB need to be configured to access S3 within the Windmill script. The job will need to accessed the S3 resources, which either needs to be accessible to the user running the job, or the S3 resource needs to be [set as public in the workspace settings](/docs/core_concepts/persistent_storage#connect-your-windmill-workspace-to-your-s3-bucket-or-your-azure-blob-storage).
307307

308308
:::
309309

0 commit comments

Comments
 (0)